Are Comedians Better at Quiz Shows?

During COVID, I started to watch a lot of British comedy and panel shows: Taskmaster, Would I Lie to You, Cats Does Countdown, Mock the Week, Big Fat Quiz, QI -- that kinda thing.

One day, my buddy observed that comedians were unusually good at winning Richard Osman's House of Games.

Could that be true?

Intuitively, it seems possible.

Comedians are thought to be quick-witted, which definitely helps on a quiz show. Also, a lot of stand-up comedy riffs on pop culture and current events. Maybe comedians have better-than-average general knowledge that would be useful.

So the question is:

Are comedians really better quizzers?

Spoiler: Yes! (probably)

But let's try to quantify that.

Why ROHoG?

I focused on Richard Osman's House of Games for a few reasons:

First, it's a fun show! (also, it's where the question arose)

Second, it's a pretty traditional gameshow dressed up in funny clothes. Contestants basically answer trivia questions to earn points and the highest score wins. Each contestant appears at least 5 times and gets the chance to answer 100+ questions over the week, so there is plenty of data to analyze.

Most importantly: someone else already did most of the work. This Google Sheet has per-episode stats for 8 seasons of the show. Thanks, whoever made that available!

Getting the Data

I manually fetched a copy of the "Players" tab from the Google Sheet. It makes heavy use of formulas and some were incompatible with Excel. I wasn't gonna debug it, so I used "Paste values" to get a snapshot without the formulas, and put that into Excel instead.

You could also fetch the data programmatically, but that's annoying because it requires Google Cloud and authentication via either OAuth or a Service Account. For a one-off project, I wasn't gonna do that.

Transforming the data

Next, we need to reformat the data a bit to make it more usable. This was all done using pandas.

First some imports..

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm, ttest_ind

df = pd.read_excel('data/rohog.xlsx', sheet_name='Players')

Followed by some cleanup.

The sheet contained a footer that needed to be removed.

# remove the last 5 lines
df = df[:-5]

Some of the columns had funky names. I decided to rename the columns I needed, and delete the rest.

# rename columns
keep_columns = {
    'Player': 'Name',
    'Ser.': 'Series',
    'Wk.': 'WeekNumber',
    'F': 'Points5',
    'Total': 'DailyPoints',
    'Week': 'WeeklyPoints',
    'Week.1': 'WeekRank',
}

df.rename(keep_columns, axis=1, inplace=True)

# drop unneeded columns not in the kept list
to_drop = [col for col in df.columns if col not in keep_columns.values()]
df.drop(to_drop, axis=1, inplace=True)

Then I removed some weird rows. There are a handful of "specials", and some of those have fewer than 5 episodes with the same contestants (ex: some early holiday shows). Since we'll be comparing weekly total scores, we need every player to be present for the entire week. For simplicity, I decided to remove any result with fewer than 5 days.

# remove any row where we don't have 5 full days (Friday is empty)
df = df[df['Points5'].notnull()]

Next, I cleaned up the player names. Some people's names had extra characters appended to indicate it was a special episode.

# remove (C), (F), (N) or (R) from the player names
df['Name'] = df['Name'].str.replace(r'\(C\)|\(F\)|\(N\)|\(R\)', '', regex=True) 

# trim the player names
df['Name'] = df['Name'].str.strip()

And finally, I converted the rankings from strings to numbers (ex: "1st" to "1", "2nd" to "2")

# convert the WeekRank from an ordinal (1st, 2nd, 3rd) to an integer
df['WeekRank'] = df['WeekRank'].str.replace(r'\D', '', regex=True).astype(int)

At this point, the raw data is in good shape! We just need to categorize our contestants.

Getting the comedians

Who we categorize as a comedian matters a lot.

Our dataset has 469 unique names. How do we decide? Certainly, we should include stand-up comedians. What about comedy actors and writers? Can a news anchor or a gameshow host be a comedian?

Despite accumulating some modest knowledge of British TV in the last few years, I didn't recognize about half the names.

So to label each person, we could..

find some British people to ask
script IMDB and look for keywords in their bios
outsource the labeling to Mechanical Turk
apply Cunningham's Law

But it's 2025, and this is a blog post, not a Master's thesis.

I just let ChatGPT decide.

Below is the prompt that I used with gpt-4o-mini:

prompt = """
Please determine whether each of the following UK television personalities is primarily known as either a Comedian or a Non-Comedian.

Only consider whether they are known for their work as a comedian, stand-up comedian or comedy writer, not for any other work they may have done.

Please output the classification of each person as a CSV file with the following format: "Name, Known For"

For example:

David Attenborough, Wildlife documentaries
David Mitchell, Comedy panel shows
David Beckham, Football
David O'Doherty, Stand-up comedy

Only reply with the CSV header and output. Do not include any other text in your response.

Here is the list of people:
%s
"""

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    store=True,
    messages=[ {"role": "user", "content": prompt} ]
)
csv = completion.choices[0].message.content
comedian_df = pd.read_csv(io.StringIO(csv))

Originally I asked for a simple yes/no categorization, but I wasn't super happy with the results. Many names with no obvious connection to comedy were being classified as comedians.

I modified the prompt to ask what each person was "known for" and those results were much better! I decided that anybody who was "known for comedy" would be considered a comedian for this analysis.

comedian_df.columns = ['Name', 'KnownFor']
comedian_df["IsComedian"] = comedian_df["KnownFor"].str.contains("comedy", case=False)

The actual code here was a little bit longer because I decided to chunk using multiple queries. But that's the gist of it.

At a glance, the results look pretty good.

Name	Is Comedian?	Known For
Ed Gamble	Yes	Stand-up comedy
Clare Balding	No	Sports presenting
Adrian Chiles	No	Television presenting
Sarah Millican	Yes	Stand-up comedy
Maisie Adam	Yes	Stand-up comedy
...	...	...

You can download the CSV here and yell at me if you disagree with the classifications.

Plotting the data

Next, let's begin plotting the data. We need to join our categorizations with the original data.

df = df.merge(comedian_df, on='Name', how='left')

First, let's setup the color scheme

plt.style.use('ggplot')
sns.set_style("darkgrid", rc={
    'axes.facecolor': '#000',
    'axes.edgecolor': '#fff',
    'axes.labelcolor': '#fff',
    'figure.facecolor': '#000',
    'text.color': '#fff',
    'xtick.color': '#fff',
    'ytick.color': '#fff',
    'patch.edgecolor': '#fff',
    'patch.force_edgecolor': False
})

A histogram is a nice place to start.

# plot histograms for the two categories
for category in df['IsComedian'].unique():
    subset = df[df['IsComedian'] == category]
    sns.histplot(subset['DailyPoints'], kde=False, label=f'Comedian={category}', bins=20, alpha=1)
plt.legend()
plt.grid(False)
plt.xlabel('Daily Points')
plt.ylabel('Frequency')
plt.title('Total Daily Points by Contestant Type')

From this chart, it can be seen that comedians score more points.

Daily points by contestant type histogram

The trend becomes more clear if we switch to a density plot.

# plot density of the total daily points
for i, category in enumerate(df['IsComedian'].unique()):
    subset = df[df['IsComedian'] == category]
    subset['DailyPoints'].plot(kind='density', label=f'Comedian={category}')
    mu, std = norm.fit(subset['DailyPoints'])
    x = np.linspace(0, 100, 100)
    p = norm.pdf(x, mu, std)
    plt.text(0.5, -0.25+.05*i, rf'Comedian={category}: $\mu={mu:.1f}$, $\sigma={std:.1f}$', ha='center', va='center', transform=plt.gca().transAxes, fontsize=12)
plt.title('Total Daily Points by Contestant Type')
plt.xlabel('Daily Points')
plt.grid(False)
plt.xlim(0, 100)
plt.ylim(0)
plt.legend()

The data look normal, so we can also calculate the mean and standard deviation for our density plot.

Daily points by contestant type KDE plot

On average, comedians score 6.3 points higher than non-comedians over a week. The standard deviation is slightly higher for comedians, mostly due to a few exceptionally high scores.

Now that was "total daily points". But technically, daily points don't matter and the winner is the person with the highest "weekly points". So let's plot that too.

# plot density of the total daily points
for i, category in enumerate(df['IsComedian'].unique()):
    subset = df[df['IsComedian'] == category]
    subset['WeeklyPoints'].plot(kind='density', label=f'Comedian={category}')
    mu, std = norm.fit(subset['WeeklyPoints'])
    x = np.linspace(0, 100, 100)
    p = norm.pdf(x, mu, std)
    plt.text(0.5, -0.25+.05*i, rf'Comedian={category}: $\mu={mu:.1f}$, $\sigma={std:.1f}$', ha='center', va='center', transform=plt.gca().transAxes, fontsize=12)
plt.title('Total Weekly Points by Contestant Type')
plt.xlabel('Weekly Points')
plt.xlim(0, 40)
plt.ylim(0)
plt.grid(False)
plt.legend()

This chart strongly favors the comedians again, with a higher mean and a lower variance than for non-comedian contestants.

Weekly points by contestant type KDE plot

This makes sense. You'd expect the group of contestants with higher daily points to win the week too.

Returning to "Daily Points", significance testing shows these results are unlikely to be due to chance.

comedian_scores = df[df['IsComedian']].DailyPoints
non_comedian_scores = df[~df['IsComedian']].DailyPoints
t, p = ttest_ind(comedian_scores, non_comedian_scores)
print(f't={t:.5f}, p={p:.2e}')

t=6.39013, p=3.55e-10

So... comedians are better at quiz shows?

I mean, sure, I dunno? It looks that way to me..

This was just for my and your amusement. I'm not a statistician. If any math wizards want to take a crack, and/or provide corrections, that would be fun to see.

The fine print

I believe there are some technical difficulties with taking a 4-player game like this, just plotting the scores, and declaring a winner.

Below, I describe some more details about how the scoring works, and I lay out some potential problems with my admittedly simple analysis.

How Scoring Works

Players have 2 scores: "daily" and "weekly". The "daily" score is the number of correct answers each day. The "weekly" score is the player's ranking each day. The player with the most "daily" points on Monday is ranked 1st, earning 4 "weekly" points, the 2nd ranked player earns 3 "weekly" points, etc. Daily scores are not cumulative (points start from zero each day). The "total daily points" appearing in the charts above refers to the sum of the 5 individual daily scores.

Daily versus Weekly doesn't matter too much. 91% of the time, the two totals agree on the winner. Officially, the gameshow itself uses the weekly points to determine the winner.

Here's an example of the scoring.

Season 1, Week 1 (daily points)

Player	Monday	Tuesday	Wednesday	Thursday	Friday	Total
Nish	10 (2nd)	12	13	8	8	51
Al	13 (1st)	11	10	7	6	47
Clara	9 (3rd)	9	4	6	8	36
Anneka	6 (4th)	4	7	4	4	25

Al had the highest "daily points" on Monday, so he earned 4 "weekly points" that day. Nish was 2nd on Monday, earning 3 weekly points. Then Nish won on Tuesday, earning 4.. Remember that Fridays are double.

Below are the resulting weekly points:

Player	Monday	Tuesday	Wednesday	Thursday	Friday	Total
Nish	3 (2nd)	4	4	4	8	23
Al	4 (1st)	3	3	3	4	17
Clara	2 (3rd)	2	1	2	6	13
Anneka	1 (4th)	1	2	1	2	7

At the end of the 5 days, Nish won with 23 weekly points.

The big difference between "daily" and "weekly" is that the weekly score is bounded to the range [6 - 24] because it is the sum of the (inverse) daily ranking. There is no way to earn more than 24 weekly points, or fewer than 6.

In contrast, the daily score is bounded by the number of questions (usually over 100 per week) and negative scores are technically possible.

So which score should we consider to answer the original question?

A. the person with the highest cumulative daily score?

B. the person with the most 1st-place days?

C. the person who won the week?

Outliers

The data contain a few exceptional scores, such the 82 points earned in Series 4, Week 6. Perhaps those outliers raised the average score for comedians?

It doesn't seem that way. Dropping outliers shifted the mean and standard deviation slightly, but didn't materially change the outcome.

Negative and Bonus points

There are cases where the scoring is more complex than just "one point per correct answer".

During "Answer Smash", players can lose a point by answering incorrectly. In rare cases, this has resulted in a player finishing the day with a negative score.

Sometimes the host will award a bonus point for an excellent answer. This is very rare, but it can happen, for example in the team game "Distinctly Average" when a player or team guesses the quantity exactly right.

In some games, like "The Nice Round", two or more people earn a point from a single question. Each show also has one "team round" where both players on the team earn the same points.

It seems unlikely that either comedians or non-comedians benefit disproportionately from these scoring minutia. From the perspective of "who is the best quizzer", I'd say these details are mostly irrelevant. But a thoughtful analysis might want to tease that apart more carefully.

Some people appear in multiple weeks

18% of contestants have appeared more than once.

Num. Appearances	Contestants	%
1	379	82.0%
2	80	17.3%
3	3	0.7%

And those contestants are disproportionately comedians.

Returning	Contestants	%
Comedians	47	56.6%
Non-Comedians	36	43.4%

Lots of the games are tricky and it takes a few episodes to get "dialed in". Let's see..

So perhaps returning contestants perform better?

Nope. Actually they do slightly worse.

Daily points by appearance

The chart above compares only the contestants who appeared exactly twice (that is, they appeared in 2 weeks of 5 days each). On their second appearance, returning contestants averaged about 2 points worse per the week! The point here is that we can discard the theory that comedians are just better because they get more practice.

Different seasons have different rounds

With the exception of "Answer Smash", each episode contains a different selection of games. After 8 seasons, some new games have been added and others retired.

Arguably, this prevents or at least complicates the comparison. Is it reasonable to compare two competitors if they played different rounds? I'm just gonna say "yes, that's part of the game". But it could be interesting to analyze this in more detail.

The definition of comedian is fuzzy

Yeah, I just asked ChatGPT to classify a list of names and then did a bit of manual verification to ensure the results seemed reasonable.

In reality, "Comedian-ness" is not a binary classification. Is a comedy actor a comedian? What about someone who did a few open mic nights before changing to a different entertainment career?

I tried to consider people working professionally in comedy, but obviously there is room for interpretation on how to define a comedian.

Maybe some other cohort drags down the non-comedian average

I only looked at comedians, but there could be other sub-groups with especially good or bad performances.

Perhaps pop-stars form a particularly poor group, hidden within the non-comedians. If a particular non-comedian sub-group underperforms, then we might be labeling the wrong effect. Perhaps it's not that "comedians are good at quizzing", but rather that "pop stars are bad at quizzing", and they're bringing down the average for the non-comedian group.

I didn't investigate any other sub-groups here.

Wait, these events aren't independent

Yeah, I kind of agree. This seems like a potentially serious problem.

But I also wanted to make some charts, so here we are.

Each week has 4 contestants. But only 1 person can earn points for each question (usually). I believe the math and charts that I used here technically require independence. They aren't correct in a 4-player scenario.

There are (at least) two problems with a contest like this. Or maybe it's the same problem, with two ways of looking at it.

First, the composition of the contestants during a particular week can favor either comedians or non-comedians

Num. Comedians	Weeks
0	8
1	70
2	45
3	9
4	2

Most weeks have either 1 or 2 comedians. But a few shows have either 4 or zero.

That's a big problem for comparing "total wins" because sometimes its impossible (or guaranteed) for a comedian to win the week, due to the composition of contestants.

It may also be an issue for comparing "total daily points", which leads to the 2nd problem.

In a 4-way contest, a very-good player can be overshadowed by an exceptional player.

An easy way to think about this is to assume that all contestants are geniuses who know the answer to every question. They only differ in their ability to hit the buzzer quickly.

Suppose we have 5 contestants, such that:

A is faster than B
B is faster than C
C is faster than D
D is faster than E

On a show with A-B-C-D, Contestant A will always win.

On a show with B-C-D-E, Contestant B will always win.

B is really good! The second best player in the group!

But if B appears on the same show as A, they will score zero points, losing every single time.

Back in reality, nobody is even close to 100% accurate, but some players are very good.

So the ordering/grouping of players into episodes within a season matters.

Imagine there are 100 points up for grab each week. Then put the 4 very best players on the same show (this could happen, with a 1/487,635 chance!)

The 4 best players are almost equally good, so they will score about the same.

Player Rank	Score
1	26
2	25
3	25
4	24

But now, let's match the very best player with the 3 worst players.

Here, the scores will be much more skewed because the very best player will dominate the scoring.

Player Rank	Score
1	60
58	14
59	13
60	13

Your scoring potential on ROHoG depends on who you are playing against.

How should we feel about that?

Can we say anything meaningful about entire categories of players, when individual scores are so dependent on the composition of contestants?

I'm not sure!

Conclusion

So that's it..

Are comedians better quizzers?

I am willing to say: yes! (probably)

(Does it matter? Definitely not!)

This was just a bit of fun, drawing some charts and trying to put some numbers on a casual observation made while watching a gameshow.

It's absolutely not a rigorous statistical analysis, so please take it with a grain of salt.