Data Is Beautiful

[OC] I ranked 690 recipes by statistical confidence. Above 4.5★, star ratings are nearly meaningless — the spread tells the whole story.

April 19, 2026

Each bubble is one of 690 recipes from AllRecipes and Food.com. X-axis is star rating, Y-axis is Wilson Score confidence, bubble size is review count.

The finding: above 4.5★, star rating tells you almost nothing. Two recipes both rated 4.8★ — one has 111 reviews (90.6% confidence), one has 20,954 reviews (95.7% confidence). Same stars. Completely different reliability.

Wilson Score Lower Bound asks: given this rating AND this many reviews, what's the worst-case probability this recipe is actually good? Same algorithm Reddit uses to rank comments.

by Content-Ad-8858

View 3 Comments

3 Comments

Onepopcornman on April 19, 2026 12:04 pm

Anytime I hear tells the whole story I know a certain agent wrote this.

Unlike some I don’t always have a problem with that but it should always be noted in the citation.

Op which model made this and why is that not noted?
ertri on April 19, 2026 12:12 pm

We gotta ban the slop, it is not beautiful
NuclearHoagie on April 19, 2026 12:28 pm

The conclusion is kind of uninteresting. The spread for high star ratings is no greater than for lower star ratings. A dish rated above 4.5 stars is almost certainly good. And the higher the rating goes above 4.5 stars, the more likely it’s good.

It’s not interesting that recipes with higher review counts have a higher lower bound of the proportion confidence interval, it’s just math. If 90% of people rated two recipes positively, but the rating was among 10 people for the first and 1000 for the second, then of course you have more confidence that the second one is better.

The lower bound of the CI is not the “best estimate” of a recipe’s quality, though, that’s the mean rating. With the CI rating, it’s impossible for recipes that aren’t popular to rate well. This is basically the whole point of the Wilson interval score, to put more confidence where you see more data. This isn’t unique to recipes, you are mathematically guaranteed to see this pattern in any type of rating data.

If two of anything are rated 5 stars, obviously you should be more sure of the one with more ratings. For *every* individual star rating, the y-axis is simply organized by the number of reviews. It’s not unique to >4.5 stars.