
Data available here
https://www.ssa.gov/oact/babynames/names.zip
For context, perplexity is a measure of how random something is by equating it to a fair dice with N sides. If some year, there are 1000 unique boy names floating around, but almost all of them are evenly split between James and Joseph, the perplexity of that year's batch of boy names is about 2. Until the 1960s, the US effectively acted as though there were about 200 boy names and 400 girl names. More recently, those numbers are closer to 1400 and 2100 respectively. Seems that girl names consistently have about twice the variety of boy names.
Caveats of this dataset here
https://www.ssa.gov/oact/babynames/background.html
by aeftimia
7 Comments
What the difference between entropy and perplexity in this context?
Could it be a sign of diversifying population in the country? As diversity is increasing so is the perplexity, maybe.
I thought people were naming their kids “Perplexity”
Makes me wonder a couple things:
– what happened around 1980 to begin the widening in the pool of names?
– does the gender difference hold up cross-culturally?
How can I short the perplexity bubble?
I wonder how many of these names are different spellings of the same pronunciation? Lindsay vs Lindsey, for example. Or Autumn vs Autymn.
> For context, perplexity is a measure of how random something is by equating it to a fair dice with N sides.
So information entropy is the _logarithm_ of perplexity, or perplexity is the exponentiation of information entropy. Got it.
> Until the 1960s, the US effectively acted as though there were about 200 boy names and 400 girl names. More recently, those numbers are closer to 1400 and 2100 respectively. Seems that girl names consistently have about twice the variety of boy names.
In the last 60 years, there’s an increase of 2.8 bits of information entropy in the selection of boys’ names, and 2.4 bits for girls’ names. So they’ve both increased quite a lot, but boys’ names have increased in variety somewhat _more_.
I wonder what fraction of the increase is due to _spelling_ variations, like Ashleigh/Ashley or Steven/Stephen, of identically-pronounced names.