There's an infinite amount of misspelled/localized names outside their country of origin e.g. Maicol for Michael, Sandiago for Santiago, Uilliam, Villiam, Willian etc etc
Aditional "anecdata": I live in a country (Hungary) where you have to apply for a special permit to give an "unusual" name to a kid, and since my son has an italian name we had to do that.
You can see the list of names people requested to add, and it includes random stuff like "Magneto" (which isn't on this list either).
This, to say that basically any given word can be a name in some country, it's likely not possible to have an exhaustive list.
Maybe replace with "extensive".
EDIT: Yeah, this list is absurdly non-exhaustive. My wife's maiden name isn't even listed and it's a common English surname, noun and verb.
Actually, I just can't find any French name that I look up. Except for "Michel" (another former Belgian prime minister) but that one was not a very high bar.
So yeah, it's pretty much non-exhaustive and it doesn't take long to find it out.
With last names this is also an important matter. McDonald is not mcdonald.
Then there is F.W. de Klerk. Not 'De Klerk' or 'de klerk'.
Some people take offence at their name being written wrongly, in this list by having no Title Case going on there could be a few people 'offended'!
I'm not going to try do statistics, but I reckon that's quite a lot. Census would be a better source than birth registers.
I went to the domain because the GitHub didn't describe the format of the data. So I'd also beef up the README.
My wife was annoyed when she came to the US and every form has a “first” and “last” field, but she doesn’t have a last name. Her passport for example, only has a “name” field.
Recently she decided to legally change her name to something that fits in better here. Although it has been an emotional decision for her, she felt she had to as it's very hard to live in the US without a first and last name.
Edit: "Annoying" doesn't even begin to cover her experience. Everything from getting a bank account, to buying a car, to getting a marriage license was a whole process of explaining until we were blue in the face, escalating to a manager, being told basically to get lost, etc. etc.. It left her in tears more than once.
There are laws in France from revolution times  that basically set your name in stone because anyone using whatever name they liked without need for any paper trail basically cause administrative and bureaucratic hell.
But nowadays even the administration works around that to comply with the tradition with things like "nom d'usage" (you can see it as "lastname nickname"), which is basically allowing you to use another name for non-legal purposes, but people still think it's legal and shove it in every form as regular "nom".
I repeatedly heard from people working in HR that a significant amount of the questions they get the first few months of employees are just (married) women complaining about their last name "being wrong" on their payslips simply because they think that's no longer their legal name though it is.
 Loi du 6 fructidor an II (August 23rd 1794), liberal translation of article 1: no citizen will carry any other name than that of the birth certificate.
The point being, reactionary bigots tend to out themselves, and that's the obnoxious worldview driving by here. Erasure of someone's identity in conformance to external expectations is something that - remarkably - remains a thriving and actively promoted idea. This makes it all the more important to confront and openly, firmly reject.
So is changing her name. But people shouldn't need to do gymnastics for such a trifling matter. This is like cutting a foot to fit a shoe.
Cultures have different naming conventions, and not all cultures pressure women to take names from husbands.
The real simple solution is to accept that the common ground of name form is a string. One string.
In an English speaking county, having a separate surname aids in sorting and presentation of family unities. The character set is typically A-Z. In Spain e.g. one expects a person the have two surnames. The default — but this can be changed – is that first name is from the father and the second one from the mother. By looking at the order of your name you can get information about family structures. Character set is A-Z + Ñ + umlauts and accent marks.
Going to an English speaking country I could expect them to spell my first name correctly; but since it contains a character outside A-Z I change my name to comply with their modus operandi. Yes, their computer system probably supports UTF8, but most people have never heard of this character and you won't find it on their keyboard. No problem, I change my name to comply with their system.
In Spain official forms often expect two surnames and certain characters. No problem, I use an extra hyphen, change my name, use my middle name as a surname or whatever makes the system happy.
Is it perfect? No. Does it really matter? No. So I just respect their customs and get on with my life as a respectful guest in the country where I am living.
The problem is it's every SaaS online service in the world many of which happily do business in Myanmar. You can't just tell an entire country with its own unique culture and 50 million citizens "FU, you're wrong", and when you do, it's you that's wrong not them! Everyone from Facebook, LinkedIn, Google, Microsoft, Amazon, to federated platforms like Matrix or Mastadon, to developer tools, to every forum on the Internet  requires a first and last name! These platforms have literally millions of users or customers around the world - and at least one entire country - that don't fit the <first> <last> mold and I find it highly disrespectful to force it on them like this.
 My info might be out of date on some of these as I've been out of the country and not paid much attention for a few years. I would hope that by now at least some of these big multinational companies would get it right. However, suffice it to say, the vast majority of SaaS platforms today are disrespectful to people with one name.
However, I don't see how if a person has no first/last name, and expects to be referred so, can be construed as disrespect to the English-speaking culture. There is no need to learn another language; all it takes is to "know" that the person has one string as a name.
You said it helps with sorting, and that is exactly why I said cutting a foot to fit a shoe. It's asking people to change their name for the sake of paper work.
I don't expect to show up and make a country change, for the sake of me. My name is already in first/last form.
I wish countries, and systems (frankly many of which are expanding to be used worldwide) to change, because I believe it to be the right direction, and in the end will save everyone's time.
In India, which is a single country but has many different languages (22 official and nearly a 1000 unofficial, including dialects) and cultures, I see the same insistence on last name/surname. There are states and districts where mononyms are quite common, but governments insist on first and last name in many forms. I’ve come to believe that those who make up the rules and are in the majority decide everything for others. In India, it’s the people around the national capital who assume that everybody has a last name or that everybody in the country knows Hindi (India has no national language, much to the chagrin of Hindi speakers who seem to believe it is the one).
Funnily, I’ve also noticed that the U.S. consulates have a system where they deal with a last name not being available, even when the name has more than one word separated by spaces in it. In such cases, they add “LNU” (Last Name Unknown) in the last name field in visas. In some cases I’ve seen “FNU” (First Name Unknown) too.
1. Given name + group descriptor (profession, locality, tribe, etc. such as Jim Baker or Joan Rivers)
2. Given name + parent’s given name and gender (such as Dwayne Johnson or Jóhanna Sigurðardóttir)
Iceland is especially fun, as they not only append a parentage name, but alongside -son and -dottir they’ve introduced a gender neutral suffix (-bur), and they don’t want you to make up first names out of the blue!
“[Iceland’s] Personal Names Committee maintains an official register of approved Icelandic given names and governs the introduction of new given names into Icelandic culture.” — https://en.wikipedia.org/wiki/Icelandic_Naming_Committee
Long ago, lot of forms from American websites wouldn't let me fill in my last name. As it technically contains 3 words. I had to concat "van der" to my last name to finish registration.
Colleague last name: O'Donnell. You can already see where this is going.
Bonus points: has an email address as "FirstName.O'Donnell"@domain.com which basically works in <1% of the sites though it's completely legal, just because nobody cares about specs. Obviously he has other email addresses. The good thing is he gets 0 spam in that inbox.
Eventually they came to me and asked if I'd mind changing it. I pointed out I'd requested this roughly 10 times by that point, and we discovered their ticket system had been eating it.
For bonus points, I'm in Ireland, the one place on earth you'd expect o'names to be handled gracefully.
Luckily, like everything in Germany, there was a form to fill to fix it.
It is not. I have a single first and a single last name, both ASCII-safe. I am not on the list.
Using ML can be useful if you can separate people by origin or in more homogeneous population.
I recently read a French political news story referring to O. It must be a misprint, I thought at first. But there is a French cabinet minister named Cedric O.
Although when written, if that person has a typical French name like "Jean Le" there might be people who think it's been truncated indeed.
Here in India you get very long/sometimes weird lastnames. Many are not there on the list.
I'd rather waste my time on HN :)
Probably a good list, but far from exhaustive.
It doesn't have my last name nor the last names of some family and friends I tried.
Perhaps just "a large list of scraped names"?
It doesn't have the volume of names that this one does, but it does have custom rules for names (e.g., Icelandic last names), and it can generate Thai names.
Thanks for the effort, I intend to use this to enrich my test data generation scripts.
Aside, not that exhaustive though her name is a combination of two first letters from her dad and from her moms names. It is in the wild though, have heard others with her name.
~100k last names
Out of 7-8 billion people this is really all (or most) of the first and last names? We aren’t a very creative species I guess. I especially would have expected the number of first names to be at least an order of magnitude larger.
Then there's also the following issues:
- The list contains mostly names in basic latin script, which already excludes most of the world population.
- It's also pretty bold of the author to assume that everyone even has a "first" and a "last" name. Not all names work like that.
- The list contains a bunch of extraneous stuff like "�" and "</pre></body></html> (>\k���� " and "रामकिशो&"
- The list is converted to lowercase. You can't just change the capitalization of names and expect them to be the same. Not all scripts work like that.
- In some languages/cultures a first/last name could be pretty much any word you can find in a dictionary, or a combination of them. You'd pretty much need to add that language's entire dictionary.
I could go on but instead I'll just link to this: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...
Going to play with this data set.
It seems to me if you have PII in text data you should treat the whole thing as PII.
 although the dataset doesn't actually contain "Brown" as a valid last or first name, go figure.
Mashword is a word mashup name generator service that we recently built that recognizes many common human names. One of our primary use cases is finding alternatives or unique spellings to traditional or common names (e.g. https://mashword.com/search?words=rebecca) It does not support all of the names in these lists, but we are adding and growing our support for names all the time.