Normalizing can help with search. For example for Ruby I maintain this gem: http...

noname120 · on March 24, 2024

Wow the code[1] looks horrific!

Why not just do this: string → NFD → strip diacritics → NFC? See [2] for more.

[1] https://github.com/SixArm/sixarm_ruby_unaccent/blob/eb674a78...

[2] https://stackoverflow.com/a/74029319/3634271

jph · on March 24, 2024

Sure does look horrific. :-) That's because it's the same code from 2008, long before Ruby had the Unicode handlers. In fact it's the same code as for many other programming languages, all the way back to Perl in the mid-1990s. I didn't create it; I merely ported it from Perl to Ruby.

More important, the normalization does more than just diacritics. For example, it converts superscript 2 to ASCII 2. A better naming convention probably would have been "string normalize" or "searchable string" or some such, but the naming convention in 2012 was based on Perl.