Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Normalizing can help with search. For example for Ruby I maintain this gem: https://rubygems.org/gems/sixarm_ruby_unaccent


Wow the code[1] looks horrific!

Why not just do this: string → NFD → strip diacritics → NFC? See [2] for more.

[1] https://github.com/SixArm/sixarm_ruby_unaccent/blob/eb674a78...

[2] https://stackoverflow.com/a/74029319/3634271


Sure does look horrific. :-) That's because it's the same code from 2008, long before Ruby had the Unicode handlers. In fact it's the same code as for many other programming languages, all the way back to Perl in the mid-1990s. I didn't create it; I merely ported it from Perl to Ruby.

More important, the normalization does more than just diacritics. For example, it converts superscript 2 to ASCII 2. A better naming convention probably would have been "string normalize" or "searchable string" or some such, but the naming convention in 2012 was based on Perl.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: