Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Convert a number to an approximated text expression (github.com)
114 points by zz0mm 67 days ago | hide | past | web | favorite | 30 comments

If you don't want it approximate, but exact, this is still a tricky non-trivial ambiguous problem. And this problem often comes up in text normalization for NLP task (e.g. speech recognition, or text-to-speech).

For Python, for English, there is the inflect library: https://pypi.org/project/inflect/

I also wrote a function in Pascal to convert integers to natural language exactly. E.g. here is the output for 100000! https://gist.github.com/benibela/f0163b02562f647e4d2f

It's 8.8MB of plaintext English output describing the ridiculously long number, for those on mobile (who can't see the full file), or who don't want to click through the 'file is truncated...' link

As an approximation, one could just say three duoquinquagintacentillinonagintacentillion

Isn't "one hundred thousand factorial" a more useful conversion? Would your conversion ever find utility?

Aside, this is why French style punctuation with a space before ? or ! is superior.

I did not write it to be useful. I just wanted to say it can handle large integers.

I am implementing the XPath standard. The standard says there need to be a function to convert integers to English, but it does not specify an upper limit on the integers

Cool, IMO that makes it potentially useful (but being useful isn't required of course).

Out of curiosity, what part of the spec does that implement? Is it to do with fn:format-integer.

Here is the function standard: https://www.w3.org/TR/xpath-functions-31/#func-format-intege...

Number formatting is really complicated

Perl does a reasonable job of it in around 250 lines: https://github.com/neilb/Lingua-EN-Numbers/blob/master/lib/L...

For Common Lisp, there's a FORMAT directive:

  CL-USER> (format t "~r" 4096)
  four thousand ninety-six

I didn't know Common Lisp spoke American English

I've got it (albeit limited to 7 digit numbers) in an interview before.

Lots of fun edge cases to handle.

One other aspect of approximated numbers is whether the author wants to emphasize that the number is large or small. "Less than" and "almost" carry different connotations.

You are right, hence Number Words returns different versions of the approximation: 'more', 'less' and 'around'. More nuanced versions could be added to the list. Importantly the choice of the approximation version to use would be with the user of the library. Wonder if the algorithm to choose appropriate approximation can be devised.

> Water temperature is below 10C (input data would be 9.53C)

I can't tell the water temperature at all without context -- could be any temperature below 10C; and in the wrong context it could be confusing. "Just below"/"just under" would be a lot more accurate.

Given GP's "...whether the author wants to emphasize..." I don't think an algorithm could choose without knowing the author's intent. There's so much context and subtlety in choosing words based on intent.

Could one split the difference with "just under"? Or maybe add an expected value or prior value, and the algorithm would adjust how it is phrased?

Does this exist for time numbers?

I wish Telegram used such a thing. Someone not seen for 1h59m is 'last seen an hour ago'. Even just showing the actual time would be more helpful IMO - it displays it at the top right underneath (iOS) or very near (Android) the system time anyway, I don't know why it's thought to be a good idea to remove so much information.

I always assumed their system is simply to tell you roughly whether to expect the person to be online soonish. And I would find it too invading to have exact numbers - e.g. in WhatsApp almost everyone I know disables that.

It tells you the exact time anyway, just have to click on it (or otherwise get to the contact's profile). The options for disabling it are to make it visible to everybody/contacts/nobody.

This is the type of thing WASM modules would be perfect for. Theres no reason for this to have so many implementations in so many languages.

That is a frustrating aspect of open-source

In the old days you would have a binary shell tool or a dll, and then you could call it from any language. It did not matter in which language the tool was written in.

It has nothing to do with open-source. This project is written in Java (basically), which is proprietary.

It has everything to do with programming languages that don't compile down to linker symbols that can be called with the C calling convention, and the utter failure of Unix philosophy when it comes to shell tools.

What would be the intended use case for something like this?

Use automation to get human-friendly text for numbers. An example from the readme:

- Q2 sales were around 1M$

When the input was $1,002,184.

I'd be amazed if a company put "around $1M" for anything that was "over $1M" !

They could use the -pr switch when running it.

Good point! Fixed the example to reflect realities of PR language.

Is there a Java version of this? I understand it's open-source, but since Clojure is JVM, and the lowest common denominator is Java, I think it will be hard to include Clojure just to use the library.

Clojure is JVM language and it is straightforward to interact with Clojure libs from Java apps. To make it even easier it is a first thing on my TODO list to provide a simple Java interface to work with Number Words plus update README with Java use examples.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact