Can you add 100 digit numbers reliably in a short amount of time over a large sa...

achierius · 2025-02-02T07:17:25 1738480645

With pen and paper? Yes, definitely. Without? LLMs have scratch space, so they're not operating without either.

szvsw · 2025-02-02T17:26:50 1738517210

At pace though? What would your throughput be to maintain 98% reliability? How long do you think one execution would take? my guess would be between 30s-1m for a single summation of two 100 digit numbers? So let’s say you want a sample size of 200 operations to estimate reliability… maybe something like 2-4hours to get through the computations working non stop? That’s actually a nontrivial amount of focus on a highly repetitive task, which isn’t easy for a lot of people. I’m now genuinely curious what the average HN user’s reliability and throughput would be for adding together two random hundred digit numbers working as fast as they can.

thfuran · 2025-02-02T17:52:40 1738518760

Computers have been faster than humans at arithmetic since the 1940s. I don't really understand your point.

szvsw · 2025-02-02T20:47:35 1738529255

The point made by a commenter a few levels up questioned whether 98% reliability for adding two 100-digit numbers together should be considered noteworthy/acceptable etc.

My point is that assessing 98% reliability qualitatively depends on whatever kind of baseline system you are comparing against.

Obviously this reliability (and speed) of an LLM with 98% accuracy is atrocious compared to any basic automated arithmetic computation system. If you need to reliably add together 100 digit numbers, of course an LLM is a bad choice.

However, comparing it to the general class of systems which an LLM belongs to - ie “things that can add 100-digit numbers together quickly and also tell you about this history of the Ottoman Empire and a bunch of other stuff too, all within a relatively short amount of time” - this reliability is (potentially) impressive. We don’t have many such systems - a moderately educated human; a generic bespoke piece of software which process natural language requests and attempts to decide whether or not to retrieve information from a database/the internet or write and execute code, but could conceivably have been written in a crude manner 5 years ago; and LLMs (which might also take advantage of tool use as suggested in the previous example).

In this context - comparing to other systems with similar capabilities - the 98% reliability might be considered impressive, especially when you consider that it likely is much higher for more common simpler arithmetic problems.

My intention in bringing up “an average HN user” was to bring up an example of a reasonable benchmark to compare to, in my opinion. A HN user is a stand-in for a moderately educated person, which could reasonably respond to a variety of generic natural language requests in a short amount of time. My point was simply that adding together 100 digit numbers reliably and quickly while also being able to chat about Shakespeare or Kalman filters or planning a wedding is likely a more difficult task than we tend to give credence to, due to our familiarity with specialized systems which do arithmetic extremely well and extremely quickly.