Hacker News new | past | comments | ask | show | jobs | submit | Shrezzing's comments login

>employees are the most expensive thing a SaaS business has.

I'm pretty sure for the overwhelming majority of (successful) SaaS businesses, the most expensive part is the marketing & advertising budget. 30-50% isn't uncommon, because the returns on successful sign-ups are enormous.


Not so. early stage funding goes to hiring.

The paper discusses this, and the approach taken in the paper implements a number-flip stage, so numbers are formatted with their least significant figure first.

Since models are very good at writing very short computer programs, and computer programs are very good at mathematical calculations, would it not be considerably more efficient to train them to recognise a "what is x + y" type problem, and respond with the answer to "write and execute a small javascript program to calculate x + y, then share the result"?

From a getting answers perspective yes, from an understanding LLMs perspective no. If you read the avstract you can see how this goes beyond arithmetic and helps with longform reasoning

But that's not all that relevant to the question "can LLMs do math". People don't really need ChatGPT to replace a calculator. They are interested in whether the LLM has learned higher reasoning skills from it's training on language (especially since we know it has "read" more math books than any human could in a lifetime). Responding with a program that reuses the + primitive in JS proves no such thing. Even responding with a description of the addition algorithm doesn't prove that it has "understood" maths, if it can't actually run that algorithm itself - it's essentially looking up a memorized definition. The only real proof is actually having the LLM itself perform the addition (without any special-case logic).

This question is of course relevant only in a research sense, in seeking to understand to what extent and in what ways the LLM is acting as a stochastic parrot vs gaining a type of "understanding", for lack of a better word.


That's a fair summary of why the research is happening. Thanks.

That's in fact what ChatGPT does ... because 99% accurate math is not useful to anyone.

This is a cromulent approach, though it would be far more effective to have the LLM generate computer-algebra-system instructions.

The problem is that it's not particularly useful: As the problem complexity increases, the user will need to be increasingly specific in the prompt, rapidly approaching being fully exact. There's simply no point to it if your prompt has to (basically) spell out the entire program.

And at that point, the user might as well use the backing system directly, and we should just write a convenient input DSL for that.


Yes, this is what external tools/plugins/api calls are all about.

>deductive reasoning is just drawing specific conclusion from general patterns. something I would argue this models can do

That the models can't see a corpus of 1-5 digit addition then generalise that out to n-digit addition is an indicator that their reasoning capacities are very poor and inefficient.

Young children take a single textbook & couple of days worth of tuition to achieve generalised understanding of addition. Models train for the equivalent of hundreds of years, across (nearly) the totality of human achievement in mathematics, and struggle with 10-digit addition.

This is not suggestive of an underlying capacity to draw conclusions from general patterns.


> Young children take a single textbook & couple of days worth of tuition to achieve generalised understanding of addition

Maybe you did! Most young children cannot actually do bigint arithmetic reliably or at all after a couple days worth of tuition!


I think the “train for hundreds of years” argument is misleading. It’s based off of parallel compute time and how long it would take to run the same training sequentially on a single GPU. This assumes an equivalence with human thought based on the tokens per second rate of the model which is a bad measurement because it varies depending on hardware and the closest comparison you could draw to what a human brain is doing would be either the act of writing or speaking but we obviously process a lot more information and produce a higher volume of information at a much higher rate than we can speak or write. Imagine if you had to verbally direct each motion of your body, it would take an absurd amount of time to do anything depending on the specificity you had to work with.

The work done in this paper is very interesting and your dismissal of “it can’t see a corpus and then generalize to n digits” is not called for. They are training models from scratch in 24 hours per model using only 20 million samples. It’s hard to equate that to an activity a single human could do. It’s as though you had piles of accounting ledgers filled with sums and no other information or knowledge of mathematics, numbers or the world and you discovered how to do addition based on that information alone. There is no textbook or tutor helping them do this either it should be noted.

There is a form of generalization if it can derive an algorithm based on a maximum length of 20 digit operands that also works for 120 digits. Is it the same algorithm we use by limiting ourselves to adding two digits at a time? Probably not but it may emulate some of what we are doing.


>There is no textbook or tutor helping them do this either it should be noted.

For this particular paper there isn't, but all of the large frontier models do have textbooks (we can assume they have almost all modern textbooks). They also have formal proofs of addition in Principia Mathematica, alongside nearly every math paper ever produced. And still, they demonstrate an incapacity to deal with relatively trivial addition - even though they can give you a step-by-step breakdown of how to correctly perform that addition with the columnar-addition approach. This juxtaposition seems transparently at odds with the idea of an underlying understanding & deductive reasoning in this context.

>There is a form of generalization if it can derive an algorithm based on a maximum length of 20 digit operands that also works for 120 digits. Is it the same algorithm we use by limiting ourselves to adding two digits at a time? Probably not but it may emulate some of what we are doing.

The paper is technically interesting, but I think it's reasonable to definitively conclude the model had not created an algorithm that is remotely as effective as columnar addition. If it had, it would be able to perform addition on n-size integers. Instead it has created a relatively predictable result that, when given lots of domain-specific problems, transformers get better at approximating the results of those domain-specific problems, and that when faced with problems significantly beyond its training data, its accuracy degrades.

That's not a useless result. But it's not the deductive reasoning that was being discussed in the thread - at least if you add the (relatively uncontroversial) caveat that deductive reasoning should lead to correct conclusion.


I think these examples still loosely fits the author's argument:

> There are some cases where big data is very useful. The number of situations where it is useful is limited

Even though there are some great use-cases, the overwhelming majority organisations, institutions, and projects will never have a "let's query ten petabytes" scenario that forces them away from platforms like Postgres.

Most datasets, even at very large companies, fit comfortably into RAM on a server - which is now cost-effective, even in the dozens of terabytes.


Big data is not only about storage size but also about processing e.g. RAM. In the next coming years the trend is that there will be more (read exponential) IoT sensor devices than we can ever imagine and the nature of their data will be mostly big data in the sense of size (storage) and the analysis (RAM). Just check the latest cars or EV cars, many of them have hundreds of sensor devices and these devices are already connected to the Internet.

Another upcoming example is the latest 5G DECT NR+ standards (the first non-cellular 5G), it will only fuels these massive accumulation of datasets and these monitoring sensor devices do not even get connected to the Internet (think of private factory networks) [1].

Apparently there are limited number of human using or having sensors but for non-human based devices the sky is the limit. For human based communication the data is very limited, we rarely communicate with each others and most of our data now is based on our intermittent media consumptions while streaming audio/video [2]. For IoT sensors devices, they mainly have regular and frequent interval sampling that probably in the ranges of every seconds, minutes, hours, etc. Some if not most of this data is not clean data, there are raw data, and raw data is inherently big and huge compared to the data, for example raw image data vs JPEG data, where the former can be several time bigger in size and processing requirements.

[1] DECT NR+: A technical dive into non-cellular 5G:

https://news.ycombinator.com/item?id=39905644

[2] 50 Video Statistics You Can’t Ignore In 2024:

https://www.synthesia.io/post/video-statistics


This is a quite good allegory for the way AI is currently discussed (perhaps the outcome will be different this time round). Particularly the scary slide[1] with the up-and-to-the-right graph, which is used in a near identical fashion today to show an apparently inevitable march of progress in the AI space due to scaling laws.

[2]https://motherduck.com/_next/image/?url=https%3A%2F%2Fweb-as...


This was the big one for me too. The juxtaposed healthy versus unhealthy lungs resemble an uncooked chicken versus a roast chicken which was left in the oven for 30 minutes more than necessary.

https://www.scotsman.com/webimg/legacy_elm_28724349.jpg?crop...


The antismoking PSA that made the strongest impression on me, by far, was the one that showed a grandfather encouraging a baby to take a step. Eventually, the baby starts walking, and rushes over to the grandfather.

And through the grandfather, who fades to translucency.

It wasn't just me; that PSA made enough of a splash that it was called out on Friends.

I've tried to find that PSA in the past, but with no success. Once I asked a friend if they could find it, and the response was "Oh, I know exactly the one you're talking about. I won't help you look for it. I hate that commercial and I don't want to see it again."

Looks like it's made it onto youtube by now in glorious 240p: https://www.youtube.com/watch?v=O6pb6XxrbmE

I note that the second comment is "This commercial was what made my father stop smoking." It's interesting to think about the balance between disturbing the smoking audience so strongly that they stop, and disturbing the non-smoking audience so strongly that they complain about being exposed to your traumatic imagery and imperil your funding.


There have been some Oceanic anti-speeding ads that had the same effect on people, apparently. There's one where time freezes right before a collision and the person at fault apologizes for the little boy he's about to murder. There's one where the driver is talking to the ghost of his friend who died in a car crash. There's one where the grim reaper spins a roulette wheel every time a driver makes a mistake at an intersection. There was one where they rewind time, nudge the speedometer slightly lower, and resume normal time, and a fatal accident (pedestrian hit by car) turns into a bruised leg.

> There's one where the grim reaper spins a roulette wheel every time a driver makes a mistake at an intersection.

In my imagination, this one would end with the roulette wheel stopping on 0, but the result of 0 not being depicted.


The wheel is labeled things like "near miss" and "death".

Hosting costs are £3m, but total expenditure is $160m - which obviously isn't covered by the interest on $250m.


The UK has an age-based advantage in this metric. Oxford & Cambridge are nearly 1,000 years old. Once you take that into account, the stat becomes "of the top 8 universities (ex. Oxbridge), 2 are in the UK and 4 are in the USA". Imperial is very high quality institution, definitely the peer of Berkley/Yale. UCL normally isn't thrown into the top 10 though - it'd usually appear in the top 25.


The oldest university in the world isn't from the UK, so that clearly can't be such a huge factor.


Bologna cannot be faulted for not being able to capitalize on its age.

Northern Italy went through an awful lot of tumultuous history between 1450 and now that England didn't experience: it was cut up into lots of little fiefdoms, dominated by several external powers, and generally kept under constantly changing iron thumbs.


> I don’t think it’s any longer about access to capital

The link provided as proof for this comment is Wayve receiving a $1bn injection from Microsoft and Nvidia [1].

The $1bn raise is not the concern of a budding 23 year old graduate leaving Imperial/Cambridge/Oxford. They're looking at the first £100k capital to see them through the first few months. In the UK, the scene for the first capital injection is far weaker than in the US, which has an inevitable downstream impact.

[1] https://www.bbc.co.uk/news/articles/crgypzg4edvo


Been a while since I was in this position but that first £100k used to be a cakewalk. SEIS made it almost a no-brainier for any high net worth individual to invest in startups.


> SEIS made it almost a no-brainer

and it still should, as I understand it. On a £100k investment that the business loses all of, the investor gets £75k (£50k tax relief + £25k-ish loss relief) back or something ridiculous, don't they? Makes me wonder how this hasn't made UK capital more readily flow


Spoiler: UK capital flow isn't as bad as some of the doom and gloom comments on this thread are making it out to be


The challenge in the UK at the moment is connecting willing high net worth individuals with entrepreneurs. Even with the tax incentives, and the relatively good incubator-ish organisations like Eagle Hub, there's some enormous disconnect between viable ideas and timely capital to execute on them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: