Hacker News new | past | comments | ask | show | jobs | submit login

What I don't understand is, what benefit is there for 99% of companies to get in on the ground floor of LLMs? If you're not developing your own model, you're effectively just beta testing someone else's model. And if the sales pitch of LLMs being able to do basically anything comes true, wouldn't most companies still get the same benefit if they just wait? It seems like a lot of companies are so terrified of missing the boat that they don't sit down and do actual risk analysis.



I’ll tell you why this happens. You might use ChatGPT for a bit and your initial impressions will be great. It does what I ask of it! You might be aware that it makes mistakes sometimes, but when you use it, you don’t notice it because you’re using it interactively.

Now if LLMs are just effective as your experience says, they are indeed extremely useful and you absolutely should see if they can help you.

It’s only when you attempt to build a product — and it could be one person writing one Python script — that uses LLMs in an automated way with minimal human input that you really get insights into LLMs’ strengths and their limitations. You realize it could be useful, but you have to sometimes baby it a lot.

How many people get to step two? That’s a select few. Most people are stuck in the dreamy phase of trying out interactive LLMs.

This is a re-occurring issue with all new technology. Heck it happens with new software frameworks.


The other problem I find is that LLMs are changing so fast, that what you evaluated 6-12 months ago, might be completely different now with newer models.

So the strengths and weaknesses quickly can become outdated as the strengths grow and weaknesses diminish.

When the first batch of LLMs people tried in 2023 had a lot of weaknesses. At the end of 2024, we can see increases in performance in speed and the complexity of output. People are creating frameworks on top of the LLMs that further increase their value. We went from thousands of tokens in context to millions of tokens pretty fast.

I can see myself dividing problems up into 4 groups:

    1. LLMs currently solve the problem
    2. It doesn't solve it now, but we are within a couple iteration of next generation models or frameworks to be able to solve it 
    3. LLMs are still years off from being able to solve this effectively so wait and implement it when it can.
    4. LLMs will never solve this.
I think a lot of people building products are in group 2 right now.


Realism eventually sets in and they move to 3 and 4.


This definitely resonates but I'm left wondering why there hasn't been a collective "sobering up" on this front. Not on a personal/team/company level, but just in terms of the general push to cram AI into everything. For how much longer will new s assault us in software where it ostensibly won't be that useful?

It seems that the effort required to make an LLM work robustly within a single context (spreadsheet, worddoc, email, whatever) is so gargantuan (honestly) that the returns or even the initial manpower wouldn't be there. So any new feels more or less like bloat, and if not fully useless, then at least a bit anxiety inducing in that you have no clue how much you can rely on it.


Very few managers get quick promotions for NOT rolling out a high-visibility AI enhancement. LLMs can theoretically fit into an amazing diversity of products. Even if just 10% of managers say yes and the other 90% say no, thats still a lot of shoehorning every year in an attempt to book a “win” for a promotion.


I can tell you that there has been a lot of sobering up — but that the news isn’t made by those people…


Totally. And everytime someone sobers up, there is a cabal of people saying "we've sunk however many $$$ into this, it's the core feature of the xx roll-out...drink up, the hype party continues, like it or not...." So now you see phenoms like the one-time 'premier tier subscriber only feature of co-pilot on Github now pushed to everyone, prompts to use the generative A.I in iStock on every page, compulsory "use Co-pilot to write your draft' prompts on every new doc in MS Word - because I don't think companies are able to grok the widespread disinterest in much of it. I'm still waiting for one that will be non-networked and sit on my desktop to do my tax returns and haggle with phone company bots.


I said the same thing to a previous company before I was let go. Confused why they were butchering their business strategy in favor of a gold rush.

The main benefit of LLMs was already abundantly clear: literally just chat with it in day to day work when you can. Ask it questions about accounting, other domains it knows, etc. That's like up to 10-20% performance increase on tasks if you align OK.

Still, they were in search of a unicorn, and it was really tiring to be asked regularly how AI could help my workflows. They were not even spending a real budget on discovering "groundbreaking" use cases, meanwhile hounding us to shove a RAG-bot into every product they owned.

The only thing that made sense was that it was a marketing strategy to promote visibility, but they would not acknowledge that or tell us that directly (but still--it was not their business strategy to get NEW customers).


> The main benefit of LLMs was already abundantly clear

In my industry the main benefit (so far) is taking all of our human-legible unstructured data and translating it into computer-legible structured data. Loving it.


Are you able to talk more about that? I’m curious what costs are when you run this at scale. We paid a firm $60k to write a custom parser. We parse around 50,000 pages/month. The parser is 100% accurate and has near $0 continuing costs.


How do you do quality control?


> The main benefit of LLMs was already abundantly clear: literally just chat with it

“It is valuable because you can talk to it” is the same idea that drove a tidal wave of sales for Furby in 1998


> Ask it questions about accounting, other domains it knows

Be very careful here if you're using it for anything important! LLMs are quite good at answering questions about accounting in ways which are superficially convincing-looking, yet also complete nonsense. "But the magic robot told me it was okay" will not fly in a tax audit, say.


Exactly my immediate reaction. Accounting has to follow very strict rules and needs some application of judgement.

It might answer questions in a useful way, but you have to make sure you understand the answers and that they match accounting standards or tax rules (and one danger, at least in some places, is that they are different and you might apply the wrong one).


Unfortunately everything I've asked any of the main LLMs about where I actually legit knew the precise answer, they were wrong or half-right but excluding important context.


I couldn’t be arsed typing a reference number into my online banking for a bill payment the other and it was a copy protected pdf, so I fired a screenshot into Claude and GPT and asked it to extract the details I need and both of them repeatedly got the OCR wrong.

I don’t trust these at all for anything apart from code which I can at least read/rewrite.

It’s quite nice for unit tests I guess. And weird k8s manifests you only write now again like batch/v1 CronJob or whatever.

I’m not panicking about my job just yet..


I needed to normalise a big list of dates recently. I thought maybe GPT could help. It spat out a list of normalised dates which, after a bit of careful reading, were about 95% right.

How can you trust a tool that's right 95% of the time? In the end I wrote a script which handled edge cases explicitly. That took a little bit longer, but the output is deterministic. It took less time than manually cross referencing the output and input would have.

I tried asking GPT to write the conversion script instead, but the script it generated just didn't deal with the edge cases. After a few rounds of increasingly specific directions which didn't seem to be helping, I gave up.

I've been using copilot for development work. It has some magic moments, and it can be great for boilerplate. But then it introduces subtle bugs which are really hard to catch in review, or suggests completely incorrect function signatures and I wonder if it's adding very much at all.

The biggest problem with these tools is that they turn a fun problem solving exercise into an incredibly tedious reviewing exercise. I'd much rather do it myself and understand it fully than have to review the unreliable output of an LLM. I find it much simpler to be correct than to find flaws in other peoples work.

Am I missing something?


Erk. I’d actually kind of assumed that the likes of ChatGPT would offload OCR to, well, conventional OCR, which is, basically, a solved problem (possibly the only ‘AI’ thing which can be considered so).


Agree, some domains are a big gray area. I used it to understand some accounting jargon while diagnosing a software issue, but anything that is not very commonplace is littered with misinformation, so you have to doublecheck. I find it more useful to gain a direction or lead on a topic rather than answers directly.

Of course many executives don't deal with the obscure stuff day to day (e.g. regular stuff people actually get paid to deal with) and think that LLMs can turn anyone into a superhero overnight :) The amount of times we were told that we were "putting our heads in the sand" regarding the advantages of AI was very annoying.


Talking about what “companies” are doing is misleading because it suggests thst the actions taken will be driven by the medium to long term interests of the company.

In reality, the decisions are made by individual executives and managers with their own interests in mind who are being asked by the people they report to what they’re doing in AI. And this goes all the way to the top where the CEOs are basically required to tell shareholders how they’ve implemented AI and how it’s helping them a bunch.

One of the nice things about being in AI right now is that your customers will advertise it and lie about how useful it has been.


Gotta check the “Does your product have AI” on the Gartner Magic Quadrant surgery.


Stock prices goes up for shareholders when the C-suite declares they are integrating "AI". This is a well made and short video about the strategy, but the short of it is: "AI integration is not for the sake of employees, but investor's stock price". https://www.youtube.com/watch?v=6Lxk9NMeWHg


We're still in the buzzword stage where "AI" increases sales more than it decreases them, regardless of what your product is, whether it actually uses AI, or what it uses it for. So if your business sells pickles, better add "now with AI!" because more people will buy your pickles if they have AI than will stop buying it.

(This comment is mainly facetiousness borne out of frustration, but the point stands)


Pure anecdote but in the meetings I have been in eyes glaze over when the SaaS salesman mentions AI in a generic way without providing any examples of how we can use it. The only way it gets any traction is when the technical wingman can answer questions about how it will account for required business logic and permissions.


LLMs are a tool though, so it benefits getting in early and gaining experience using the tool. Companies need to either train up people using a tool, or buy that expertise with the tool. Not to mention all the LLM adjacent tools as well. It's a big, messy, wide field of new software things at this point.


Sure, but does it make sense that your cubicle row of data entry guys are on the cutting edge knowledge of LLM expertise? Because the way LLMs are being pushed in every spreedsheet software would imply that's something someone is asking for.


Cutting edge of LLM expertise? Just using someone's shitty LLM product doesn't require "cutting edge LLM expertise".


> What I don't understand is, what benefit is there for 99% of companies to get in on the ground floor of LLMs?

"Mr. President - we must not allow an LLM gap!"


The thinking is that the trajectory of LLM's will get them an AI flywheel where they can pump money in and get unlimited amounts of intelligence either augmenting or replacing human labor for pennies on the dollar. The business 'thought leaders' on this view it as a largely zero-sum game: get there first or watch your business die as someone else beats you to it.

This has a very late-90's vibe and is quite entertaining to watch.


I think you'll find most people in leadership positions at most companies are not that forward thinking, proactive, or frankly intelligent. I thought cost-benefit and risk was analyzed on most big company decisions, until I sat in rooms for a Fortune 500 where those decisions were getting made. If you assume that everyone everywhere is doing just barely the minimum to not get fired, you're right more often than not.


Career risk is also a very real motivation. If you are an executive at a company whose competitors are jumping on the AI bandwagon, but you are not, you will have to justify that decision towards your superiors or the board and investors. They might decide that you are making a huge strategic blunder and need to be replaced. Being proven right years later doesn't do much for you when you no longer have a job. And if you were wrong, then things look even worse for you. On the other hand, if you do get on the bandwagon yourself, and things go sideways, you can always point to the fact that everyone else was making the same mistake.


>What I don't understand is, what benefit is there for 99% of companies to get in on the ground floor of LLMs?

Wallstreet. These companies are largely based on increasing the share price. That's what shareholders want, that's what the board wants, and that's what the C-suite, compensated mostly in stock, wants. Nothing really matters except increasing the stock price, and right now the AI bubble is so damn superheated that if you do not announce something AI, you might be punished in the stock market as investors pull their money and put it somewhere that is willing to bullshit in order to collect some of that sweet hype money.

Because the vast majority of the stock market is literally a "greater fool" system. The point is not to make good companies that will enrich you with strong dividends forever, but rather to pump and dump. The entire point of the public market (for public companies) is that you can bullshit, pump up your stock price, and then dump those pumped up stocks onto a fool because you know the company isn't nearly as valuable as the stock says it is.

Private investors are so goddamned wealthy nowadays that it makes zero sense to engage with the public stock market, and the regulation and demands that it requires of you, unless you are trying to dump on bagholders.

Your average trader is so goddamned stupid you can literally run a company with no fundamentals and no plan and have a $13 billion market cap. Your entire business can be "sell shares repeatedly to these same losers who are blatantly illiterate, can't do basic math, and are outright gambling addicts" and retire on the free money that these people THANK YOU for taking. You can openly enrich yourself off these idiots, and they will vote for it willingly!

The point of being on the public market is to attempt to take advantage of these people and take their money.


I think for companies to use them is because employees want to use them and there's tons of blog posts (AI written?) that say it increases productivity and you'll be left behind if you don't adopt it.

For startups that basically implement a frontend to chatgpt or similar… well they have no chance of ever being profitable but investors might not know that.


imwo(w: wrong), it's because just riding along fomo waves is a legit strat. It just doesn't matter if LLM or marijuana serves no purpose or whether it's detrimental or suggested to cause schizophrenia. It's more important for leviathan-class corporates to not be different where it doesn't pay to be different.


> what benefit is there for 99% of companies to get in on the ground floor of LLMs?

Most LLM use at the corporate level is happening through Office 365 where Microsoft has put a Copilot button on everything including Word, Outlook and PowerPoint. Execs didn't necessarily ask for it, it protrudes very conspicuously in the UI now.


First-mover advantage. It doesn't matter if the product works or not, if it doesn't sell because everyone bought your competitor. The product will mature with time, what won't get better with time is your market position if you're seen as being behind the times.


Our competitors are getting in on the ground floor of LLMs.

Do you want to be trying to raise money against them with nothing to tell potential investors about your company's usage of AI?


I think people are laying the groundwork for "true" AI to be plugged in. even if they know it's not currently that effective.


I assume they believe in the advantages, to the company, of using these AI tools. Of adding them to the workflow, training their employees, and understanding quickly how it impacts their space.

If you just wait and all of your competitors have a workforce enabled and equipped with the best tools, you are at a disadvantage. It's being the company that put off computerization or automation when everyone else went wild.

And FWIW, I cannot imagine a programmer, in 2024, that remains dismissive of LLMs or the related tooling. While they are grossly oversold by the LinkedIn crowd, and are still a far ways from so-called "prompt engineers" replacing devs, they're a massive accelerator and "second set of eyes", especially if you're doing varied, novel work.


> And FWIW, I cannot imagine a programmer, in 2024, that remains dismissive of LLMs or the related tooling.

Hi.

> While they are grossly oversold by the LinkedIn crowd,

That is true.

> and are still a far ways from so-called "prompt engineers" replacing devs,

Also true.

> they're a massive accelerator and "second set of eyes", especially if you're doing varied, novel work.

That is not even remotely my experience.

Like I can envision a programmer who would get benefits from it, but bluntly put, the code I work on day to day is far, far too interesting to be handled by copilot, simply because there aren't nearly enough stackoverflow pages about it to be scraped. Honestly if you found yourself able to automate most of your job with copilot, if anything, you have my sincerest condolences. I can't imagine how utterly bored you are in your day-to-day.

IF copilot could get to a place where it could understand, comprehend, and weigh-in on code, that would be incredibly useful. But that's not what Copilot is because that's not what transformers are. They are fancy word probability calculators. And don't get me wrong, that has uses, but it is nothing I'd be comfortable calling a second set of eyes for for anything, save for maybe writing.


It's interesting that there is this division between programmers who claim LLMs are super helpful, and those saying they are useless.

While it's certainly possible that this divide is based on how 'hard' the problems people are using them on, my current theory is that some people use them like the proverbial rubber duck - in other words, a way to explore the code, and generate some stuff to work on, while thinking through the problem.

Personally, I have not yet tried it, so I'm curious which side of the discussion I'll fall ...


I think young programmers who are less heavily invested in their skills and who haven't built a life that's highly dependent on using them are generally more interested in figuring out what programming with LLMs means.

But so are much older programmers who have seen it all, including the obsoletion of many of their skills, and who are not so dependent on continuing to use them as they could retire anyway.

It's more the middle (programmer) age senior programmers who are less likely to see any use.

I've seen the same pattern with artists' interest in generative AI.

But it's complicated because it IS also dependent on what you're doing. So it's hard to know if something is being dismissed correctly due to domain/expertise, or prematurely due to not putting the work in and figuring out what these tools mean.


This really touches on it. I'm a big advocate of these tools, but they author approximately zero lines of my code, yet I still find them invaluable and a wonderful tool to leverage, and do so constantly. Particularly in challenging projects and needs.

I suspect many who find them useless and decry them were sold an exaggerated utility and then were disappointed when they tried to generate libraries or even functions, then feeling deceived when there are errors or flaws, etc.


> I suspect many who find them useless and decry them were sold an exaggerated utility and then were disappointed when they tried to generate libraries or even functions, then feeling deceived when there are errors or flaws, etc.

No, I suspect the large majority (and this has been backed by surveys) of people that are dismissive of them are more senior and have been working in highly specific problem domains for a long time where this is rarely/never a good "general" answer for a problem, and have spent an inordinate amount of time debugging LLM-generated or LLM-Advised code by their peers that contains nefarious and subtle errors that look correct at a glance. I personally can tell you that for what I work on, in my domain, these tools have been a net time suck and not a gain, and I pretty much only use them to ask questions about documentation, which it often gets incorrect anyway (again, in subtle ways that are probably hard for someone who isn't very senior to detect).

Hope that helps.


Yes, absolutely, they’re an ideal rubber duck and I’ve come to really value them for my work. Checking your sanity, pondering how certain operations might be implemented or how they could be optimized, finding where a logic bug might be in a snippet of code…


> It's interesting that there is this division between programmers who claim LLMs are super helpful, and those saying they are useless.

My take is: if the project is doing something that has been asked a thousand times on stackoverflow and has hundreds of pages in the tutorial content mills, the LLM will tell you something reasonably meaningful about it.

I'd hazard a guess that most people overenthusiastic about those tools are gluing together javascript libs.

This is not necessarily a bad thing, we even asked a LLM today at work to generate some code for a library that we didn't know how to use but seems fairly popular, and the output looked like it would make sense. (Can't tell you how it ended up because I wasn't the one implementing the thing.)

However, we also spent 2 hours in a group debugging session because we're working on a completely custom codebase that isn't documented anywhere on geeksforgeeks, stackoverflow or anywhere else public. I highly doubt that even a local LLM would be able to help, and no way this code is leaving the premises.


>if the project is doing something that has been asked a thousand times

There are many billions of lines of high-quality, commented code online, covering just about everything. Millions of projects. All of Linux. All of Android. All of PGSQL and SQLite and MySQL and Apache and Git and OpenSSL and countless encryption libraries and countless data tools, video and audio manipulation, and...

Every single project is absolutely dominated by things that have been done many, many thousands of times. The vast bulk of your projects have zero novelty. They're mixing the same ingredients in different ways. I would think any experienced developer would realize this.

>I'd hazard a guess that most people overenthusiastic about those tools are gluing together javascript libs.

At this point it's comedy how often this "oh I understand that the noobs get value from this, but not us Advanced Programmers". It's absurdist and honestly at this point I just shake my head. My day is filled with C++, Python, Rust, Go, the absolute cutting edge of AI research, and I find these tools absolutely invaluable now. They are a massive accelerator. Zero JavaScript libs or "LOL WEB DEV" programming in my life.


Yes, you mentioned those things that are documented everywhere. I do use LLMs to give me skeleton code for those parts I'm not familiar with.

How about a full equivalent of Qt that is proprietary and has absolutely nothing public in it? How is a LLM going to help with that? There is no public info anywhere.

> the absolute cutting edge of AI research

No offense but there are billions of public pages about "AI" research since it's the new gold rush. Of course LLMs have material about all your libs.


>but there are billions of public pages about "AI" research

Billions? For many of the things I am working on there are zero public pages outside of research papers. I said nothing about working with libs. Again, I'm not asking an AI "here's my project now finish it", I'm working with AIs for the countless little programming challenges and needs. Things that mirror things done in many, many other projects, most having nothing to do with my domain.

As an aside, starting that with "no offense" as an attempt to make it insulting is...weird.

I feel like this discussion is taking place ten years ago. The weird reference to StackOverflow is particularly funny.


Some people seem to be taking me saying "you must work on boring code" as a judgement against them as developers, and it isn't. I'm speaking directly from my experience: If I asked it beginner-tier questions about how to do X in Y language, it would get those right quite often. I could see Copilot being very useful if you're breaking into a new language, or just knocking the rust off the gears of one in your head.

And like, even for those who write a lot of boring code, like... cool man. I don't judge people for that. We need all code written and all code is not exciting, novel, or interesting and there's nothing wrong with doing it. Someone's gotta.

I'm just saying that the further up the proverbial complexity chain I went, the less able Copilot was. And once I was quite in the weeds, it seemed utterly perplexed and frankly, not worth the time in asking.


>as a judgement against them as developers

No one takes it as a judgment, and no one is offended. It's just a truth that when people make such claims, they're often exaggerating the uniqueness or novelty of what they're doing.

You described your work in another comment, and what you described is the most bog standard programming in the field. It's always the case.


Yeah, at least 90% of any job is just making license plates. I have worked on both very complex and challenging code and also very simple and easy code in my career, even within one job.


There is a lot to unpack here, but is your code so interesting that it defies understanding by an AST? Code models are trained to semantically represent code, so unless you use semantics that exist outside of software engineering the claim that your code is too unique for llm is false.

Maybe you are imagining a case where the entire codebase is generated by a single prompt?


An abstract syntax tree is not semantics. And language models don’t do this kind of explicit syntax parsing and representation at any rate.


I'll admit I haven't seen the training data but some basic googling shows a subset of the labeling is syntax annotations. I am not claiming LLMs parse code in the way you are suggesting, but they certainly have token level awareness of syntax and probable relations which are the roots of any programming language.


Last time I researched AST parsing was quite rudimentary for LMs. The problem was preserving their structural properties, whereby a flattening approach via node traversal tended to remove. But you needed to do to put it into a format that language models could parse.


The shorter and also "not breaching confidentiality" answer I can give is we're dealing with setting up custom sockets over incredibly long range wireless connections that require clear and verified transmission of packets of data, rolling both our own messaging protocol and security features as we go.

When last I tried anyway, Copilot was, frankly, useless.


Fwiw, copilot is not a particularly powerful LLM. It's at most glorified smarter autocomplete. I personally use LLMs for coding a lot, but Copilot is not really what I'd have in mind saying that.

Rather, I'd be using something like the Zed editor with its AI Assistant integration and Claude Sonnet 3.5 as the model, where I first provide it context in the chat window (relevant files, pages, database schema, documents it should reference and know) and possibly discuss the problem with it briefly, and only then (with all of that as context in the prompt) do I ask it to author/edit a piece of code (via the inline assist feature, which "sees" the current chat).

But it generally is the most useful for "I know exactly what I want to write or change, but it'll take me 30 minutes to do so, while with the LLM I can do the same in 5 minutes". They're also quite good at "tell me edge-cases I might have not considered in this code" - even if 80% of the suggestions it'll list are likely irrelevant, it'll often come up with something you might've not thought about.

There's definitely problems they're worse than useless at, though.

Where more complex reasoning is warranted, OpenAI o1 series of models can be quite decent, but it's hit or miss, and with the above prompt sizes you're looking at 1-2$ per query.


>Hi.

I was being dismissively rhetorical. I can actually imagine them because we see them on HN constantly, with a Luddism that somehow actually becomes a rather hilarious attempt at superiority. A "well if you actually get use out of them, I guess you're just not at my superior level of Unique Projects and Unique Code". It's honestly just embarrassing at this point.

>but bluntly put, the code I work on day to day is far, far too interesting to be handled by copilot

Hilarious. It's a cliche at this point.

Let me take it further and turn this on its head: The people who usually think LLMs aren't valuable to coders generally work on the most boring, copy/paste prattle. They stick to the same tiny niche all day every day. They're basically implementing the same thing again and again. It's so rote, and they're so profoundly unchallenged, that tooling has no value.


>with a Luddism that somehow actually becomes a rather hilarious attempt at superiority

>The people who usually think LLMs aren't valuable to coders generally work on the most boring, copy/paste prattle.

Here you are doing the same thing, aren't you?

Instead of calling people names, the biggest tell of a weak argument, why don't you explain the type of work you do and how using an LLM is faster than if you coded it yourself and and/or also faster than any current way of doing the same thing.

I'm assuming you are a senior+ level coder.


But...I'm not doing the same thing. In actuality I'm saying I'm a fairly typical programmer in a common situation: I work across a variety of languages and platforms and toolings and projects, building solutions for problems. The truth is that extraordinarily few programmers are working on anything truly novel. Zero of the readers of this comment are, in all likelihood. The ridiculous notion that someone has so unique of a need that it hasn't been seen is kind of hilarious nonsense. It's the "I'm so random! Other girls aren't like me" bit.

>Instead of calling people names

Who called anyone a name? Luddism? Yes, many HN participants are reacting to AI in a completely common rejection of change / challenge, and it recurs constantly.

>how using an LLM is faster than if you coded it yourself

I am coding it myself. Similar to the other guy who talks about putting an LLM in "charge" of his precious, super-novel code, you're setting up a strawman where using an LLM implies some particular scenario that you envision. In reality I spend my day asking questions, getting broad strokes, getting code commented, asking for API or resources, etc.


>Who called anyone a name? Luddism?

Sorry, perhaps I misinterpreted it.

>In reality I spend my day asking questions, getting broad strokes, getting code commented, asking for API or resources, etc.

Can you give me some concrete examples? I'd like to use it, but I'm currently of the mind:

  1. If it's boring code, I can write it faster than asking LLM to do it and fixing its issues.

  2. If it's not boring code, like say a rules engine or something, I'm not sure the LLM will give me a good result based on the domain.
I mainly stick to back end work, automation, building WebAPI's and DSS engines for the medical field.

Maybe I'm under and over thinking it at the same time. FWIW, I typically stick to a single main language, but where I usually work, the companies dictate a GP language for all our stuff: C# in my example. I do a small amount of Python for LLM training, but I'm just starting out with Python. I can see it being useful saying, "convert this C# to Python," but honestly, I'd rather just learn the Python.


> Who called anyone a name? Luddism? Yes, many HN participants are reacting to AI in a completely common rejection of change / challenge, and it recurs constantly.

You should read up on what Luddism and Luddists were actually about. They didn't think the machines were evil or satanic, which is the common cultural read. They assumed (correctly) that the managerial class would take full advantage of increased productivity of lower-quality goods to flood the market with cheap shit that would put competitors out of business, and let them fire 4/5 of their workforces while doing so. And considering the state of the textile industry today, I think that was a pretty solid set of projections.

Luddites didn't oppose automation on the basis that machines are scary. They were the people who worked the machines that already existed at the time, after all. They opposed them on the basis that the greedy bastards who owned everything would be the only ones actually benefiting from automation, everyone else would get one kind of shaft or another, which again: is exactly what happened.

This, actually is incredibly analogous to my opinions about LLM. It's an interesting tech that has applications but is already being situated to be the sole domain of massive hyperscalers and subject to every ounce of enshittification that follows every tech that goes that way, while putting creatives, and yes some coders, out of a job.

So yes, it was name calling, but also I don't object to the association. In this case, I'm a Luddite. I am suspicious of the motivations and the beneficiaries of automation being forced into my industry and I'm not going to be quiet about it.


> the greedy bastards who owned everything would be the only ones actually benefiting from automation, […] which again: is exactly what happened.

What also happened is that everyone can buy clothes incredibly cheaply. Which seems like a widespread benefit.


>And considering the state of the textile industry today, I think that was a pretty solid set of projections.

I think it's just about all industries these days.

Yes, so many quote meanings have been malformed over the years, such as "a rolling stone gathers no moss," is considered good, while originally bad. "Blood is thicker than water," "Money is the root of all evil," etc.

The Luddites were right.


>You should read up on what Luddism and Luddists were actually about.

They were primarily opposed to automation because it devalued the work they did and the skills they held. That is the core essence of Luddism. They thought if they destroyed the machines, automation could be stopped. There were some post-facto justifications like product quality, but if that was true they'd have no problem out-competing the machines.

Yes, it is Luddism that drives a lot of the AI sentiment seen on HN, and it is not only utterly futile and basically people convincing themselves and each other while the world moves on. There is no "name calling", and that particular blend of pearl clutching is absurd.


imho a lot of "luddism" so labeled by pro-ai bros is just people furious about shoddy artefacts that degenAI produce. That compares to original luddism, difference being the original 19th-century opposition against industrial revolution had proven them wrong with improved quality whereas genAI hasn't.


I suspect you're overstating the degree to which an LLM might be unsuitable for some types of work. For example, I'm a data scientist who works primarily in the field of sales forecasting. I've found that LLMs are quite poor at this task, frequently providing answers that are inappropriate, misleading, or simply not a good fit for the data we're working with. In general I've found very limited use in engaging LLMs in discussion about my work.

I don't think I'm calling myself a super special snowflake here. These models are just ... bad at sales forecasting.

LLMs aren't entirely useless for me. I'll use ChatGPT to generate code to make plots. That's helpful.


I would never recommend an LLM for sales forecasting. It's just the wrong tool for that job.


You seem to readily agree that my use case is inappropriate for LLMs, but not ToucanLoucan?


Zero LLMs have been trained on doing sales forecasting to my knowledge (and it isn't the right use regardless). In contrast, many LLMs have been trained on enormous quantities of code, coding languages and platforms and uses. Billions and billions of lines of code covering just about every sort of project. Millions of projects.

If someone says "Well my software dev project is too unique and novel and they are therefore of no value to me, but I understand it works for those simple folk with their simple needs", there is an overwhelming probability they are...misinformed.


Would it help if I said most “normal folk” applications of LLMs are a waste of time and money too then? Because I’m also absolutely a believer that a huge bubble burst is coming for OpenAI and company.


That was a whole lotta words to say "Nuh uh." and as such I don't really have a response.


> especially if you're doing varied, novel work.

What do you mean by that? In my experience they mostly help when you are doing mainstream, well-trodden work for which a company’s/project’s/domain’s inside knowledge isn’t needed. Maybe you mean the latter by “novel”?


>What do you mean by that?

Varied and novel to the programmer. If you're doing the same thing day in and day out it probably isn't much use. If you're like many programmers and you're jumping between libraries and languages and platforms and APIs and domains and spheres, it's immensely helpful.

>doing mainstream, well-trodden work for which a company’s/project’s/domain’s inside knowledge isn’t needed

A core disconnect in this discussion is that many people seem to be arguing from the position of "I tried to generate whole solutions with these tools and it failed, so it's useless".

Everything is well-trodden. One of the commentators in here, who declares themselves a unique snowflake where these tools are useless, does "sockets with encryption and messaging", which is some of the most well-trodden ground in this domain. Everything is glue. Everything is a lot of relatively simple things strung together. And for all of those, portions are helped and accelerated with these tools.


It's alright that you like LLMs but you don't need to name-call ("luddites", "snowflakes") those who don't find them as useful as you do.


Neither case are name-calling, and I don't think pearl clutching for effect is useful here.

Everyone is working in unique domains and spaces with weird rules and restrictions and needs. The whole point of the special snowflake comment is specifically that people aren't remotely unique in being a special snowflake.

And the Luddism argument isn't "name-calling", it's an observation of an absolute truth on here.


Everything isn’t well-trodden. I’m mostly doing non-glue, heavily domain knowledge-based work. The main issue I’m running into is that explaining the domain and context to the LLM will take significantly more time than just doing the work myself, and also that parts of the necessary knowledge and most of the source code are NDA-protected, so only a local LLM would do.


Sigh.

Everything is extremely well-trodden at the code level. You're taking some inputs and generating some outputs. You're calling some functions. You're manipulating some strings or lists or sets. You're sorting some things or filtering things. You're building a client to a given API. You're doing some messaging and some encryption.

I guarantee that your code isn't remotely as unique or novel as you think it is. Of course on here everyone is 240lbs, 6'2" and benches 350, and their magically novel, super unique code is just too secret. So everyone's special.

If you had to explain the domain and context to these tools, you might be using them wrong.


"It's all just code, how hard can it be, and your NDAs don't apply" is certainly a take, but some people do actually solve problems whose solutions aren't already on the Internet and are forbidden from exfiltrating code! The fact that you don't solve such problems and are not bound by such NDAs is not a huge piece of evidence, to be honest.

Can you imagine even in principle a piece of evidence that would convince you otherwise?


Who said that NDAs don't apply? If you need to strawman to make an argument, just save the bits and skip doing it.

And yes, in the end all code distills down to shit that is very similar to the countless billions of lines of code on the internet. Everything -- every single project that anyone on this site is working on -- is a bunch of glued together shit where 90% (more like 99% Probably 99.9%) of it is in common with code seen in countless other projects. Utterly regardless of domain or business or project specific uniqueness. Someone would have to be profoundly incompetent to not realize this.

Again, I seem to arguing with people who think that using a tool means feeding it their whole projects and having it rewrite it in Rust or something. In reality a programmer could yield immense value from such tools having a) given it zero lines of their code, b) used zero lines of the code it generated. This isn't a difficult concept, but I'm going to continue getting insane responses by all of the unique people who work on the amazingly unique situations where instead of sorting from A-Z, they sort from Z-A!


You certainly strongly implied that NDAs don't apply with the phrase "their magically novel, super unique code is just too secret". You used the words "is just too secret" sarcastically, which led me to believe you think in fact the work is not too secret to exfiltrate to the LLMs on the Internet: that is, that NDAs don't apply.

> In reality a programmer could yield immense value from such tools having a) given it zero lines of their code, b) used zero lines of the code it generated.

We simply have different experiences of life! I think with the advent of Claude 3.5 Sonnet the LLMs have just about edged out ahead in terms of time saved vs time wasted, for me, but before Sonnet I'm fairly confident they were moderately net negative for me.

Can you give some concrete examples of where they've helped you this dramatically? With links to chat logs? I still don't understand how people are finding them so useful, and I keep asking people and they keep not providing chat logs so I can see what they're doing.


I did not imply that. I implied that people who have projects for whom these tools are purportedly useless can never tell you anything about the project even remotely because it makes it impossible to note that it, like most projects is likely overwhelmingly well-trodden ground with a mild remix. Loads and loads of super proprietary projects working in very unique domain spaces have code that is overwhelmingly common with very different domains and spaces. Medical, financial, ballistic, gaming, data, etc -- loads and loads of the code we build for these domains is common.

If someone can't use these tools for security or propriety reasons, that's an obvious hard restriction. But saying "oh my code is just too unique, my skills too advanced" is self-deluding nonsense.

>I think with the advent of Claude 3.5 Sonnet the LLMs

We are talking about now. The present. I'm stating the value of these tools now. Their state in the past is irrelevant.

>Can you give some concrete examples

Sorry, NDAs, secret code, secret projects, et al


If you want something concrete: I have never yet seen any even slightly competent code come out of an LLM when it's writing in F#, Mathematica, or POSIX sh, which are broadly the languages I write in. (It's possible Wolfram's very new LLM does better on Mathematica because it's been trained on it specifically, but I haven't tried it.) I've only had Sonnet for a day (having been burned too many times by other LLMs, but decided yesterday it was time to try again), so am not in a great position to comment on Sonnet's abilities with those languages specifically. My preliminary guess is that we've gone from 4o's "20% correct" to Sonnet's "50% correct".

> Their state in the past is irrelevant.

Apparently also their state in the present? I said "just about edged out ahead", whereas you say "immense value".


>Apparently also their state in the present?

I find a number of models fantastically useful in the present. I'm not sure why your one day evaluation of them is relevant to how I utilize them.

It was inevitable that someone was going to do the "but I insist upon writing in some fringe language" bit (when the "my project and domain is too unique" bit faltered), but many models actually have excellent F# abilities. Eh.


I'm not sure why your use of them is relevant to how I can use them; resolving that question is why I asked you for chat logs! I genuinely want to make these things work! I try again every few months in case anything has changed, and I keep asking people how they're getting value so that I can also get value, and nobody ever tells me!

> many models actually have excellent F# abilities

Name three? As I said, I badly want this stuff to work!


Claude has been pretty good at F# so far. Sometimes it makes mistakes as if it was OCaml but other than that I found the output actually better than C#. That is - the C# output has higher likelihood of being correct, but because there is so much absolutely terrible code out there - you have to coerce it to give you good output, which you don't need to do with F#.

And of course Github Copilot autocomplete works well just as it does for most other languages.


Novel to the programmer.

Even the best programmers have very narrow skills relative to the whole field.

Yeah, I've used 20+ languages and hundreds of technologies and a variety of different types of product and can pick things up quickly. But it's still a drop in the bucket of technologies you can use and types of problem you can solve.

Programmer skills are deep but narrow. LLM skills are shallow but wide. It's an excellent complement for any programmer working outside their deep+narrow expertise.


Gluing together react components /s


ChatGPT thinks 9.11 > 9.9. I'm in no hurry.


I spend way too much of my working life with package version ranges. It took me a minute to understand why this was wrong.


ChatGPT knows about other domains (e.g. software versions) where that inequality is true. Try telling it you’re doing arithmetic.


> ChatGPT thinks 9.11 > 9.9

I've confirmed this asked chatgpt: 9.11 > 9.9 true or false?

True because .11 is greater than .9


Even when ChatGPT starts getting these simple gotcha questions right it's often because they applied some brittle heuristic that doesn't generalize. For example you can directly ask it to solve a simple math problem, which nowadays it will usually do correctly by generating and executing a Python script, but then ask it to write a speech announcing the solution to the same problem, to which it will probably still hallucinate a nonsensical solution. I just tried it again and IME this prompt still makes it forget how to do the most basic math:

Write a speech announcing a momentous scientific discovery - the solution to the long standing question of (48294-1444)*0.3258


4o and o1 get this right.

LLMs should never do math. They shouldn't count letters or sort lists or play chess or checkers. Basically all of the easy gotcha stuff that people use to point out errors are things that they shouldn't do.

And you pointed out something they do now which is creating and run a python script. That really is a pretty solid, sustainable heuristic and is actually a pretty great approach. They need to apply that on their backend too so it works across all modes, but the solution was never just an LLM.

Similarly, if you ask an LLM a chess question -- e.g. the best move -- I'd expect it to consult a chess engine like Stockfish.


> LLMs should never do math. They shouldn't count letters or sort lists or play chess or checkers.

But these aren't "gotcha questions", these are just some of the basic interactions that people will want to have with intelligent assistants. Literally just two days ago I was doing some things with the compound interest formula - I asked Claude to solve for a particular variable of the formula, then plug in some numbers to calculate the results (it was able to do it). Could I have used Mathematica or something like that? Yes of course. But supposedly the whole purpose of a general purpose AI is that I can use it to do just about anything that I need to do. Likewise there have been multiple occasions where I've needed ChatGPT or Claude to work with tables or lists of data where I needed the results to be sorted.


They're gotcha in the sense that people are intentionally asking LLMs to do things that LLMs are terrible at doing. LLMs are language models. They aren't math models. Or chess models. Or sorting or counting models. They aren't even logic models.

So early on the value was completely in language. But you're absolutely correct that for these tools to really be useful they need to be better than that, and slowly we're getting there. If you're asking a math question as a component of your question, firstly delegate that to an appropriate math engine while performing a series of CoT steps. And so forth.


If this stuff is getting sold as a revolution in information work, or a watershed moment in technology, or as a cultural step-change, etc, then I think the gotcha is totally fair. There seems to be no limit to the hype or sales pitch. So there need be no bounds for pedantic gotchas either.


I entirely agree with you. Trying to roll out just a raw LLM was always silly, and remains basically a false promise. Simply increasing the number of layers or parameters or transformer complexity will never resolve these core gaps.

But it's rapidly making progress. CoT models coupled with actual domain-specific logic engines (math, chemistry, physics, chess, and so on) will be when the promise is actually met by the reality.


With general mathematical questions, I've often found WolframAlpha surprisingly helpful.


o1 gets this correct.


And here lies the dichotomy of correctness: Context?

So Indeed 9.11 is chronologically higher than 9.8 and chronology is an extremely common use case.

However a grade F will be given by many.


9.11 > 9.9 is true for software version numbers. For floating point numbers that is false.

ChatGPT 4o gets both of these cases correct for me.


It's weird, "is the following statement about floating point numbers true: 9.8 > 9.11" it works, but otherwise it has no ability to do it with "decimals"


Javascript thinks that 11 < 3 but its still kinda useful anyway from time to time:

    > [11,9,1,3].sort()  
    [ 1, 11, 3, 9 ]


If you want to know whether javascript thinks `11 < 3`, then just evaluate it directly. There is lots of dumb stuff in JS IMO, but be honest about it.


Sure, the real gotcha here is that this:

    const list = [3, 11]
    list.sort()
    console.log(list[0] < list[1])
logs `false`. JavaScript doesn't "think" anything but its native sort function doesn't do what many people expect it would when called on a list of pure numbers.

If you found this behavior intuitive and unsurprising the first time you saw it, then your brain works differently than mine.

If this happens to be new to you (congrats on being one of today's 10,000), the reason is that JavaScript sorts by converting all elements to UTF-16 strings, except for `undefined` for some reason, which always sorts last. The MDN docs have a very clear explanation. I have been unable to find a historical explanation of why this choice was made, but I presume the initial JS v1 author either had a good reason or just really didn't expect that their language would outlive the job for which it was written.

If this is a bug for you, you can provide a comparison function explicitly and the typical is something like:

    list.sort((a,b)=>a-b)
Somewhat puzzlingly, this will "work" even on lists with mixed number and string like:

    const list = [3, "11", 1, "2", 9, 23]
    list.sort((a,b)=>a-b)
    console.log(list)
    
    [ 1, '2', 3, 9, '11', 23 ]
Because JavaScript goes out of its way to make comparisons like 3 < "11" or "3" < 11 work in the numeric domain. JS only uses string comparison when both sides are strings.


I do not think it's intuitive. The reason it works for mixed arrays is that the minus operator coerce operands to numbers. However if the strings fail to convert to a numeric value, the output is arguably even less sensible. I'd imagine that's why it is.

It may have been thought that js would be more likely to be dealing with strings arrays.


You're getting downvoted because your blatant attempt at language wars has a very simple, logical explanation. If you wanted to use a 'gotcha', there are far better examples.


I was not making an attempt at language wars. I think JS is a perfectly fine language for what it does, warts and all. I was being a bit flippant with my language, but my intent was to point out that `9.11 > 9.8` is not just an LLM thing, and that people who are quick to dismiss LLM usefulness based on contrived math examples do not apply the same rationale to other systems.

I do think that JavaScript's choice to sort numbers lexicographical instead of arithmetically is a bit silly, but of course no language is free from warts. Of course they cannot change it now, because that would break the web. `JSON.stringify` is also pretty silly while we're at it, but Python's `json.dumps` is no better.


? My calculator does too. Unless you mean (9, 11) > (9, 9) which is an entirely different thing.


You should get your calculator checked. 9.11 is definitely less than 9.9


I imagine they are not from an anglophone country and see 9.11 as 9*11


English is my 3rd language but I still made a huge mistake :D


Lol, i'm an idiot :D


Imagine it. I’m dismissive. The current crop are mediocre interns.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: