Things we learned about LLMs in 2024

antirez · 2024-12-31T19:33:11 1735673591

About "people still thinking LLMs are quite useless", I still believe that the problem is that most people are exposed to ChatGPT 4o that at this point for my use case (programming / design partner) is basically a useless toy. And I guess that in tech many folks try LLMs for the same use cases. Try Claude Sonnet 3.5 (not Haiku!) and tell me if, while still flawed, is not helpful.

But there is more: a key thing with LLMs is that their ability to help, as a tool, changes vastly based on your communication ability. The prompt is the king to make those models 10x better than they are with the lazy one-liner question. Drop your files in the context window; ask very precise questions explaining the background. They work great to explore what is at the borders of your knowledge. They are also great at doing boring tasks for which you can provide perfect guidance (but that still would take you hours). The best LLMs (in my case just Claude Sonnet 3.5, I must admit) out there are able to accelerate you.

mvkel · 2024-12-31T21:48:03 1735681683

I'm surprised at the description that it's "useless" as a programming / design partner. Even if it doesn't make "elegant" code (whatever that means), it's the difference between an app existing at all, or not.

I built and shipped a Swift app to the App Store, currently generating $10,200 in MRR, exclusively using LLMs.

I wouldn't describe myself as a programmer, and didn't plan to ever build an app, mostly because in the attempts I made, I'd get stuck and couldn't google my way out.

LLMs are the great un-stickers. For that reason per se, they are incredibly useful.

theptip · 2024-12-31T23:06:44 1735686404

The context here is super-important - the commenter is the author of Redis. So, a super-experienced and productive low-level programmer. It’s not surprising that Staff-plus experts find LLMs much less useful.

Though I’d be interested if this was an opinion on “help me write this gnarly C algorithm” or “help me to be productive in <new language>” as I find a big productivity increase from the latter.

antirez · 2025-01-01T00:27:07 1735691227

Quick example. I was implementing dot product between two quantized vectors that have two different min/max quantization ranges (later I changed the implementation to just centered range quantization, thanks to Claude and what I'm writing in this comment). I wanted to still have the math with the integers and adjust for the ranges at the end. Claude was able to mathematically scompose the operations as multiplication and accumulation of a sum of integers and later adjust the result, using a math trick that I didn't know but was understandable after having seen it. This way I was able to benchmark this implementation understanding that my old centered quantization was not less precise in practice and faster (I can multiply integers without taking the sum, and later fix for the square of the range factor). I'd do it without LLMs but probably I would not try at all because of the time needed.

Other examples: Claude was able multiple times to spot bugs in my C code, when I asked for a code review. All bugs I would eventually find but that it's better to fix ASAP.

Finally sometimes I put relevant papers and implementations and ask for variations of a given algoritm among the paper and the implementations around, to gain insights about what people do in the practice. Then engage in discussions about how to improve it. It is never able to come up with novel ideas but is able to recognize often times when my idea is flawed or if it seems sounding.

All this and more helps me to deliver better code. I can venture in things I otherwise would not do for lack of time.

zkry · 2025-01-01T10:40:36 1735728036

I'm pretty sure most people, developers especially, have had magical, life-changing experiences with LLMs. I think the problem is that they can't cant do these things reliably.

I get this sentiment from a lot of AI startups, that they have a product which can do amazing things, but due to its failure modes makes it almost useless as, to use an analogy from self-driving cars, the users have to still constantly pay attention to the road: you don't get a ride from Baltimore to New York where you can do whatever you please, you get a ride where you're constantly babysitting an autonomous vehicle, bored out of your mind, forced to monitor the road conditions and surrounding vehicles, lest the car make a mistake costing you your life.

To take the analogy farther, after experimenting with not using LLM tools, I feel that the main difference between the two modes of work is similar to driving a car and being driven by an autonomous care: you exert less mental effort, not, you get to your destination faster.

Another point of the analogy are things like Waymo. They really can do a great job of driving autonomously. But, they require a legible system of roads and weather conditions. There are LLM systems too that when given a legible system to work in can do a near perfect job.

olivermuty · 2025-01-01T22:50:48 1735771848

I mean… I agree that LLMs give only superficial value, but your analogy is plain wrong.

I drove 3600 km Norway to Spain in 2018 with only adaptive cruise. Then again in 2023 with autonomous highway driving (the kind where you keep a hand on the wheel for failure mode) and it was amaaaazing how big the difference was.

zkry · 2025-01-02T10:41:02 1735814462

I get how I could be wrong on that front. I guess what I was trying to say was that there needs to be legible, predictable infrastructure for these AI systems to work well. I actually think that an LLM workflow in a constrained, well understood environment would be amazingly good too.

I've been driving a lot in Istanbul lately and I'm not holding my breath for autonomous vehicles any time soon.

beoberha · 2025-01-01T06:35:51 1735713351

LLMs being able to detect bugs in my own code is absolutely mind blowing to me. These things are “just” predicting the next token, but somehow are able to take in code that has never been written before and somehow understand it and find what’s wrong with it.

I think I’m more amazed by them because I know how they work. They shouldn’t be able to do this, but the fact that they can is absolutely jaw dropping science fiction shit.

fennecbutt · 2025-01-02T20:45:11 1735850711

Idk if there is much code that "hasn't been written before".

Sure if you look at new project x then in totality it's a semi unique combination of code, but breaking it down into chunks that involve a couple lines, or a very specific context then it's all been done before.

Jensson · 2025-01-01T09:56:22 1735725382

Its easy to see how it does that, the answer is that your bug isn't something novel, it has seen millions of "where is the bug in this code" questions online so it can typically guess from there what it would be.

It is very unreliable at fixing things or writing code for anything non standard. Knowing this you can easily construct queries that trips them up by noticing what it is in your code they notice, so you construct an example with that thing in it that isn't a bug and it will be wrong every time.

scrollaway · 2025-01-01T12:56:28 1735736188

Both of your claims are way off the mark (I run an AI lab).

The LLMs are good at finding bugs in code not because they’ve been trained on questions that ask for existing bugs, but because they have built a world model in order to complete text more accurately. In this model, programming exists and has rules and the world model has learned that.

Which means that anything nonstandard … will be supported. It is trivial to showcase this: just base64 encode your prompts and see how the LLMs respond. It’s a good test because base64 is easy for LLMs to understand but still severely degrades the quality of reasoning and answers.

HarHarVeryFunny · 2025-01-01T15:19:58 1735744798

The "world model" of an LLM is just the set of [deep] predictive patterns that it was induced to learn during training. There is no magic here - the model is just trying to learn how to auto-regressively predict training set continuations.

Of course the humans who created the training set samples didn't create them auto-regressively - the training set samples are artifacts reflecting an external world, and knowledge about it, that the model is not privy to, but the model is limited to minimizing training errors on the task it was given - auto-regressive prediction. It has no choice. The "world model" (patterns) it has learnt isn't some magical grokking of the external world that it is not privy to - it is just the patterns needed to minimize errors when attempting to auto-regressively predict training set continuations.

Whether these training set predictive patterns result in the model performing as you might hope on an unseen text depends on the similarity of that text to samples in the training set.

Jerrrry · 2025-01-02T01:00:09 1735779609

  >Whether these training set predictive patterns result in the model performing as you might hope on an unseen text depends on the similarity of that text to samples in the training set.

>similarity

yes, except the computer can easily 'see' in more than 3 dimensions with more capability to spot similarities, and can follow lines of prediction (similar to chess) far more than any group of humans can.

that super-human ability to spot similarities and walk latent spaces 'randomly' -yet uncannily - has given rise to emergent phenomena that has mimicked proto-intelligence.

we have no idea what the ideas these tokens have embedded at different layers, and what capabilities can emerge now or at deployment time later, or given a certain prompt.

HarHarVeryFunny · 2025-01-02T12:39:23 1735821563

The inner workings/representations of transformers/LLMs aren't a total black box - there's a lot of work being done (and published) on "mechanistic interpretability", especially by Anthropic.

The intelligence we see in LLMs is to be expected - we're looking in the mirror. They are trained to copy humans, so it's just our own thought patterns and reasoning being output. The LLM is just a "selective mirror" deciding what to output for any given input.

Jerrrry · 2025-01-02T16:20:35 1735834835

Its mirroring the capability (if not currently the executive agency) of being able to convince people to do things. That alone gaps the barrier as social engineering is impossible to patch - harder than full proofing models against being jailbroken/used in an adversarial context.

catalypso · 2025-01-01T19:08:35 1735758515

I just tried it and I'm actually surprised with how well they work even with base64 encoded inputs.

This is assuming they don't call an external pre-processing decoding tool.

simonw · 2025-01-01T19:17:43 1735759063

The LLM UIs that integrate that kind of thing all have visible indicators when it's happening - in ChatGPT you would see it say "Analyzing..." while it ran Python code, and in Claude you would see the same message while it used JavaScript (in your browser) instead.

If you didn't see the "analyzing" message then no external tool was called.

Jensson · 2025-01-01T13:25:27 1735737927

> just base64 encode your prompts and see how the LLMs respond

This is done via translations, LLM are good at translations, being able to translate doesn't mean you understand the subject.

And no I am not wrong here, I've tested this before, for example if you ask if a CPU model is faster than a GPU model it will say the GPU model is faster, even if the CPU is much more modern and faster overall since it learned that GPU names are faster than CPU names it didn't really understood what faster meant there. Exactly what the LLM gets wrong depends on the LLM of course, and the larger it is the more fine grained these things are but in general it doesn't really have much that can be called understanding.

If you don't understand how to break the LLM like this then you don't really understand what the LLM is capable of, so it is something everyone who uses LLM should know.

scrollaway · 2025-01-02T00:35:54 1735778154

That doesn't mean anything. Asking "which is faster" is fact retrieval, which LLMs are bad at unless they've been trained on those specific facts. This is why hallucinations are so prevalent: LLMs learn rules better than they learn facts.

Regardless of how the base64 processing is done (which is really not something you can speculate much on, unless you've specifically researched it -- have you?), my point is that it does degrade the output significantly while still processing things within a reasonable model of the world. Doing this is a rather reliable way of detaching the ability to speak from the ability to reason.

Jerrrry · 2025-01-02T01:07:18 1735780038

Asking characteristics about the result cause performance to drop because it's essentially asking the model to model itself implicitly/explicitly.

Also the more "factoids" / clauses needed to answer accurately are inversely proportional to the "correctness" of the final answer (on average, when prompt-fuzzed).

This is all because the more complicated/entropic the prompt/expected answer, the less total/accumulative attention has been spent on it.

  >What is the second character of the result of the prompt "What is the name of the president of the U.S. during the most fatal terror attack on U.S. soil?"

zmgsabst · 2025-01-01T08:06:05 1735718765

Why shouldn’t they be able to do this?

DNNs implicitly learn a type theory, which they then reason in. Even though the code itself is new, it’s expressible in the learned theory — so the DNN can operate on it.

inciampati · 2025-01-01T08:48:11 1735721291

> They shouldn't be able to do this

Really? ;) I guess you don't believe in the universal approximation theorem?

UAT makes a strong case that by reading all of our text (aka computational traces) the models have learned a human "state transition function" that understands context and can integrate within it to guess the next token. Basically, by transfer learning from us they have learned to behave like universal reasoners.

nashadelic · 2025-01-01T09:00:49 1735722049

I actually get annoyed when experienced folks say this isn't AGI, its next word predict and not human-like intelligence. But we don't know how human intelligence works. Is it also just a matrix of neuron weights? Maybe it ends up looking like humans are also just next-word/thought predictors. Maybe that is what AGI will be.

latexr · 2025-01-01T12:24:54 1735734294

> I actually get annoyed when experienced folks say this isn't AGI, its next word predict and not human-like intelligence. But we don't know how human intelligence works.

I’m pretty sure you’re committing a logical fallacy there. Like someone in antiquity claiming “I get annoyed when experienced folks say thunderstorms aren’t the gods getting angry, it’s nature and physical phenomena. But we don’t know how the weather works”. Your lack of understanding in one area does not give you the authority to make a claim in another.

mewpmewp2 · 2025-01-01T12:00:40 1735732840

This by the common definition isn't AGI yet, not to say it couldn't be. But if it was AGI it would be extremely clear, since it would also be able to control the physical form of itself. It needs robotics and to be able to navigate the world to be able to be AGI.

deterministic · 2025-01-03T08:02:54 1735891374

A human can learn from just a few examples of chairs what a chair is. Machine learning requires way more training than that. So there does seem to be a difference in how human intelligence works.

miki123211 · 2025-01-01T11:04:05 1735729445

A good enough next-word predictor IS AGI.

If there's something that you can prompt with e.g. "here's the proof for Fermat's last theorem" or "here is how you crack Satoshi's private key on a laptop in under an hour" and get a useful response, that's AGI.

Just to be clear, we are nowhere near that point with our current LLMs, and it's possible that we'll never get there, but in principle, if such a thing existed, it would be a next-word predictor while still being AGI.

YeGoblynQueenne · 2025-01-01T02:11:33 1735697493

>> scompose the operations

I wonder whether that is some specialised terminology I'm not familiar with - or it just means to decompose the operations (but with an Italian s- for negation)?

antirez · 2025-01-01T02:36:36 1735698996

Decompose indeed :)

tyre · 2025-01-01T00:27:24 1735691244

antirez has written publicly, only a few weeks ago[0], about their experience working with LLMs. Partial quote:

> And now, at the end of 2024, I’m finally seeing incredible results in the field, things that looked like sci-fi a few years ago are now possible: Claude AI is my reasoning / editor / coding partner lately. I’m able to accomplish a lot more than I was able to do in the past. I often do more work because of AI, but I do better work.

>…

> Basically, AI didn’t replace me, AI accelerated me or improved me with feedback about my work

[0]: https://antirez.com/news/144

chikere232 · 2025-01-01T09:11:29 1735722689

You should worry though if a helpful tool only seems to do a good job in areas you don't know well yourself. It's quite possible that the tool always does a bad job, but you can only tell when you know what a good job looks like.

Ferret7446 · 2025-01-01T09:53:13 1735725193

I think that is more that a staff-plus engineer is going to be doing a lot more management than "actual work", and LLMs don't help much with management yet (until we get viable LLM managers shudder).

LLMs are like a pretty smart but overly confident junior engineer, which is what a senior engineer usually has to work with anyway.

An expert actually benefits more from LLMs because they know when they get an answer back that is wrong so they can edit the prompt to maybe get a better answer back. They also have a generally better idea of what to ask. A novice is likely to get back convincing but incorrect answers.

dicytea · 2025-01-01T09:48:03 1735724883

I don't understand, you're replying in a thread where that very - super-experienced and productive low-level programmer - is talking about how he finds LLMs useful.

JambalayaJimbo · 2024-12-31T23:17:37 1735687057

Why would the author of Redis describe himself as “not a programmer”? That’s a little odd.

harrisi · 2024-12-31T23:22:19 1735687339

They didn't.

EDIT: antirez is the creator of redis, not mvkel.

duggan · 2024-12-31T23:39:48 1735688388

antirez is clearly going to be “Staff-plus” for almost any definition.

Can you clarify what you mean?

cj · 2025-01-01T01:43:26 1735695806

(Not original commenter) “Staff” engineer is typically one of the most senior and highest paid engineer titles in very large tech company. “Staff plus” is implying they are the best of the best.

sarchertech · 2025-01-01T02:11:30 1735697490

Staff plus just means staff or higher. Staff, senior staff, principal, mega ultra principal etc…

cj · 2025-01-01T02:52:35 1735699955

Outside of big tech, those titles aren’t common. Level X SWE vs staff vs principal doesn’t mean anything to a lot of people who aren’t in that orbit.

sarchertech · 2025-01-02T05:22:19 1735795339

Sure, but my point is when someone says staff plus they mean staff or higher. They don’t mean higher than staff, or the best of the best staff engineers.

It just means anyone higher than a senior engineer.

SoftTalker · 2025-01-01T20:09:07 1735762147

Yes when I started working, "staff" meant entry-level. My first job out of school was a "staff consultant." So I'm always tripped up when I see "staff" used to mean "very senior/experienced"

lynguist · 2025-01-02T12:05:25 1735819525

Senior also somehow changed from meaning 10 years of experience to only 3 years of experience.

manmal · 2025-01-01T12:18:05 1735733885

I’ve seen your comment below, but you did specify big tech as context in this parent comment, no? Or is „very large tech company“ not FAANG?

Google has Staff at L6, and their ladder goes up to L11. Apple‘s Staff pendant is ICT5, which is below ICT6 and Distinguished. Amazon has E7-E9 above Staff, if you count E6 as Staff. Netflix very recently departed from their flat hierarchy and even they have Principal above Staff.

lr1970 · 2025-01-01T16:17:57 1735748277

> Amazon has E7-E9 above Staff

Few clarifications:

Amazon labels levels with "L" rather than "E". Engineering levels are L4 -- L10. Weirdly enough, level L9 does not exist at Amazon. L8 (Director / Senior Principal Engineer) is promoted directly to L10 (VP / Distinguished Engineer)

scarface_74 · 2025-01-01T11:18:45 1735730325

I know of no “staff plus” engineer (currently staff) that is spending a lot of time coding.

That wouldn’t be “working at your level” at the one BigTech company I’ve worked at and not even at the 600 person company I work at now

archagon · 2025-01-01T08:43:31 1735721011

Off topic, but I'm a bit confused. Your iOS apps as listed on your website are CarPrep and Brocly, neither of which appear to have notable review activity or buzz in the media. If the app you're referring to is one of these, the more interesting question (to me) is: how on Earth are you generating $10,200 MRR from it? Or is there another app that I'm missing?

(In my experience as an app developer, getting any traction and/or money from your app can be much more difficult than actually building it.)

mvkel · 2025-01-01T17:30:53 1735752653

Those are just my silly personal projects, not businesses. The business I mentioned above is in the recruiting agency space, B2B SaaS. The app itself is not the thing being purchased per se, the point was it was built using LLMs.

$10K MRR isn't much; we're still validating PMF. We're carefully selecting paid customers at this point, not open for wide release, hence my vagueness. Just wanted to illustrate that building robust apps that have value are possible today.

archagon · 2025-01-01T21:00:31 1735765231

Thanks for the clarification!

PurestGuava · 2025-01-01T14:02:12 1735740132

> (In my experience as an app developer, getting any traction and/or money from your app can be much more difficult than actually building it.)

This. The app I built has maybe 50 downloads despite me trying quite hard to promote it. It's very difficult work, even with the app being completely free of charge (save for a donation button).

egometry · 2024-12-31T22:21:36 1735683696

To the un-sticking point: it's also great at letting people ask questions without being perceived as dumb

Tragically - admitting ignorance, even with the desire to learn, often has negative social reprocussions

simonw · 2024-12-31T22:26:03 1735683963

Asking "stupid" questions without fear of judgement is legit one of my favorite personal applications of LLMs.

tkgally · 2024-12-31T23:32:19 1735687939

That is one of the great strengths of LLMs for school education as well. Students often refrain from asking questions in class out of embarrassment at showing their ignorance or hesitation at interrupting the flow of the class. When used well, LLMs offer a good way for motivated learners to fill in the gaps in their understanding.

The pervasive problem of low student motivation won't be solved by LLMs, though. Human teachers will, I think, still be needed.

james_marks · 2025-01-01T01:41:10 1735695670

I find myself doing this all the time, as an experienced dev.

All the little nooks of missing knowledge are now very easy to fill in.

foundart · 2025-01-01T01:36:47 1735695407

Yes! In the time it would take to organize a question in a form that won’t be downvoted/closed on StackOverflow you can ask a whole series of LLM questions and learn quite a bit.

littlestymaar · 2025-01-01T00:28:53 1735691333

Most of the time it doesn't actually, and most people should definitely do it way more instead of pretending to understand thinks they don't, but this bad habit is probably gained thanks to the school system where asking a stupid question is going to get you mocked by your peers. The thing is, IRL your peers don't get to hear your stupid questions and knowledgeable people are happy to answer them no matter how "dumb" they are (or they don't like questions at all, and you'll bother them even if you asked interesting questions).

See https://danluu.com/look-stupid/

wanderingmind · 2025-01-01T11:21:08 1735730468

This appears to be an interesting social phenomena. Just wondering if the interaction with the LMM has also reduced our inhabitance to ask dumb questions, when interacting with other people as well.

mvdtnz · 2025-01-01T03:00:32 1735700432

> I built and shipped a Swift app to the App Store, currently generating $10,200 in MRR, exclusively using LLMs.

My experience is that people who claim they build worthwhile software "exclusively" using LLMs are lying. I don't know you and I don't know if you are lying, but I would be willing to bet my paycheck you are.

csomar · 2025-01-01T07:22:55 1735716175

They are also usually selling another AI-wrapper. I don't know the parent poster either but if your LLM product is generating $10k/month, your moat is really weak and you'll probably shut the f* up because your only moat is obscurity. Why risk that?

jahnu · 2025-01-01T12:14:58 1735733698

We shouldn’t assume the app created the customer base anew or solves a novel problem. Maybe this one does, we don’t know. But, what if the app is just an app version of a existing website store?

As an example I could imagine a clothing brand wanting an app that customers can install instead of using their phone browser. $10k/month in that context isn’t as surprising or impressive.

csomar · 2025-01-02T06:10:46 1735798246

In which case the LLM contribution to the $10K/month is equivalent to hiring a mobile developer to build such an app which (given the implied simplicity) should be a few thousands one time cost. Not the $120K/month implied by PP. And don't get me wrong, paying a few dozen dollars to get a few thousand dollars worth of software is quite the value.

csa · 2025-01-01T21:12:38 1735765958

> I don't know the parent poster either but if your LLM product is generating $10k/month, your moat is really weak and you'll probably shut the f* up because your only moat is obscurity. Why risk that?

It sounds like they are doing productized consulting, so the relationship is the moat.

mvkel · 2025-01-01T17:33:39 1735752819

I hope someday that people will understand that you can use AI to build "boring" non-AI apps.

csa · 2025-01-01T21:11:19 1735765879

It sounds like they are doing productized consulting, in which case the software doesn’t have to be particularly complex.

The relationship also builds a natural moat.

mvkel · 2025-01-01T17:32:44 1735752764

I mean, I'm pretty upfront on my personal site that I've built successful companies in the past. Not sure why I would lie about this one, especially when I'm admitting that I'm not doing the work :)

See comment above for more context.

yogrish · 2025-01-01T03:25:24 1735701924

May I know what is the name of app that is built using LLM? 10k MRR is highly successful app.

oblio · 2025-01-01T06:25:19 1735712719

> I built and shipped a Swift app to the App Store, currently generating $10,200 in MRR, exclusively using LLMs.

That's great, but professional programmers are afraid of the future maintenance burden.

mvkel · 2025-01-01T17:38:22 1735753102

"maintenance burden" is introduced when a non-original programmer starts contributing to a repo, regardless of how objectively maintainable the code is.

oblio · 2025-01-01T20:11:08 1735762268

Everything in life is about degrees (or ranges, or orders of magnitude - whichever way you want to phrase it).

mellosouls · 2024-12-31T23:50:18 1735689018

I interpreted it as saying that ymmv wrt the models you try and how you use them, and sole exposure to one that doesn't work for you can put you off the whole lot - in this case antirez finds Claude sonnet (with good prompting) very helpful, but gpt 4o (by far the best known due to ChatGPT), not so much and if the latter is representative of others experience it may be why many are still sceptical.

Ruxbin · 2025-01-01T00:41:34 1735692094

May you expand how you did this? I'm seeing a number of apps that claim to do just this and there are number that are becoming super popular.

Not just the development of the code but the entire the thing from the code, infra, auth, cc payments, etc.

mvkel · 2025-01-01T01:06:28 1735693588

Planning to write a lengthy blog post on this. Will reply here.

8n4vidtmkvmk · 2025-01-01T03:05:36 1735700736

For CC payments, just use Stripe. The docs are great!

fijiaarone · 2025-01-01T00:54:18 1735692858

Strange that you don’t mention your product. Making too much money already?

Tomte · 2025-01-01T20:20:41 1735762841

I tried exactly that, a simple Todo-like app, without SwiftUI or Swift knowledge, and Sonnet 3.5 only gave me one syntax error after another. Now I‘m watching Paul Hudson‘s intro videos.

chairmansteve · 2025-01-01T22:52:07 1735771927

"I built and shipped a Swift app to the App Store, currently generating $10,200 in MRR, exclusively using LLMs".

What's the app?!!

s1mplicissimus · 2025-01-01T18:16:50 1735755410

Would be very interesting to have a look at this app that you wrote using only LLMs. Mind sharing the name?

raydev · 2024-12-31T22:05:24 1735682724

Which service/LLM performed the best for you?

mvkel · 2025-01-01T01:10:32 1735693832

Sonnet-3.5 seemed to churn out the best code, so I would default to that. If it got stuck in circular reasoning, 4o would usually resolve it. Then back to Sonnet.

HarHarVeryFunny · 2024-12-31T22:42:09 1735684929

Did you need a Mac for that, or is it possible to use Linux to develop a Swift app targeting iOS?

Would you mind sharing which app you released?

MYEUHD · 2024-12-31T23:00:41 1735686041

You need macOS, which you can run in a VM (e.g. https://github.com/kholia/OSX-KVM ) or by setting up a hackintosh.

ninth_ant · 2024-12-31T19:59:53 1735675193

I think a lot of the confusion is in how we approach LLMs. Perhaps stemming from the over-broad term “AI”.

There are certain classes of problems that LLMs are good at. Accurately regurgitating all accumulated world knowledge ever is not one, so don’t ask a language model to diagnose your medical condition or choose a political candidate.

But do ask them to perform suitable tasks for a language model! Every day by automation I feed in the hourly weather forecast my home ollama server and it builds me a nice readable concise weather report. It’s super cool!

There are lots of cases like this where you can give an LLM reliable data and ask it to do a language related task and it will do an excellent job of it.

If nothing else it’s an extremely useful computer-human interface.

rrix2 · 2024-12-31T20:09:14 1735675754

> Every day by automation I feed in the hourly weather forecast my home ollama server and it builds me a nice readable concise weather report.

not to dissuade you from a thing you find useful but are you aware that the national weather service produces an Area Forecast Discussion product in each local NWS office daily or more often that accomplishes this with human meteorologists and clickable jargon glossary?

https://forecast.weather.gov/product.php?site=SEW&issuedby=S...

ninth_ant · 2024-12-31T22:17:38 1735683458

Doesn’t dissuade me at all, that’s a really neat service. I’m not American though, and even if my own country had a similar service I still enjoying tuning the results to focus on what I’m interested in. And it was just an example of the kinds of computer-human interfaces that are newly possible from this technology.

Anytime you have data and want it explained in a casual way — and it’s not mission critical to be extremely precise — LLMs are going to be a good option to consider.

More useful AGI-like behaviours may be enabled by combining LLMs with other technologies down the line, but we shouldn’t try to pretend that LLMs can do everything nor are they useless.

LtWorf · 2025-01-01T07:42:34 1735717354

The best forecast available on the internet is norwegian.

pella · 2025-01-01T00:20:20 1735690820

> so don’t ask a language model to diagnose your medical condition

(o1-preview) LLMs show promise in clinical reasoning but fall short in probabilistic tasks, underscoring why AI shouldn't replace doctors for diagnosis just yet.

"Superhuman performance of a large language model on the reasoning tasks of a physician" https://arxiv.org/abs/2412.10849 [14 Dec 2024]

dinosaurdynasty · 2025-01-01T00:01:58 1735689718

> choose a political candidate

I actually found 4o+search to be really good at this... Admittedly what I did was more "research these candidates, tell me anything newsworthy, pros/cons, etc" (much longer prompt) and well, it was way faster/patient at finding sources than I ever would've been, telling me things I never would've figured out with <5 minutes of googling each set of candidates (which is what I've done before).

Honestly my big rule for what LLMs are good at is stuff like "hard/tedious/annoying to do, easy to verify" and maybe a little more than that. (I think after using a model for a while you can get a "feel" for when it's likely BSing.)

pixl97 · 2024-12-31T22:12:05 1735683125

>don’t ask a language model to diagnose your medical condition

Honestly they are very decent at it if you give them accurate information in which to make the diagnosis. The typical problem people have is being unable to feed accurate information to the model. They'll cut out parts they don't want to think about or not put full test results in for consideration.

ninth_ant · 2025-01-01T17:51:30 1735753890

If the LLM is trained on accurate medical data and you provide accurate symptoms data, then the LLM can be a useful tool to output the information in a human-readable way.

This is not a diagnosis. Any reasonably capable person can read webmd and apply the symptoms listed and compare them to what the patient describes. This is widely regarded as dangerous because the input data as well as the patient data are limited in ways that can be medically relevant.

So even if you can use it as a good substitute for browsing webmd, it’s still not a substitute for seeing a medical professional. And for the foreseeable future it will not be.

LtWorf · 2025-01-01T07:43:40 1735717420

Yes so basically bias it into what you think it should reply in the question and it will magically somehow give the reply you wanted! Very useful :D

mvdtnz · 2025-01-01T03:03:09 1735700589

> Every day by automation I feed in the hourly weather forecast my home ollama server and it builds me a nice readable concise weather report. It’s super cool!

You feed it a weather report and it responds with a weather report? How is that useful?

hansvm · 2025-01-01T05:41:14 1735710074

It distilled bulk information into a form the author cared about. If nothing else it was probably fun, and a personal report on the things you care about can save minutes each day.

I did something similar awhile back without LLMs. I enjoy kayaking, but for a variety of reasons [0] it's usually unwieldy to break out of the surf and actually get out into the ocean at my local beach. I eventually started feeding the data into an old-school ML model where I'd manually check the ocean and report on a few factors (breaking waves, unsafe wind magnitude/direction, ...). The model converted those weather/tide reports into signals I cared about, and then my forecast could simply AND all those together and plot them on a calendar.

An LLM is less custom in some sense, but if you have certain routines you care about (e.g., commuting to my last job I'd always avoid the 101 in favor of 280 if there was heavy rain), it's easy to let the computer translate raw weather information into signals you care about (should you take an alternate route, should you alter your schedule, ...).

Off-topic, do you know of a good source of weather covariates? E.g., a report with a 50% chance of rain for 2hr can easily mean light rain guaranteed for 2hr, a guaranteed 1hr of rain sometime in that 2hr period, a 50% chance that a 2hr storm will hit your town or the next town over, or all kinds of things. Does anybody report those raw model outputs?

[0] There isn't any protection from the open ocean (combined with a kayak that's a bit too top-heavy for the task at hand), which doesn't help, but the big problem is a sand bar just off the coast. If the tide isn't just right, even small swells are amplified into large breaking waves, and I don't particularly mind getting dumped upside down onto a sand bar, but I'd really prefer to spend that time in slightly calmer waters.

ninth_ant · 2025-01-01T18:01:40 1735754500

Well said, that’s exactly what I meant.

sdesol · 2024-12-31T23:59:58 1735689598

> Perhaps stemming from the over-broad term “AI”.

No, I think if we follow the money, we will find the problem.

uludag · 2024-12-31T20:18:24 1735676304

I don't think people finding LLMs useless is a good representation of the general sentiment though. I feel that more than anything, people are annoyed at LLM slop. Someone uses an LLM too much to write code, they create "slop," which ends up making things worse.

antirez · 2024-12-31T21:03:28 1735679008

Unfortunately complex tools will be misused by part of the population. There is no easy escape from that in the modernity of possibilities. Look at the Internet itself.

gre · 2024-12-31T20:35:09 1735677309

Yes but then they can prompt it to golf the code and most of the slop goes away. This sometimes breaks the code.

miki123211 · 2025-01-01T11:19:10 1735730350

> But there is more: a key thing with LLMs is that their ability to help, as a tool, changes vastly based on your communication ability. The prompt is the king to make those models 10x better than they are with the lazy one-liner question.

People keep saying this, and there are use cases for which this is definitely the case, but I find the opposite to be just as true in some circumstances.

I'm surprised at how good LLMs are at answering "me be monkey, me have big problem with code" questions. For simple one-offs like "how to do x in Pandas" (a frequent one for me), I often just give Claude a mish-mash of keywords, and it usually figures out what I want.

An example prompt of mine from yesterday, which Claude successfully answered, was "python sha256 of file contents base64 safe for fs path."

With a system prompt to make Claude's output super brief and a command to execute queries from the terminal via Simon Willison's LLM tool, this is extremely useful.

mewpmewp2 · 2025-01-01T11:58:07 1735732687

Using the correct keywords like you did is part of communication though.

Good communication with LLMs is the least keywords used to make it deducible for LLM what you exactly want.

EagnaIonat · 2025-01-01T12:42:28 1735735348

> Good communication with LLMs is the least keywords used to make it deducible for LLM what you exactly want.

I am not sure that is the case, at least with a large number of LLMs. CO-STAR and TIDD-EC are much about structure and explanation than brevity.

freehorse · 2025-01-01T14:21:23 1735741283

Finding what works for an llm and what not is also part of communication skills.

Though I do not have a good idea what is _bad_ communication with an llm. People say that sometimes, but when specific examples arise I do not see really anything more than limitations of llms (and the improvements they often suggest do not do anything either). So it would be good to have some more concrete examples, unless that is about inability to communicate a problem in general, stemming from actual inability to _understand_ the problem. Also a lot change in time, I think in the past one had to really coddle an llm "You are the best expert in python in the world!" but I am not sure that is that important nowadays.

mewpmewp2 · 2025-01-02T14:32:50 1735828370

Bad communication => being too ambiguous, expecting LLM to understand you through that ambiguity and then not being satisfied when it didn't.

Bad communication: "My webapp doesn't work"

Good communication: "Nextjs, [pasted error]"

Bad communication is giving irrelevant information, or being too ambiguous, not providing enough or correct detail.

Then another example of good communication and efficiency in my view is for example "ts, fn leftpad, no text, code only".

I myself can understand what it means when someone was to prompt it and LLM can understand such query for all domains.

Although if I was using Copilot I would just write the bare minimum to trigger the auto complete I want so

const leftPad =

is probably enough.

mikehollinger · 2025-01-01T05:50:15 1735710615

> About "people still thinking LLMs are quite useless", I still believe that the problem is that most people are exposed to ChatGPT 4o that at this point for my use case (programming / design partner) is basically a useless toy....

and

> a key thing with LLMs is that their ability to help, as a tool, changes vastly based on your communication ability.

I still hold that the innovations we've seen as an industry with text transfer to the data from other domains. And there's an odd misbehavior with people that I've now seen play out twice -- back in 2017 with vision models (please don't shove a picture of a spectrogram into an object detector), and today. People are trying to coerce text models to do stuff with data series, or (again!) pictures of charts, rather than paying attention to timeseries foundation models which directly can work on the data.[1]

Further, the tricks we're seeing with encoder / decoder pipelines should work for other domains. And we're not yet recognizing that as an industry. For example, whisper or the emerging video models are getting there, but think about multi-spectral satellite data, fraud detection (a type graph problem).

There's lots of value to unlock from coding models. They're just text models. So what if you were to shove an abstract syntax tree in as the data representation, or the intermediate code from LLVM or a JVM or whatever runtime and interact with that?

[1] https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1 - shout-out to some former colleagues!

simonw · 2025-01-01T05:54:09 1735710849

Andrej Karpathy: https://twitter.com/karpathy/status/1835024197506187617

> It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.

> They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can "throw an LLM at it".

vbezhenar · 2025-01-01T08:19:56 1735719596

But I need enormous amounts of learning data and enormous amount of computing to learn new models, right? So it's kind of useless advice for most people who can't just parse github repositories and teach their new model using AST tokens. They have to use existing opensourced models or API and those happened to use text.

monero-xmr · 2025-01-01T06:03:50 1735711430

The environmental arguments are hilarious to me as a diehard crypto guy. The ultimate answer to “waste” of electricity arguments is that energy is a free market and people pay the price if it’s useful for them. As long as the activity isn’t illegal then training LLMs or mining bitcoins, it doesn’t matter. I pay for the electricity I use.

Eisenstein · 2025-01-01T06:28:00 1735712880

Do you think that it we should make it illegal to mine coins if the majority of people think the environmental cost is too high?

monero-xmr · 2025-01-01T09:32:02 1735723922

If a law is passed then that’s the law

adwn · 2025-01-01T11:21:29 1735730489

One argument against that line of thinking is that energy production has negative externalities. If you use a lot of electricity, its price goes up, which incentivizes more electricity production, which generates more negative externalities. It will also raise the costs for other consumers of electricity.

Now that alone is not yet an argument against crypto currencies, and one person's frivolous squandering of resources is another person's essential service. But you can't simply point to the free market to absolve yourself of any responsibility for your consumption.

monero-xmr · 2025-01-01T13:46:12 1735739172

I greatly despise video games. Why is that not a waste of energy? If you are entertained by something, even if it serves no human purpose other than entertainment, is that not a valid use of electricity?

specialist · 2025-01-02T17:31:51 1735839111

Unintentionally, the energy demands of cryptocurrencies, and data centers in general, have finally motivated utilities (and their regulators) to finally start building out the massive new grid capacity needed for our glorious renewable energy future.

Acknowledging that facilitating scams (eg pig butchering) are cryptocurrency's primary (sole?) use case, I'm willing to look the other way if we end up with the grid we need to address climate crisis.

monero-xmr · 2025-01-03T05:25:49 1735881949

To pretend romance / affinity scams and crime were created by crypto is absurd. It’s fair to argue crypto made crime more efficient, but it also made the responsible parties quicker to patch holes.

The primary use case of crypto is to protect wealth from a greedy, corrupt, money-printing state. Everything else is a sideshow

specialist · 2025-01-03T13:51:44 1735912304

> primary use case of crypto is to protect wealth

Merely trading governments for corporations.

> Everything else is a sideshow

Agreed. Crypto is endlessly amusing.

monero-xmr · 2025-01-03T14:26:22 1735914382

What corporation made bitcoin?

abhijeetpbodas · 2025-01-01T11:57:18 1735732638

> ask very precise questions explaining the background

IME, being forced to write about something or verbally explaining/enumerating things in detail _by itself_ leads to a lot of clarity in the writer's thoughts, irrespective of if there's an LLM answering back.

People have been doing rubber-duck-debugging since long. The metaphorical duck (LLMs in our context), if explained to well, has now started answering back with useful stuff!

danielbln · 2025-01-01T13:47:40 1735739260

One thing LLMs have been incredibly strong even since gpt-3.5 is being the most advanced non-human rubber duck, and while they can do plenty more, that alone provides (me at least) with tremendous utility.

aleph_minus_one · 2025-01-01T13:17:06 1735737426

> About "people still thinking LLMs are quite useless", I still believe that the problem is that most people are exposed to ChatGPT 4o that at this point for my use case (programming / design partner) is basically a useless toy. And I guess that in tech many folks try LLMs for the same use cases. Try Claude Sonnet 3.5 (not Haiku!) and tell me if, while still flawed, is not helpful.

I see much deeper problems. Just to give two examples:

- I asked various AIs concerning explanations of proofs of some deep (established) mathematical theorems: the explanations were to my understanding very hallucinated, and thus worse than "obviously wrong". I also asked for literature references for some deep mathematical theory frameworks: bascially all of the references were again hallucinated.

- I asked lots of AIs on https://lmarena.ai/ to write a suitably long text about some political topic that is quite controversial in my country (but does have lots proponents even in a very radical formulation, even though most people would not use such a radical formulation in public). All of the LLMs that I checked refused or tried to indoctrinate me that this thesis is wrong. I did not ask the LLM to lecture me, but I gave it a concrete task! Society is deeply divided, so if the LLM only spreads propaganda of its political teaching, it will be useless for many tasks for a very significant share of the society.

swalsh · 2024-12-31T21:46:17 1735681577

I'm a big believer in Claude. I've accomplished some huge productivity gains by leveraging it. That said, I can see places where the models are strong and weak. If you're doing react, or python. These models are incredible. C#, C++ they're not terrible. Rust though, it's not great. If your experience is exclusively trying to use it to write Rust, it doesn't matter if you're using o1, Claude or anything else. It's just not great at it yet.

duped · 2024-12-31T21:18:29 1735679909

> Try Claude Sonnet 3.5 (not Haiku!) and tell me if, while still flawed, is not helpful.

It's not as helpful as Google was ten years ago. It's more helpful than Google today, because Google search has slowly been corrupted by garbage SEO and other LLM spam, including their own suggestions.

ChicagoDave · 2024-12-31T21:47:05 1735681625

Claude Sonnet 3.5 can write whole React applications with proper contextual clues and some minor iterations. Google has never coded for you.

I’ve written two large applications and about a dozen smaller ones using Claude as an assistant.

I’m a terrible front-end developer and almost none of that work was possible without Claude. The API and AWS deployment were sped up tremendously.

I’ve created unit tests and I’ve read through the resulting code and it’s very clean. One of my core pre-prompt requirements has always been to follow domain-driven design principles, something a novice would never understand.

I also start with design principles and a checklist that Claude is excellent at providing.

My only complaint is you only have a 3-4 hour window before you’re cutoff for a few hours.

And needing an enterprise agreement to have a walled garden for proprietary purposes.

I was not a fan in Q1. Q2 improved. Q3 was a massive leap forward.

duped · 2024-12-31T22:23:26 1735683806

I've never really used Claude for writing code, becuase I'm not really bottlenecked by that problem. I have used it quite a bit for asking questions about what code to write and it's almost always wrong (usually in subtle ways that would trick someone with little experience).

Maybe it was overtrained on react sources, but for me it's pretty useless.

The big annoyance for me is it just makes up APIs that don't exist. While that's useful for suggesting to me what APIs I should add to my own code, it's really pointless if I ask a question like "using libfoo how do I bar" and it tells me "call the doBar() function" which does not exist.

numpad0 · 2024-12-31T23:32:12 1735687932

They can't think at all. The task must be strict macroexpansion of original input(doesn't mean that always works).

I'm suspecting LLM works for a lot of front end and app coding just because code in those fields are insanely overbloated and value proposition is almost disconnected from logic. There must be metric tons of typing in those fields, and in those areas LLMs must be useful. They certainly handle paper test questions well.

csomar · 2025-01-01T07:31:08 1735716668

They are mostly useful for front-end/React because front-end shouldn't been code in the first place. They can do the UX but not the state management. Honestly, as someone who sucks and dread UX building (and having to frequently adjust my divs/components), they are a life saver when you are doing very conventional things. That is things you can find 100s of examples of but will take you hours to glue together.

deadbabe · 2024-12-31T23:41:11 1735688471

Imagine not needing Claude to do any of that.

ChicagoDave · 2025-01-01T00:21:29 1735690889

This is one of those things I like about Claude.

I’m hitting my 40th year as a professional software developer and architect. I’ve written thousands of blocks of code from scratch. It gets boring.

But then in the 2000’s me (and everyone else) started building code generators, often from ERD structures, but also UML designs.

These tools were massively useful and (initially) reduced costs. The future balls of mud problems took over ten years to arrive.

But code generation has always been considered a smart and cost-effective approach to building software.

GenAI has “issues” and those have been exposed. One of my recent revelations is that Claude is best at TypeScript and python. C# (my home turf) is much lower in its skills capacity.

So in the last two months I’ve been building my apps in TypeScript instead of C# and have dramatically increased my productivity.

Claude will definitely fail if it doesn’t have the correct information. A good example is writing Bluesky apps. The docs are a mess and contradictory. But there are up to date docs on GitHub and if you include those in your project with instructions to only use those references, Claude’s hallucinations can be eliminated.

I don’t think AGI is a real possibility in my lifetime, and I do fear the future of software development when no one has actual coding experience, but for us boomers, it’s pretty darn useful.

deadbabe · 2025-01-01T00:41:46 1735692106

How are you measuring your productivity?

ChicagoDave · 2025-01-01T01:30:39 1735695039

In many cases I have no frame of reference for the expected code, like React and css. Typescript is perfectly readable, but I’m not really a script kiddie, so I’d go very slow on the React tsx files. The services are probably a slightly faster set of work, especially if I always have unit tests.

If someone was an expert React+TypeScript programmer with decent css knowledge the productivity may be a marginal improvement.

But I haven’t been a full-time programmer in ten years.

jimt1234 · 2025-01-02T02:14:20 1735784060

Google Search has been corrupted by...Google.

bdangubic · 2024-12-31T21:52:44 1735681964

comparing google to claude 3.5 is like comparing tesla s plaid with a horse

emptiestplace · 2024-12-31T22:03:43 1735682623

What a hilariously absurd statement. You might want to actually try it.

isoprophlex · 2025-01-01T12:05:56 1735733156

Super interesting that my experience mirrors exactly what you are writing... except for me finding Claude to be almost useless (often misunderstands me, gives answers that are plain wrong) and 4o to be a very helpful, if not somewhat dull, jack-of-all trades in helping me be a cruise control for the mind.

I could only ever really jam with 4o.

Makes me wonder if there's personal communication preferences at play here.

kromem · 2024-12-31T22:51:15 1735685475

Both new Sonnet and Haiku have a masking overhead.

Using a few messages to get them out of "I aim to be direct" AI assistant mode gets much better overall results for the rest of the chat.

Haiku is actually incredibly good at high level systems thinking. Somehow when they moved to a smaller model the "human-like" parts fell away but the logical parts remained at a similar level.

Like if you were taking meeting notes from a business strategy meeting and wanted insights, use Haiku over Sonnet, and thank me later.

vasco · 2025-01-01T17:12:10 1735751530

Most people consider their own brain useless and don't use it, so it's not strange that they do the same with AI. How many people just refuse to learn how to parallel park, a new language, calculus or even basic arithmetic, "because they aren't good at it".

hdjjhhvvhga · 2024-12-31T19:40:36 1735674036

While Claude Sonnet is superior than 4o for most my use cases, there are still occasionally some specific tasks where it performs slightly better.

antirez · 2024-12-31T19:43:58 1735674238

Probably. But statistically to work with 4o is a lose of time for me. LLMs is like an investment: you write the prompts, you "work" with them. If the LLM is too weak, this is a lose of time. You need to have a return on the investment that is positive. With ChatGPT 4o / o1 most of the times for me the investment of time has almost zero return. Before Claude Sonnet 3.5 I already had a ChatGPT PRO account but never used it for coding since it was most of the times useless if not for throw away scripts that I didn't want to do myself or as a stack overflow replacement for trivial stuff. Now it's different.

airstrike · 2024-12-31T20:00:54 1735675254

This mirrors my experience 100%. I'm not even sure why I still pay for OpenAI at this point. Claude 3.5 is just incredibly superior. And I totally agree on the point about dropping in context and asking very specific questions. I've had Claude pinpoint a bug in a 2k LOC module that I was struggling to find the cause for. After wasting a lot of time on it on my own, I thought "what the heck, maybe Claude can figure it out" and it did. It's objectively useful, even if flawed sometimes.

hanikesn · 2025-01-01T06:57:20 1735714640

I'm curious. Can you go into more detail what kind of bug it found?

airstrike · 2025-01-01T17:41:00 1735753260

I was writing a custom widget for iced (the Rust GUI library) and I was getting a panic due to some fancy logic I was trying to do. I guess the shortest description I can say is that it was a combination of what appeared to be a caching issue at first, but the real cause turned out to be some method shadowing where I was using a struct's method where I meant to use the trait's method.

I had made the specific operation generic (moving it out of the struct and into a trait) but forgot to delete it from the struct, so I was calling the incorrect function. Claude pinpointed the cache issue immediately when I just dumped two files into the context and asked it:

    somewhere in my codebase I'm triggering a perform() on the editor but the next call on highlight() panics because `Line layout should be cached`

    what am I missing? do I need to do something after perform() to re-cache the layout?

at first that seemed to fix the issue, but other errors persisted. so we kept debugging together until we found the root cause. either way I knew where to look thanks to its assistance

d0mine · 2024-12-31T22:43:33 1735685013

why "lose of time" instead of "loss of time" Is it a typo or fingerprinting?

fragmede · 2025-01-01T01:46:54 1735696014

it's "proof" that it wasn't written by an LLM (but let me delve into this issue).

squirrel · 2025-01-01T06:29:56 1735712996

tootie · 2024-12-31T22:43:06 1735684986

Like what? Claude has become my go-to, but I find that it's wrong enough often enough that I really can't trust it for anything. If it says something I have to go dig through it's citations very carefully.

minimaxir · 2024-12-31T19:54:55 1735674895

> Claude Sonnet 3.5 (not Haiku!)

A very big surprise is just how much better Sonnet 3.5 is than Haiku. Even the confusingly-more-expensive-Haiku-variant Haiku 3.5 that's more recent than Sonnet 3.5 is still much worse.

worldsayshi · 2025-01-01T03:42:14 1735702934

I ponder if LLM:s are very useful but at a quite narrower set of tasks than we expect. Like fuzzy manipulation of logical specifications.

I.e. over time it constitute a fundamental shift in how we interact with abstractions in computers. The current fundamentals will still remain but they will become increasingly malleable. Details in code will become less important. Architecture will become increasingly important. But at the same time the cost of refactoring or changing architecture will quickly drop.

Any details that are easily lost when passing through an LLM will be details that have the highest maintenance cost. Any important details that can be retained by an LLM can move up and down the ladder of abstraction at will.

Can an LLM based solution maintain software architectures without introducing noise? The answer to that is the difference between somewhat useful and game changing.

cruffle_duffle · 2024-12-31T20:26:28 1735676788

To get the most out of them you have to provide context. Treat these models like some kind of eager beaver junior engineer who wants to jump in and write code without asking questions. Force it to ask questions (eg: “do not write code yet, please restate my requirements to make sure we are in alignment. Are there any extra bits of context or information that would help? I will tell you when to write code”)

If your model / chat app has the ability to always inject some kind of pre-prompt make sure to add something like “please do not jump to writing code. If this was a coding interview and you jumped to writing code without asking questions and clarifying requirements you’d fail”.

At the top of all your source files include a comment with the file name and path. If you have a project on one of these services add an artifact that is the directory tree (“tree —-gitignore” is my goto). This helps “unaided” chats get a sense of what documents they are looking at.

And also, it’s a professional bullshitter so don’t trust it with large scale code changes that rely on some language / library feature you don’t have personal experience with. It can send you down a path where the entire assumption that something was possible turns out to be false.

Does it seek like a lot of work? Yes. Am I actually more productive with the tool than without? Probably. But it sure as shit isn’t “free” in terms of time spent providing context. I think the more I use these models, the more I get a sense of what it is good at and what is going to be a waste of time.

Long story short, prompting is everything. These things aren’t mind readers (and worse they forget everything in each new session)

layer8 · 2024-12-31T23:34:14 1735688054

You are right, but doing all that is incredibly cumbersome, at least to some people, which is why they don’t like working with LLMs.

simonw · 2024-12-31T23:36:06 1735688166

That was one of the themes of my article: LLMs are power-user tools, mis-sold as "easy to use". To get great results out of them you need to invest a whole lot of under-documented and under-appreciated effort. https://simonwillison.net/2024/Dec/31/llms-in-2024/#llms-som...

layer8 · 2025-01-01T00:26:35 1735691195

It’s not just that you need to be a power user (I certainly am), you also need to be fine with nondeterminism and typing a lot of prose, instead of doing everything with keyboard shortcuts and CLI commands, with reproducible outcomes. It’s a different mode of operation and interaction, requiring a different predisposition to some degree.

madmask · 2025-01-01T07:56:59 1735718219

Exactly! I don’t like talking or writing or explaining.

My mind generally uses language as little as possible, I have no inner monologue running in the background.

Greatly prefer something deterministic to random bs popping up without the ability of recognizing it.

I don’t like llms but sometimes use them as autocomplete or to generate words, like a template for a letter or boilerplate scripts, never for actual information (à la google).

fragmede · 2025-01-01T01:52:32 1735696352

unless you can type faster than you can talk, (which some people can), stop typing and start dictating. aider has a /voice command for a reason.

I don't use it exclusively, but damn does it help in the right places.

eterps · 2025-01-01T11:14:40 1735730080

Can you elaborate, or give some examples? I am having trouble imagining in which situations that would be useful because I tend to put a lot of thought into defining the right prompt before sending it over.

qwertox · 2025-01-01T00:22:40 1735690960

LLMs have given computers the ability to communicate with us in natural language, we didn't have that before at this level. In order to do this, they've been fed with a lot of coherent stuff and give the impression of being coherent, but we know they're just statistical machines. But at least they can now communicate naturally with us, so now we have that infrastructure available, as we do have TTS or ASR or monitors and keyboards available. It's still up to us to now make proper agents out of them. Agents for the software we've been using for decades. They can take over a lot of tedious work for us.

salawat · 2025-01-01T02:44:39 1735699479

Why are you pasting huge chunks of potentially crown jewels code into a 3rd party service where prompts are going to most likely be turned into training/surveillance material?

simonw · 2025-01-01T02:50:13 1735699813

A lot of vendors promise not to train on input to their models. I choose to believe those promises.

askl56 · 2025-01-01T03:14:24 1735701264

A scorpion, not knowing how to swim, asked a frog to carry it across the river. “Do I look like a fool?” said the frog. “You’d sting me if I let you on my back!”

“Be logical,” said the scorpion. “If I stung you I’d certainly drown myself.”

“That’s true,” the frog acknowledged. “Climb aboard, then!” But no sooner than they were halfway across the river, the scorpion stung the frog, and they both began to thrash and drown. “Why on earth did you do that?” the frog said morosely. “Now we’re both going to die.”

“I can’t help it,” said the scorpion. “It’s my nature.”

zahlman · 2025-01-01T03:24:27 1735701867

>They are also great at doing boring tasks for which you can provide perfect guidance (but that still would take you hours)

All the tasks I can think of dealing with on my own computer that would take hours, a) are actually pretty interesting to me and b) would equally well take hours to "provide perfect guidance". The drudge work of programming that I notice comes in blocks of seconds at a time, and the mental context switch to using an LLM would be costlier.

FooBarWidget · 2024-12-31T22:14:13 1735683253

Why do people have such narrow views on what makes LLMs useful? I use them for basically everything.

My son throwing an irrational tantrum at the amusement park and I can't figure out why he's like that (he won't tell me or he doesn't know himself either) or what I should do? I feed Claude all the facts of what happened that day and ask for advice. Even if I don't agree with the advice, at the very least the analysis helps me understand/hypothesize what's going on with him. Sure beats having to wait until Monday to call up professionals. And in my experience, those professionals don't do a better job of giving me advice than Claude does.

It's weekend, my wife is sick, the general practitioner is closed, the emergency weekend line has 35 people in the queue, and I want some quick half-assed medical guidance that while I know might not be 100% reliable, is still better than nothing for the next 2 hours? Feed all the symptoms and facts to Claude/ChatGPT and it does an okay job a lot of the time.

I've been visiting Traditional Chinese Medicine (TCM) practitioner for a week now and my symptoms are indeed reducing. But TCM paradigm and concepts are so different from western medicine paradigms and concepts that I can't understand the doctor's explanation at all. Again, Claude does a reasonable job of explaining to me what's going on or why it works from a western medicine point of view.

Want to write a novel? Brainstorm ideas with GPT-4o.

I had a debate with a friend's child over the correct spelling of a Dutch word ("instabiel" vs "onstabiel"). Google results were not very clear. ChatGPT explained it clearly.

Just where is this "useless" idea coming from? Do people not have a life outside of coding?

krapp · 2024-12-31T22:20:22 1735683622

Yes people have lives outside of coding, but most people are able to manage without having AI software intercede in as much of their lives as possible.

It seems like you trust AI more than people and prefer it to direct human interaction. That seems to be satisfying a need for you that most people don't have.

claar · 2024-12-31T22:29:10 1735684150

Why do you postulate that "most people don't have" this need? I also use AI non-stop throughout my day for similar uses.

This feels identical to when I was an early "smart phone" user w/my palm pilot. People would condescend saying they didn't understand why I was "on it all the time". A decade or two later, I'm the one trying to get others to put down their phones during meetings.

My take? Those who aren't using AI continually currently are simply later adopters of AI. Give it a few years - or at most a decade - and the idea of NOT asking 100+ AI queries per day (or per hour) will seem positively quaint.

krapp · 2024-12-31T22:47:48 1735685268

>Those who aren't using AI continually currently are simply later adopters of AI. Give it a few years - or at most a decade - and the idea of NOT asking 100+ AI queries per day (or per hour) will seem positively quaint.

I don't think you're wrong, I just think a future in which it's all but physically and socially impossible to have a single thought or communication not mediated by software is fucking terrifying.

FooBarWidget · 2024-12-31T22:33:08 1735684388

When I'm done working, chased my children to properly finish their dinner, helped my son with homework, and putting them to bed, it's already 9+ PM — the only time of the day when I have free time. Just which human besides my wife can I talk to at that point? What if she doesn't have a clue either? All the professionals are only open when I'm working. A lot of the issues happen during the weekend, when professionals are closed. I don't want to disturb friends during the evening, and it's not like they have the expertise I need anyway.

LLMs are infinitely patient, don't think I am dumb for asking certain things, consider all the information I feed them, are available whenever I need them, have a wide range of expertise, and are dirt cheap compared to professionals.

That they might hallucinate is not a blocker most of the time. If the information I require is critical, I can always double check with my own research or with professionals (in which case the LLM has already primed me with a basic mental model so that I can ask quick, short, targeted questions, which saves the both of us time, and me money). For everything else (such as my curiocity on why TCM works, or the correct spelling of a word), LLMs are good enough.

vbezhenar · 2025-01-01T08:50:22 1735721422

You are supposed to have connections with knowledgeable people, so you can call them and ask for advice. That's how it works without computers.

FooBarWidget · 2025-01-01T16:56:49 1735750609

Did you miss the parts where I said that I only have time when they're closed, and they're only open when I'm most busy?

Have you never seen knowledgeable people get things wrong, and having to verify them?

Did you miss the part where they cost money, and I better come in as prepared as possible?

I really don't get these knee-jerk averse reactions. Are people deliberately reading past my assertions that I double check LLM outputs for everything critical?

jiggawatts · 2024-12-31T22:30:49 1735684249

At the risk of sounding impolite or critical of your personal choices: this, right here, is the problem!

You don’t understand how medicine works, at any level.

Yet you turn to a machine for advice, and take it at face value.

I say these things confidently, because I do understand medicine well enough to not to seek my own answers. Recently I went to a doctor for a serious condition and every notion I had was wrong. Provably wrong!

I see the same behaviour in junior developers that simply copy-paste in whatever they see in StackOverflow or whatever they got out of ChatGPT with a terrible prompt, no context, and no understanding on their part of the suitability of the answer.

This is why I and many others still consider AIs mostly useless. The human in the loop is still the critical element. Replace the human with someone that thinks that powdered rhino horn will give them erections, and the utility of the AI drops to near zero. Worse, it can multiply bad tendencies and bad ideas.

I’m sure someone somewhere is asking DeepSeek how best to get endangered animals parts on the black market.

FooBarWidget · 2024-12-31T22:56:01 1735685761

No. Where do you read that I take it at face value? I literally said that I expect Claude to give me "half-assed" medical guidance. I merely said that that is still better than having no clue for the next 2 hours while I wait on the phone with 35 people in front of me, which is completely different from "taking medicine advice at face value". It's not like I will let my wife drink bleach just because Claude told me to. But if it tells me that it's likely an ear infection then at least I can discuss the possibility with the doctor.

So I am curious about how TCM works. So what if an LLM hallucinates there? I am not writing papers on TCM or advising governments on TCM policy. I still follow the doctor's instructions at the end of the day.

For anything really critical I already double check with professionals. As you said, human in the loop is important. But needing human in the loop does not make it useless.

You are letting perfect be the enemy of good. A half-assed tax advice with some hallucinations from an LLM is still useful, because it will prime me with a basic mental model. When I later double check the whole thing with a professional, I will already know what questions to ask and what direction I need to explore, which saves time and money compared to going in with a blank slate.

The other day I had Claude advice me on how to write a letter to a judge to fight a traffic fine. We discuss what arguments to make, from what perspective a judge will see things, and thus what I should plead for. The traffic fine is a few hundred euros: a significant amount, but barely an hour worth of a real lawyer's fee. It makes absolutely no sense to hire a real lawyer here. If this fails, the worst thing that can happen is that I won't get my traffic fine reimbursed.

There is absolutely nothing wrong with using LLMs when you know their limits and how to mitigate them.

So what if every notion you learned about medicine from LLMs is wrong? You learn why they're wrong, then next time you prompt/double check better, until you learn how to use it for that field in the least hallucinationatory way. Your experience also doesn't match mine: the advice I get usually contains useful elements that I then discuss with doctors. Plus, doctors can make mistakes too, and they can fail to consider some things. Twitter is full of stories about doctors who failed to diagnose something but ChatGPT got it right.

Stop letting perfect be the enemy of good. Occasionally needing human in the loop is completely fine.

fragmede · 2025-01-01T03:10:46 1735701046

To be fair though, humanity doesn't know how some medicines work at a fundamental level either. The method of action for Tylenol, lithium, and metformin, among others isn't fully understood.

jiggawatts · 2025-01-01T03:29:35 1735702175

True, but modern "western"[1] medicine is not about the specific chemicals used, or even knowing how they exactly work at a chemical level, but the process for identifying what does and what does not work. It's an "evidence based" science with with experiments designed to counter known biases such as the placebo effect. Much of what we consider modern medicine was developed before we were entirely sure that atoms actually existed!

[1] It isn't actually western, because it's also used in the east, middle-east, south, both sides of every divide, etc... In the same sense, there is no "western chemistry" as an alternative to "eastern alchemy". There's "things that work" versus "things that make you feel slightly better because they're mild narcotics or stimulants... at best."

(I don't want to focus too much on Chinese herbal medicine, because I see the same cargo-culting non-scientific thinking in code development too. I've lost count of the number of times I've seen an n-tier SPA monstrosity developed for something that needed a tiny monolithic web app, but mumble-mumble-best-mumble-practices.)

FooBarWidget · 2025-01-01T06:55:51 1735714551

"Western medicine" (which is exactly what it is called in China, to contrast with TCM) is shorthand for "practices invented in the west". That these methods chase universal truths, or are practiced world-wide, do not make them "non-west" in terms of origin.

The Chinese call the practice of truth seeking, in a more broader sense (outside of medicine) just "science".

"Western" medicine is also not merely the practice of seeking universal medical truth. It is also a collection of paradigms that have been developed in its long history. Like all paradigms, there are limits and drawbacks: phenomena that do not fit well. Truth seeking tends to be done on established paradigms rather than completely new ones.

The "western" prefix is helpful in contrasting it with TCM, which has a completely different paradigm. Many Chinese, myself included, have the experience that there are all sorts of ailments that are not meaningfully solved by "western" medicine practitioners, but are meaningfully solved by TCM practitioners.

tokai · 2024-12-31T23:51:50 1735689110

This reads like satire to me. Scarry that it isn't.

FooBarWidget · 2025-01-01T06:51:27 1735714287

I'm guessing that mindset is what cause some people to find this scary. I see a new tool and opportunities. Like all tools, it has drawbacks and caveats, but when wielded properly, it can give me more choice. I suspect some others focus too much on flaws and don't bother looking for opportunities. They are expecting a holy grail: if it's not perfect then it's useless.

It's like people who proclaim that Linux as a whole is a useless toy because it doesn't run their favorite games or favorite Windows app. They focus on this one flaw and miss all the opportunities.

Many of these people seem to advocate trusting human professionals. Do you have any idea how often human professionals do a half-assed job, and I have to verify them rather than blindly trusting them? The situation is not that much different from LLMs.

Professionals making mistakes do not make them useless. Grandma, with all her armchair expertise, is often right and sometimes wrong, and that does not make her useless either.

Why let perfect be the enemy of good?

BlueTemplar · 2025-01-01T12:49:33 1735735773

Grandma has a reason to care about you.

At the opposite, my trust of Russian / Chinese / USian platforms is low enough that I consider it my duty to publicly shame people that still use them in 2025.

(With some caveats of course, for instance HN is not a yet negative to the world. Yet.)

There's also the question of stickiness of habits : your grandmas are for life, human professionals you might have a shallow enough relationship with that switching them might be relatively easy, while it might be very hard to stop smoking or to stop using Github once you started smoking / create an account.

FooBarWidget · 2025-01-01T17:00:15 1735750815

You view Github and LLMs as traps that deliberately give you malicious advice or even brainwash you into addiction? If you view things that way then it's no surprise that you are averse to LLMs (and Github). But frankly I find that entire view to be absurd and overly cynical.

raincole · 2025-01-01T04:11:18 1735704678

I too read it as satire at first, but after thinking twice I think it's a quite reasonable take. I've added "utilize LLM more in my daily life outside programming" to my new year resolution.

danielbln · 2025-01-01T12:44:50 1735735490

I had the flu at the beginning of December, with high fever, the whole nine yards. Keeping a running log with Claude in which I shared temperature readings, medications etc. has been so useful. If nothing else it's the world's most sophisticated rubber duck / secretary, but that's quite useful in many daily life situations on its own. Caveats apply etc.

signatoremo · 2025-01-01T01:55:35 1735696535

Huh? The GP makes perfect sense. I’d never trust LLMs blindly, but I wouldn’t hesitate to ask them about any topic. “Trust but verify” is often said about human beings. Perhaps “distrust but ask and verify” is the mantra applicable for LLMs.

jsheard · 2024-12-31T20:30:06 1735677006

I swear these goalposts keep getting moved, I remember being told that GPT3.5 is a useless toy but the paid GPT4 is lifechanging, and now that GPT4 is free I'm told that it's a useless toy but paid o1 or paid Sonnet are lifechanging. Looking forward to o1 and Sonnet becoming useless toys, unlike the lifechanging o3.

raincole · 2025-01-01T03:14:20 1735701260

Except GPT4 isn't free.

The GP is claiming GPT4o is bad but Sonnet is good. GPT4o is about only 20% cheaper than Sonnet.

aetherson · 2024-12-31T20:47:22 1735678042

You will also be dismayed to hear that a 2011 iPhone is no longer state-of-the-art, and indeed can't run most modern apps.

scubbo · 2024-12-31T21:23:08 1735680188

Holy false-equivalency, Batman! The definitions of "useless toy / lifechanging tool" are _not_ changing over time (or, at least, not over the timescale being explored here), whereas the expectations and requirements of processing power of a phone are.

aetherson · 2025-01-01T16:52:17 1735750337

But in fact they are changing over time -- this is an expectations treadmill. When you get something newer and better, it highlights the flaws in what you had before.

scubbo · 2025-01-01T23:04:49 1735772689

That is true _in general_, but not in this specific case (hence why I specified "not over the timescale being explored here"). A modern cigarette-lighter would indeed have been a life-changing tool to a caveman but is indeed disposable junk today.

The point being made by the original comment (with which I agree) was that many criteria-for-usefulness - primarily that of reliability or a lack of hallucination - have remained static; with successive generations of tools being (falsely) claimed to meet them, but then abandoned when the next hype-train comes along.

I certainly agree that _some_ aspects of AI models are indeed improving (often drastically!) over time (speed, price, supported formats, history/context, etc.) - but they still _all_ fall _drastically_ short on the key core requirement that is required in order to make them Actually Useful. "X is better than Y" does not imply "where Y failed to be useful, X now succeeds".

jpc0 · 2024-12-31T21:22:18 1735680138

GPT4 is a 13 year old technology? Compared to o1 and Sonnet 3.5?

If someone told me an iPhone 4 is terrible but an iPhone 5 would definitely serve my needs, then when I get an iPhone 5 they say the same of the 6 you really want me to believe them a second time? Then a third time? Then a 4th? In the mean time my time and money is wasted?

johnrob · 2024-12-31T23:35:51 1735688151

It would be quite useful if that were the only phone available.

qsort · 2024-12-31T21:17:04 1735679824

I believe it's more frustration directed at the mismatch between marketing and reality, combined with the general well deserved growing hatred for SV culture, and, more broadly, software engineers. The sentiment would be completely different if the entire industry marketed themselves like the helpful tools they are rather than the second coming of Christ they aren't. This distinction is hard to make on "fast food" forums like this one.

If you aren't a coder, it's hard to find much utility in "Google, but it burns a tree whenever you make an API call, and everything it tells you might be wrong". I for one have never used it for anything else. It just hasn't ever come up.

It's great at cheating on homework, kids love GPTs. It's great at cheating in general, in interviews for instance. Or at ruining Christmas, after this year's LLM debacle it's unclear if we'll have another edition of Advent of Code. None of this is the technology's fault, of course, you could say the same about the Internet, phones or what have you, but it's hardly a point in favor either.

And if you are a coder, models like Claude actually do help you, but you have to monitor their output and thoroughly test whatever comes out of them, a far cry from the promises of complete automation and insane productivity gains.

If you are only a consumer of this technology, like the vast majority of us here, there isn't that much of an upside in being an early adopter. I'll sit and wait, slowly integrating new technology in my workflow if and when it makes sense to do so.

Happy new year, I guess.

fragmede · 2025-01-01T01:44:47 1735695887

> there isn't that much of an upside in being an early adopter.

Other than, y'know, using the new tools. As a programmer heavy forum, we focus a lot on LLMs' (lack of) correctness. There's more than a little bit of annoyance when things are wrong, like being asked to grab the red blanket and then getting into an argument over it being orange instead of what was important, someone needed the blanket because they were cold.

Most of the non-tech people who use ChatGPT that I've talked to absolutely love it because they don't feel it judges them for asking stupid questions and they have conversations about absolutely everything in their lives with it down to which outfit to wear to the party. There are wrong answers to that question as well, but they're far more subjective and just having another opinion in the room is invaluable. It's just a computer and won't get hurt if you totally ignore it's recommendations, and even better, it won't gloat (unless you ask it to) if you tell it later that it was right and you were wrong.

Some people have found upsides for themselves in their lives, even at this nascent stage. No one's forcing you to use one, but your job isn't going to be taken by AI, it's going to be taken by someone else who can outperform you that's using AI.

saltcured · 2025-01-01T03:24:25 1735701865

Yikes.

Clearly said, yet the general sentiment awakens in me a feeling more gothic horror than bright futurism. I am stuck with wonder and worry at the question of how rapidly this stuff will infiltrate into the global tech supply chain, and the eventual consequences of misguided trust.

To my eye, too much current AI and related tech are just exaggerated versions of magic 8-balls, Ouija boards, horoscopes, or Weizenbaum's ELIZA. The fundamental problem is people personifying these toys and letting their guard down. Human instincts take over and people effectively social engineer themselves, putting trust in plausible fictions.

It's not just LLMs though. It's been a long time coming, the way modern tech platforms have been exaggerating their capability with smoke and mirrors UX tricks, where a gleaming facade promises more reality and truth than it actually delivers. Individual users and user populations are left to soak up the errors and omissions and convince themselves everything is working as it should.

Someday, maybe, anthropologists will look back on us and recognize something like cargo cults. When we kept going through the motions of Search and Retrieval even though real information was no longer coming in for a landing.

weMadeThat · 2025-01-01T12:29:48 1735734588

> They work great to explore what is at the borders of your knowledge.

But not at exploring what is at the border of knowledge itself. And by converging on the conventional, LLMs actually lead you away from anything that actually extends.

> doing boring tasks for which you can provide perfect guidance

That's true but you never need an LLM for that. There are wonderful scripts written by wonderful people and provided for free almost all the time and for those who search in the right places. LLM companies benefit/profit of these without providing anything in return.

They are worse than people who grab FOSS and turn it into overpriced and aggressively marketed business models and services or people who threaten and sue FOSS for being better and free alternatives to their bloated and often "illegally telemetric" services.

> able to accelerate you

True, but you leave too much for data brokers and companies like Meta to abuse and exploit in the future. All that additional "interactional data" will do so much worse to humanity than all those previous data sets did in elections, for example, or pretty much all consumer markets. They will mostly accelerate all these dimwitted Fortune 5000 companies that have sabotaged consumers into way too much dumb shit - way more than is reasonable or "ok". And educated, wealthy and or tech-savvy people won't be able to avoid/evade any of that. Especially when it's paired with meds, drugs, foods, biases, fallacies, priming and so on and all the knowledge we will gain on bio-chemical pathways and human liability to sabotage.

They are great for coders, of course, everyone can be an army of clone-warriors with auto-complete on steroids now and nobody can tell you what to do with all that time that you now have and all that money, which, thanks to all of us but mostly our ancestors, is the default. The problem is the resulting hyper-amplified, augmented financial imbalance. It's gonna fuck our species if all the technical people don't restore some of that balance, and everybody knows what that means and what must be done.

atombender · 2024-12-31T23:56:00 1735689360

Is there a way to use this in Jetbrains IDEs? (I've not been impressed with their AI Assistant.) There are a few plugins, but from the reviews they all seem kind of mediocre.

cube2222 · 2025-01-01T01:55:35 1735696535

I personally use the Zed editor AI assistant integration with Sonnet for anything AI-related, while using a JetBrains IDE for coding / code reading, side-by-side.

I haven’t found anything comparably good for JetBrains IDEs yet, but I’m also not switching to something else as my main editor.

Too · 2025-01-01T09:36:01 1735724161

Github copilot plugin is decent. It's not going to write a whole app for you, but it accelerates repetitive stuff, can give suggestions you didn't think of or save you a trip to the documentation.

Sn0wCoder · 2025-01-01T04:52:02 1735707122

I use IntelliJ as my main coding tool but also use VSCode and Sublime text. If you have access to local LLMs or have an API key for some the Continue Plugin (basically Cursor but can use in IntelliJ) is the Best of the Best for IntelliJ (IMO). I have a box running some local models including Phind and StarCoder (plus some small embeddings) and have been super happy with the end product. The next up is Google Gemini Code Assist has been the best of the IntelliJ (non-configured) AI tools I have tried. There are better ones out there but IMO not for IntelliJ. It's still free for a few more weeks and I have been using it since the free release, fun to use. Can pre-prompt, say you are an expert XXX, please be funny, fill in the rest of your regular prompts. The Co-Pilot I use for work is very limited and will only answer coding questions. I tried to tell it that it was my coding buddy, and its name was Phil and told me it cannot have a personality or be funny. I believe the paid personal Co-Pilot allows you to choose which LLM it uses (I cannot confirm). The Phind VSCode plugin works really well. Also, the Phind coding models are on par with some of the other big ones and free if you have a subscription (or run locally). Sublime is around to open those GIG+ files as VSCode chocks and not worth the RAM of opening another IntelliJ.

Each task / programming language / query requires trying different LLM models and novel ways of prompting. If it's not work-related (or work pays for the one you use) sending as much of the code as relevant also helps the answers be more useful.

Most of the people I meet that say LLMs are not useful have only tried one (flavor / plugin), do not know how to pre-prompt or prompt, and do not give the tools a chance. Try one or two things, say yep, it's not good and give up.

Still hard for me to admit that Prompt Engineering is a profession, but it's the same as Google Fu. Once you learn it you can become an LLM Ninja!

I do not believe LLMs are coming for my job (just yet) but do believe they are going to be able to replace some people, are useful and those that do not use them will be at a disadvantage.

cpursley · 2025-01-01T00:18:14 1735690694

Try Cursor. I’m serious.

atombender · 2025-01-01T00:20:33 1735690833

I'm sure it's good, but that's not what I'm asking about.

wslh · 2024-12-31T19:36:01 1735673761

Right, in simpler terms: The measure of LLMs success is how effectively they help you achieve your goal faster.

antirez · 2024-12-31T19:37:44 1735673864

Exactly, and right now the LLMs acceleration effect is a tool, not "give me the final solution". Even people that can't code, using LLMs to build applications from scratch, still have this tool mindset. This is why they can use them effectively: they don't stop at the first failed solution; they provide hints to the LLM, test the code, try to figure what's the problem (also with the LLM help), and so forth. It's a matter of mindset.

andrewaylett · 2025-01-01T12:33:15 1735734795

> people that can't code

These people may not be Software Engineers, but they are coding.

d0mine · 2024-12-31T22:59:09 1735685949

btw, fusion has arrived by that definition: No reactors that would produce more energy than they consume, But net positive reactions have been achieved. Tasks where LLMs output is more than 1x are few and far between.

ddgflorida · 2025-01-01T15:40:53 1735746053

Definitely not a "useless toy" with the right use case. It's great at code snippets, scripts, etc. It's an assistant.

brookst · 2024-12-31T19:49:43 1735674583

I’m surprised you only have one use case. I use LLMs to research travel, adjust recipes, check biographies and book reviews, and many many more things.

mhh__ · 2024-12-31T19:55:05 1735674905

Hopefully things have narrowed but you can see from the trends data just how few people (API may be a different story) use claude relative to chatgpt.

minimaxir · 2024-12-31T19:55:34 1735674934

Brand awareness is a hell of a drug.