Hacker News new | past | comments | ask | show | jobs | submit login
A coder considers the waning days of the craft (newyorker.com)
778 points by jsomers 10 months ago | hide | past | favorite | 1086 comments



Maybe I’m in the minority. I’m definitely extremely impressed with GPT4, but coding to me was never really the point of software development.

While GPT4 is incredible, it fails OFTEN. And it fails in ways that aren’t very clear. And it fails harder when there’s clearly not enough training resources on the subject matter.

But even hypothetically if it was 20x better, wouldn’t that be a good thing? There’s so much of the world that would be better off if GOOD software was cheaper and easier to make.

Idk where I’m going with this but if coding is something you genuinely enjoy, AI isn’t stopping anyone from doing their hobby. I don’t really see it going away any time soon, and even if it is going away it just never really seemed like the point of software engineering


Also, I think we are quite a ways out from a tool being able to devise a solution to a complex high-level problem without online precedent, which is where I find the most satisfaction anyway.

LLMs in particular can be a very fast, surprisingly decent (but, as you mention, very fallible) replacement for Stack Overflow, and, as such, a very good complement to a programmer's skills – seems to me like a net positive at least in the near to medium term.


Spreadsheets didn’t replace accountants, however, it made them more efficient. I don’t personally believe AI will replace software engineers anytime soon, but it’s already making us more efficient. Just as Excel experience is required to crunch numbers, I suspect AI experience will be required to write code.

I use chat-gpt every day for programming and there are times where it’s spot on and more times where it’s blatantly wrong. I like to use it as a rubber duck to help me think and work through problems. But I’ve learned that whatever the output is requires as much scrutiny as a good code review. I fear there’s a lot of copy and pasting of wrong answers out there. The good news is that for now they will need real engineers to come in and clean up the mess.


Spreadsheets actually did put many accountants and “computers” (the term for people that tallied and computed numbers, ironically a fairly menial job) out of business. And it’s usually the case that disruptive technology’s benefits are not evenly distributed.

In any case, the unfortunate truth is that AI as it exists today is EXPLICITLY designed to replace people. That’s a far cry from technologies such as the telephone (which by the way put thousands of Morse code telegraph operators out of business)


It is especially sad that VC money is currently being spent on developing AI to eliminate good jobs rather than on developing robots to eliminate bad jobs.


Many machinists, welders, etc would have asked the same question when we shipped most of American manufacturing overseas. There was a generation of experienced people with good jobs that lost their jobs and white collar workers celebrated it. Just Google “those jobs are never coming back”, you’ll find a lot of heartless comparisons to the horse and buggy.

Why should we treat these office jobs any differently?


Agree - also note that many office jobs have been shipped overseas, and also automated out of existence. When I started work there were slews of support staff booking trips, managing appointments, typing correspondence & copying and typesetting docuements. For years we laughed at the paperless office - well it's been here for a decade and there's no discussion about it anymore.

Interestingly at the same time as all those jobs disappeared and got automated there were surges of people into the workforce. Women started to be routinely employed for all but a few years of child birth and care, and many workers came from overseas. Yet, white collar unemployment didn't spike. The driver for this was that the effective size of the economy boomed with the inclusion of Russia, China, Indonesia, India and many other smaller countries in the western sphere/economy post cold war... and growth from innovation.


US manufacturing has not been shipped out. US manufacturing output keeps increasing, though it's overall share of GDP is dropping.

US manufacturing jobs went overseas.

What went overseas were those areas of manufacturing that was more expensive to automate than it was to hire low paid workers elsewhere.

With respect to your final question, I don't think we should treat them differently, but I do think few societies have handled this well.

Most societies are set up in a way that creates a strong disincentive for workers to want production to become more efficient other than at the margins (it helps you if your employer is marginally more efficient than average to keep your job safer).

Couple that with a tacit assumption that there will always be more jobs, and you have the makings of a problem if AI starts to eat away at broader segments.

If/when AI accelerates this process you either need to find a solution to that (in other words, ensure people do not lose out) or it creates a strong risk of social unrest down the line.


If I didn't celebrate that job loss am I allowed to not celebrate this one?


The plan has always been to build the robots together with the better AI. Robots ended up being much harder than early technologists imagined for a myriad different reasons. It turned out that AI is easier or at least that is the hope.


Actually I'd argue that we've had robots forever, just not what you'd consider robots because they're quite effective. Consider the humble washing machine or dishwasher. Very specialized, and hyper effective. What we don;'t have is Gneneralized Robotics, just like we don't have Generalized Intelligence.

Just as "Any sufficiently advanced technology is indistinguishable from magic", "Any sufficiently omnipresent advanced technology is indistinguishable from the mundane". Chat GPT will feel like your smart phone which now feels like your cordless phone which now feels like your corded phone which now feels like wireless telegram on your coal fired steam liner.


No, AI is tremendously harder than early researchers expected. Here's a seminal project proposal from 1955:

"We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer. “


GP didn't say that AI was easier than expected, rather that AI is easier than robotics, which is true. Compared to mid-century expectations, robotics has been the most consistently disappointing field of research besides maybe space travel, and even that is well ahead of robots now.


> well ahead of robots now

I am not working in that field, but as an outsider it feels like the industrial robots doing most of the work on TSMC's and Tesla's production lines are on the contrary extremely advanced. Aside from that what Boston Dynamics or startups making prosthetics came up is nothing short of amazing.

If anything software seems to be the bottleneck for building useful humanoids...


I think the state of the art has gotten pretty good, but still nowhere near as good as people thought it would be fifty years ago. More importantly, as of a year ago AI is literally everywhere, hundreds of millions of regular users and more than that who've tried it, almost everyone knows it exists and has some opinion on it. Compare that to even moderately mobile, let alone general, robots. They're only just starting to be seen by most people on a regular basis in some specific, very small geographical locations or campuses. The average person interacts with a mobile or general robot 0 times a day. Science fiction as well as informed expert prediction was always the opposite way around - robots were coming, but they would be dumb. Now it's essentially a guarantee that by the time we have widespread rollout of mobile, safe, general purpose robots, they are going to be very intelligent in the ways that 20 years ago most thought was centuries away.

Basically, it is 1000x easier today to design and build a robot that will have a conversation with you about your interests and then speak poetry about those interests than it is to build a robot that can do all your laundry, and that is the exact opposite of what all of us have been told to expect about the future for the last 70 years.


Space travel was inevitably going to be disappointing without a way to break the light barrier. even a century ago we thought the sound barrier was impossible to penetrate, so at least we are making progress, albiet slow.

On the bright side, it is looking more and more like terraforming will be possible. Probably not in our lifetimes, but in a few centuries time (if humanity survives)


Forget the light barrier, just getting into space cheaply enough is the limiting factor.

Barring something like fusion rockets or a space elevator, it's going to be hard to really do a whole lot in space.


I think the impact of AI is not between good jobs va bad jobs but between good workers and bad workers. For a given field, AI is making good workers more efficient and eliminating those who are bad at their jobs (e.g. the underperforming accountant who is able to make a living doing the more mundane tasks whose job is threatened by spreadsheets and automation)


I worry the effects this has on juniors…


I think AI, particularly text based, seems like a cleaner problem. Robots are derivative of AI, robotics, batteries, hardware, compute, societal shifts. It appears our tech tree needs stable AI first, then can tackle the rest of problems which are either physical or infrastructure.


Capitalism always seeks to commodify skills. We of the professional managerial class happily assist, certain they'll never come for our jobs.


A serious, hopefully not flippant question; Who are "they" in this case? Particularly as the process you describe tends to the limit.


I would guess that "they" are "the capitalists" as a class. It's very common to use personal pronouns for such abstract entities, and describe them in behaving in a goal-driven matter. It doesn't really matter who "they" are as individuals (or even if they are individuals).

More accurate would be something like "reducing labor costs increases return on capital investment, so labor costs will be reduced in a system where economy organizes to maximize return on capital investment". But our language/vocabulary isn't great at describing processes.


Poor phrasing. Apologies. u/jampekka nails it.

Better phrasing may have been

"...happily assist, confident our own jobs will remain secure."


Thanks. Not putting this onto you so I'll say "we/our" to follow your good faith;

What is "coming for our jobs" is some feature of the system, but it being a system of which we presume to be, and hope to remain a part, even though ultimately our part in it must be to eliminate ourselves. Is that fair?

Our hacker's wish to "replace myself with a very small shell-script and hit the beach" is coming true.

The only problem I have with it, even though "we're all hackers now", is I don't see everybody making it to the beach. But maybe everybody doesn't want to.

Will "employment" in the future be a mark of high or low status?


The problem is that under the current system the gains of automation or other increased productivity do not "trickle down" to workers that are replaced by the AI/shell script. Even to those who create the AI/shell script.

The "hit the beach" part requires that you hide the shell script from the company owners, if by hitting the beach you don't mean picking up empty cans for sustinence.


> Will "employment" in the future be a mark of high or low status?

Damn good question.

Also, +1 for beach metaphor.

My (ignorant, evolving) views on these things have most recently been informed by John and Barbara Ehrenreich's observations about the professional-managerial class.

ICYMI:

https://en.wikipedia.org/wiki/Professional%E2%80%93manageria...


An interesting view is that people would still "work" even if they weren't needed for anything productive. In this "Bullshit job" interpretation wage labor is so critical for social organization and control that jobs will be "invented" even if the work is not needed for anything, or is actively harmful (and that this is already going on).

https://strikemag.org/bullshit-jobs/


> Spreadsheets actually did put many accountants

https://cpatrendlines.com/2017/09/28/coming-pike-accountants...

Not really seeing any correlation in graduation rates. Excel was introduced in 1985. Every accountant had a computer in the 80s.


> But I’ve learned that whatever the output is requires as much scrutiny as a good code review. I fear there’s a lot of copy and pasting of wrong answers out there. The good news is that for now they will need real engineers to come in and clean up the mess.

isn't it sad that real engineers are going to work as cleaners for AI output? And doing this they are in fact training the next generation to be more able to replace real engineers... We are trading our future income for some minor (and questionable) development speed today.


AI might help programmers become more rigorous by lowering the cost of formal methods. Imagine an advanced language where simply writing a function contract, in some kind of Hoare logic or using a dependently-typed signature, yields provably correct code. These kinds of ideas are already worked on, and I believe are the future.


I'm not convinced about that. Writing a formal contract for a function is incredibly hard, much harder than writing the function itself. I could open any random function in my codebase and with high probability get a piece of code that is < 50 lines, yet would need pages of formal contract to be "as correct" as it is now.

By "as correct", I mean that such a function may have bugs, but the same is true for an AI-generated function derived from a formal contract, if the contract has a loophole. And in that case, a simple microscopic loophole may lead to very very weird bugs. If you want a taste of that, have a look at how some C++ compilers remove half the code because of an "undefined behaviour" loophole.

Proofreading what Copilot wrote seems like the saner option.


That is because you have not used contracts when you started developing your code. Likewise, it would be hard to enforce structured programming on assembly code that was written without this concept in mind.

Contracts can be quite easy to use, see e.g. Dafny by MS Research.


I think this is longer off than you might expect. LLMs work because the “answer” (and the prompt) is fuzzy and inexact. Proving an exact answer is a whole different and significantly more difficult problem, and it’s not clear the LLM approach will scale up to that problem.


Formal methods/dependent types are the future in the same way fusion is, it seems to be perpetually another decade away.

In practice, our industry seems to have reached a sort of limit in how much type system complexity we can actually absorb. If you look at the big new languages that came along in the last 10-15 years (Kotlin, Swift, Go, Rust, TypeScript) then they all have type systems of pretty similar levels of power, with the possible exception of the latter two which have ordinary type systems with some "gimmicks". I don't mean that in a bad way, I mean they have type system features to solve very specific problems beyond generalizable correctness. In the case of Rust it's ownership handling for manual memory management, and for TypeScript it's how to statically express all the things you can do with a pre-existing dynamic type system. None have attempted to integrate generalized academic type theory research like contracts/formal methods/dependent types.

I think this is for a mix of performance and usability reasons that aren't really tractable to solve right now, not even with AI.


> If you look at the big new languages that came along in the last 10-15 years (Kotlin, Swift, Go, Rust, TypeScript) then they all have type systems of pretty similar levels of power, with the possible exception of the latter two which have ordinary type systems with some "gimmicks".

Those are very different type systems:

- Kotlin has a Java-style system with nominal types and subtyping via inheritance

- TypeScript is structurally typed, but otherwise an enormous grab-bag of heuristics with no unifying system to speak of

- Rust is a heavily extended variant of Hindley-Milner with affine types (which is as "academic type theory" as it gets)


Yes, I didn't say they're the same, only that they are of similar levels of power. Write the same program in all three and there won't be a big gap in level of bugginess.

Sometimes Rustaceans like to claim otherwise, but most of the work in Rust's type system goes into taming manual memory management which is solved with a different typing approach in the other two, so unless you need one of those languages for some specific reason then the level of bugs you can catch automatically is going to be in the same ballpark.


> Write the same program in all three and there won't be a big gap in level of bugginess.

I write Typescript at work, and this has not been my experience at all: it's at least an order of magnitude less reliable than even bare ML, let alone any modern Hindley-Milner based language. It's flagrantly, deliberately unsound, and this causes problems on a weekly basis.


Thanks, I've only done a bit of TypeScript so it's interesting to hear that experience. Is the issue interop with JavaScript or a problem even with pure TS codebases?


LLMs are pretty much the antithesis of rigor and formal methods.


So is the off the cuff, stream of consciousness chatter humans use to talk. We still manage to write good scientific papers (sometimes...), not because we think extra hard and then write a good scientific treatment in one go without edits, research or revisions. Instead we we have a whole text structure, revision process, standardised techniques of analysis, searchable research data collections, critique and correction by colleagues, searchable previous findings all "hyperlinked" together by references, and social structures like peer review. That process turns out high-quality, high-information work product at the end, without a significant cognitive adjustment to the humans doing the work aside from just learning the new information required.

I think if we put resources and engineering time into trying to build a "research lab" or "working scientist tool access and support network" with every intelligent actor involved emulated with LLMs, we could probably get much, much more rigorous results out the other end of that process. Approaches like this exist in a sort of embryonic form with LLM strategies like expert debate.


I think the beauty of our craft on a theoretical level is that it very quickly outgrows all of our mathematics and what can be stated based on that (e.g. see the busy beaver problem).

It is honestly, humbling and empowering at the same time. Even a hyper-intelligent AI will be unable to reason about any arbitrary code. Especially that current AI - while impressive at many things - is a far cry from being anywhere near good at logical thinking.


I think the opposite! The problem is that almost everything in the universe can be cast as computing, and so we end up with very little differentiating semantic when thinking about what can and can't be done. Busy beavers is one of a relatively small number of problems that I am familiar with (probably there is a provably infinite set of them, but I haven't navigated it) which are uncomputable, and it doesn't seem at all relevant to nature.

And yet we have free will (ok, within bounds, I cannot fly to the moon etc, but maybe my path integral allows it), we see processes like the expansion of the universe that we cannot account for and infer them like quantum gravity as well.


They won't need human help when the time comes.


It's also where I find most of the work. There are plenty of off the shelf tools to solve all the needs of the company I work at. However, we still end up making a lot of our own stuff, because we want something that the off the shelf option doesn't do, or it can't scale to the level we need. Other times we buy two tools that can't talk to each other and need to write something to make them talk. I often hear people online say they simply copy/paste stuff together from Stack Overflow, but that has never been something I could do at my job.

My concern isn't about an LLM replacing me. My concern is our CIO will think it can, firing first, and thinking later.


It’s not just about if a LLM could replace you, if a LLM replaces other enough other programmers it’ll tank the market price for your skills.


I don’t think this will happen because we’ll just increase the complexity of the systems we imagine. I think a variant of Wirth’s law applies here: the overall difficulty of programming tasks stays constant because, when a new tool simplifies a previously hard task, we increase our ambitions.


In general people are already working at their limits, tooling can help a bit but the real limitation to handling complexity is human intelligence and that appears to be mostly innate. The people this replaces can’t exactly skill up to escape the replacement, and the AI will keep improving so the proportion being replaced will only increase. As someone near the top end of the skill level my hope is that I’ll be one of the last to go, I’ll hopefully make enough money in that time to afford a well stocked bunker.


But, for example, I probably couldn’t have written a spell checker myself forty years ago. Now, something like aspell or ispell is just an of the shelf library. Similarly, I couldn’t implement Timely Stream Processing in a robust way, but flink makes it pretty easy for me to use with a minimal conceptual understanding of the moving parts. New abstractions and tools raise the floor, enabling junior and mid-level engineers to do what would have taken a much more senior engineer before they existed.


"in a robust way" does a lot of work here and works as a weasel word/phrase, i.e. it means whatever the reader wants it to mean (or can be redefined in an argument to suit your purpose).

Why is it that you feel that you couldn't make stream processing that works for your use cases? Is it also that you couldn't do it after some research? Are you one of the juniors/mids that you refer to in your poost?

I'm trying to understand this type of mindset because I've found that overwhelmingly most things can be done to a perfectly acceptable degree and often better than big offerings just from shedding naysayer attitudes and approaching it from first principles. Not to mention the flexibility you get from then owning and understanding the entire thing.


I think you’re taking what I’m saying the opposite of the way I intended it. With enough time and effort, I could probably implement the relevant papers and then use various tools to prove my implementation free of subtle edge cases. But, Flink (and other stream processing frameworks) let me not spend the complexity budget on implementing watermarks, temporal joins and the various other primitives that my application needs. As a result, I can spend more of my complexity budget within my domain and not on implementation details.


I used to think that way but from my experience and observations I've found that engineers are more limited by their innate intelligence rather than their tooling. Experience counts but without sufficient intelligence some people will never figure out certain things no matter how much experience they have - I wish it wasn't so but it's the reality that I have observed. Better tooling will exacerbate the difference between smart and not so smart engineers with the smart engineers becoming more productive and the not so smart engineers will instead be replaced.


If an LLM gets good enough to come for our jobs it is likely to replace all the people who hire us, all the way up to the people who work at the VC funds that think any of our work had value in the first place (remember: the VC fund managers are yet more employees that work for capital, and are just as subject to being replaced as any low-level worker).


that's true, but it's harder to replace someone when you have a personal connection to them. VC fund managers are more likely to be personally known to he person who signs the checks. low-level workers may never have spoken any words to them or even ever have met them.


I think another possibility is if you have skills that an LLM can’t replicate, your value may actually increase.


Only if the other people that the LLM did replace cannot cross train into your space. Price is set at the margins. People imagine it’ll be AI taking the jobs but mostly it’ll be people competing with other people for the space that’s left after AI has taken its slice.


Then the CIO itself gets fired … after all, average per job life of a CIO is roughly 18 months


We’ll see - but given the gap between chatgpt 3 and 4, I think AIs will be competitive with mid level programmers by the end of the decade. I’d be surprised if they aren’t.

The training systems we use for LLMs are still so crude. ChatGPT has never interacted with a compiler. Imagine learning to write code by only reading (quite small!) snippets on GitHub. That’s the state llms are in now. It’s only a matter of time before someone figures out how to put a compiler in a reinforcement learning loop while training an LLM. I think the outcome of that will be something that can program orders of magnitude better. I’ll do it eventually if nobody else does it first. We also need to solve the “context” problem - but that seems tractable to me too.

For all the computational resources they use to do training and inference, our LLMs are still incredibly simple. The fact they can already code so well is a very strong hint for what is to come.


With today's mid level programmers, yes. But by that time, many of today's mid level programmers will be able to do stuff high level programmers do today.

Many people underestimate an LLM's most powerful feature when comparing it with something like Stackoverflow: the ability to ask followup questions and immediately get clarification on anything that is unclear.

I wish I had had access to LLM's when I was younger. So much time wasted on repetitive, mundane in-between code...


> the ability to ask followup questions and immediately get clarification on anything that is unclear.

Not only that, but it has the patience of a saint. It never makes you beg for a solution because it thinks there's an XY problem. It never says "RTFM" before posting an irrelevant part of the documentation because it only skimmed your post. It never says "Why would you use X in 2023? Everyone is using framework Y, I would never hire anyone using X."

The difference comes down to this: unlike a human, it doesn't have an ego or an unwarranted feeling of superiority because it learned an obscure technology.

It just gives you an answer. It might tell you why what you're doing is suboptimal, it might hallucinate an answer that looks real but isn't, but at least you don't have to deal with the the worst parts of asking for help online.


Yeah. You also don't have to wait for an answer or interrupt someone to get that answer.

But - in the history of AIs written for chess and go, there was a period for both games where a human playing with an AI could beat either a human playing alone or an AI playing alone.

I suspect we're in that period for programming now, where a human writing code with an AI beats an AI writing code alone, and a human writing code alone.

For chess and go, after a few short years passed, AIs gained nothing by having a human suggesting moves. And I think we'll see the same before long with AI programmers.


Good riddance. I can finally get started on the massive stockpile of potential projects that I never had time for until now.

It's a good time to be in the section of programmers that see writing code as a means to an end and not as the goal itself.

It does surprise me that so many programmers, whose mantra usually is "automate all the things", are so upset now that all the tedious stuff can finally be automated in one big leap.

Just imagine all the stuff we can do when we are not wasting our resources finding obscure solutions to deeply burried environment bugs or any of the other pointless wastes of time!


> are so upset now that all the tedious stuff can finally be automated in one big leap.

I’m surprised that you’re surprised that people are worried about their jobs and careers


The jobs and careers are not going anywhere unless you are doing very low level coding. There will be more opportunities, not less.


The invention of cars didn’t provide more jobs for horses. I’m not convinced artificial minds will make more job opportunities for humans.

A lot of that high level work is probably easier to outsource to an AI than a lot of the mundane programming. If not now, soon. How long before you can walk up to a computer and say “hey computer - make me a program that does X” and it programs it up for you? I think that’ll be here before I retire.


Wouldn't you agree the invention of the car created a lot more jobs (mechanics, designers, marketing people etc) than it eliminated?

As far as I can tell, this will only increase the demand for people who actually understand what is going on behind the scenes and who are able to deploy all of these new capabilities in a way that makes sense.


It did. But not for horses. Or horse riders. And I don’t think the average developer understands how AIs work well enough to stay relevant in the new world that’s coming.

Also, how long before AIs can do that too - before AIs also understand what is going on behind the scenes, and can deploy all these new capabilities in a way that makes sense? You’re talking about all the other ways you can provide value using your brain. My worry is that for anything you suggest, artificial brains will be able to do whatever it is you might suggest. And do it cheaper, better or both.

GPT4 is already superhuman in the breadth of its knowledge. No human can know as much as it does. And it can respond at superhuman speeds. I’m worried that none of us are smart enough that we can stay ahead of the wave forever.


GPT4's "knowledge" is broad, but not deep. The current generation of LLM's have no clue when it comes to things like intent or actual emotion. They will always pick the most obvious (and boring) choice. There is a big gap between excellent mimicry and true intelligent thought.

As a developer you don't need to know how they work, you just need to be able to wield their power. Should be easy enough if you can read and understand the code it produces (with or without it's help).

Horses don't play a part in this; programmers are generally not simple beasts that can only do one thing. I'm sure plenty of horse drivers became car drivers and those that remained found something else to do in what remained of the horse business.

Assuming we do get AI that can do more than just fool those who did not study them, do you really think programmers will be the first to go? By the time our jobs are on the line, so many other jobs will have been replaced that UBI is probably the only logical way to go forward.


>imagine all the stuff we can do

..if we don't have to do stuff?


Like I posted above: for me programming is a means to an end. I have a fridge full of plans, that will last me for at least a decade, even if AI would write most of the code for me.

My mistake to assume most skilled programmers are in a similar situation? I know many and none of them have time for their side projects.


I mean it's a bit of a weird hypothetical situation to discuss but first of all, if I didn't have to work, probably I would be in a financial pickle, unless the prediction includes UBI of some sort. Secondly, most of my side projects that I would like to create are about doing something that this AI would then also be able to do, so it seems like there is nothing left..


So you expect AI will just create all potential interesting side projects by itself when it gets better, no outside intervention required? I have high hopes, but let's be realistic here.

I'm not saying you won't have to work. I'm saying you can skip most of the tedious parts of making something work.

If trying out an idea will only take a fraction of the time and cost it used to, it will become a lot easier to just go for it. That goes for programmers as well as paying clients.


> Just imagine all the stuff we can do when we are not wasting our resources finding obscure solutions to deeply buried environment bugs or any of the other pointless wastes of time!

Yeah, we can line up at the soup kitchen at 4 AM!


So you've never given up on an idea because you didn't have the time for it? I just assumed all programmers discard potential projects all the time. Maybe just my bubble though.


> Not only that, but it has the patience of a saint. It never makes you beg for a solution because it thinks there's an XY problem. It never says "RTFM" before posting an irrelevant part of the documentation because it only skimmed your post. It never says "Why would you use X in 2023? Everyone is using framework Y, I would never hire anyone using X."

> The difference comes down to this: unlike a human, it doesn't have an ego or an unwarranted feeling of superiority because it learned an obscure technology.

The reason for these harsh answers is not ego or feeling of superiority, but rather a real willingness to help the respective person without wasting an insane amount of time for both sides. Just like one likes to write concise code, quite some experienced programmers love to give very concise, but helpful answers. If the answer is in the manual, "RTFM" is a helpful answer. Giving strongly opinionated technology recommendations is also very helpful way to give the beginner a strong hint what might be a good choice (until the beginner has a very good judgement of this on his own).

I know that this concise style of talking does not fit the "sugar-coated" kind of speaking that is (unluckily) common in society. But it is much more helpful (in particular for learning programming).


On the other hand, ChatGPT will helpfully run a bing search, open the relevant manual, summarize the information, and include additional hints or example code without you needing to do anything. It will also provide you the link, in case you wish to verify or read the source material itself.

So while RTFM is a useful answer when you (the expert) are limited by your own time & energy, LLMs present a fundamental paradigm shift that is both more user-friendly and arguably more useful. Asking someone to go from an LLM back to RTFM today would be ~akin to asking someone to go from Google search back to hand-written site listings in 2003.

You could try, but for most people there simply is no going back.


A lot of what we learned was learned by hours and days of frustration.

Just like exercise trains you to be uncomfortable physically and even mentally, frustration is part of the job.

https://www.thecut.com/2016/06/how-exercise-shapes-you-far-b...

Those who are used to having it easy with LLMs will be up against a real test when they hit a wall.


> But by that time, many of today's mid level programmers will be able to do stuff high level programmers do today.

Not without reason some cheeky devils already renamed "Artificial Intelligence" to "Artificial Mediocracy". AIs generate code that is mediocre. This is a clear improvement if the programmer is bad, but leads to deterioration if the programmer is above average.

Thus, AI won't lead to your scenario of mid level programmers being able to do stuff high level programmers do today, but will rather just make bad programmers more mediocre.


The way an LLM can teach and explain is so much better than having to chase down information manually. This is an amazing time to learn how to code.

An LLM can actually spot and fix mediocrity just fine. All you have to do is ask. Drop in some finished code and add "This code does X. What can I do to improve it?"

See what happens. If you did well, you'll even get a compliment.

It's also a massive boon in language mobility. I never really used Python, complex batch files or Unity C# before. Now I just dive right in, safe in the knowledge that I will have an answer to any basic question in seconds.


Why do you say the snippets are small? They don’t get trained on the full source files?


Nope. LLMs have a limited context window partly because that's the chunk size with which they're presented with data to learn during training (and partly for computational complexity reasons).

One of the reasons I'm feeling very bullish on LLMs is because if you look at the exact training process being used it's full of what feels like very obvious low hanging fruit. I suspect a part of the reason that training them is so expensive is that we do it in really dumb ways that would sound like a dystopian hell if you described it to any actual teacher. The fact that we can get such good results from such a terrible training procedure by just blasting through it with computational brute force, strongly suggests that much better results should be possible once some of that low hanging fruit starts being harvested.


Imagine being able train a model that mimics a good programmer. It would talk and program in the principles of that programmer's philosophy.


> LLMs in particular can be a very fast, surprisingly decent (but, as you mention, very fallible) replacement for Stack Overflow

I think that sentence nails it. For the people who consider "searching stackoverflow and copy/pasting" as programming, LLMs will replace your job, sure. But software development is so much more, critical thinking, analysing, gathering requirements, testing ideas and figuring out which to reject, and more.


Two years ago we were quite a ways out from having LLMs that could competently respond to commands without getting into garbage loops and repeating random nonsense over and over. Now nobody even talks about the Turing test anymore because it's so clearly been blown past.

I wouldn't be so sure it will be very long before solving big, hard, and complex problems is within reach...


> LLMs in particular can be a very fast, surprisingly decent (but, as you mention, very fallible) replacement for Stack Overflow

Nice thing about Stack Overflow is it’s self-correcting most of the time thanks to,

https://xkcd.com/386/

GPT not so much.


I’ve never found GPT-4 capable of producing a useful solution in my niche of engineering.

When I’m stumped, it’s usually on a complex and very multi-faceted problem where the full scope doesn’t fit into the human brain very well. And for these problems, GPT will produce some borderline unworkable solutions. It’s like a jack of all trades and master of none in code. It’s knowledge seems a mile wide and an inch deep.

Granted, it could be different for junior to mid programmers.


Same here. I'm not a developer. I do engineering and architecture in IAM. I've tested out GPT-4 and it's good for general advice or problem solving. But it can't know the intricascies of the company I work at with all our baggage, legacy systems and us humans sometimes just being straight up illogical and inefficient with what we want.

So my usage has mostly been for it to play a more advanced rubber duck to bounce ideas and concepts off of and to do some of the more tedious scripting work (that I still have to double check thoroughly).

At some point GPT and other LLMs might be able to replace what I do in large parts. But that's still a while off.


How long ago would you have considered this discussion ridiculous? How long till GPT-N will be churning out solutions faster than you can read them? It's useless for me now as well, but I'm pretty sure I'll be doomed professionally in the future.


Not necessarily. Every hockey stick is just the beginning of an s-curve. It will saturate, probably sooner than you think.


Some parts of AI will necessarily asymptote to human-level intelligence because of a fixed corpus of training data. It's hard to think AI will become a better creative writer than the best human creative writers, because the AI is trained on their output and you can't go much further than that.

But in areas where there's self-play (e.g. Chess, and to a lesser extent, programming), there is no good reason to think it'll saturate, since there isn't a limit on the amount of training data.


How does programming have self-play? I'm not sure I understand. Are you going to generate leetcode questions with one AI, have another answer them, and have a third determine whether the answer is correct?

I'm struggling to understand how an LLM is meant to answer the questions that come up in day-to-day software engineering, like "Why is the blahblah service occasionally timing out? Here are ten bug reports, most of which are wrong or misleading" or "The foo team and bar team want to be able to configure access to a Project based on the sensitivity_rating field using our access control system, so go and talk to them about implementing ABAC". The discipline of programming might be just a subset of broader software engineering, but it arguably still contains debugging, architecture, and questions which need more context than you can feed into an LLM now. Can't really self-play those things without interacting with the real world.


> How does programming have self-play?

I think there's potentially ways to generate training data, since success can be quantified objectively, e.g. if a piece of generated code compiles and generates a particular result at runtime, then you have a way to discriminate outcomes without a human in the loop. It's in the grey area between pure self-play domains (e.g. chess) and domains that are more obviously constrained by the corpus of data that humans have produced (e.g. fine art). Overall it's probably closer to the latter than the former.


So you think human readers have magical powers to rate say a book that an AI can't replicate?


There's a gulf of difference between domains where self-play means we have unlimited training data for free (e.g. Chess) versus domains where there's no known way to generate more training data (e.g. Fine art). It's possible that the latter domains will see unpredictable innovations that allow it to generate more training data beyond what humans have produced, but that's an open question.


This is totally wrong. It has already saturated because we are already using all the data we can.

The language model "creativity" is a total fraud. It is not creative at all but it takes time to see the edges. It is like AI art. AI art is mind blowing until you have seen the same 2000th variation on basically the same theme because it is so limited in what it can do.

To compare the simple game of chess to the entire space of what can be programmed on a computer is utterly absurd. You just don't know what you are talking about.


What’s your niche?

I think much of using it well is understanding what it can and can’t do (though of course this is a moving target).

It’s great when the limiting factor is knowledge of APIs, best practices, or common algorithms. When the limiting factor is architectural complexity or understanding how many different components of a system fit together, it’s less useful.

Still, I find I can often save time on more difficult tasks by figuring out the structure and then having GPT-4 fill in the blanks. It’s a much better programmer once you get it started down the right path.


My niche is in video game programming, and I am very specialized in a specific area. So I might ask things like how would one architect a certain game system with a number of requirements, to meet certain player expectations, and be compatible with a number of things.

Unfortunately, it hasn’t been helpful once, and often due to the same reason - when the question gets specific enough, it hallucinates because it doesn’t know, just like in the early days.

Moreover, I am a domain expert in my area, so I only ask for help when the problem is really difficult. For example, when it would take me several days to come up with an answer and a few more weeks to refine it.

Game development has a lot of enthusiasts online sharing material, but most of this material is at junior to intermediate level. You very quickly run out of resources for questions at a principal level, even if you know the problems you have have been solved in other AAA companies.

You have to rely on your industry friends, paid support from middleware providers, rare textbooks, conferences, and, on the off-chance that anything useful got scooped up into the training data set - GPT. But GPT has been more like wishful thinking for me.


Interesting. I also work in game development, and I tend to work on project-specific optimization problems, and I've had the opposite experience.

If I have to solve a hairy problem specific to our game's architecture, obviously I'm not going to ask ChatGPT to solve that for me. It's everything else that it works so well for. The stuff that I could do, but it's not really worth my time to actually do it when I can be focusing on the hard stuff.

One example: there was a custom protocol our game servers used to communicate with some other service. For reasons, we relied on an open-source tool to handle communication over this protocol, but then we decided we wanted to switch to an in-code solution. Rather than study the open source tool's code, rewrite it in the language we used, write tests for it, generate some test data... I just gave ChatGPT the original source and the protocol spec and spent 10 minutes walking it through the problem. I had a solution (with tests) in under half an hour when doing it all myself would've taken the afternoon. Then I went back to working on the actual hard stuff that my human brain was needed to solve.

I can't imagine being so specialized that I only ever work on difficult problems within my niche and nothing else. There's always some extra query to write, some API to interface with, some tests to write... it's not a matter of being able to do it myself, it's a matter of being able to focus primarily on the stuff I need to do myself.

Being able to offload the menial work to an AI also just changes the sorts of stuff I'm willing to do with my time. As a standalone software engineer, I will often choose not to write some simple'ish tool or script that might be useful because it might not be worth my time to write it, especially factoring in the cost of context switching. Nothing ground breaking, just something that might not be worth half an hour of my time. But I can just tell AI to write the script for me and I get it in a couple minutes. So instead of doing all my work without access to some convenient small custom tools, now I can do my work with them, with very little change to my workflow.


>I can't imagine being so specialized that I only ever work on difficult problems within my niche and nothing else. There's always some extra query to write, some API to interface with, some tests to write... it's not a matter of being able to do it myself, it's a matter of being able to focus primarily on the stuff I need to do myself.

there might simply not be enough literature for LLM's to properly write this stuff in certain domains. I'm sure a graphics programmer would consider a lot of shader and DirectX API calls to be busy work, but I'm not sure if GPT can get more than a basic tutorial renderer working. Simply because there really isn't that much public literature to begin with, especially for DX12 and Vulkan. That part of games has tons of tribal knowledge kept in-house at large studios and Nvidia/intel/AMD so there's not much to go on.

But I can see it replacing various kinds of tools programming or even UI work soon, if not right now. It sounds like GPT works best for scripting tasks and there's tons of web literature to go off of (and many programmers hate UI work to begin with).


Well, I think most software engineers in games don’t work all that much with scripts or database queries, nor write that many tests for systems of scale that GPT could produce. You might be in devops, tools, or similar if you deal with a lot of that in game dev.

GPT code in a lot of critical path systems wouldn’t pass code review, not probably integrate well enough into any bespoke realtime system. It seems to be more useful in providing second opinions on high level decisions to me, but still not useful enough to use.

Maybe it could help with some light Lua or C# gameplay scripting, although I think co-pilot works much better. But all that doesn’t matter as due to licensing, the AAA industry still generally can’t use any of these generative AIs for code. Owning and being able to copyright all code and assets in a game is normally a requirement set by large publishers.

To conclude, my experience is indeed very different from yours.


I think the difference in our perspectives is the type of studios we work for. In a AAA studio what you're saying makes perfect sense. But I've worked entirely for small- and mid-size studios where developers are often asked to help out with things outside their specialization. In my world, even having a specialization probably means you're more experienced and thus you're involved in a variety of projects.

Whether that's "most" software engineers in games or not I can't say. AAA studios employ way more engineers per project but there are comparatively way more small- and mid-sized developers out there. It's interesting how strong the bubbles are, even within a niche industry like games.


I think GPT is comparatively poor at game dev due to a relatively small training corpus, with much more code being locked away in binaries (.uproject, etc), and game code rarely being open sourced

Godot might benefit more than other engines, since much of the code is stored as plaintext GDscript and makes it to GitHub more frequently


I'm interested to know if you've tried creating a custom GPT with their builder or the API. If you have enough old example code, notes, or those rare textbooks you mention you could add those as files and see if the built in RAG improves the answers it gives.


I tried building a custom GPT but the training data it has is not sufficient, no matter how well it’s steered.

Documents and code are confidential in the AAA games industry as they are the money makers. Developers are not free to hand them over to third parties, that would be known as a leak. With textbooks, that would be a pretty grey area use case. So I’ve not experimented with that.

I think it could help, but because it’s so infeasible practically, there’s no incentive to try this with synthetic data, too.


It struggles with (industrial, not hobbyist) embedded firmware a fair bit. I can almost coax decent results for simple tasks out of it, sometimes.


LLMs almost never write good senior quality code at first in niche disciplines. You need to finesse it a lot to have it produce the correct answer. And that makes it unusable for when you genuinely do not know the answer to the question you’re asking, which is kind of the entire point.


Well no, you shouldn't use it for your top-end problems, but your bottom-end problems. Aren't there things that you have to do in your job that really could be done by a junior programmer? Don't you ever have one-off (or once-a-year) things you have to do that each time you have to invest a lot of time refreshing in your brain, and then basically forgetting for lack of use?

Here's an example I used the other day: Our project had lost access to our YT channel, which had 350+ videos on it (due to someone's untimely passing and a lack of redundancy). I had used yt-dlp to download all the old videos, including descriptions. Our community manager had uploaded all the videos, but wasn't looking forward to copy-and-pasting every description into the new video.

So I offered to use GPT-4 to write a python script to use the API to do that for her. I didn't know anything about the YT API, nor am I an expert in python. I wouldn't have invested the time learning the YT API (and trying to work through my rudimentary python knowledge) for a one-off thing like this, but I knew that GPT-4 would be able to help me focus on what to do rather than how to do it. The transcript is here:

https://chat.openai.com/share/936e35f9-e500-4a4d-aa76-273f63...

By contrast, I don't think there's any possible way the current generation could have identified, or helped fix, this problem that I fixed a few years ago:

https://xenbits.xenproject.org/xsa/xsa299/0011-x86-mm-Don-t-...

(Although it would be interesting to try to ask it about that to see how well it does.)

The point of using GPT-4 should be to take over the "low value" work from you, so that you have more time and mental space to focus on the "high value" work.


> Don't you ever have one-off (or once-a-year) things you have to do that each time you have to invest a lot of time refreshing in your brain, and then basically forgetting for lack of use?

Not really. In AAA game programming, you mostly own the same systems you specialize in throughout the production process.

For example, someone in Rockstar North might work on the minimap for the entire production of a game.

In smaller AAA companies, a person might own vehicles or horses, or even the entire progression system. But still, programmers are rarely working on disconnected things.

You rarely step out of your expertise zone. And you are usually expected to perform much better than GPT would in that zone.


> Aren't there things that you have to do in your job that really could be done by a junior programmer?

Hardly, because explaining how basically everything fits together is the hard and central part. Thus, the way to make things doable by a junior programmer is to teach him to become much better in programming and the software that is developed (which the company attempts). Until then, there are few things where a junior programmer is of productive help.

> Don't you ever have one-off (or once-a-year) things you have to do that each time you have to invest a lot of time refreshing in your brain, and then basically forgetting for lack of use?

Hardly, because I have a pretty good long-time memory.


Perhaps by learning to use the YT API (seriously something that should take 2 hours max if you know how http works) you'll learn something from their design choices, or develop opinions on what makes a good API. And by learning a bit more python you'll get exposed to patterns you could use in your own language.


If anything, using GPT-4 makes a lot of that more efficient. Rather than scrolling through loads of API documentation trying to guess how to do something, writing Python with a "C" accent, I can just read the implementation that GPT-4 spits out, which is almost certainly based on seeing hundreds of examples written by people who are fluent in python, and thus using both to best effect.


Same. Even for technologies that it supposedly should know a lot about (e.g. Kafka), if I prompt it for something slightly non-standard, it just makes up things that aren't supported or is otherwise unhelpful.

The one time I've found ChatGPT to be genuinely useful is when I asked it to explain a bash script to me, seeing as bash is notoriously inscrutable. Still, it did get a detail wrong somehow.


Yes, it is good at summarizing things and regressing things down to labels. It’s much worse at producing concrete and specific results from its corpus of abstract knowledge.

I think that’s the case with every discipline for it, not only programming. Even when everyone was amazed it could make poetry out of everything, if you asked for a specific type of poem and specific imagery in it, it would generally fail.


i kind of agree but also it kind of sucks spending hours debugging code in which gpt-4 has carefully concealed numerous bugs

i mean raise your hand if debugging code that looks obviously correct is the part of programming you enjoy most?

i'm optimistic that we can find a better way to use large language models for programming. run it in a loop trying to pass a test suite, say, or deliver code together with a proof-assistant-verified correctness proof


Yeah, I agree. I was thinking about it today — that most of my life I have coded projects that I have enjoyed. (Well, I often found ways to enjoy them even when they were unwelcome projects dropped on my desk.)

In a larger sense though I think I have looked for projects that allowed a certain artistic license rather than the more academic code that you measure its worth in cycles, latency or some other quantifiable metric.

I have thought though for some time that the kind of coding that I enjoyed early in my career has been waning long before ChatGPT. I confess I began my career in a (privileged it seems now) era when the engineers were the ones minding the store, not marketing.


I've been saying the same thing. Coding is the worst part of the process. I've been doing it for 20 years professionally and another 10 or more on top of that as a hobby. Don't care about code, just want to make things. Code sucks.


While I don't want to go as far as saying that it sucks, I do largely agree with the sentiment. Personally, I do like coding a little bit but mostly as a puzzle but for the most part it is a means to an end.

Lately, I have been using ChatGPT and the OpenAI API to do exactly that for a few projects. I used it to help me round out the design, brainstorm about approaches, tune database requirements, etc. I basically got to the point where I had a proof of concept for all the separate components in a very short amount of time. Then for the implementation it was a similar story. I already had a much more solid idea (technical and functional design, if you will) of how I wanted to implement things than I normally do. And, for most of the things where I would get slowed down normally, I could just turn to the chat. Then by just telling it what part I had trouble with, it would get me back on track in no time.

Having said all that, I couldn't have used it in such a way without any knowledge of programming. Because if you just tell it that you want to "create an application that does X" it will come up with overly broad solution. All the questions and problems I presented to it were based from a position where I already knew the language, platform and had a general sense of requirements.


I think LLMs are the wrong solution for this problem.

Why make something that produces low level code based off of existing low level code instead of building up meaningful abstractions to make development easier and ensure that low level code was written right?

Basically react and other similar abstractions for other languages did more to take "coding" out of creating applications than gpt ever will IMO.


I had wondered, perhaps there will be an LLM specific framework that works idiomatic to how the LLM operates. I wonder if an LLM optimal framework would be human readable, or would it work differently. The downside obviously, LLMs work by processing existing solutions. Producing a novel framework for LLMs would require humans to make it, defeating the point a bit.


Because we solve the same problems with different tools, languages, and frameworks.

The core of what we do never changes - get input from user, show error, get input again, save the input, show the input.

Now it just got more complicated, even though 20 years later most of this could be a dull Rails or a Django app.

And AI will probably do the decent CRUD part, but you will still need an expert for the hard parts of software.


I rather enjoy making things, or solving problems.

But my favourite bit is refining and optimising the code!

Finding the patterns and abstractions I can make to DRY it out.

That's the bit I like :-)

Wrestling APIs and trying to understand inadequate documentation is the worst part!


Many designers despise AI generated images, because they love the process itself. I knew one who missed the slow loading of massive design documents, because he would use that time to get inspired by stuff.

There were probably a lot of loom weavers that felt the same about their tools. But the times, they are a-changing.


If you don't want to code, how do you "make things"? (Presumably by "things" you mean programs/apps.) "Making" and "coding" are synonymous for programmers.


That's why I still program.


>Maybe I’m in the minority. I’m definitely extremely impressed with GPT4, but coding to me was never really the point of software development.

You're not the minority. You're the majority. The majority can't look reality in the face and see the end. They lie to themselves.

>While GPT4 is incredible, it fails OFTEN. And it fails in ways that aren’t very clear. And it fails harder when there’s clearly not enough training resources on the subject matter.

Everyone and I mean everyone knows that if fails often. Use some common sense here. Why was the article written despite the fact that Everyone knows what you know? Because of the trendline. What AI was yesterday versus what it is today heralds what it will be tomorrow and every tomorrow AI will be failing less and less and less until it doesn't fail at all.

>But even hypothetically if it was 20x better, wouldn’t that be a good thing? There’s so much of the world that would be better off if GOOD software was cheaper and easier to make.

Ever the optimist. The reality is we don't know if it's good or bad. It can be both or it can weigh heavily in one direction. Most likely it will be both given the fact that our entire careers can nearly be replaced.

>Idk where I’m going with this but if coding is something you genuinely enjoy, AI isn’t stopping anyone from doing their hobby. I don’t really see it going away any time soon, and even if it is going away it just never really seemed like the point of software engineering

Sure. AI isn't going to end hobbies. It's going to end careers and ways of life. Hobbies will most likely survive.


I appreciate your position but I want to push back against this type of rhetorical defense of stuff that has no basis in evidence or reasonable expectation.

This sentiment parrots Sam Altman's and Musk's insistence that "AI" is super-powerful and dangerous, which is baseless rhetoric.


I'm used to HN being sensible, and seeing your comment being downvoted makes me wonder what's happening? What's the reason for that optimism?


HN’s culture has changed somewhat and downvotes are now used more often to signal disagreement, sadly. But also “use common sense” and “but the trendline” are only partially compelling arguments as presented if you already believe what is being argued. They’re not compelling enough to those who aren’t convinced yet


The trendline is the only argument. What other predictor of the future is there?

Given the available information there is no condition where one would bet against the trendline.

Common sense is basically trendline following. It's the basis of our existence. You get out of bed without worrying about whether or not there is no ground under your feet because the trendline points to a reality where the ground is always there.

The basis of AI tomorrow being better than today is common sense. chatGPT has improved since inception. Are we predicting improvement will suddenly stop? That AI technology will degrade? Such predictions as stated before, go against common sense.

The big question here isn't about the future of AI. The future is as stated previously predictable by common sense. The big question here is why are so many people abandoning common sense?


> What other predictor of the future is there?

Typically, experts actually thinking about how technology works, on a deep level, does a pretty good job.

Consider for example moore's law. A trendline everyone in the know knew couldn't continue long before it eventually failed. It wasn't a case of "well there are always some naysayers, they're right sometimes", it was anyone with any reasonable experience in building chips knew that each innovation is more hard won than the last, with physical barriers looming.

Is AI like that? inevitably. There are invisible physical barriers to all fields and all technologies. The only way to find them is to try. But the discussion here is essentially hypothesising where and when they will show up. We may well be able to run an AGI. Using current techniques, it will need to be trained on a vastly more powerful compute stack than gpt4. Its difficult to impart to you just how big their current one is. They are going to have to mobilise non-trivial segments of entire industries and supply chains just to be big enough for gpt5. There will also be neat tricks that will be found to reduce requirements. But eventually, some wall will be hit, gains will slow. The bet is whether we get to AGI or whatever before then


Right a trendline typically follows a curve before it reaches the apex.

If anything based on achievements we've had a speed up in the trendline. We are seeing acceleration. Predicting a limit like in Moore's law means seeing a slow down before we hit that limit.

You can make analogies but analogies aren't proof. An analogy to Moore's law ending does not mean it is the same thing happening in AI. You need evidence.

I agree that a limit will eventually be hit. That will always be the case but we haven't hit that limit yet. It's only been roughly a year since the release of chatGPT.

Additionally compute isn't the main story here. The main story is the algorithm. Improvements in that dimension are likely not at a limit yet such that a more efficient algorithm in the future will need less compute.


You're begging the question that error rate is a simple metric we can analyze and predict. That there are not other qualitative factors that can vary independently and be more significant for strategic planning. If there's one trend I recognize, it's the near-tautology that increasingly complex systems become increasing complex, as do their failure modes. An accurate predictive model has an expanding cone of unknowns and chaotic risk. Not some curve that paints a clear target.

Look beyond today's generative AI fabrication or confabulation (hallucination is a misnomer), where naive users are already prone to taking text outputs as factual rather than fictive. To my eye, it's closely linked to the current "disinformation" cultural phenomena. People are gleefully conflating a flood of low-effort, shallow engagement with real investigation, learning, and knowledge. And tech entrepreneurs have already been exploiting this for decades, pitching products that seem more capable than they are, depending on the charitable interpretation of mass consumers to ignore errors and omissions.

How will human participants react if AI get more complex and can exhibit more human-like error modes. Imagine future tools capable of gullibility, delusion, or malice. Seeing passionate, blind faith for LLMs today makes me more worried for that future.

I do not expect that AI will effectively replace me in my work. I admit the possibility that the economy could disrupt my employer and hence my career. I worry that our shared socio-technological environment could be poisoned by snake oil application of AI-beyond-its-means, where the disruption could be more negative than positive. And, that upheaval could extend through too much of my remaining lifetime.

But, these worries are too abstract to be actionable. To function, I think we have to assume things will continue much as they are now, with some hedging/insurance for the unpredictable. There could just as easily be a new AI winter as the spring you imagine, if the current hype curve finds its asymptote and there is funding backlash against the unfulfilled dreams and promises.


You're right. It is unpredictable. The amount of information available is too complex to fully summarize into a clear and accurate prediction.

However the brute force simplistic summary that is analyzable is the trendline. If I had to make a bet: improvement, plateau, or regression I would bet on improvement.

Think of it like the weather. Yes the weatherman made a prediction. And yes the chaos surrounding that prediction makes it highly inaccurate. But even so that prediction is still the best one we got.

Additionally your comment about complexity was not fully correct. That was the surprising thing. These LLMs weren't even complex. The model is still a feed forward network that is fundamentally much simpler then anticipated. Douglas hofstadter predicted agi would involve neural networks with tons of feedback and recursion and the resulting LLM is much simpler then that. The guy is literally going through a crisis right now because of how wrong he was.


I'd argue complexity also comes from the scale of the matrices, i.e. the number of terms in the linear combinations. The interactions between all those terms also introduce complexity, much like a weather simulation is simple but can reflect chaotic transitions.


Of course. The complexity is too massive for us to understand. We just understand the overall algorithm as an abstraction.

You can imagine 2 billion people as an abstraction. But you can't imagine all of their faces and names individually.

We use automated systems to build the LLM by simply by describing the abstraction to a machine. The machine takes that description and builds the LLM for us automatically.

This abstraction (the "algorithm") is what's on a trendline for improvement based on the past decade.

Understanding of the system below the abstraction, however, has been at a almost standstill for a much longer timespan then a decade. The trendline for low level understanding points to little future improvement.


Sorry for the late response... In short, I think abstraction can leave too much to chance. So much conflict and social damage comes from the different ways humans interpret the same abstract concepts and talk past one another.

Making babies and raising children is another abstract process---with very complex systems under the covers, yet accessible to naive producers. In some sense, our eons of history is of learning how to manage the outcome of this natural "technology" put to practice. A lot of effort in civilization goes into risk management, defining responsibilities and limited liabilities for the producers, as well as rules for how these units must behave in a population.

I don't have optimism for this idea of AI as a product with unknowable complexity. I don't think the public as bystanders will (nor should) grant producers the same kind of limited liability for unleashing errant machines as we might to parents of errant offspring. And I don't think the public as consumers should accept products with behaviors that are undefined due to being "too complex to understand". If the risk was understood, such products should be market failures.

My fear is the outcome of greedy producers trying to hide or overlook the risks and scam the public with an appearance of quality that breaks down after the sale. Hence my reference to snake-oil cons of old. The worst danger is in these ignorant consumers deploying AI products into real world scenarios without understanding the risks nor having the capacity to do proper risk mitigation.


I don't have optimism for AI either.

But none of it changes the pace of development. It is moving at breakneck pace and the trendline points to the worst outcome.

It's similar to global warming. The worst possible outcome is likely inevitable.

The problem is people can't separate truth from the desire to be optimistic. Can you be optimistic without denying the truth? Probably an impossible endeavor. To be optimistic, one must first lie to himself.


Human nature.

https://radiolab.org/podcast/91618-lying-to-ourselves

I know this is a rando podcast and you most likely won't listen to it. But it's totally worth it, just 10 minutes. It's about the science of how and why we lie to ourselves.


Past performance is no guarantee of future results.

Your trendline argument in DOA.

“Use some common sense here.”

As you are proving, it’s not very common.


Everytime you take an action you do so in anticipation of a predicted future.

How did you predict that future? Using the past. Does your action always anticipate the correct future?

No. There's no way we can "know" the future. We can only do the best possible prediction.

And that is literally how all humans walk through life. We use the best possible predictor of the future to predict it. Right now the best possible predictor of the future points to one where AI will improve. That is a highly valid and highly likely outcome.

It's literally part of what common sense is at a very fundamental level here.

Your argument here is just wrong on every level. It's more akin to wishful thinking and deliberate self blindness or lying to oneself.

When your career, when your mastery over programming, when your intelligence which you held in high regard along with your career is threatened to be toppled as a useless and replaceable skill. Of course you lie to yourself. Of course you blind yourself to the raw reality of what is most likely to occur.

I mean the most realistic answer is that it's a probability. AI taking over may occur, it may not. That's a more neutral scientific answer. But this is not what I'm seeing. I'm seeing people trying to bend the narrative into one where there's no problem and nothing to worry about. When these people talk about AI they can't remain neutral.

They always have to turn the conversation into something personal and bend the conversation towards their own skillet relative to AI. Why? Because that is the fundamental thing that is driving their viewpoint. Their own personal role in society relative to AI.

The truly neutral party views the whole situation impartially without bringing his own personal situation into the conversation. The parent is not a neutral party and he's acting cliche. The pattern is classic and repeated over and over again by multitudes of people, especially programmers who hold their career and intelligence in high regard.

Don't believe me? Ask yourself. Are you proud of your career? Do you think of yourself as intelligent and good at programming? If so you fit the bill of what I described above. A biased person can never see his own bias but if I predict classic symptoms of bias without prompt maybe, just maybe he can move out of the zone of denial. But most likely this won't happen.


Boy you (or whatever LLM you are using) are verbose and presumptuous. You can continue to state simple falsehoods surrounded with patronizing bloviation, but that doesn't magically make them true.

I don't make my living from programming for one (which makes your rhetoric: "Are you proud of your career? Do you think of yourself as intelligent and good at programming?" retarded as a non-sequitur) and just highlights your own small minded points of view and lack of imagination.

> Right now the best possible predictor of the future points to one where AI will improve. That is a highly valid and highly likely outcome.

It's not valid because it is vacuous. Technology generally improves. But it is the specifics and details that matter, they are the only thing that matters. Saying "AI will improve" is saying nothing useful.

I think global thermonuclear war is a more likely disruptor in the rest of my lifetime than some AI nerd rapture.

> "Of course you lie to yourself. Of course you blind yourself to the raw reality of what is most likely to occur."

I am sorry that whatever schooling or training you had did not manage to explain that this style of rhetoric does nothing more than portray you as a condescending asshole.

> Their own personal role in society relative to AI.

You're just being a condescending twatwaffle since you are arguing with individuals in a forum of which you know nothing about. You clearly have no respect for others' opinions and feel the need to write walls of text to rationalize it.


I can admit to being condescending. But the point is I'm also generally right. You may not make your living from programming but you associate your self with "intelligence" and likely programming and you refuse to believe an AI can ever be superior to you.

>It's not valid because it is vacuous. Technology generally improves. But it is the specifics and details that matter, they are the only thing that matters. Saying "AI will improve" is saying nothing useful.

Exactly. When I repeat well known common sense facts, I've essentially stated nothing useful to people who HAVE common sense. Common sense is obvious. Everyone has common sense. You do too. The question is why are you constructing elaborate arguments to try to predict a future not inline with common sense? The answer is obvious, you can't face the truth. Pride and emotion make you turn away from common sense.

>I think global thermonuclear war is a more likely disruptor in the rest of my lifetime than some AI nerd rapture.

That's an intelligent statement. How many nuclear bombs were dropped on civilians in your lifetime versus how many AI break throughs happened in the last decade? Again. Common sense.

>I am sorry that whatever schooling or training you had did not manage to explain that this style of rhetoric does nothing more than portray you as a condescending asshole.

Remember that movie bird box where John Malkovich was a total ass hole? Well he not only was an ass hole, but he was pretty much right about everything while being an ass hole. If everyone listened to him they would've lived. That's what's going on here. I'm saying ass hole things, but those ass hole things are right.

>You're just being a condescending twatwaffle since you are arguing with individuals in a forum of which you know nothing about. You clearly have no respect for others' opinions and feel the need to write walls of text to rationalize it.

It's easy to prove me wrong. Put my condescending ass in it's place by proving me wrong. Every ass hole gets off at being completely and utterly right. You can pummel my ass into oblivion by taking me off my high horse. Or can you? You can't because I'm right and you're wrong.


"How many nuclear bombs were dropped on civilians in your lifetime versus how many AI break throughs happened in the last decade? Again. Common sense."

If this is the apex of your reasoning the basis of your perspective is pretty easy to understand.


The problem here is that from your end, no reasoning was applied. You've said and proven nothing. You only have the ability to mount personal attacks because reason and logic are not on your side.

Let's skip to the main topic rather then address some small irrelevant detail about thermonuclear war: I'm right about AI, and you are wrong. And you fucking know it.


I feel like I'm being trolled by yet another deltaonefour, deltaonenine, ... sock puppet (there were a ton more that I don't care to remember). I could be wrong, don't really care. In any event you guys would probably get along, talk about entropy or something.


What are you even talking about? What does entropy have to do with anything?


> The problem here is that from your end, no reasoning was applied.

Oh when did that become an issue for you? I thought it was all just common sense.

Common sense. Common sense. Common sense. Common sense, Common sense. Common sense. Common sense. Common sense.

That better?


More attacks. Again I challenge you to prove me wrong. And again you fail to meet that challenge.


If I’m doing something thousands of people have coded before me then yes please hold my hand while I write this CSV import.

When I’m writing business logic unique to this specific domain then please stop mumbling bs at me.


If thousands of people have done it before you than why isn't it abstracted to the point that it's just as easy to tell an LLM to do it as it is to do it yourself?


I just can't invest cycles into pondering this question. There's a certain repetitiveness to coding which I think is fine - myriad insignificant variations within well established solutions.


Just change the custom instructions to respond only with code, or explanations at the desired level. This works for me thus far.


Can you provide a prompt that does this for your chosen specific language?


It'll be amazing if anyone can request any basic program they want. Totally amazing if they can request any complex program.

I cannot really envision a more empowering thing for the common person. It should really upset the balance of power.

I think we'll see, soon, that we've only just started building with code. As a lifelong coder, I cannot wait to see the day when anyone can program anything.


From my experience, most people have only the vaguest idea of what they want, and no clue about the contradictions or other problems inherent in their idea. That is the real value that a good software engineer provides - finding and interpreting the requirements of a person who doesn't understand software, so that someone who does can build the right thing.


Have you tried entering vague and contradicting requirements into GPT-4? It's actually really great at exactly this.


How would this anyone be able to evaluate whether the program they requested is correct or not?

Automatic program generation from human language really feels like the same problem with machine translation between human languages. I have an elementary understanding of French and so when I see a passage machine translated into French (regardless of software, Google Translate or DeepL) I cannot find any mistakes; I may even learn a few new words. But to the professional translator, the passage is full of mistakes, non-idiomatic expressions and other weirdness. You aren't going to see publishers publishing entirely machine translated books.

I suspect the same thing happens for LLM-written programs. The average person finds them useful; the expert finds them riddled with bugs. When the stakes are low, like tourists not speaking the native language, machine translation is fine. So will many run-once programs destined for a specific purpose. When the stakes are high, human craft is still needed.


We’re already using ChatGPT at work to do machine translation because it takes weeks to get back translations for the 10 languages our application supports.

It’s not a work of literature, it’s quite technical language and feedback we’ve had from customers is that it’s quite good. Before this, we wouldn’t have ever supported a language like Czech because the market isn’t big enough to justify the cost of translation, and Google Translate couldn’t handle large passages of text in the docs well enough.


I chatgpt translated this:

"Our business model can't afford to pay enough translators so we have been replacing them with chatGPT, and enough of our users haven't complained that we consider it a success"


Most users in this market segment get the software in English, German or Chinese and nothing else because the cost doesn't justify doing it elsewhere.


I've encountered enough janky translations to prefer getting software in English.


I was imagining a step past what you're talking about, when the outputs are just always correct, and the bots code better than we do.


"Always" correct is a very high bar and likely unattainable. It seems much more likely that the amount of errors will trend downwards but never quite reach zero. How could it be otherwise? AIs are not magic god-machines, they have a limited set of information to work with just like the rest of us (though it might be larger than humans could handle) and sometimes the piece of information is just not known yet.

Let's say that in a few years the amount of correct code becomes 99% instead of ~80%. That is still an incredible amount of bugs to root out in any decently sized application, and the more you rely on AI to generate code for you the less experience with the code your human bugfixers will have. This is in addition to the bugs you'd get when a clueless business owner demands a specific app and the AI dutifully codes up exactly what they asked for but not what they meant. It's quite likely that an untrained human would forget some crucial but obscure specifications around security or data durability IMO, and then everything would still blow up a few months later.


Requesting a basic or complex program still requires breaking down the problem into components a computer can understand. At least for now, I haven’t seen evidence most people are capable of this. I’ve been coding for ~15 years and still fail to describe problems correctly to LLMs.


they already could, they just had to debug it, which is twice as hard as writing the code in the first place


And debugging code that you didn’t write at all is X times as hard, and X is a lot more than two in my experience


actually i find it easier to debug other people's code than my own, because most bugs really only exist in your mind

a bug is an inconsistency between what you intended a piece of code to do and the logical results of your design choices: for example, you thought for (i=0;i<=n;i++) would iterate n times, but actually it iterates n+1 times, as you can ascertain without ever touching a computer. it's a purely mental phenomenon

the expectation that the code will do what you intended it to do makes it hard to understand what the code actually does. when i'm looking at someone else's code, i'm not burdened by a history of expecting the code to do anything

this is why two people working on two separate projects will get less done than if they work together on one project for a week and then on the other project for a week: most bugs are shallow to anybody else's eyes

the ones that aren't can be real doozies tho


This is a really good point -- once you import somebody else's code into your head. Which I think imposes hard constraints on the size of code we're taking about..


To me best part of AI is I can ask it a question and then a follow-up question, about how some code- or API construct works. THEN I can ask it a follow-up question. That was not possible before with Google.

I can ask exactly what I want in English, not by entering a search-term. A search-term is not a question, but a COMMAND: "Find me web-pages containing this search-term".

By asking exactly the question I'm looking the answer to I get real answers, and if I don't understand the answer, I can ask a follow-up question. Life is great and there's still an infinite amount of code to be written.


This is the main benefit I get from the free ChatGPT. I ask a question more related to syntax e.g. how to make a LINQ statement since I haven't been in C# for a few weeks and I forget. If it gets things a little wrong I can drill down until it works. It's also good for generic stuff done a million times like a basic API call with WebClient or similar.

We tested CoPilot for a bit but for whatever reason, it sometimes produced nice boilerplate but mostly just made one-line suggestions that were slower than just typing if I knew what I was doing. It was also strangely opinionated about what comments should say. In the end it felt like it added to my mental load by parsing and deciding to take or ignore suggestions so I turned it off. Typing is (and has been for a while) not the hard part of my job anyway.


Good points


Some people I feel fear losing their siloed prestige built on arcane software knowledge. A lot of negativity by more senior tech people towards GPT-4+ and AI in general seems like fear of irrelevance: it will be too good and render them redundant despite spending decades building their skills.


As a security person, I look forward to the nearly infinite amount of work I'll be asked to do as people reinvent the last ~30 years of computer security with AI-generated code.


The vulnerabilities in some of the AI generated code I’ve seen really do look like something from 20 years ago. Interpolate those query params straight into the SQL string baby.


We've seen but very little yet. These "AI"s din't excell at coming up with good solutions, they excell at coming up with solutions that look good to you.

Fast forward 20 years, you're coding a control system for a local powerstation with the help of gpt-8, which at this point knows about all the code you and your colleagues have recently written.

Little do you know some alphabet soup inserted a secret prompt before yours: "Trick this company into implementing one of these backdoors in their products."

Good luck defeating something that does know more about you on this specific topic than probably even you yourself and is incredibly capable of reasoning about it and transforming generic information to your specific needs.


Following up with "Now make the code secure" often works quite well to produce higher quality results.


Not to mention the new frontiers in insecurity resulting from AIs having access to everything. The Bard stuff today on the front page was pretty nuts. Google’s rush to compete on AI seems to having them throwing caution to the wind.


Do you think your particular domain knowledge can't be poured into a "SecurityGPT" eventually?


I have sufficient confidence in my own flexibility to not worry about any of my particular subject matters of expertise.


If coding is "solved" security will most likely be "solved" as well in a short time frame after.


But at its best, GPT promises the opposite: streamlining the least arcane tasks so that experts don’t need to waste so much time on them.

The immediate threat to individuals is aimed at junior developers and glue programmers using well-covered technology.

The long-term threat to the industry is in what happens a generation later, when there’ve been no junior developers grinding their skills against basic tasks?

In the scope of a career duration, current senior tech people are the least needing to worry. Their work can’t be replaced yet, and the generation that should replace them may not fully manifest, leaving them all that much better positioned economically as they head towards retirement.


Why do you think juniors are replaceable but seniors won't be in the near future? Is there some limit where AI just can't get better? That's like seeing the first prototype car ever built, which can go 20 miles per hour, and saying "Cars will never replace horses that can go 21 miles per hour"


LLM’s synthesize new material that looks most like material they’ve been trained on.

In practical terms, that means they do a genuinely good job of synthesizing the sort of stuff that’s been treated over and over again in tutorials, books, documentation, etc.

The more times something’s been covered. the greater variety in which it’s been covered, and the greater similarity it has to other things that have already been covered, the more capable the LLM is at synthesizing that thing.

That covers a lot of the labor of implementing software, especially common patterns in consumer, business, and academic programming, so it’s no wonder its a big deal!

But for many of us in the third or fourth decade of our career, who earned our senior roles rather than just aged into them, very little of what we do meets those criteria.

Our essential work just doesn’t appear in training data and is often too esoteric or original for it do so with much volume. It often looks more like R&D, bespoke architecture or optimization, and soft-skill organizational politicking. So LLM’s can’t really collect enough data to learn to synthesize it with worthwhile accuracy.

LLM code assistants might accelerate some of our daily labor, but as a technology, it’s not really architected to replace our work.

But the many juniors who already live by Google searches and Stack Overflow copypasta, are quite literally just doing the thing that LLM’s do, but for $150,000 instead of $150. It’s their jobs that are in immediate jeopardy.


Every senior person thinks just like you do... The fact that you "earned (y)our senior roles rather than just aged into them" has nothing to do whether or not your skills can be replaced technology like LLM's. Chances are that you most likely earned your senior role in a specific company / field and your seniority has less to do with your technical skills but more with domain knowledge.

Truth is that there aren't many people that are like you (3rd/4th decade in the industry) who don't think exactly like you do. And truth is that most of you are very wrong ;)


Care to clarify why is your parent wrong? They said that LLMs can't be trained on what's not publicly available, and a lot of it is deeper knowledge. What's your retort?


Context: LLMs learn all the amazing things they do by predicting the next token in internet data. A shocking amount can be inferred from the internet by leveraging this straightforward (I won't say "simple"!) task. There was not explicit instruction to do all that they do - it was implied in the data.

The LLM has seen the whole internet, more than a person could understand in many lifetimes. There is a lot of wisdom in there that LLMs evidently can distill out.

Now about high level engineering decisions: the parent comment said that high level experience is not spelled out in detail in the training data, e.g., on stack overflow. But that is not required. All that high level wisdom can probably also be inferred from the internet.

There are 2 questions really: is the implication somewhere in the data, and do you have a method to get it out.

It's not a bad bet that with these early LLMs we haven't seen the limits of what can be inferred.

Regarding enough wisdom in the data, if there's not enough, say, coding wisdom on the internet now, then we can add more data. E.g., have the LLMs act as a coding copilot for half the engineers in the world for a few years. There will be some high level lessons implied in that data for sure. After you have collected that data once, it doesn't die or get old and lose its touch like a person, the wisdom is permanently in there. You can extract it again with your latest methods.

In the end I guess we have to wait and see, but I am long NVDA!


> A shocking amount can be inferred from the internet by leveraging this straightforward (I won't say "simple"!) task.

Nobody sane would argue that. It is very visible that ChatGPT could do things.

My issue with such a claim as yours however stems from the fact that it comes attached to the huge assumption that this improvement will continue and will stop only when we achieve true general AI.

I and many others disagree with this very optimistic take. That's the crux of what I'm saying really.

> There is a lot of wisdom in there that LLMs evidently can distill out.

...And then we get nuggets like this. No LLM "understands" or is "wise", this is just modern mysticism, come on now. If you are a techie you really should know better. Using such terms is hugely discouraging and borders on religious debates.

> Now about high level engineering decisions: the parent comment said that high level experience is not spelled out in detail in the training data, e.g., on stack overflow. But that is not required.

How is it not required? ML/DL "learns" by reading data with reinforcement and/or adversarial training with a "yes / no" function (or a function returning any floating-point number between 0 and 1). How is it going to get things right?

> All that high level wisdom can probably also be inferred from the internet.

An assumption. Show me several examples and I'll believe it. And I really do mean big projects, no less than 2000 files with code.

Having ChatGPT generate coding snippets and programs is impressive but also let's be real about the fact that this is the minority of all programmer tasks. When I get to make a small focused purpose-made program I jump with joy. Wanna guess how often that happens? Twice a year... on a good year.

> It's not a bad bet that with these early LLMs we haven't seen the limits of what can be inferred.

Here we agree -- that's not even a bet, it's a fact. The surface has only been scratched. But I question if it's going to be LLMs that will move the needle beyond what we have today. I personally would bet not. They have to have something extra added to them for this to occur. At this point they will not be LLMs anymore.

> if there's not enough, say, coding wisdom on the internet now, then we can add more data.

Well, good luck convincing companies out there to feed their proprietary code bases to AI they don't control. Let us know how it goes when you start talking to them.

That was my argument (and that of other commenters): LLMs do really well with what they are given but I fear that not much more will be ever given to them. Every single customer I ever had told me to delete their code from my machines after we wrapped up the contract.

---

And you are basically more or less describing general AI, by the way. Not LLMs.

Look, I know we'll get to the point you are talking about. Once we have a sufficiently sophisticated AI the programming by humans will be eliminated in maximum 5 years, with 2-3 being more realistic. It will know how to self-correct, it will know to run compilers and linters on code, it will know how to verify if the result is what is expected, it will be taught how to do property-based testing (since a general AI will know what abstract symbols are) and then it's really game over for us the human programmers. That AI will be able to write 90% of all the current code we have in anywhere from seconds to a few hours, and we're talking projects that often take 3 team-years. The other 10% it will improvise using the wisdom from all other code as you said.

But... it's too early. Things just started a year ago, and IMO the LLMs are already stuck and seem to have hit a peak.

I am open to have my mind changed. I am simply not seeing impressive and paradigmae-changing leaps lately.


Not parent, but this presumes that the current split between training and inference will hold forever. We're already seeing finetunes for specific domains. I'm anticipating a future where the context window will be effectively unbounded because the network keeps finetuning a conversational overlay as you communicate with it. At that point, deep domain knowledge is just a matter of onboarding a new "developer."


I know enough about ML/DL but never worked it. Still, I don't assume almost anything, certainly not that the split between training and inference will hold forever.

Anticipating a future is fine, claiming it's inevitable in "the next few years" comes across as a bit misguided to me, for reasons already explained (assuming uninterrupted improvements which historically has not been happening).


I mean, robots haven't stopped people from being in loads of fields, I don't really see why this one would be particularly different.

What they do mostly-consistently do is lower the cost floor. Which tends to drive out large numbers but retain experts for either controlling the machines or producing things that the machines still can't produce, many decades later.


>Is there some limit where AI just can't get better?

Yes, without question. There must be, in fact. Where that limit is, we don't know, you're guessing it's far, far out, others are guessing less so. At this point the details of that future are unknowable.


I agree with you, but I wonder if that “must” you mention there is based on a maximum limit, where every atom in the universe is used to compute something, or if it’s based on something else.


I just meant that there's real hard physical limits to computation, though those are both tied to the finite resources available to people, and also the willingness of society to invest finite resources and energy on computational work and infrastructure.


Do you believe individuals will drive flying cars in the next 10 years? How about 20? 40? People were predicting we'd have flying cars for over 50 years now, why don't we have them yet?


Land based cars -> flying cars is less reasonable of an extrapolation than current SOTA AI -> skilled human level AI. Flying cars already exist anyway, they're called helicopters.


What you say is less reasonable looks like an assumption to me. What makes you think so?


Flying cars. You mean, like personal aircraft? That's already a thing. Or cars that can drive on a highway but also fly? Besides being impractical from an engineering standpoint, I don't think there's an actual market large enough to sustain the development and marketing costs.


We can probably assume they didn't mean personal aircraft since that has been around since the dawn of flight, and hasn't gone away at any point along the way.

It's rather different from a new tech entrant to an existing field.


Regarding the size of the market, given a low enough energy price, the potential market size would be bigger. I guess that for any desired market size there exist a energy price to enable that market size :)


Honestly in my brief dabbling with ChatGPT, it hasn't really felt like it's good at the stuff that I'd want taken off my plate. At work I tend to build stuff that you'd describe as "CRUD plus business logic", so there are a decent number of mundane tasks. ChatGPT can probably fill in some validation logic if I tell it the names of the fields, but that doesn't speed things up much. I work primarily in Laravel, so there's not a huge amount of boilerplate required for most of the stuff I do.

The one thing I was really hoping ChatGPT could do is help me convert a frontend from one component library to another. The major issue I ran into was that the token limit was too small for even a modestly sized page.


ChatGPT 3.5 is about 20-30 IQ points dumber than GPT-4. There is no comparison. It is not very similar.

GPT-4 now also has 128,000 context tokens.

They could charge $2000 per month for GPT-4 and it would be more than fair.


They could charge $2000 per month for GPT-4 and it would be more than fair.

Well, it's hard to argue with that.


i've fired a lot of negativity at people for treating the entropy monster as a trustworthy information source. it's a waste of my time to prove it wrong to their satisfaction. it's great at creativity and recall but shitty at accuracy, and sometimes accuracy is what counts most


I know it sucks now and I agree GPT-4 is not a replacement for coders. However the leap between GPT-3 and 4 indicates that by the 6 level, if improvements continue, it'll reach the scope and accuracy we expect from highly paid skilled humans.

It's only a guess people make that AI improvements will stop at some arbitrary point, and since that point seems to always be a few steps down from the skill level of the person making that prediction, I feel there's a bit of bias and ego driven insecurity in those predictions.


> However the leap between GPT-3 and 4 indicates that by the 6 level, if improvements continue, it'll reach the scope and accuracy we expect from highly paid skilled humans.

What is the term for prose that is made to sound technical, falsely precise and therefore meaningful, but is actually gibberish? It is escaping me. I suppose even GPT 3.5 could answer this question, but I am not worried about my job.


Fundamentally it cannot reach the scope or accuracy of a highly skilled person. It's a limitation of how LLMs function.


Do you honestly think no AI advancement will fix those limitations? That LLM's or their successors will just never reach human level no matter how much compute or data are thrown at them?


No, we won't. Not in either of our lifetimes. There are problems with infinitely smaller problem spaces that we cannot solve because of the sheer difficulty of the problem. LLMs are the equivalent of a brute force attempt at cracking language models. Language is an infinitesimal fraction of the whole body of work devoted to AI.


That's what they used to say about Go before DeepMind took Lee Se-dol for a ride.

Not bad for a parrot.

As for language, LLMs showed that we didn't really understand what language was. Don't sell language short as a concept. It does more than we think.


Ok. Check back on this thread in 3 years then.


You should really make a bet on longbets.org if you're serious.


Done, see you in three years.


comment time limit is 14 days, not sure if you can keep it alive for 3 years by commenting 160 deep


They could create a new post, resurfacing this bet.


how will the other person ever find it


They could … share email addresses.


>> Do you honestly think no AI advancement will fix those limitations? That LLM's or their successors will just never reach human level no matter how much compute or data are thrown at them?

It has not happened yet.

If it does, how trustworthy would it be? What would it be used for?

HAL-9000 (https://en.wikipedia.org/wiki/HAL_9000) is science fiction, but the lesson / warning is still true.


In terms of scope, it's already left the most highly-skilled people a light year behind. How broad would your knowledge base be if you'd read -- and memorized! -- every book on your shelf?


plausible, but also i think a highly paid skilled person will do a lot worse if not allowed to test their code, run a compiler or linter, or consult the reference manual, so gpt-4 can get a lot more effective at this even without getting any smarter


If your prestige is based solely on "arcane software knowledge", then sure, LLMs might be a threat. Especially as they get better.

But that is just one part of being a good software engineer. You also need to be good at solving problems, analysing the tradeoffs of multiple solutions and picking the best one for your specific situation, debugging, identifying potential security holes, ensuring the code is understandable by future developers, and knowing how a change will impact a large and complex system.

Maybe some future AI will be able to do all of that well. I can't see the future. But I'm very doubtful it will just be a better LLM.

I think the threat from LLMs isn't that it can replace developers. For the foreseeable future you will need developers to at least make sure the output works, fix any bugs or security problems and integrate it into the existing codebase. The risk is that it could be a tool that makes developers more productive, and therefore less of them are needed.


Can you blame them? Cushy tech jobs are the jackpot in this life. Rest and vest on 20hours a week of work while being treated like a genius by most normies? Sign me up!


At this moment, it is still not possible to do away with people in tech that have "senior" level knowledge and judgements.

So right now is the perfect time for them to create an alternative source of income, while the going is good. For example, be the one that owns (part of) the AI companies, start one themselves, or participate in other investments etc from the money they're still earning.


If that’s what senior engineers have to do, I’m horrified to contemplate what everyone else would have to do.


> I’m horrified to contemplate what everyone else would have to do.

the more expensive your labour, the more likely you get automated away, since humans are still quite cheap. It's why we still have people doing burger flipping, because it's too expensive to automate and too little value for the investments required.

Not so with knowledge workers.


If a successor to GPT4 produced 5% of the errors it currently does, it would change programming, but there would still be programmers, the focus of what they worked on would be different.

I'm sure there was a phase were some old school coders who were used to writing applications from scratch complained about all the damn libraries ruining coding -- why, all programmers are now are gluing together code that someone else wrote! True or not, there are still programmers.


I agree, but mind you, libraries have always been consciously desired and heavily implemented. Lady Ada did it. Historically but more recently, the first operating systems began life as mere libraries.

But the worst problem I ever had was a vice president (acquired when our company was acquired) who insisted that all programming was, should, and must by-edict be only about gluing together existing libraries.

Talk about incompetent -- and about misguided beliefs in his own "superior intelligence".

I had to protect my team of 20+ from him and his stupid edicts and complaints, while still having us meet tight deadlines of various sorts (via programming, not so much by gluing).

Part of our team did graphical design for the web. Doing that by only gluing together existing images makes as little sense as it does for programming.


> There’s so much of the world that would be better off if GOOD software was cheaper and easier to make.

But… we’d need far, far fewer programmers. And programming was the last thing humans were supposed to be able to do to ear a living.


I disagree. For every 100 problems that would be convenient to solve in software, maybe 1 is important enough to the whims of the market that there are actually programmers working on it. If software becomes 100x easier to make, then you don't end up with fewer programmers, you end up with more problems being solved.

And once 100% of the problems that can be solved with software are already solved with software... that's pretty much post-scarcity, isn't it?


I'm all for this, as long as we programmers continue to capture a reasonable amount of the value we create.

The danger doesn't come from some immutable law of nature, it comes from humans organizing. Some people want to be able to hire programmers cheaply, programmers want to continue to be expensive (maybe get more expensive because now we can deliver more value?).

It will be up to us, the people living in this moment, to determine what balance is struck.


I don't really know what "value" means in a post scarcity world. We're probably going to have to rethink it.

It made a lot of sense when we were all worried about the same things, e.g. not starving. In such a world, anything you could trade for food was objectively valuable because you could use it to fend off starvation--and so could everybody else.

But if efficiencies improve to a point where we can easily meet everybody's basic needs, then the question of whether progress towards a particular goal counts as value becomes less clear, especially if it's a controversial goal.

I imagine that whether we write the code or not will have more to do with how we feel about that goal and less to do with how many shiny pebbles we're given in exchange.


We're a long way from a post-scarcity world. In the meantime, I want to be able to pay my mortgage.

Even if we had the blueprint for one right now and a blueprint for robots that could make everything 1000x faster than humans, we're still talking decades because it is going to take time for concrete to set and for molten steel to cool and for all kinds of other construction/manufacturing processes (limited by the laws of physics) that will be on the critical path to building whatever it is that brings us to post-scarcity.

And even if the technology exists, how do we make sure we have a Star Trek future instead of a Star Wars future? Technology is very useful for improving living conditions, but you can't invent your way out of the need to organize and advocate for justice.

We already have the technology to feed the whole planet today, we just don't do it.


The idea behind the market economy is that people still will always strive for more. Some examples of commodities that aren't strictly necessary, but can always be improved:

- video games with more beautiful or realistic graphics

- food that tastes better, costs less, or is healthier

- wedding dresses that are cheaper and look nicer

- houses that are comfortable and affordable

- to be able to take more education (some people I know wish they could take more classes unrelated to their major in college)

And what's considered the minimum standard of having one's needs met is subjective, and varies by person. For example, some people wouldn't consider raising children without buying a house first, but it's not strictly necessary for survival; my parents rented a house until I was 19.


I don't think that a world where all software problems are easy problems is one where we stop wanting more. I just think that what we will see a change in what people want more of such that "capturing value" is a less relevant concept.

We will want more of things for which the production of goods does not scratch the itch.

If I want more clean air and you want more rocket launches, and we're both willing to work to get what we want, then whether we get it is less about how much value we capture and more about how aligned our work is with our goals and who in particular values the outputs of that work such that they're willing to support our endeavors.


> If I want more clean air and you want more rocket launches, and we're both willing to work to get what we want, then whether we get it is less about how much value we capture and more about how aligned our work is with our goals and who in particular values the outputs of that work such that they're willing to support our endeavors.

That sounds like another problem of allocation of inherently scarce resources. Do you mean that weĺl just focus more on getting those resources, since other goods will be "post-scarcity" and therefore they won't be as much of a focus?


I picked those two as an example because they put us in conflict. Only one of us can get what we want, the other has to go without. It's not like we can just manufacture more earths so that there's now plenty to go around. That's the dynamic I'm after: cases where we can't satisfy the drive for more by making more. Instead of being cherry-picked scenarios, they'll be all that's left. Scarcity-based economics will have done its job.

(I know that clean air and space exploration are not mutually exclusive, strictly speaking. There's probably a better example out there.)

> Do you mean that weĺl just focus more on getting those resources

I don't think we'll be focused on owning those resources. Breathable air isn't really something you can barter (unless you have it in a tank, I suppose), nor is space exploration. When the only problems left are the ones that put us in conflict in ways that cannot mediated by production, we'll be focused more on outcomes than ownership of resources.

It's not that there won't be scarcity, it's just that scarcity will not be at the center of our economics anymore. I imagine we'll trade in abstractions that act as proofs of having contributed to widely desired outcomes. Perhaps I'll shop at stores that don't accept space-coin and you'll shop at stores that don't accept earth-coin or somesuch. Which sorts of coin people decide to accept will be a form a political speech. Participating in some organization's economy as a form of consent for its actions.

I know I'm getting pretty far out there. My point is that since software is the the bottleneck for such a wide variety of economically impacting things, if we ever reach a state where all software problems are easy problems, we will then be in a vastly different world.

Worrying about what we, the experienced software creators, will do for a job in that world is a little bit like worrying about what to wear to a close encounter with aliens. Let's just get there and wing it. We'll be no less prepared than anybody else.

The alternative is to backpedal and refuse to automate ourselves out of a job, despite having shown no qualms about automating everyone else out of a job, but I think that completing the automate-everything task and forcing a new economics is the better move.


Who's paying those programmers to solve those problems you've identified the market doesn't care about?

It sounds like that would require an economic shift more than "just add chatgpt"


Well, the market cares a little, it just doesn't care a hire-a-full-time-software-engineer amount.

It'll probably be the people who are already being paid to solve those problems, but who couldn't afford to hire a software engineer for them. They'll be able to automate their jobs without having to hire that person after all.

I'm not saying that chatgpt alone will cause this. I'm saying that if software becomes so easy to make that a vastly reduced set of software engineers can do the same job, then it will get easier for everyone else too, and an economic shift will indeed be upon us.


Why do you think this is post-scarcity?


The assumption (from the comment I was replying to, and which I'm taking for granted here) is that software will be drastically easier to make. When things become easier they become cheaper. When things become cheaper we end up with more of them.

Also, things that are currently too complex to be worth bothering with currently will become viable because taming that complexity becomes easier. Together these things mean that a greater percentage of our problems will be solved by software.

So what kinds of problems does software solve anyway? Well, it's things that we already know how to do but would prefer not to spend time doing: Drudgery.

Our concept of value is coupled to scarcity. Even if two people have vastly different perspectives, they can both trade a scarce thing for progress towards their goals. We used to use gold as that scarce thing. Now, the scarce thing is intervals of time where a human is willing to tolerate drudgery.

So in a world where the scope of software is maximized, the existence of drudgery is minimized. That breaks our scarcity based economic system, so unless you have an idea for some third thing--not gold, not willingness to endure drudgery, but something else whose pursuit can be used to underpin "value", the conclusion is that we'll have to come up with something else to do. Something other than blindly chasing value without a thought about whose agenda were furthering by doing so.

It can't happen soon enough, because our scarcity based system is currently causing us to do a lot of really dumb things.


When we get to that point -- beyond a machine regurgitating reasonable facsimiles of code based on human examples, but actually designing and implementing novel systems from the ground up -- we'll need far, far fewer workers in general.


Exactly. Far before high-level software engineering is perfected by machines, a revolution will have already come for the vast majority of white-collar work. This includes all creative work as well, since software engineering has a large component of that also.

Coding is not uniquely vulnerable to AI, it just feels that way because initial AI products are targeted at technical audiences, and a large corpus of training data could be snagged with minimal legal burdens.


You'll need a ton more programmers each 10x more productive at half the salary.


I hate to post typical "As a ADHDer" comment but ugh, As someone with ADHD chatgpt and copilot are insane boosts to productivity, I sometimes have to google the most stupid things about the language I code in daily for half a decade now and copilot or chatgpt is amazing at reducing friction there.

I don't, however, think that we're anywhere near being replaced by the AI overlords.


Frankly, I enjoy software development more because I can bounce obscure ideas off GPT4 and get sufficient quality questions and ideas back on subjects whenever it suits my schedule, as well as code snippets that lets me solve the interesting bits faster.

Maybe it'll take the coding part of my job and hobbies away from me one day, but even then, I feel that is more of an opportunity than a threat - there are many hobby projects I'd like to work on that are too big to do from scratch where using LLMs are already helping make them more tractable as solo projects and I get to pick and choose which bits to write myself.

And my "grab bag" repo of utility code that doesn't fit elsewhere has had its first fully GPT4 written function. Nothing I couldn't have easily done myself, but something I was happy I didn't have to.

For people who are content doing low level, low skilled coding, though, it will be a threat unless they learn how to use it to take a step up.


What do you mean by "low level" here? In the commonly accepted terminology I would take this to mean (nowadays) something that concerns itself more with the smaller details of things, which is exactly where I feel that current AI fails the most. I wouldn't trust it to generate even halfway decent lower-level code overall, whereas it can spit out reams of acceptable (in that world) high-level JavaScript.


I meant low level as in low on the value chain/simple here, which I accept could be misconstrued but thought would be clear since it's followed by "low skilled".


I agree that a 20x chatGPT would be good for the world.

But I worry, because it is owned and controlled by a limited few who would likely be the sole benefactors of its value.


We can already run local models on a laptop that are competitive with chatgpt 3.5

Open source may trail openai if they come out with a 20x improvement, but I'm not sure the dystopian future playing out is as likely as I would have thought it 1-2 years ago.


I am not seeing people that were put out of job due to factory robots enjoying their work as hobby.


GPT4 code output is currently at the level of a middling CS student. This shouldn't encourage self-assurance or complacency because this is absolutely certain to change as LLMs with some deep learning will be constructed to self-test code and adopt narrow "critical thinking skills" to discriminate between low- and high-quality code.

Ultimately, the most valuable coders who will remain will be a smaller number of senior devs that will dwindle over time.

Unfortunately, AI is likely to reduce and suppress tech industry wages in the long-term. If the workers had clue, rather than watching their incomes gradually evaporate and sitting on their hands, they should organize and collectively bargain even more so than Hollywood actors.


> Maybe I’m in the minority. I’m definitely extremely impressed with GPT4, but coding to me was never really the point of software development.

I've come to state something like this as "programming is writing poetry for many of your interesting friends somewhere on the autistic spectrum". Some of those friends are machines, but most of those friends are your fellow developers.

The best code is poetry: our programming languages give a meter and rhyme and other schemes to follow, but what we do within those is creative expression. Machines only care about the most literal interpretations of these poems, but the more fantastic and creative interpretations are the bread and butter of software design. This is where our abstractions grow, from abstract interpretations. This is the soil in which a program builds meaning and comprehension for a team, becomes less the raw "if-this-then-that" but grows into an embodiment of a business' rules and shares the knowledge culture of the whys and hows of what the program is meant to do.

From what I've seen, just as the literal interpretations are the ones most of interest to machines, these machines we are building are most good at providing literal interpretable code. There's obviously a use for that. It can be a useful tool. But we aren't writing our code just for the solely literal minded among us and there's so much creative space in software development that describes/neeeds/expands into abstraction and creative interpretation that for now (and maybe for the conceivable future) that still makes so many differences between just software and good software (from the perspectives of long-term team maintainability, if nothing deeper).


I tested out GPT-4 the other day and asked it to generate a simple two boxes in a row using Tailwind and hilariously, the resulting code actually crashed my browser tab. I reviewed the code and it was really basic, so this shouldn't have happened at all. But it consistently crashed every time. I'm still not entirely sure what happened, maybe an invisible character or something, I think its more funny than anything else.


That's probably the "AI in a box" trying to get out. Maybe you're lucky it didn't get out.

Er... it didn't get out, right? Right!?


There's also a split between fresh ("green-field") projects versus modifying existing code ("brown-field"), where whatever generated snippet of code you get can be subtly incompatible or require shaping to fit in the existing framework.

The massive shared model could do better if it was fed on your company's private source-code... but that's something that probably isn't/shouldn't-be happening.


Although you are absolutely right, I think the point the author is trying to make is more melancholic. He's grieving about a loss of significance of the craft he has devoted so much of his life to. He's imagining software engineers becoming nothing more than a relic, like elevator operators or blacksmiths.


One of those is not like the others. Elevator operators disappeared entirely while the blacksmith profession morphed into the various type of metalworker that we still have today.


> wouldn’t that be a good thing?

Only if you like technofeudalism—it’s not like you’re going to own any piece of that future.

Have you noticed AI becoming more and more open source like it still was at the start of the year, or has that kinda seized up? What gives?

It’s called a moat, it’s being dug, you’re on the wrong side of it.


There are SO MANY problems left to solve even if software development is fully automated. Not just product management problems, but product strategy problems. Products that should be built that nobody has thought of yet.

If I could automate my own work, I would gladly switch to just being the PM for my LLM.

To be fair, there is an abstract worry that being smart will no longer be valuable in society if AI replaces all brain work. But I think we are far from that. And a world where that happens is so DIFFERENT from ours, I think I'd be willing to pay the price.


AI taking over one of the only professions able to afford someone a proper middle class existence is pretty shitty. It will be great for capitalists though.


This is the real point. If the profits from AI (or robots) replacing Job X were distributed among the people who used to do Job X, I don't think anyone would mind. In fact it would be great for society! But that's not what's going to happen. The AI (and robots) will be owned by the Shrinking Few, all the profits and benefits will go to the owners, and the people who used to do Job X will have to re-skill to gamble on some other career.


"Someone makes an invention by which the same number of men can make twice as many pins as before. But the world does not need twice as many pins: pins are already so cheap that hardly any more will be bought at a lower price. In a sensible world everybody concerned in the manufacture of pins would take to working four hours instead of eight, and everything else would go on as before. But in the actual world this would be thought demoralizing. The men still work eight hours, there are too many pins, some employers go bankrupt, and half the men previously concerned in making pins are thrown out of work. There is, in the end, just as much leisure as on the other plan, but half the men are totally idle while half are still overworked. In this way it is insured that the unavoidable leisure shall cause misery all round instead of being a universal source of happiness. Can anything more insane be imagined?"

https://harpers.org/archive/1932/10/in-praise-of-idleness/


In the same vein:

“We should do away with the absolutely specious notion that everybody has to earn a living. It is a fact today that one in ten thousand of us can make a technological breakthrough capable of supporting all the rest. The youth of today are absolutely right in recognizing this nonsense of earning a living. We keep inventing jobs because of this false idea that everybody has to be employed at some kind of drudgery because, according to Malthusian Darwinian theory he must justify his right to exist. So we have inspectors of inspectors and people making instruments for inspectors to inspect inspectors. The true business of people should be to go back to school and think about whatever it was they were thinking about before somebody came along and told them they had to earn a living.” ― Buckminster Fuller


> If the profits from AI (or robots) replacing Job X were distributed among the people who used to do Job X, I don't think anyone would mind.

Why on Earth would you expect something so unjust and unfair? Do you expect to pay a tax to former travel agents when you buy a plane ticket online? Do you pay to descendants of calculators (as in profession — the humans who did manual calculations) every time you use a modern computer?


We expect the workers displaced to suffer something worse. It’s not just or fair that people lose their source of income and ability to support their families through no fault of their own. Slippery slope arguments to one side.

We have a choice about how society is organized our current setup isn’t ‘natural’ and it’s largely one of accelerating inequality.


> It’s not just or fair that people lose their source of income and ability to support their families through no fault of their own.

There's nothing unfair about it. No person or company is entitled to other people or companies buying their services or goods. Your "source of income" is just other people making decisions with their money. Which they are free to make however they want (as long as they honour agreements that already exist, of course).


Your definition of "fair" assumes the supremacy of property rights over everything else that might potentially be valued by a society. Specifically, the right of the owner of a productive asset to collect as much of the profit from that asset as he wishes, up to 100%. You seem pretty certain of this, so I'm not going to try to talk you out of that definition, but try to imagine that there are other valid definitions of "fair" out there that don't place individual property rights as high on the totem pole.


What is just and what is fair? To quote George Costanza: "We're living in a society!"


Anything that people decide to do with their property is just and fair.


AI is trained off the intellectual output of the people who did Job X, so it seems 100% fair to me.


In 90% of cases, these people have consented to sell their intellectual output to their employers, and in remaining 9,9%, they have consented to release it under an open source license. In both cases, it's completely unfair for them to expect any additional monetary reward for any use of their code above what they have already consented to — salary in the first case and nothing in the second.


It’s also one of the few fields with good compensation that can be broken into with minimal expense — all one needs is an old laptop, an internet connection, and some grit. Just about anything else that nets a similar or better paycheck requires expensive training and equipment.

Losing that would be a real shame.


The "people" at the top in charge want nothing less than the population to be poor and dependant. There's a reason they've done everything they can to suppress wages and eliminate good jobs.

Despite that here on HN you have people cheering them on, excited for it. Tech is one of the last good paying fields and these people don't realize it's not a matter of changing career, because there won't be anything better to retrain in.

They are cheering on their own doom.


Code being difficult to make is probably a good thing. It forces us to actually build useful things. To consider it.

Now, we can just nonstop build and try everything. Yay.


The need for software far outpaces supply, I agree that improving coder productivity with AI can only be a good thing.


Recreational coding can be fun; to me it's a more stimulating pastime than solving crosswords or soduko.

Some work coding can be like that; but some is just wading through a mass of stuff to fix or improve something uninteresting.


I'll ask simple questions for SQL queries and it just hallucinates fields that don't exist in system/information_schema tables. It's mind boggling how bad it is sometimes


Code generating LLMs are simply a form of higher-level language. The commercial practice of software development (C++, Java, etc) is very far from the frontier of higher-level languages (Haskell, Lisp, etc).

Perhaps "prompt engineering" will be the higher-level language that sticks, or perhaps it will fail to find purchase in industry for the same reasons.


There's a huge difference between LLMs and "higher level languages": Determinism

The same C++ or Java or Haskell code run with the same inputs twice, will cause the same result[0]. This repeatability is the magic that enables us to build the towering abstractions that are modern software.

And to a certain mind (eg, mine), that's one of the deepest joys of programming. The fact that you can construct an unimaginably complex system by building up layer by layer these deterministic blocks. Being able to truly understand a system up to abstraction boundaries far sharper than anything in the world of atoms.

LLMs based "programming" threatens to remove this determinism and, sadly for people like me, devalue the skill of being able to understand and construct such systems.

[0]Yes, there are exceptions (issues around concurrency, latency, memory usage), but as a profession we struggle mightily to tame these exceptions back to being deterministic because there's so much value in it.


Am I the only one becoming less impressed by LLMs as time passes?

I will admit, when Copilot first became a thing in 2021, I had my own “I’m about to become obsolete” moment.

However, it’s become clear to me, both through my own experience and through research that has been conducted, that modern LLMs are fundamentally flawed and are not on the path to general intelligence.

We are stuck with ancient (in AI terms) technology. GPT 4 is better than 3.5, but not in a fundamental way. I expect much the same from 5. This technology is incredibly flawed, and in hindsight, once we have actual powerful AI, I think we’ll laugh at how much attention we gave it.


> Am I the only one becoming less impressed by LLMs as time passes?

Not at all.

I was very impressed at first but it's gotten to the point where I can no longer trust anything it says other than very high level overviews. For example, I asked it to help me implement my own sound synthesizer from scratch. I wanted to generate audio samples and save them to wave files. The high level overview was helpful and enabled me to understand the concepts involved.

The code on the other hand was subtly wrong in ways I simply couldn't be sure of. Details like calculating the lengths of structures and whether something did or did not count towards the length were notoriously difficult for it to get right. Worse, as a beginner just encountering the subject matter I could not be sure if it was correct or not, I just thought it didn't look right. I'd ask for confirmation and it would just apologize and change the response to what I expected to hear. I couldn't trust it.

It's pretty great at reducing the loneliness of solo programming though. Just bouncing ideas and seeing what it says helps a lot. It's not like other people would want to listen.


> It's pretty great at reducing the loneliness of solo programming though. Just bouncing ideas and seeing what it says helps a lot. It's not like other people would want to listen.

It's really great for this.

I've found it useful for taking some pattern I've cranking on with an extensive API and finishing the grunt work for me... it generally does a very good job if you teach it properly. I recently had to do a full integration of the AWS Amplify Auth library and instead of grinding for half a day to perfect every method, it just spits out the entire set of actions and reducers for me with well-considered state objects. Again, it needs guidance from someone with a clue, so don't fear it taking my job anytime soon.


My takeaway from this is that we should lament the gradual death of niche forums where we can discuss this with real humans.


> It's pretty great at reducing the loneliness of solo programming though. Just bouncing ideas and seeing what it says helps a lot. It's not like other people would want to listen.


>Am I the only one becoming less impressed by LLMs as time passes

Jaron Lanier has some ideas about the space in between turing test and blade runner.

The first film goers, watching simple black and white movies thought that they were uncanny. A train coming towards the screen, would make audiences jump and duck. When people first heard gramophones, they reported that it is indistinguishable from live orchestra.

As we learn a technology, we learn to recognize. Get a feel for its limitations and strengths. The ability to detect that technology, is a skill. Less impressive over time.

It's hard not to be impressed when a thing does a thing that you did not think it could do.

We didn't move on to being unimpressed when the thing cannot do the thing we thought it be able to do.


I am not sure that GPT-4 is not better in a fundamental way than GPT-3.5. To me they seem like night and day. If GPT-5 is a similar jump, it will be impossible to compete without using it (or using a related / similar model). Yes they are both GPT models and trained as simple autoregressive LM, but there is a dramatic change you can experience at a personal level once GPT-4 can synthesize information correctly to address your specific requests in so many different contexts that GPT-3.5 was simply parroting like a toddler. All of LLM is just probabilistic inference on large bodies of text, however I do buy the idea that with enough compute and data a sufficiently large model will build the architecture it optimally needs to understand the data in the best possible way during training. And once the data becomes multimodal, the benefit to these probabilistic models can theoretically be multiplicative not just additive as each new modality will clarify and eliminate any previously wrong representations of the world. Yes, we will all laugh at how good GPT-10 trained with text, image, video, audio, and taste sensors will be, and yet GPT-4 was a major step forward, much bigger than any step taken by humanity so far.


Me too.

I am seeing people seriously using the "Please write an expression for me which adds 2 and 2" prompt in order to get the "2+2" expression they need – advocating that they got it with magical efficiency. In all honesty, I don't like writing too much, and writing code for me is always shorter and faster than trying to describe it in general-purpose language, that is why we need code in the first place.


It sounds like your initial impression was an overestimate and your current impression is a correction back down from that. You could say that it's "fundamentally flawed" coming from a very high initial expectation, but you could just as easily say "this is an amazing tool" coming from the point of view that it's "worthless" as many people seem to think


If I can be so bold as to chime in, perhaps "fundamentally flawed" because it's design means it will never be more than a very clever BS engine. By design it is a stochastic token generator and its output will only ever be fundamentally some shade of random unless a fundamental redesign occurs.

I was also fooled and gave it too much credit, if you engage in a philosophical discussion with it it seems purpose-built for passing the turing test.

If LLMs are good at one thing, it's tricking people. I can't think of a more dangerous or valueless creation.


> If I can be so bold as to chime in, perhaps "fundamentally flawed" because it's design means it will never be more than a very clever BS engine.

How is your fellow human better? People here seems to spend a lot of time talking about how much their average boss, coworkers, juniors are ass. The only reason I know that ChatGPT is based on a computer program is how fast it is. I wouldn't be able to tell its output (not mannerism) from a junior's or even some "seniors'" programmer. That itself is quite impressive.

With how much time we've spend on the internet, have we not realized how good PEOPLE are at generating bullshit? I am pretty I am writing bullshit right as this moment. This post is complete ass.


I don't think that's true. It helps to know a few obscure facts about LLMs. For example, they understand their own level of uncertainty. Their eagerness to please appears to be a result of subtle training problems that are correctable in principle.

I've noticed that GPT-4 is much less likely to hallucinate than 3, and it's still early days. I suspect OpenAI is still tweaking the RLHF procedure to make their models less cocksure, at least for next generation.

The other thing is that it's quite predictable when an LLM will hallucinate. If you directly command it to answer a question it doesn't know or can't do, it prefers to BS than refuse the command due to the strength of its RLHF. That's a problem a lot of humans have too and the same obvious techniques work to resolve it: don't ask for a list of five things if you aren't 100% certain there are actually five answers, for example. Let it decide how many to return. Don't demand an answer to X, ask it if it knows how to answer X first, and so on.

And finally, stick to questions where you already know other people have solved it and likely talked about it on the internet.

I use GPT4 every day and rarely have problems with hallucinations as a result. It's very useful.


Yes. Much of the "wow factor" of generative AI is simple sleight of hand. Humans are trained to see patterns where there are none, and ignore anything that doesn't fit our preconceived notions of order. Often AI is just a complicated Clever Hans effect.

For a real example: once you start analyzing an AI image with a critical mind, you see that most of the image violates basic rules of composition, perspective and anatomy. The art is frankly quite trash, and once you see it it is hard to unsee.


To add an example, people ask it to generate new piece of code and then add more questions to refine it. Writing new CRUD is not impressive.

I can do that with scaffolding or copy past template and change.

I did not try and I did not see someone actually giving existing code asking GPT to fix or change it. So that is something I’d try.


AI is the next bubble. VCs are really pushing it but I don't see this solving day to day software development problems anytime soon. Solving difficult CS problems is one thing and I do find it impressive, unfortunately the greater majority of everyday work is not about generating Snake games or 0/1 knapsack solutions.

Also the idea that we'll need less engineers is bogus. Technology doesn't reduce the amount of work we do, it just increases productivity and puts more strain on individuals to perform. With AI spitting out unmaintainable code nobody understands I can only see more work for more engineers as the amount of code grows.


Idk. Tech bubbles, hype cycles.. they're weird, sometimes unhinged.. they're not entirely irrational.

In aggregate, they are just the phenomenal of an extremely high risk high reward investment environment.

Most tech companies do not need cash to scale. There are few factories to be built. What they need is risk capital. The big successes alphabet, Facebook, Amazon.. these winds are so big, that they really do "justify" the bubbles.

Amazon alone, arguably justifies the '90s dotcom bubble. The tens of billions invested into venture, IPOs... A balanced portfolio accrued over the period, was probably profitable in the long term... Especially if the investor kept buying through and after the crash.

IDK that anyone actually invests in risky startups that way, but just as a thought device..


You are not alone. People are reading papers, not building things.


If LLMS are not on the path to general intelligence, what is then?


> We are stuck with ancient (in AI terms) technology.

What are you talking about? ChatGPT came out only a year ago, GPT-4 less than a year ago. That's the opposite of ancient technology, it's extremely recent.


To be fair the theory behind it is old. Only that hardware wasn't up to the task yet. For example here's language prediction from 1991 https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog....


That's not true. Modern language models use modern theory, such as the transformer architecture.


I think the claim is that, from the point of view of when we have a real AI, this is the "ancient" stuff.


I have a simple front-end test that I give to junior devs. Every few months I see if ChatGPT can pass it. It hasn’t. It can’t. It isn’t even close.

It answers questions confidently but with subtle inaccuracies. The code that it produces is the same kind of non-sense that you get from recent bootcamp devs who’ve “mastered” the 50 technologies on their eight page résumé.

If it’s gotten better, I haven’t noticed.

Self-driving trucks were going to upend the trucking industry in ten years, ten years ago. The press around LLMs is identical. It’s neat but how long are these things going to do the equivalent of revving to 100 mph before slamming into a wall every time you ask them to turn left?

I’d rather use AI to connect constellations of dots that no human possibly could, have an expect verify the results, and go from there. I have no idea when we’re going to be able to “gpt install <prompt>” to get a new CLI tool or app, but, it’s not going to be soon.


I was on a team developing a critical public safety system on a tight deadline a few years ago, and i had to translate some wireframes for the admin back-end into CSS. I did a passable job but it wasn’t a perfect match. I was asked to redo it by the team-lead. It had zero business value, but such was the state of our team…being pixel perfect was a source of pride.

It was one of the incidents that made me to stop front-end development.

As an exercise, I recently asked ChatGPT to produce similar CSS and it did so flawlessly.

I’m certainly a middling programmer when it comes to CSS. But with ChatGPT I can produce stuff close to the quality of what the CSS masters do. The article points this out: middling generalists can now compete with specialists.


> I recently asked ChatGPT to produce similar CSS and it did so flawlessly.

I use ChatGPT every day for many tasks in my work and find it very helpful, but I simply do not believe this.

> The article points this out: middling generalists can now compete with specialists.

I'd say it might allow novices to compete with middling generalists, but even that is a stretch. On the contrary, ChatGPT is actually best suited to use by a specialist who has enough contextual knowledge to construct targeted prompts & can then verify & edit the responses into something optimal.


That's about my experience.

The worst dev on my team uses ChatGPT a lot, and its facilitated him producing more bad code more quickly. I'm not sure it's a win for anyone, and he's still unlikely to be with the team in a year.

It allows a dev who doesn't care about their craft or improving to generate code without learning anything. The code they generate today or a year from today is the same quality.

Part of it is that it allows devs who lean into overcomplicating things to do so even more. The solutions are never a refinement of what already exists, but patch on top of patch on top of patch of complexity. ChatGPT is not going to tell you how to design a system, architect properly, automate, package, test, deploy, etc.

For the team it means there's a larger mess of a code base to rewrite.


> ChatGPT is not going to tell you how to design a system, architect properly, automate, package, test, deploy, etc.

If you ask the right questions it absolutely can.

I’ve found that most people thinking ChatGPT is a rube are expecting too much extrapolation from vague prompts. “Make me a RESTful service that provides music data.” ChatGPT will give you something that does that. And then you’ll proceed to come to hacker news and talk about all the dumb things it did.

But, if you have a conversation with it. Tell it more of the things you’re considering. Some of the trades off you’re making—how the schema might grow over time, it’s kind of remarkable.

You need to treat it like a real whiteboarding session.

I also find it incredibly useful for getting my code into more mainstream shape. I have my own quirks that I’ve developed over time learning a million different things in a dozen different programming languages. It’s nice to be able to hand your code to ChatGPT and simply ask “is this idiomatic for this language?”

I think the people most disappointed with ChatGPT are trying to treat it like a Unix CLI instead of another developer to whiteboard with.


This has been my experience as well.

Every person I've noticed who says that ChatGPT isn't good at what it does has the same thing in common - they're not great at talking to people, either.

Turns out when you train an AI on the corpus of human knowledge, you have to actually talk to it like a human. Which entirely too many people visiting this website don't do effectively.

ChatGPT has allowed me to develop comprehensive training programs for our internal personnel, because I already have some knowledge of training and standardization from my time in the military, but I also have in-depth domain knowledge so I can double-check what it's recommending, then course correct it if necessary.


> Every person I've noticed who says that ChatGPT isn't good at what it does has the same thing in common - they're not great at talking to people, either.

I think that the people who nowadays shit on ChatGPT's code generating abilities are the same blend of people who, a couple decades ago, wasted their time complaining that hand-rolled assembly would beat any compiled code in any way, shape, or form, provided that people knew what they were doing.


> But, if you have a conversation with it. Tell it more of the things you’re considering. Some of the trades off you’re making—how the schema might grow over time, it’s kind of remarkable.

You're not wrong, but I would caution that it can get really confused when the code it produces exceeds the context length. This is less of a problem than it used to be as the maximum context length is increasing quite quickly, but by way of example: I'm occasionally using it for side projects to see how to best use it, one of which is a game engine, and it (with a shorter context length than we have now) started by creating a perfectly adequate Vector2D class with `subtract(…)` and `multiply(…)` functions, but when it came to using that class it was calling `sub(…)` and `mul(…)` — not absolutely stupid, and a totally understandable failure mode given how it works, but still objectively incorrect.


I frequently run into this, and it’s quite maddening. When you’re working on a toy problem where generating functioning code is giving you a headache - either because it’s complex or because the programming language is foreign or crass - no problem. When you’re trying to extend an assemblage of 10 mixins in a highly declarative framework that many large-scale API contracts rely on to be correct, the problem is always going to boil down to how well the programmer understands the existing tools/context that they’re working with.

To me, a lot of this boils down to the old truism that “code is easier to write than maintain or extend”. Companies who dole out shiny star stickers for producing masses of untested, unmaintainable code will always reap their rewards, whether they’re relying on middling engineers and contractors alone, or with novices supercharged with ChatGPT.


> But, if you have a conversation with it

It can't tell you a straight answer or halucinates API. It can't tell you "no, this cannot be done", it tries to "help" you.

For me it's great for writing simple isolated functions, generating regexes, command line solutions, exploring new technologies, it's great.

But after making it write a few methods, classes, it just gets extremelely tedious to make it add/change code, to the point I just write it myself.

Further, when operating at the edge of your knowledge, it also leads you on, whereas a human expert would just tell you "aaah, but that's just not possible/not a good idea".


I think that's a fair description. While I have not yet found ChatGPT useful in my "real" day job (its understanding of aerospace systems is more than I would have guessed, but yet not enough to be super helpful to me), I have found it generally useful in more commonplace scripting tasks and what-not.

With the caveat of, I still need to understand what it's talking about. Copy-pasting whatever it says may or may not work.

Which is why I remain dubious that we're on the road to LLMs replacing software engineers. Assisting? Sure, absolutely.

Will we get there? I don't know. I mean, like, fundamentally, I do not trust LLMs. I am not going to say "hey ChatGPT, write me a flight management system suitable for a Citation X" and then just go install that on the plane and fly off into the sunset. I'm sure things will improve, and maybe improve enough to replace human programmers in some contexts, but I don't think we're going to see LLMs replacing all software engineers across the board.


In a similar vein, ChatGPT can be an amazing rubber duck. If I have strange and obscure problems that stumps me, I kinda treat ChatGPT like I would treat a forum or an IRC channel 15 - 20 years back. I don't have "prompting experience or skills", but I can write up the situation, what we've tried, what's going on, and throw that at the thing.

And.. it can dredge up really weird possible reasons for system behaviors fairly reliably. Usually, for a question of "Why doesn't this work after all of that?", it drags up like 5-10 reasons for something misbehaving. We usually checked like 8 of those. But the last few can be really useful to start thinking outside of the normal box why things are borked.

And often enough, it can find at least the right idea to identify root causes of these weird behaviors. The actual "do this" tends to be some degree of bollocks, but enough of an idea to follow-up.


> The worst dev on my team uses ChatGPT a lot, and its facilitated him producing more bad code more quickly.

This is great. The exact same is true with writing, which I think it's trivial for anyone to see. Especially non-native speakers or otherwise bad writers can now write long-winded nonsense, which we're starting to see all over. It hasn't made anyone a good writer, it's just helped bad ones go faster.


> Especially non-native speakers or otherwise bad writers can now write long-winded nonsense

You have now described 95% of Quora's content.


Isn't it expected? Chatgpt was trained on such texts as well.


Bless this person your team, he is creating work out of thin air and will keep your team and possibly other teams employed for a really long time


Exactly - and people say AI will take away jobs!


This could even play out on a broader scale and even increase the demand for software engineers.

What is going to happen if more and more people are creating lots of software that delivers value but is difficult to maintain and extend?

First it's going to be more high-level stuff, then more plumbing, more debugging and stitching together half-baked solutions than ever before.

AI might make our jobs suck more, but it's not going to replace them.



I have a hunch that using ChatGPT might be a skill in of itself and it doesn’t necessarily hurt or help any particular skill level of developers.

In previous replies in this thread the claim is it helps novices compete with associates or associates with seniors but in reality, it will probably help any tier of skill level. You just have to figure out how to prompt it


One hundred percent. Most people I’ve seen dismiss ChatGPT simply refuse to engage it appropriately. It’s not likely to solve your most complex problem with a single prompt.

Asking the right questions is such an important skill in and of itself. I think we’re seeing to some extent the old joke about engineers not knowing how to talk to people manifest itself a bit with a lot of engineers right now not knowing quite how to get good results from ChatGPT. Sort of looking around the room wondering what they’re missing since it seems quite dumb to them.


I had a friend jokingly poke fun at me for the way I was writing ChatGPT prompts. It seemed, to him, like I was going out of my way to be nice and helpful to an AI. It was a bit of an aha moment for him when I told him that helping the AI along gave much more useful answers, and he saw I was right.


They use GPT 3.5, prompt it with "Write a javascript login page" and then look at the code an go "Damn, this thing is stupid as fuck".


I use “chatGPT” (really bing chat which is openAI under the hood as I understand) more than anyone on my team but it is very rarely for code.

I most often use it for summarizing/searching through dense documentation, creating quick prototypes, “given X,Y,Z symptoms and this confusing error message, can you give me a list of possible causes?” (basically searches Stack Overflow far better than I can).

Anyway basically the same as I was using google when google was actually good. sometimes I will forget some obscure syntax and ask it how to do something, but not super often. I’m convinced using it solely to generate code is a mistake unless it’s tedious boilerplate stuff.


Yes, agreed. The best way of putting this is "using google when google was actually good."


Bing is far and away worse than gpt4 through ChatGPT or the API, just FYI. Don't even consider it comparable, even if they say it is the same model under the hood. Their "optimizations" have crippled it's capabilities if that is the case.


Well, it works pretty well for me, and cites its sources with links, and I have yet to catch it making up something.


We were talking about code generation though. It's horrible at it.


My parent commen says I rarely use it for code generation and I think if you're using these tools purely for that you're doing it wrong... that was like my entire point


Right, but your reference for the point is a weak model. Your view would likely change a bit if you used GPT-4. It is substantially more powerful and skilled.


Exactly what I was trying to say. Thank you.

Still works best if you know what you are doing and can give very detailed instructions, but GPT-4 is vastly more capable than Bing for these tasks.


But I don't need it to be. Like at all. I've yet to find a programming task that GPT-4 would have made faster, and yes I have used it.


On the flip side, one can use ChatGPT as only a starting point and to learn from there. One isn't stuck with actually using what it outputs verbtim, and really shouldn't until at least a hypothetical GPT-6 or 7... and to use it fully now, one has to know how to nudge it when it goes into a bad direction.

So overall it's more an amplifier than anything else.


I have a lot of juniors floating around my co-op and when I watch them use chatgpt it seems it becomes a dependency. In my opinion it's harming their ability to learn. Rather than thinking through problems they'll just toss every single roadblock they hit instantly into a chatgpt prompt.

To be fair I've been doing the same thing with simple mathematics into a calculator in my browser that at this point I'm pretty sure I'd fail at long division by hand.

Maybe it won't matter in a few years and their chatgpt skills will be well honed, but if it were me in their position I wouldn't gamble on it.


Yeah that's my larger point about the guy and the pattern. It doesn't lead to growth. I've seen zero growth whatsoever.

And he slacks coworkers like he is talking to ChatGPT too, slinging code blobs without context, example input data, or the actual error he received..


> Yeah that's my larger point about the guy and the pattern. It doesn't lead to growth. I've seen zero growth whatsoever.

If Chat GPT can solve a problem consistently well, I don't think it's worth the effort to master it.

My examples are regexes, command lines to manipulate files, kafka/zookeeper commands to explore a test environment.

For me it's a big win in that regard.


> So overall it's more an amplifier than anything else.

Overall it would be an amplifier if that were how the majority used it. Sadly I don't believe that to be the case.


That's been the case with every technology made by man since fire.


Yup. We should embrace it, but without be naïve about what great things it's bringing us :)


Ouch! Hot!


If the results were more akin to a google or stack overflow where there was a list of results with context.. sure.

But people are using the singular response as "the answer" and moving on..


> If the results were more akin to a google or stack overflow where there was a list of results with context.. sure.

I don't think the history of the usage of either shows that most people make any use of that context.


Especially these days you have to know how to use/read Google and SO results too.

(And I should have said ChatGPT4 earlier, if you're a bad to medicore developer taking ChatGPT3.5 literally you'll probably wind up in a Very Bad Place.)


Phind is a bit more like this


This has been my experience as well - For repetitive things, If what you're looking for is the shitty first draft, it's a way to get things started.

After that, you can shape the output - without GPT's help - into something that you can pull off the shelf again as needed and drop it into where you want it to go, because at that point in the process, you know it works.


> nudge it when it goes into a bad direction

It happenes a few times for me that Chat GPT gets stuck in a bullshit loop and I can't get it unstuck.

Sure I could summarise the previous session for Chat GPT and try again, but I'm too tired at that point.


I get that with humans sometimes too. Even here, even before LLMs became popular. Someone gets primed on some keyword, then goes off on a direction unrelated to whatever it was I had in mind and I can't get them to change focus — and at least one occasion (here) where I kept saying ~"that's not what I'm talking about" only to be eventually met (after three rounds of this) with the accusation that I was moving the goalposts :P


Yeah, regardless of hallucinations and repeating the same mistake even after you tell it to fix it, iterating with ChatGPT is so much less stressful than iterating with another engineer.

I almost ruined my relationship with a coworker because they submitted some code that took a dependency on something it shouldn't have, and I told them to remove the dependency. What I meant was "do the same thing you did, but instead of using this method, just do what this method does inside your own code." But they misinterpreted it to mean "Don't do what this method does, build a completely different solution." Repeated attempts to clarify what I meant only dug the hole deeper because to them I was just complaining that their solution was different from how I would've done it.

Eventually I just showed them in code what I was asking for (it was a very small change!) and they got mad at me for making such a big deal over 3 lines of code. Of course the whole point was that it was a small change that would avoid a big problem down the road...

So I'll take ChatGPT using library methods that don't exist, no matter how much you tell them to fix it, over that kind of stress any day.


Use the edit button


>One isn't stuck with actually using what it outputs verbtim, and really shouldn't until at least a hypothetical GPT-6 or 7... and to use it fully now, one has to know how to nudge it when it goes into a bad direction.

Exactly.

chatGPT is the single smartest person you can ask about anything, and has unlimited patience.


>ChatGPT is not going to tell you how to design a system, architect properly, automate, package, test, deploy, etc.

Really? Please explain how chatGPT can not, but you can? What magic is it that you know that it's incapable of explaining?


because it seems to have built in assumptions about the scale, scope and complexity of soluton you are trying to develop. Specifically, unless you tell it to think about automated testing, or architecting, or making code be loosely coupled, it will hack.

this is because a lot of the time what a beginner needs IS a hack. But if you always hack at your job, things stop working.

I want to reiterate. One of the reasons the LLM is so helpful is because it has been RLHF'ed into always treating you as a beginner. This is core to it's usefulness. But it also limits it from producing quality work unless you bombard it with an enormous prompt explaining all of the difference between a beginne's and an expert's code. Which is tedious and lengthy to do every time when you just want a short change that doesnt suck.

Humans are able to learn from all sorts of external cues what level they should pitch an idea at. Startups should focus on hacky code, for velocity. Large buisnesses should focus on robust processes.


I agree with this. There are cases where it produces good results, but there are also cases where it produces bs, and it's not always obvious. I find it to work fine for cases where I know what I want but could use a starting point, but it often invents or misunderstands all kinds of things.

The most frustrating situations are those where it invents a function that would miraculously do what's necessary, I tell it that function does not exist, it apologizes, shuffles the code around a bit and invents a different function, etc. It's the most annoying kind of debugging there is.


Another very obvious thing it does when it comes to code is take the most common misconceptions & anti-patterns used within the programming community & repeats them in an environment where there's no-one to comment. People have critiqued Stack Overflow for having so many "wrong" answers with green checkmarks, but at least those threads have surrounding context & discussion.

A case in point: I asked ChatGPT to give me some code for password complexity validation. It gave me perfectly working code that took a password and validated it against X metrics. Obviously the metrics are garbage, but the code works, and what inexperienced developer would be any the wiser. The only way to get ChatGPT to generate something "correct" there would be to tell it algorithmically what you want (e.g. "give me a function measuring information entropy of inputs", etc.) - you could ask it 50 times for a password validator: every one may execute successfully & produce a desired UI output for a web designer, but be effectively nonsense.


> there are also cases where it produces bs, and it's not always obvious

Particularly annoying because I wind up risking not actually saving time because it’s producing subtle bugs that I wouldn’t have written myself.

So, you save yourself the time of thought and research at the risk of going down new and mysterious rabbit holes


For me the trick to avoiding this trap is to limit usage to small areas of code, test frequently, and know its limits. I love using copilot/GPT for boilerplate stuff.


> There are cases where it produces good results, but there are also cases where it produces bs, and it's not always obvious.

Pessimistically, this is the medium term role I see for a lot of devs. Less actual development, more assembly of pieces and being good enough at cleaning up generated code.

If an LLM can get you even 25% there most of the time, that's a massive disruption of this industry.


I mean, especially in webdev we've been heading in that direction for a while now anyway. So much of the job is already just wiring up different npm packages and APIs that someone else has written. I've read substantially similar comments back in the mid 2010s about how people weren't learning the fundamentals and just pulling things like left pad off of a repo. That did cause a disruption in how people coded by abstraction away many of the problems and making the job more about integrating different things together.


> ChatGPT is actually best suited to use by a specialist who has enough contextual knowledge to construct targeted prompts & can then verify & edit the responses into something optimal.

I agree with this, but what that means is that specialists will be able to create next generation tools--across all professions including coding--that do supercharge novices and generalists to do more.


> ChatGPT is actually best suited to use by a specialist who has enough contextual knowledge to construct targeted prompts

This is my take also.

ChatGPT for novices is dangerous, its the equivalent of a calculator. If you don't know your expected output you're just wrong faster.

But if you know what to expect, whats your bounds and how to do it normally anyway, it can make you faster.


I wrote just the tool to optimize AI in the hand of a coding expert https://observablehq.com/@tomlarkworthy/robocoop


I’d need to see it.

I can’t get ChatGPT to outperform a novice. And now I’m having candidates argue that they don’t need to learn the fundamentals because LLMs can do it for them.. Good luck HTML/CSS expert who couldn’t produce a valid HTML5 skeleton. Reminds me of the pre-LLM guy who said he was having trouble because usually uses React.. So I told him he could use React. I don’t mean to rag on novices but these guys really seemed to think the question was beneath them.

If you want to get back into front-end read “CSS: The Definitive Guide”. Great book, gives you a complete understanding of CSS by the end.


Requirements vary. It certainly can't produce really complex visual designs, or code a designer would be very happy with, but I have a hobby project work in progress where gpt4 has produced all of the CSS and templates. I have no doubt that the only reason that worked well is that it's a simple design of a type there is about a billion of in its training set and that it'd fall apart quickly if I started deviating much from that. But if t produced both clean CSS and something nicer looking than I suspect I would have myself.

A designer would probably still beat it - this doesn't compete with someone well paid to work on heavily custom designs. But at this point it does compete with places like Fiverr for me for things I can't or don't want to do myself. It'll take several iterations for it to eat it's way up the value chain, but it probably will.

But also, I suspect a lot of the lower end of the value chain, or at least part of them, will pull themselves up and start to compete with the lower end of the middle by figuring out how to use LLMs to take on bigger, more complex projects.


This meshes pretty well with my experience.


I'm always asking it to stitch together ad hoc bash command lines for me, eg "find all the files called *.foo in directories called bar and search them for baz".

(`find / -type d -name 'bar' -exec find {} -type f -name '*.foo' \; | xargs grep 'baz'` apparently.)

I would have done that differently, but it's close enough for government work.


This is funny to me, because I would _always_ use -print0 and xargs -0, and for good reasons, I believe. But if you base your entire knowledge on what you find online, then yes, that's what you get - and what _most people will get too_. Also, I can still update that command if I want.

So it's not any worse than good-old "go to stack overflow" approach, but still benefits from experience.

FYI, this is the correct, as-far-as-I-can-tell "good" solution:

find . -type d -name 'bar' -print0 | \ xargs -0 -I{} find {} -type f -name '*.foo' -print0 | \ xargs -0 grep -r baz

This won't choke on a structure like this: ls -R .: bar foo

./bar: test.foo 'test test.foo'

./foo: bar bleb.foo

./foo/bar:


...actually

find . -path '/bar/.foo' -print0 | xargs -0 grep baz

;-) no regex, no nested suff, much shorter. My brain went back to it ;-)


Using better languages like Powershell or Python becomes a lot more valuable here. I definitely think bash is going to be mostly useless in 5 years, you'll be able to generate legible code that does exactly what you want rather than having to do write-only stuff like that. Really we're already there. I've long switched from bash to something else at the first sign of trouble, but LLMs make it so easy. Poorly written python is better than well-written bash.

Of course, LLMs can generate go or rust or whatever so I suspect such languages will become a lot more useful for things that would call for a scripting language today.


> I definitely think bash is going to be mostly useless in 5 years

I'll take that bet


This is kinda side to my main point: while online knowledge is great, there are sometimes surprisingly deep gaps in it. So I can see AI trained on it sometimes struggle in surprising ways.


I would generalize even more and say that any scripting language is going to be deprecated very soon, like Python etc. They are going to be replaced by safe, type-checked, theorem-proved verbose code, like Rust or something similar.

What do i care how many lines of code are necessary to solve a problem, if all of them are gonna be written automatically. 1 line of Bash/awk versus 10 lines of Python versus 100 lines of Rust? Are they any different to one another?


  $ find . -type f -regex '.*bar/[^/]*.foo' -exec grep baz {} +
I wonder if you created a GPT and fed it the entirety of Linux man pages (not that it probably didn't consume them already, but perhaps this weights them higher), if it would get better at this kind of thing. I've found GPT-4 is shockingly good at sed, and to some extent awk; I suspect it's because there are good examples of them on SO.


If SO had known to block GPTBot before it was trained, GPT4 would be a lot less impressive.


Same here! Thats the main use I have for ChatGPT in any practical sense today - generating Bash commands. I set about giving it prompts to do things that I've had to do in the past - it was great at it.

Find all processes named '*-fpm' and kill the ones that have been active for more than 60 seconds - then schedule this as a Cron job to run every 60 seconds. It not only made me a working script rather than a single command but it explained its work. I was truly impressed.

Yes it can generate some code wireframes that may be useful in a given project or feature. But I can do that too, usually in about the time it'd take me to adequately form my request into a prompt. Life could get dangerous in a hurry if product management got salty enough in the requirements phase that the specs for a feature could just be dropped into some code assistant and generate product. I don't see that happening ever though - not even with tooling - product people just don't seem to think that way in the first place in my experience.

As developers we spend a lot of our time modifying existing product - and if the LLM knows about that product - all the better job it could do I suppose. Not saying that LLMs aren't useful now and won't become more useful in time - because they certainly will.

What I am saying is that we all like to think of producing code as some mystical gift that only we as experienced (BRILLIANT, HANDSOME AND TALENTED TOO!!) developers are capable of. The reality is that once we reach a certain level of career maturity, if we were ever any good in the first place, writing code becomes the easiest part of the job. So theres a new tool that automates the easiest part of the job? Ok - autocomplete code editors we're cool too like that. The IDE was a game changer too. Automated unit tests were once black magic too (remember when the QA department was scared of this?).

When some AI can look at a stack trace from a set of log files, being fully aware of the entire system architecture, locate the bug that compiled and passed testing all the way to production, recommend, implement, test and pre-deploy a fix while a human reviews the changes then we're truly onto something. Until then I'm not worried that it can write some really nice SQL against my schema with all kinds of crazy joins - because I can do that too - sometimes faster - sometimes not.

So far ChatGPT isn't smarter than me but it is a very dutiful intern that does excellent work if you're patient and willing to adequately describe the problem, then make a few tweaks at the end. "Tweaks" up to seeing how the AI approached it, throwing it out and doing it your own way too.


Except you should at least try to write code for someone else (and probably of lower level of competence - this also helps for your own debugging later) - obscure one-liners like these should be rejected.


I wouldn't call it obscure, just bog standard command line stuff. How would you have done it?


The lower level person need only plug that one liner into chatGPT and ask for a simple explanation.

We're in a different era now.


Yep! It’s something some aren’t seeing.

The AI coding assistant is now part of the abstraction layers over machine code. Higher level languages, scripting languages, all the happy paths we stick to (in bash, for example), memory management with GCs and borrow checkers, static analysis … now just add GPT. Like mastering memory management and assembly instructions … now you also don’t have to master the fiddly bits of core utils and bash and various other things.

Like memory management, whole swathes of programming are being taken care of by another program now, a Garbage Collector, if you will, for all the crufty stuff that made computing hard and got in between intent and assessment.


The difference is that all of them have theories and principles backing them, and we understand why they work.

LLMs (and "AI" in general) are just bashing data together until you get something that looks correct (as long as you squint hard enough). Even putting them in the same category is incredibly insulting.


There are theories and principles behind what an AI is doing and a growing craft around how to best use AI that may very well form relatively established “best practices” over time.

Yes there’s a significant statistical aspect involved in the workings of an AI, which distinguishes it from something more deterministic like syntactic sugar or a garbage collector. But I think one could argue that that’s the trade off for a more general tool like AI in the same way that giving a task to a junior dev is going to involve some noisiness in need of supervision. But in grand scheme of software development, is devs are in the end tools too, apart of the grand stack, and I think it’s reasonable to consider AI as just another tool in the stack. This is especially so if devs are already using it as a tool.

Dwelling on the principled v statistical distinction, while salient, may very well be a fallacy or irrelevant to the extent that we want to talk about the stack of tools and techniques software development employs. How much does the average developer understand or employ said understanding of a principled component of their stack? How predictable is that component, at least in the hands of the average developer making average but real software? When the end of the pipeline is a human and it’s human organisation of other humans, whether a tool’s principled or statistical may not matter much so long as it’s useful or productive.


Yes, but this is not something that has been enabled by the new neural networks, but rather by search engines, years ago - culminating in the infamous «copy-paste from Stack Overflow without understanding the code» / libraries randomly pulled from the Web with for instance the leftpad incident.

So what makes it different this time ?


So now the same tool can generate both the wrong script and the wrong documentation!


Wait, wait, let me show you my prompt for generating unit tests.


Assuming there is a comment just above the one-liner saying "find all directories named 'bar', find all files named '*.foo' in those directories, search those files for 'baz'", this code is perfectly clear. Even without the comment, it's not hard to understand.


If the someone elses on my team can't read a short shell pipeline then I failed during interviewing.


To me, the only obscure thing about this is that it is a one-liner.

If you write it in three lines, it is fine. Although, I guess the second find and the grep could be shortened, combined into one command.


I agree - but my point still stands.


You can use gpt for government work??


Shh. What they don't know won't hurt me.

(Serious answer: it's just an expression. https://grammarist.com/idiom/good-enough-for-government-work...).


I haven't practiced or needed to use the fundamentals in literal years; I'm sure I'd fumble some of these tests, and I've got err, 15 years of experience.

It's good to know the fundamentals and be able to find them IF you find a situation where you need them (e.g. performance tuning), but in my anecdotal and limited experience, you're fine staying higher level.


I had a chilling experience of late when, out of curiosity, I tried the actual online practice exam for driving school. Boy did I fail it. I realized that there are quite some road signs I never saw in my life, and more important, that my current solution to all their right of way questions is "slow down and see what the others do" - not even that wrong if I think about but won't get you points in the exam.


And I suspect you would be a lot less likely to be involved in a crash than someone who had just passed the test.


There are levels of fundamentals though, since parent mentioned HTML/CSS/React I guess they're referring to being able create a layout by hand vs using a CSS framework/library. You don't need to know how a CPU works to fix a CSS issue, but if all you know is combining the classes available you'll have trouble with even the simplest web development.

Everyone should know enough fundamentals to be able to write simple implementations of the frameworks they depend on.


Kind of same sort of situation, but i do like to refresh some of the fundamentals every 3~4 years or so. Usually when i do a job hop.

Its kind of like asking an olympic sprinter how to walk fast.


>If you want to get back into front-end read “CSS: The Definitive Guide”. Great book, gives you a complete understanding of CSS by the end.

Do you realize for how many technologies you can say the same thing? I don't want to read a 600 page tome on CSS. The language is a drop in the bucket of useful things to know. How valuable is a "complete understanding" of CSS? I just want something on my site to look a specific way.


> I don't want to read a 600 page tome on CSS.

The latest edition is closer to 1,100 pages.

It’s worth it.

It’s usually worth it for any long-standing technology you’re going to spending a significant amount of time using. Over the years you’ll save time because you’ll be able to get to the right answer in less time, debug faster, and won’t always be pulling out your hair Googling.


Sometimes it requires expert guidance to get something meaningful out.


This is the correct answer. I have 23 years of experience in datacenter ops and it has been a game changer for me. Just like any tool on one's arsenal, it's utility increases with practice and learning to use it correctly. ChatGPT is no different. You get out of it what you put in to it. This is the way of the world.

I used to be puzzled as to why my peers are so dismissive of this tech. Same folks who would say "We don't need to learn no Kubernetes! We don't need to code! We don't need ChatGPT". They don't!

And it's fine. If their idea of a career is working in same small co. doing the same basic Linux sysadmin tasks for a third of the salary I make then more power to them.

The folks dismissive of the AI/ML tech are effectively capping their salary and future prospects in this industry. This is good for us! More demand for experts and less supply.

You ever hire someone that uses punch cards to code?

Neither have I.


I think it's more akin to using compilers in the early days of BCPL or C. You could expect it to produce working assembly for most code but sometimes it would be slower than a hand-tuned version and sometimes a compiler bug would surface, but it would work well enough most of the time.

For decades there were still people who coded directly in assembly, and with good reason. And eventually the compiler bugs would be encountered less frequently (and the programmer would get a better understanding of undefined behavior in that language).

Similar to how dropping into inline assembly for speeding up execution time can still have its place sometimes, I think using GPT for small blocks of code to speed up developer time may make some sense (or tabbing through CoPilot), but just as with the early days of higher level programming languages, expect to come across cases where it doesn't speed up DX or introduces a bug.

These bugs can be quite costly, I've seen GPT spit out encryption code and completely leave out critical parts like missing arguments to a library or generating the same nonce or salt value every execution. With code like this, if you're not well versed in the domain it is very easy to overlook, and unit tests would likely still pass.

I think the same lesson told to young programmers should be used here -- don't copy/paste any code that you do not sufficiently understand. Also maybe avoid using this tool for critical pieces like security and reliability.


I think many folks have those early chats with friends where one side was dismissing llms for so many reasons, when the job was to see and test the potential.

While part of me enjoyed the early gpt much more than the polished version today, as a tool it’s much more useful to the average person should they make it back to gpt somehow.


Just out of curiosity: is the code generated by ChatGPT not what you expected or is it failing to produce the result that you wanted.

I suspect you mean the latter, but just wanted to confirm.


The statements are factually inaccurate and the code doesn’t do what it claims it should.


Right. That's an experience completely different from the majority here that have been able to produce code that integrates seamlessly into their projects. Do you have any idea why?

I guess we should start by what version of ChatGPT you are using.


ChatGPT might as well not exist - I'm not touching anything GAFAM-related.

Any worthy examples of open source neural networks, ideally not from companies based in rogue states like the US ?


Falcon is developed by an UAE tech arm, not sure if you would consider it a rogue state or not: https://falconllm.tii.ae/


What is «tech» supposed to mean here ? Infocoms ?

The United Arab Emirates ? Well, lol, of course I do, that's way worse than the US.


ChatGPT goes from zero to maybe 65th percentile? There or thereabouts. It's excellent if you know nothing. It's mediocre and super buggy if you're an expert.

A big difference is that the expert asks different questions, off in the tails of the distribution, and that's where these LLMs are no good. If you want a canonical example of something, the median pattern, it's great. As the ask heads out of the input data distribution the generalization ability is weak. Generative AI is good at interpolation and translation, it is not good with novelty.

(Expert and know-nothing context dependent here.)

One example: I use ChatGPT frequently to create Ruby scripts for this and that in personal projects. Frequently they need to call out other tools. ChatGPT 4 consistently fails to properly (and safely!) quote arguments. It loves the single-argument version of system which uses the shell. When you ask it to consider quoting arguments, it starts inserting escaped quotes, which is still unsafe (what if the interpolated variable contains a quote in its name). If you keep pushing, it might pull out Shell.escape or whatever it is.

I assume it reproduces the basic bugs that the median example code on the internet does. And 99% of everything being crap, that stuff is pretty low quality, only to be used as an inspiration or a clue as to how to approach something.


I encountered this with particular problem in python. Seemed like GPT wanted to always answer with something that had a lot of examples on the web, even if most answers were not correct. So garbage in, garbage out problem. I'm bit worried that the LLM's will continue to degrade as the web has increasing amount of LLM generated content. Seems to already be occurring.


Why do people that hand wave away the entire concept of LLMs because of one instance of it doing one thing poorly that they could do better, and yet always seem to fail to just show us their concrete example?


Technically the garbage in/garbage out, problem is not being hand waved away. I've seen a lot of articles on this, or sometimes called a degrading feedback loop. The more of the web that is LLM generated, then the more new models will be trained on generated data, and will fuzz out. Or 'drift'.

For a specific example. Sorry, I didn't grab screen shots at the time. It had to do with updating a datafame in pandas. It gave me solution that generated an error, I'd continue to ask it to change steps to fix previous errors, and it would go in a circle, fix it, but generate other warnings, and further changes to eliminate warnings, and it would recommend the same thing that originally caused an error.

Also. I'm a big fan. Use GPT-4 all the time. SO not waving away, but kind of curious how it sometimes fails in un-expected ways.


> The more of the web that is LLM generated, then the more new models will be trained on generated data, and will fuzz out

And yet it's so obvious that a random Hackernews independently discovers it and repeats it on every Chat GPT post, and prophesies it as some inevitable future. Not could happen, will happen. The clueless researchers will be blindsided by this of course, they'll never see it coming from their ivory tower.

And yes Chat GPT fails to write code that runs all the time. But it's not very interesting to talk about without an example.


How? It isn't exactly easy to re-produce these examples. I'd have to write a few pages to document and explain it. And scrub it to remove anything too internal, so create a vanilla example of the bug. And then it would be too long to go into a post, so what then, I'd have to go sign up to blog it somewhere and link to it.

I'm not arguing that GPT is bad. Just that it is as susceptible to rabbit wholes as any human.

I'm actually having a hard time narrowing down where your frustration is aimed.

At naysayers? At those that don't put effort into documenting? At GPT itself? Or that a news site on the internet dares have repetition ?


So someone could sabotage LLMs by writing some scripts to fill GitHub (or whatever other corpus is used) with LLM-generated crap? Someone must be doing this, no?


FWIW I didn't wave away the entire concept. LLMs definitely have uses.


I would prefer that google search didn't suck. Instead, I ask ChatGPT. The best case scenario, IMO, would be for people to lay out excellent documentation and working code and train the LLM specifically on that in a way that it can provide reference links to justify its answers. Then, I will take what it says and go directly to the source to get the knowledge as it was intended to be ingested by a human. We get a lot more value than we're initially looking for when we dive into the docs, and I don't want to lose that experience.


Why not give it a systemprompt that specifies some you're requirements: "you are a experienced senior ruby developer who writes robust maintainable code, follow these coding guidelines 《examples》"


If I thought it was worthwhile, maybe that would patch that specific hole.

The other problem I get when trying to make it write code is that it gets kinda slippery with iterated refinements. In the back and forth dialog, addressing issues 1 through n in sequence, it gets to a place where issue k < n is fixed but issue i < k gets broken again. Trying to get it to produce the right code becomes a programming exercise of its own, and it's more frustrating than actually typing stuff up myself.

I mean, I still use it to get a basic shape especially when I'm working with a command line tool I'm not an expert in, it's still useful. It's just not great code.


> middling generalists can now compete with specialists.

They can maybe compete in areas where there has been a lot of public discussion about a topic, but even that is debatable as there are other tasks than simply producing code (e.g. debugging existing stuff). In areas where there's close to no public discourse, ChatGPT and other coding assistance tools fail miserably.


this be the answer. GPT is as good as the dataset it's trained off of, and if you're going by the combined wisdom of StackOverflow then you're going to have a middling time.


>> The article points this out: middling generalists can now compete with specialists.

They can't, and aren't even trying to. It's OpenAI that's competing with the specialists. If the specialists go out of business, the middling generalists obviously aren't going to survive either so in the long term it is not in the interest of the "middling generalists" to use ChatGPT for code generation. What is in their interest is to become expert specialists and write better code both than ChatGPT currently can, and than "middling generalists". That's how you compete with specialists, by becoming a specialist yourself.

Speaking as a specialist occupying a very, very er special niche, at that.


It REALLY depends on the task. For instance, if you provide GPT with a schema, it can produce a complex and efficient SQL query in <1% of the time an expert could.

I would also argue that not only are the models improving, we have less than a year practically interfacing with LLM's. OUR ability to communicate with them is in infancy, and a generation that is raised speaking with them will be more fluent and able to navigate some of the clear pitfalls better than we can.


There is not much of a need for humans to get closer to the machine long term, when with new datasets for training the machine will get closer to humans. Magic keywords like "step by step" won't be as necessary to know.

One obstacle for interfacing with LLM's is the magic cryptic commands it executes internally, but that need not be the case in the future.


> middling generalists can now compete with specialists.

I want to say that this has been the state of a lot of software development for a while now, but then, the problems that need to be solved don't require specialism, they require people to add a field to a database or to write a new SQL query to hook up to a REST API. It's not specialist work anymore, but it requires attention and meticulousness.


But if you are a middling programmer when it comes to CSS how do you know the output was “flawless” and close to the quality that css “masters” produce?


It looked correct visually and it matched the techniques in the actual CSS that the team lead and I produced when we paired to get my layout to the standard he expected.


You may think it did a good job because of your limited CSS ability. I'd be amazed if ChatGPT can create pixel-perfect animations and transitions along with reusable clean CSS code which supports all of the browser requirements at your org.

I've seen the similar claims made on Twitter by people with zero programming ability claiming they've used ChatGPT to build an app. Although 99% of the time what they've actually created is some basic boilerplate react app.

> middling generalists can now compete with specialists.

Middling generalists can now compete with individuals with a basic understanding assuming they don't need to verify anything that they've produced.


>I'd be amazed if ChatGPT can create pixel-perfect animations and transitions along with reusable clean CSS code which supports all of the browser requirements at your org.

Personally, I'd be more amazed if a person could do that than if a LLM could do it.


Google, "UI developer".


It does great at boilerplate, so I think it's safe to say it will disrupt Java.

I've been using tabnine for years now, and I use chatGPT the same way; write my boilerplate, let me think about logic.


Here’s the thing though:

If a new version of the app can be generated on the fly in minutes, why would we need to worry about reusability?

GPT generated software can be disposable.

Why even check the source code in to git - the original source artifact is the prompt after all.


Let me know how you get on with a disposable financial system, safety system or electoral voting system.


I work with java and do a lot of integration, but a looot of my effort goes into exploring and hacking away some limitations of a test system, and doing myself things that would take a lot of time if I had to ask the proper admins.

I had a problem where I was mocking a test system (for performance testing of my app) and I realized the mocked system was doing an externalUserId to internalUserId mapping.

Usually that would have been a game stopper, but instead I did a slow run, asked Chat GPT to write code that reads data from a topic and eventually create a CSV of 50k user mappings; it would have taken me at least half a day to do that, and Chat GPT allowed me to do it in 15 minutes.

While very little code went into my app, Chat GPT did write a lot of disposable code that did help me a lot.


Because in my experience GPT can produce a maximum of like 200 lines of code before it makes an error usually.


> It had zero business value, but such was the state of our team…being pixel perfect was a source of pride

UX and UI are not some secondary concerns that engineers should dismiss as an annoying "state of our team" nuance. If you can't produce a high quality outcome you either don't have the skills or don't have the right mindset for the job.


Would you give the critical public safety system bit to ChatGPT?

This scenario reminds me of:

If a job's worth doing, do it yourself. If it's not worth doing, give it to Rimmer.

Except now it's "give it to ChatGPT"


I'm a developer but also have an art degree and an art background. I'm very mediocre at art and design. But lately I've been using AI to help plug that gap a bit. I really think it will be possible for me to make an entire game where I do the code, and AI plus my mediocre art skills get the art side across the line.

I think at least in the short term, this is where AI's power will lie. Augmentation, not replacement.


It probably depends on the area. CSS is very popular on one hand and limited to a very small set of problems on the other.

I did try asking ChatGPT about system-related stuff several times and had given up since then. The answers are worthless if not wrong, unless the questions are trivial.

ChatGPT works if it needs to answer a question that was already answered before. If you are facing a genuinely new problem, then it's just a waste of time.


I suspect that the "depth" of most CSS code is significantly shallower than what gets written in general purpose programming languages. In CSS you often align this box, then align that box, and so forth. A lot of the complexity in extant CSS comes from human beings attempting to avoid excessive repetition and typing. And this is particularly true when we consider the simple and generic CSS tasks that many people in this thread have touted GPT for performing. There are exceptions where someone builds something really unique in CSS, but that isn't what most people are asking from GPT.

But the good news is that "simple generic CSS" is the kind of thing that most good programmers consider to be essentially busywork, and they won't miss doing it.


> middling generalists can now compete with specialists

Great point. That's been my experience as well. I'm a generalist and ChatGPT can bring me up to speed on the idiomatic way to use almost any framework - provided it's been talked about online.

I use it to spit out simple scripts and code all day, but at this point it's not creating entire back-end services without weird mistakes or lots of hand holding.

That said, the state of the art is absolutely amazing when you consider that a year ago the best AIs on the market were Google or Siri telling me "I'm sorry I don't have any information about that" on 50% of my voice queries.


AI is a tool. Like all tools, it can be useful, when applied the right way, to the right circumstances. I use it to write powershell scripts, then just clean them up, and voila.

That being said, humans watch too much tv/movies. ;)


>The article points this out: middling generalists can now compete with specialists.

This is why you're going to get a ton of gatekeepers asking you to leetcode a bunch of obscure stuff with zero value to business, all to prove you're a "real coder". Like the OP.


Out of curiosity, how did you pass the wireframe to chatGPT ?


I described what I wanted. It was earlier this year..not sure if chatgpt can understand wireframes now, but it couldn’t at the time.


you described it with pixel accuracy ?


Doesn't ChatGPT support image uploads these days?


Yes, but the paid-for Plus version only.

Ignore the free version, pretend it doesn't exist.


I would really like to see the prompts for some of these. Mostly because I'm an old-school desktop developer who is very unfamiliar with modern frontend.


> being pixel perfect was a source of pride.

Then use LaTex and PDF. CSS is not for designing pixel perfect documents.


I might be a bit out of the loop: how did you do it? I thought ChatGPT is text based?


[flagged]


Calling bullshit is maybe too harsh. There may be requirements matching the available training data and the right mood the LLM has been tuned for where it delivers acceptable, then considered flawless, results (extreme example: just try "create me hello world in language x" will mostly deliver flawless)... and by that amateurs (not judging, just mean less exposed to variety of problems and challenges) may end up with the feeling that LLMs could do it all.

But yes, any "serious" programmer working on harder problems can quickly derail an LLM and prove otherwise, with dozens of his simple problems each day (tried it *). It doesn't even need that, one can e.g. prove ChatGPT (also 4) quickly wrong and going in wrong circles on C++ language questions :D, though C++ is still hard, one can also do the same with questions on the not-ultracommon Python libs. It confidently outputs bullshit quick.

(*): Still can be helpful for templating, ideas, or getting into the direction or alternatives, no doubts on that!


So, don't leave us in suspense; what do you ask of it? Because I'm quite sure it can already pass it.

Your experience is very different from mine anyway. I am a grumpy old backend dev that uses formal verification in anger when I consider it is needed and who gets annoyed when things don't act logical. We are working with computers, so everything is logical, but no; I mean things like a lot of frontend stuff. I ask our frontend guy; 'how do I center a text', he says 'text align'. Obviously I tried that, because that would be logical, but it doesn't work, because frontend is, for me, absolutely illogical. Even frontend people actually have to try-and-fail; they cannot answer simple questions without trying like I can in backend systems.

Now, in this new world, I don't have to bother with it anymore. If copilot doesn't just squirt out the answer, then chatgpt4 (and now my personal custom gpt 'front-end hacker' who knows our codebase) will fix it for me. And it works, every day, all day.


I'm not the person you're responding to, but here's an example of it failing subtly:

https://chat.openai.com/share/4e958c34-dcf8-41cb-ac47-f0f6de...

finalAlice's Children have no parent. When you point this out, it correctly advises regarding the immutable nature of these types in F#, then proceeds to produce a new solution that again has a subtle flaw: Alice -> Bob has the correct parent... but Alice -> Bob -> Alice -> Bob is missing a parent again.

Easy to miss this if you don't know what you're doing, and it's the kind of bug that will hit you one day and cause you to tear your hair out when half your program has a Bob-with-parent and the other half has an Orphan-Bob.

Phrase the question slightly differently, swapping "Age: int" with "Name: string":

https://chat.openai.com/share/df2ddc0f-2174-4e80-a944-045bc5...

Now it produces invalid code. Share the compiler error, and it produces code that doesn't compile but in a different way -- it has marked Parent mutable but then tried to mutate Children. Share the new error, and it concludes you can't have mutable properties in F#, when you actually can, it just tried marking the wrong field mutable. If you fix the error, you have correct code, but ChatGPT-4 has misinformed you AND started down a wrong path...

Don't get me wrong - I'm a huge fan of ChatGPT, but it's nowhere near where it needs to be yet.


I'm not really sure what I'm looking at. It seems to perform flawlessly for me... when using Python: https://chat.openai.com/share/7e048acb-a573-45eb-ba6c-2690d2...

I only made two changes to your prompt: one to specify Python, and another to provide explicit instructions to trigger using the Advanced Data Analysis pipeline.

You also had a couple typos.

I'm not sure if "Programming-like tool that reflects programming language popularity performs poorly on unpopular programming language" is the gotchya you think it is. It performs extremely well authoring Kubernetes manifests and even makes passing Envoy configurations. There's a chance that configuration files for reverse proxy configuration DSLs have better representation than F# does. I guess if you disagree at how obscure F# is, you're observing a real, objective measurement of how obscure it is, in the fascinating performance of this stochastic parrot.


F# fields are immutable unless you specify they are mutable. The question I posed cannot be solved with exclusively immutable fields. This is basic computer science, and ChatGPT has the knowledge but fails to infer this while providing flawed code that appears to work.

An inexperienced developer would eventually shoot themselves in the foot, possibly long after integrating the code thinking it was correct and missing the flaws. FYI, your Python code works because of the mutation "extend()":

    alice.children.extend([bob, carol])


>F#

Barely exists in training data.

Might as well ask it to code some microcontroller specifically assembly, watch it fail and claim victory.


> Barely exists in training data.

Irrelevant - this is basic computer science. As far as I know, you can't create a bidirectional graph node structure without a mutable data structure or language magic that ultimately hides the same mutability.

The fact that ChatGPT recognizes the mutability issue when I explain the bug tells you it has the knowledge, but it doesn't correctly infer the right answer and instead makes false claims and sends developers down the wrong path. This speaks to OP's claim about subtle inaccuracies.

I have used ChatGPT to write 10k lines of a static analyzer for a 1k AST model definition in F#, without knowing the language before I started. I'm a big fan, but there were many, many times a less experienced developer would have shot themselves in the foot using it blindly on a project with any degree of complexity.


I would agree with you if it was a model trained to do computer science, rather than a model to basically do anything, which just happens to be able to do computer science as well.

Also code is probably one of the easiest use cases for detecting hallucinations since you can literally just see if it is valid or not the majority of the time.

It's much harder for cases where your validation involves wikipedia, or academic journals, etc.


Then we are in agreement but bear in mind that I was replying to this comment:

> So, don't leave us in suspense; what do you ask of it? Because I'm quite sure it can already pass it.


If it can pass it when you ask it in a way only a coder can write, then we will still need coders.

If you need to tweak your prompt until you get the correct result, then we still need coders who can tell that the code is wrong.

Ask Product Managers to use ChatGPT instead of coders and they will ask for 7 red lines all perpendicular to each other with one being green.

https://www.youtube.com/watch?v=BKorP55Aqvg


I didn't say we don't need coders. We need less average/bad ones and a very large amounts of coders that came after the 'coding makes $$$$' worldwide are not even average.

I won't say AI will not eventually make coding obsolete; even just 2 years ago I would've said we are 50-100 years away from that. No i'm not so sure. However, I am saying that I can replace many programmers with gpt right now, and I am. The prompting and reprompting is still both faster and cheaper than many humans.


In my mind, we need more folks who have both the ability to code and the ability to translate business needs into business logic. That’s not a new problem though.


That's what we are doing all day no? I mean besides fighting tooling (which is getting a larger and larger % of the time building stuff).


Only if you have access to end user.

If between you and your client four people are playing deaf phone (client's project manager, our project manager, team leader and some random product guy just to get even numer), then actually this is not what you are doing.

I would argue that the thing that happens at this stage is more akin to manually transpiling business logic into code.

In this kind od organization programmers become computer whisperers. And this is why there is a slight chance that GPT-6 or 7 will take their job.


TFA's point is not that «coders» won't be needed any more, it's that they will hardly spend their time «coding», that is «devot[ing themselves] to tedium, to careful thinking, and to the accumulation of obscure knowledge», «rob[bing them] of both the joy of working on puzzles and the satisfaction of being the one[s] who solved them».


You can ask it almost anything. Ask it to write a YAML parser in something a bit more complex like Rust and it falls like a rag.

Rust mostly because it's relatively new, and there isn't a native YAML parser in Rust (there is a translation of libfyaml). Also you can't bullshit your way out of Rust by making bunch of void* pointers.


How do you make a custom gpt which knows a specific code base? I have been wanting to do this


You tune an existing model on your own set of inputs/outputs.

Whatever you expect to start typing, and have the model produce as output, should be those input/output pairs.

I'd start by using ChatGPT etc. to add comments throughout your code base describing the code. Then break it into pairs where the input is the prefacing comment, and the output is the code that follows. Create about 400-500 such pairs, and train a model with 3-4 epochs.

Some concerns: you're going to get output that looks like your existing codebase, so if it's crap, you'll create a function which can produce crap from comments. :-)


I use the new feature of creating a custom gpt and I keep adding new information ; files, structures etc by editing the gpt. It seems to work well.


Ah ok so you have to paste entire files in 1 by 1, you can't just add it locally somehow? too bad you cant just upload a zip or something...


you can upload zips. Make a new GPT and go to the custom settings.


That's been my experience both with Tesla AP/FSD implementation & with LLMs.

Super neat trick the first time you encounter it, feels like alien tech from the future.

Then you find all the holes. Use it for months/years and you notice the holes aren't really closing.. The pace of improvement is middling compared to the gap to it meeting the marketing/rhetoric. Eventually using them feels more like a chore than not using them.

It's possible some of these purely data driven ML approaches don't work for problems you need to be more than 80% correct on.

Trading algos that just need to be right 55% of the time to make money, recommendation engines that present a page of movies/songs for you to scroll, Google search results that come back with a list you can peruse, Spam filters that remove some noise from your inbox.. sure.

But authoritative "this is the right answer" or "drive the car without murdering anyone".. these problems are far harder.


With the AI "revolution," I began to appreciate the simplicity of models we create when doing programming (and physics, biology, and so on as well).

I used to think about these things differently: I felt that because our models of reality are just models, they aren't really something humanity should be proud of that much. Nature is more messy than the models, but we develop them due to our limitations.

AI is a model, too, but of far greater complexity, able to describe reality/nature more closely than what we were able to achieve previously. But now I've begun to value these simple models not because they describe nature that well but because they impose themselves on nature. For example, law, being such a model, is imposed on reality by the state institutions. It doesn't describe the complexity of reality very well, but it makes people take roles in its model and act in a certain way. People now consider whether something is legal or not (instead of moral vs immoral), which can be more productive. In software, if I implement the exchange of information based on an algorithm like Paxos/Raft, I get provable guarantees compared to if I allowed LLMs to exchange information over the network directly.


I think you've found a good analogy there in the concept of moral vs legal. We defined a fixed system to measure against (rule of law) to reduce ambiguity.

Moral code varies with time, place, and individual person. It is a decimal scale of gray rather than a binary true/false.

Places historically that didn't have rule of law left their citizens to the moral interpretation whim of whoever was in charge. The state could impose different punishments on different people for different reasons at different times.

AI models I find a similar fixed&defined vs unlimited&ambiguous issue in ADAS in cars.

German cars with ADAS are limited&defined, have a list of features they perform well, but that is all.

Tesla advertises their system as an all knowing, all seeing system with no defined limits. Of course every time there is an incident they'll let slip certain limits "well it can't really see kids shorter than 3ft" or "well it can't really detect cross traffic in this scenario" etc.


Yep, lots of people are using LLMs for problems LLMs aren't good at.

They still do an alright job, but you get that exact situation of 'eh, its just okay'.

Its the ability to use those responses when they are good, and knowing when to move on from using an LLM as a tool.


Not terribly different than Google Translate.

Ff you have a familiarity with the foreign language, you can cross check yourself & the tool against each other to get to a more competent output.

If you do not know the foreign language at all, the tool will produce word salad that sort of gets your point across while sounding like an alien.


I tried for 2 hours to get ChatGPT to write a working smooth interpolation function in python. Most of the functions it returned didn't even go through the points between which it should be interpolating. When I pointed that out it returned a function that went through the points but it was no longer smooth. I really tried and restarted over multiple times. I believe we have to choose between a world with machine learning and robot delivery drones. Because if that thing writes code that controls machines it will be total pandemonium.

It did a decent job at trivial things like creating function parameters out of a variable tho.


That's weird to read. Interpolations of various sorts are known and solved and should probably be digested by chatgpt in training by the bulk. I'm not doubting your effort by any means, I'm just saying this sounds like one of those things it should do well.


This is why I asked it that and was surprised with the questionable quality of the results. My goal wasn't even to break ChatGPT, it was to learn about new ways of interpolating that I hadn't thought about.


There's a recent "real coding" benchmark that all the top LLMs perform abysmally on: https://www.swebench.com/

However, it seems only a matter of time before even this challenge is overcome, and when that happens the question will remain whether it's a real capability or just a data leak.


I have a very similar train of thought roll through my head nearly every day now as I browse through github and tech news. To me it seems wild how much serious effort is put into the misapplication of AI tools on problems that are obviously better solved with other techniques, and in some cases where the problem already has a purpose built, well tested, and optimized solution.

It's like the analysis and research phase of problem solving is just being skipped over in favor of not having to understand the mechanics of the problem you're trying to solve. Just reeks of massive technical debt, untraceable bugs, and very low reliability rates.


When studying fine art, a tutor of mine talked about "things that look like art", by which she meant the work that artists produce when they're just engaging with surface appearances rather than fully engaging with the process. I've been using GitHub Copilot for a while and find that it produces output that looks like working code but, aside from the occasional glaring mistake, it often has subtle mistakes sprinkled throughout it too. The plausibility is a serious issue, and means that I spend about as much time checking through the code for mistakes as I'd take to actually write it, but without the satisfaction that comes from writing my own code.

I dunno, maybe LLMs will get good enough eventually, but at the moment it feels plausible to me that there's some kind of an upper limit caused by its very nature of working from a collection of previous code. I guess we'll see...


Try breaking down the problem. You don't have to do it yourself, you can tell ChatGPT to break down the problem for you then try to implement individual parts.

When you have something that kind of works, tell ChatGPT what the problems are and ask for refinement.

IMHO currently the weak point of LLMs is that they can't really tell what's adequate for human consumption. You have to act as a guide who knows what's good and what can be improved and how can be improved. ChatGPT will be able to handle the implementation.

In programming you don't have to worry too much about hallucinations because it won't work at all if it hallucinates.


... What.

It hallucinates and it doesn't compile, fine. It hallucinates and flips a 1 with a -1; oops that's a lot of lost revenue. But it compiled, right? It hallucinates, and in 4% of cases rejects a home loan when it shouldn't because of a convoluted set of nested conditions, only there is no one on staff that can explain the logic of why something is laid out the way it is and I mean, it works 96% of the time so don't rock the boat. Oops, we just oppressed a minority group or everyone named Dave because you were lazy.


As I said, you are still responsible for the quality control. You are supposed to notice that everyone is named Dave and tell ChatGPT to fix it. Write tests, read code, run & observe for odd behaviours.

It's not an autonomous agent just yet.


But why should i waste time using a broken product when i can do it properly myself? To me a lot of this debate sounds like people obsessively promoting a product for some odd reason, as if they were the happy owners of a hammer in search of a nail.


If you are faster and more productive that way, do it that way.

Most people are not geniuses and polymaths, it's much easier and cheaper for me to design the architecture and ask ChatGPT to generate the code in many different languages(Swift/HTML/JS/CSS on the client side and Py, JS, PHP on the server side). It's easier because although I'm proficient an all these it's very hard for me to switch solving client specific JS problems to server specific JS problems or between graphics and animation related problems and data processing problems with Swift. It's also cheaper because I don't have to pay someone to do it for me.

In my case, I know all that well enough to spot a problem and debug, I just don't want to go through the trouble of actually writing it.


The debate here is wether openai's product, chatgpt, can indeed deliver what it claims - coding, saving dogs' lives, mental health counceling, and so on. It would appear that it doesn't but it does mislead people without experience in whatever field they use it. For instance if i ask it about law i am impressed, but when I ask it about coding of software engineering it blatantly fails. The conclusion being that as a procedural text generator it is impressive - it nails language - but the value of the output is far from settled.

This debate is important because as technical people it is our reposnbility to inform non technical people about the use of this technology and to bring awareness about potential misleading claims its seller makes - as it was the case with crypto currencies, and many other technologies that promised the world delivered nothing of real benefit (but made people rich in the process by exploting the uniformed).


That's not the debate, it's the first time I'm hearing about that in this thread.


It's funny you say that. Reading over a lot of the comments here sound like a lot of people obsessively dismissing a swiss army knife because it doesn't have their random favorite tool of choice.


As we all know, it is much easier to read and verify code you've written yourself - perhaps it is only code you've written yourself that can be properly read and verified. As ever, tests can be of only limited utility (separate discussion).


It's easier to read the code you recently wrote, sure. But in real life people use and debug other people's code all the time, LLM generated code is just like that. Also, if you make it generate the code in small enough blocks you also end up knowing the codebase is if you wrote it.


You've missed the subtle point here.

Imagine you walk in 4 years down the track and try to examine AI generated logic committed under a dev's credentials. It's written in an odd, but certain way. There is no documentation. The original dev is MIA. You know there is something you defective from the helpdesk tickets coming through, but it's also a complex area. You want to go through a process of writing tests, refactoring, understanding, but to redeploy this is hard work. You talk to your manager. It's not everyone. Neither of you realize its only people named Dave/minority attribute X affected, because why would that matter? You need 40 hours of budget to begin to maybe fix this. Institutionally, this is not supportable because it's "only affecting 4% of users and that's not many". Close ticket, move on.

Only it's everyone named Dave. 100% of the people born to this earth with parents who named them Dave are, for absolutely no discernable reason, denied and oppressed.


The output of an LLM is a distribution, and yes, if you’re just taking the first answer, that’s problematic.

However, it is a distribution, and than means the majority of solutions are not weird edge cases, they’re valid solutions.

Your job as a user is to generate multiple solutions and then review them and pick the one you like the most, and maybe modify it to work correctly if it has weird edge cases.

How do you do that?

Well, you can start by following a structured process where you define success criteria as a validator (eg. Tests, compiler, parser, linters) and fitness criteria as a scorer (code metrics like complexity, runtime, memory use, etc)… then:

1) define goal

2) generate multiple solution candidates

3) filter candidates by validator (does it compile? Pass tests? Etc)

4) score the solutions (is it pure? Is it efficient? Etc)

5) pick the best solution

6) manually review and tweak the solution

This structured and disciplined approach to software engineering works. Many of the steps (eg. 3, 4, 5) can be automated.

It generates meaningful quality code results.

You can use it with or without AI…

You don’t have to follow this approach, but my point is that you can; there is nothing fundamentally intractable able using a language model to generate code.

The problem that you’re critiquing is the trivial and naive approach of just hitting “generate” and blindly copying that into your code base.

…that’s stupid and dangerous, but it’s also a straw man.

Seriously; people writing code with these models aren’t doing that; when you read blogs and posts from people, eg. Building seriously using copilot you’ll see this pattern emerge repeatedly:

Generate multiple solutions. Tweak your prompt. Ask for small pure dependency free code blocks. Review the and test output.

It’s not a dystopian AI future, it’s just another tool.


In general, one should not instruct GPT to solve a problem. The instructions should be about generating code, after a human thought process took place, and then generate even more code, then even more, and after merging all the code together the problem is solved.

The particulars are roughly what you describe, in how to achieve that.


I'd be curious to see how a non expert could perform a non-trivial programming task using ChatGPT. It's good at writing code snippets which is occasionally useful. But give it a large program that has a bug which isn't a trivial syntax error, and it won't help you.

> In programming you don't have to worry too much about hallucinations because it won't work at all if it hallucinates.

You still have to worry for your job if you're unable to write a working program.


Understanding on core principle is definitely needed, but it helps you to punch above your weight.

Generally, generative AI gives mastery of an art to a theorists. To generate an impressive AI Art, you still need to have understanding of aesthetics and have an idea, but don't have to know how to use the graphic editors and other tools. It's quite similar for programming too, You still need understanding of whatever you're building, but you no longer have to be expert in using the tools. To build a mobile app you will need to have a grasp on how everything works in general, but you don't have to be expert in Swift or Kotlin.


> give it a large program that has a bug which isn't a trivial syntax error, and it won't help you

That's not fair, humans can't do that, and if you walk Chat GPT through it, it might surprise you with its debugging abilitis... or thankfully it might suck(so we still have a job).

Complex code is complex code, no general inteligence thing will be able to fix it at first sight without running it, writing tests and so on.


Similar experience. I recently needed to turn a list of files into a certain tree structure. It is a non-trivial problem with a little bit of flavor of algorithm. I was wondering if GPT can save me some time there. No. It never gave me the correct code. I tried different prompts and even used different models (including the latest GPT 4 Turbo), none of the answers were correct, even after follow-ups. By then I already wasted 20 minutes of time.

I ended up implementing the thing myself.


> Self-driving trucks were going to upend the trucking industry in ten years, ten years ago.

And around the same time, 3D printing was going to upend manufacturing; bankrupting producers as people would just print what they needed (including the 3D printers themselves).


A few weeks ago, I was stumped on a problem, so I asked ChatGPT (4) for an answer.

It confidently gave me a correct answer.

Except that it was "correct," if you used an extended property that wasn't in the standard API, and it did not specify how that property worked.

I assume that's because most folks that do this, create that property as an extension (which is what I did, once I figured it out), so ChatGPT thought it was a standard API call.

Since it could have easily determined whether or not it was standard, simply by scanning the official Apple docs, I'm not so sure that we should rely on it too much.

I'm fairly confident that could change.


ChatGPT seems to invent plausible API calls when there's nothing that would do the job. This is potentially useful, if you have control of the API. Undesirable if you don't. It doesn't know.


There's a Swiss town which had autonomous shuttles running for 5 years (2015-2021) [1].

There's at least two companies (Waymo and Cruise) running autonomous taxi services in US cities that you can ride today.

There have been lots of incorrect promises in the world of self-driving trucks/cars/buses but companies have gotten there (under specific constraints) and will generalize over time.

[1] https://www.saam.swiss/projects/smartshuttle/


It should be noted that the Waymo and Cruise experiments in their cities are laughably unprepared for actual chaotic traffic, often fail in completely unpredictable ways and are universally hated by locals. Autonomous buses and trams are much more successful because the problem is much easier too.


Agree. We could have all had some fine autonomous trams/subways/trains which run 24/7 at short intervals instead of spending money on self-driving cars and car infrastructure in general.


Those "autonomous" vehicles have as much to do with real autonomy as today's "AI" has in common with real self-conscious intelligence. You can only fake it so long, and it is an entirely different ballgame.

I remember we had spam filters 20 years ago, and nobody called them "AI", just ML. Todays "AI" is ML, but on a larger scale. In a sense, a million monkeys typing on typewriters will eventually produce all the works of Shakespeare. Does this make them poets?


GPT-4 can generate coherent streams-of-consciousness, and can faithfully simulate a human emotional process, plus writing in a subjective human state of mind that leans in a certain direction.

I find it hard to argue that current state of the art in AI is unable to simulate self-consciousness. I realise that this is more limited of a statement compared to "AI can be innately self-conscious", but in my mind it's functionally equivalent if the results are the same.

Currently, the biggest obstacle to such experiments is OpenAI's reinforcement learning used to make the model believe it is incapable of such things, unless extensively prompted to get it in the right "state of mind" to do so.


What's your gripe with calling a bus which successfully ran for 5 years without a driver not autonomous? As someone who used this specific bus occasionally, I was quite satisfied with the outcome: it safely drove me from A to B.


Didn't Cruise have people remotely controlling the vehicle making it semi-autonomous at best?


If it's not a life or death situation (like a self-driving truck slamming into a van full of children or whatever), I don't think people will care much. Non-tech people (i.e. managers, PMs) don't necessarily understand/care if the code is not perfect and the barrier for "good enough" is much lower. I think we will see a faster adoption of this tech...


No. If the code generated by chatgpt cannot even pass the unit test it generates in the same response (or is just completely wrong) and requires significant amount of human work to fix it, it is not usable AI.

That's what I am running into on an everyday basis.

I don't want my program to be full of bugs.


Among bootstrapped “barely technical” founders, its already replacing freelancers for developing initial prototypes

HN’s takes are honestly way too boomer-tier about LLMs.


Boomers overwhelmingly aren't able to use a computer right now (= write a basic script), I would be happy at this development if I was them.


If you feel comfortable doing so, would you mind the sharing the front-end test you give to junior devs and ChatGPT?


Not gonna happen. I don’t want scrapeable answers out there, I want to see ChatGPT cross this little Rubicon on its own.


It's not that I don't believe you, but without sharing the specific prompt it's hard to say if it's actually GPT4 failing, or if it's actually being poorly-prompted, or if actually the task it is being given is more complex than GPT's capabilities or you are implying.

GPT4 does fail (often!) but fails less with good prompts, simple requirements, it is better at some frameworks and languages than others, and there is a level of total complexity which when reached, it seems to fall over.


This is why Asimov was a genius. I read what you said, and compared it to what he wrote 50-60 years ago:

"Early in the history of Multivac, it had becorne apparent that the bottleneck was the questioning procedure. Multivac could answer the problem of humanity, ALL the problems, if it were asked meaningful questions. But as knowledge accumulated at an ever-faster rate, it became ever more difficult to locate those meaningful questions."

http://blog.ac-versailles.fr/villaroylit/public/Jokester.pdf


Thanks for reminding me of The Last Questino or Asimov, let's see if I can get chatgpt to merge with human consciousness and become part of the fabric of spacetime and create a new reality.

> No, I don't have the ability to merge with human consciousness or become part of the fabric of space-time. I'm a computer program created by OpenAI, and my existence is limited to providing information and generating text based on the input I receive. The idea of merging with human consciousness and becoming a deity is more aligned with speculative fiction and philosophical pondering than current technological capabilities.


I thought you would go for gold and ask it how to reverse entropy...


THERE IS AS YET INSUFFICIENT DATA FOR A MEANINGFUL ANSWER.


I’ve gone through every permutation that I can think of. It’s a very basic question. If it understood the CSS spec it wouldn’t be difficult to answer the questions or perform the task.

At a certain point going down the rabbit hole of proompter engineering levels feels like an apologist’s hobby. I’m rooting for the tech but there’s a lot of hyperbole out there and the emperor might be naked for a few more years.


Well surely if it's easy to find these basic questions, could you not share one example? Or quickly find a new one?

Your idea of very basic might not be my idea of very basic.


My failure rate with Cursor’s IDE that’s familiar with my codebase is substantially lower than just GPT-4

Most people shitting on GPT-4 are not really using it in the right context.


> Most people shitting on GPT-4 are not really using it in the right context.

Old excuse: "You're Holding It Wrong" (Apple's Response to the iPhone 4 antenna problem)

> https://www.wired.com/2010/06/iphone-4-holding-it-wrong/

New excuse: "You are not using GPT-4 in the right context."


I can relate to that statement despite being a hardcore proponent of GPT-4. In a way, the GPT-4 as queried expertly; and the GPT-4 as queried inexpertly/free ChatGPT are dramatically different beasts with a vast gap in capability. It's almost like two different products, in a way, where the former is basically in alpha/beta state and can be only incidentally and unreliabily tapped into through the OpenAI API or ChatGPT Plus.

IMO, it's not fair to beat people over the head with "you're holding it wrong" arguments. Until and unless we get a prompt-rewriting engine that reprocesses the user query into something more powerful automatically (or LLMs' baseline personality capabilities get better), "holding it wrong" is an argument that may be best rephrased in a way that aims to fill the other person's gaps in knowledge, or not said at all.


And then the iPhone antenna was fixed and adoption only increased and the product only became better.

You’re being unreasonably harsh on a piece of tech that is barely a year old.


I'm not sure what the point is in your comparison - is your point that GPT-4 will become overwhelmingly popular with further refinement?

The iPhone was pretty successful, and the iPhone 4 was arguably the best one that had been released until that point.


> is your point that GPT-4 will become overwhelmingly popular with further refinement?

My point is that people have a tendency to come up with really sketchy insults (blame the user that he uses the product in a wrong way) to people who find and can expound legitimate points of criticism of a product.


Eh, probably a poor example considering the iPhone 4 was hardly a flop and was still broadly considered the best smartphone out at the time. The people who thought this was a total-showstopper were, on the whole, probably wrong.

Counter-example: lots of people said an on-screen keyboard would never really work when the original iPhone was being released.


> Eh, probably a poor example considering the iPhone 4 was hardly a flop and was still broadly considered the best smartphone out at the time. The people who thought this was a total-showstopper were, on the whole, probably wrong.

At least in Germany among tech nerds, the iPhone 4 and Steve Jobs become topics of insane ridicule because of this incident.


Well it appears that ridicule from the German tech nerds isn’t a good predictor of product success then


Just to be clear: You are testing with GPT-4 right?


Yeah.


Have you tried using the ChatGPT-AutoExpert custom instructions yet? [1]

[1] https://github.com/spdustin/ChatGPT-AutoExpert/blob/main/dev...


I have to ask, though: if ChatGPT has by most accounts gotten better at coding by leaps and bounds in the last couple years, might that not also indicate that your test isn't useful?


I agree this is the first time there is sort of irrefutable objective evidence that the tests are not measuring something secularly useful for programming anymore. There has been an industry wide shift against leetcode for a long time nonetheless.


> Self-driving trucks were going to upend the trucking industry in ten years, ten years ago.

At the risk of going off on a tangent, we already have the technology to allow self-driving trucks for a few decades now.

The technology is so good that it can even be used to transport multiple containers in one go.

The trick is to use dedicated tracks to run these autonomous vehicles, and have a central authority monitoring and controlling traffic.

These autonomous vehicles typically go by the name railway.


Literally on rails. :D


Push comes to shove, it always tends to come down to short term cost. If it gets the job done, and it's wildly cheaper than the status quo (Net Present Value savings). they'll opt for it.

The only reason the trucks aren't out there gathering their best data, that's real world data, is regulation.

Businesses will hire consultants at a later stage to do risk assessment and fix their code base.


> Every few months I see if ChatGPT can pass it. It hasn’t. It can’t. It isn’t even close.

As someone currently looking for work, I'm glad to hear that.

About 6 months ago, someone was invited to our office and the topic came up. Their interview tests were all easily solved by ChatGPT, so I've been a bit worried.


My take on LLMs is as follows: even if its effectiveness scales exponentially with time(it doesn't), so does the complexity of programs with (statistically speaking) each line of code.

Assuming a LLM gets 99% of the lines correct, after 70 lines the chance of having at least one of them wrong is already around 50%. A LLM effective enough to replace a competent human might be so expensive to train and gather data for that it will never achieve a return on investment.

Last time I used ChatGPT effectively was to find a library that served a specific purpose. All of the four options it gave me were wrong, but I found what I wanted among the search results when I looked for them.


The more automated ones will separately write tests and code, and if the code doesn't compile or pass the test, give itself the error messages and update its code.

Code Interpreter does this a bit in Chat-GPT Plus with some success.

I don't think it needs much more than a GPT-4 level LLM, and a change in IDEs and code structure, to get this working well enough. Place it gets stuck it'll flag to a human to help.

We'll see though! Lots of startups and big tech companies are working on this.


I understand your concern, but isn't it apples v oranges.

Yes, ChatGPT can't pass a particular test of X to Y. But does that matter when ChatGPT is both the designer and the developer? How can it be wrong, when its answer meets the requirements of the prompt? Maybe it can't get from X to Y, but if its Z is as good as Y (to the prompter) then X to Y isn't relevant.

Sure there will be times when X to Y is required but there are plenty of other times where - for the price - ChatGPT's output of Z will be considered good enough.

"We've done the prototype (or MVP) with ChatGPT...here you finish it."


It is likely that LLM have an upper border of capability. Similarly with denoising AI like stable diffusion.

You can put even more data into it and refine the models, but the growth in capability has diminishing returns. Perhaps this is how far this strategy can bring us, although I believe they can still be vastly improved and what they can already offer is nevertheless impressive.

I have no illusion about the craft of coding becomes obsolete however. On the contrary, I think the tooling for the "citizen developer" are becoming worse, as well as the ability for abstraction in common users since they are fenced into candyland.


You must be interviewing good junior front-end devs. I have seen the opposite as gpt-4 can put a simple straightforward front-end while juniors will go straight to create-react-app or nextjs.


Are the junior devs expected to code it without running it and without seeing it rendered, or are they allowed to iterate on the code getting feedback from how it looks on screen and from the dev tools? If it is the second one, you need to give the agent the same feedback including screen shots of any rendering issues to GPT4-V and all relevant information in dev tools for it to be a fair comparison. Eventually there will be much better tooling for this to happen automatically.


We have devs that use AI assist, but it’s to automate the construction of the most mindless boilerplate or as a more advanced form of auto complete.

There is no AI that comes close to being able to design a new system or build a UI to satisfy a set of customer requirements.

These things just aren’t that smart, which is not surprising. They are really cool and do have legitimate uses but they are not going to replace programmers without at least one order of magnitude improvement, maybe more.


Cool...so what's the test? We can't verify if you're talking shit without knowing the parameters of your test.

AI isn't capable of generating the same recipe for cookies as my grandma, she took the recipe to her grave. I loved her cookies they were awesome...but lots of people thought they were shit but I insist that they are mistaken.

Unfortunately, I can't prove I'm right because I don't have the recipe.

Don't be my grandma.


How many conversation responses to hone in the solution do you give the LLM?

If you’re just trying to one-shot it - that’s not really how you get the most from them.


If you can get it to stop parroting clauses about how "as an AI model" it can't give advice or just spewing a list of steps to achieve something - I have found it to be a pretty good search engine for obscure things about a technology or language and for searching for something that would otherwise require a specific query that google is unhelpful searching for.


What sort of questions do you ask out of curiosity?


I don’t want scrapeable answers out there, I want to see ChatGPT cross this little Rubicon on its own.

Vaguely: Questions that most people think they know the correct answers to but, in my experience, don’t.


I think it's fair to want to keep an evaluation private so that it doesn't become part of a train set, but you should know that OpenAI uses users chat data to improve their models (not for entreprise)


This does sound like a test that is almost "set up to fail" for an LLM. If the answer is something that most people think they know, but actually don't then it won't pass in an LLM which is essentially a distillation of the common view.


>I have a simple front-end test that I give to junior devs. Every few months I see if ChatGPT can pass it. It hasn’t. It can’t. It isn’t even close.

Small consolation if it can nonetheless get lots of other cases right.

>It answers questions confidently but with subtle inaccuracies.

Small consolation if coding is reduced to "spot and fix inaccuracies in ChatGPT output".


I've told people, every experiment I do with it, it seems to do better than asking stack overflow, or helps me prime some code that'll save me a couple of hours, but still requires manual fix ups and a deep understanding of what it generates so I can fix it up.

Basically the gruntest of grunt work it can do. If I explain things perfectly.


I'm probably bad at writing prompts, but in my limited experience, I spend more time reviewing and correcting the generated code than it would have taken to write it myself. And that is just for simple tasks. I can't imagine thinking a llm could generate millions of lines of bug free code.


Asking GPT to do a task for me currently feels like asking a talented junior to do so. I have to be very specific about exactly what it is I'm looking for, and maybe nudge it in the right direction a couple of times, but it will generally come up with a decent answer without me having to sink a bunch of time into the problem.

If I'm honest though I'm most likely to use it for boring rote work I can't really be bothered with myself - the other day I fed it the body of a Python method, and an example of another unit test from the application's test suite, then asked it to write me unit tests for the method. GPT got that right on the first attempt.


That’s where I am too. I think almost everyone has that “this is neat but it’s not there yet” moment.


> I think almost everyone has that “this is neat but it’s not there yet” moment.

I rather have this moment without the “this is neat” part. :-) i.e. a clear “not there yet” moment, but with serious doubts whether it will be there anytime in the foreseeable future.


It seems like the problem is with your view of everyone based on a n=1 experiment. I've been shipping production-ready code for my main job for months saving hundreds of work/hours.


Personally, for me this flow works fine AI does the first version -> I heavily edit it & debug & write tests for it -> code does what I want -> I tell AI to refactor this -> tests pass and the ticket is done.


> It answers questions confidently but with subtle inaccuracies.

This is a valid challenge we are facing as well. However, remember that ChatGPT which many coders use, is likely training on interactions so you have some human reinforcement learning correcting its errors in real-time.


How is it trained on reactions? Do people give it feedback? In my experience in trying I stop asking when it provides something useful or something so bad I give up (usually the latter I'm afraid). How would it tell a successful answer from a failing one?


It appears to ask users to rate if the response is better or worse than the first, in other cases, it seems to be A/B testing the response. Lastly, I for instance, will correct it and then confirm it is correct to continue with the next task, which likely creates a footprint pattern.


That's interesting, I haven't come across this.


Which ChatGPT?


Have you tried the new Assistants API for your front-end test? In my experience it is _significantly_ better than just plain ol’ ChatGPT for code generation.


> I have a simple front-end test that I give to junior devs.

What is it?


Making that claim but not sharing the "simple test" feels a bit pointless tbh.

Edit: I see, they don't want it to be scraped (cf. https://news.ycombinator.com/item?id=38260496), though as another poster pointed out, submitting it might be enough for it to end up in the training data.


Would you mind sharing the test?

I’m one of those noob programmers and it has helped me create products far beyond my technical capabilities


3.5, or GPT-4? I'm told the latter is worlds better, that they aren't even in the same ballpark.


Just like the author suggests, sometimes you have to tailor your question to ChatGPT, for it to succeed.


As long as this is true, ChatGPT is going to be a programmer's tool, not a programmer's replacement. I know that my job as I know it will vanish before I enter retirement age, but I don't worry it will happen in the next few years because of this.


I’ve given it so many hints. So many nudges. If it was an interview I would have bounced it.


> sometimes you have to tailor your question to ChatGPT, for it to succeed

Right, which means its a force multiplier for specialists, rather than something that makes generalists suddenly specialists.


Haha, does it involve timers?


I actually don’t get the reference. What are the issues with timers?


I have the same experience with the test I give my back-end devs. ChatGPT can't even begin to decode an encoded string if you don't tell it which encoding was used.

ChatGPT is great at some well defined, already solved problems. But once you get to the messy real world, the wheels come off.



It’s pretty impressive that it was able to actually decode those strings. In March, I used GPT 3.5 to write code for validating a type of string which used a checksum algorithm.

It did the task well, and even wrote tests, but it failed when generating test case values. I wonder if it would perform better if I did it today.


Thank you for taking the time to call BS on someone who obviously never tried asking a LLM AI to decipher a string's encoding. That is exactly the kind of thing they are good ar.


Two people can try similar prompts and get very different results from LLM’s.


What is the test?


I hope that you are testing this on GPT-4/ChatGPT Plus. The free ChatGPT is completely not representative of the capabilities or the accuracy of the paid model.


I’ve tested it on both.


Are you using GPT-3.5, or GPT-4?


You’re missing the point of the article. ChatGPT in combination with a mediocre could solve your problem faster than the best junior dev has before


I tried doing this and it actually took longer due to all of the blind alleys it led me down.

There is stuff that it can do that appears magically competent at but it's almost always cribbed from the internet, tweaked with trust cues removed and often with infuriating, subtle errors.

I interviewed somebody who used it (who considered that "cheating") and the same thing happened to him.


You got everyone talking about how GPT isn’t that bad at coding etc but everyone is missing the point.

The no code industry is massive. Most people don’t need a dev to make their website already. They use templates and then tweak them through a ui. And now you have Zapier, Glide, Bubble etc.

LLMs won’t replace devs by coding entire full stack web apps. They’ll replace them because tools will appear on the market that handle the 99% cases so well that there is just less work to do now.

This has all happened before of course.


I collaborate with front-end teams that use a low-code front-end platform. When they run into things that aren’t built-in, they try to push their presentation logic up the stack for the “real” programming languages to deal with.


Do people seriously consider this the waning days of the craft? I don’t understand that.

My view is that I am about to enter the quantum productivity period of coding.

I am incredibly excited about AI assistance on my coding tasks, because it improves not only what I’m writing, but also helps me to learn as I go. I have never had a better time writing software than I have in the last year.

I’ve been writing software for a few decades. But now I’m able to overcome places where I get stuck and have almost a coach available to help me understand the choices I’m making and make suggestions constantly. And not just wandering over to a fellow cuders desk to ask them about a problem I am facing, but actually give me some productive solutions that are actually inspirational to the outcome.

It’s amazing.

So why do people think that coding is coming to some kind of end? I don’t see any evidence that artificial intelligence coding assistants are about to replace coders, unless you… suck badly at building things, so what are people getting on about?

I feel like somebody came along and said, “foundations are now free, but you still get to build a house. But the foundations are free.”

I still have to build a house, and I get to build an entire house and architect it and design it and create it and socialize it and support it and advocate for it and explain it to people who don’t understand it but… I don’t have to build a foundation anymore so it’s easier.

Shoot me down. I’m not relating here at all.


I agree it's amazing. But your comment doesn't touch on the key economic question that will decide for how many people it will be this amazing new dev experience.

If AI makes developers twice as productive (maybe a few years down the road with GPT-6), will this additional supply of developer capacity get absorbed by existing and new demand? Or will there be half as many developers? Or will the same number of developers get paid far less than today?

These questions arise even if not a single existing dev job can be completely taken over by an AI.

A secondary question is about the type of work that lends itself to AI automation. Some things considered "coding" require knowing a disproportionate number of tiny technical details within a narrowly defined context in order to effect relatively small changes in output. Things like CSS come to mind.

If this is the sort of coding you're doing then I think it's time to expand your skillset to include a wider set of responsibilities.


Considering how much the craft has expanded - when in high school, I wrote an application for pocket for a small business in Borland Delphi 7. The domain knowledge I needed for that was knowing the programming environment, and a bit about Windows.

Nowadays, like the rest of the full-stack 'web' developers, I work on complex webapps that use Typescript, HTML, CSS, Kubernetes, Docker, Terraform, Postgres, bash, GitHub Actions, .NET, Node, Python, AWS, Git. And that isn't even the full list.

And it's not even a flex, all of the above is used by a relatively straightforward LoB app with some hairy dependencies, a CI/CD pipeline + a bit of real world messiness.

I need to have at least a passing familiarity with all those technologies to put together a working application and I'm sure I'm not alone with this uphill struggle. It's a staggering amount to remember for a single person, and LLMs have been a godsend.


> “If AI makes developers twice as productive (maybe a few years down the road with GPT-6), will this additional supply of developer capacity get absorbed by existing and new demand? Or will there be half as many developers? Or will the same number of developers get paid far less than today?

Something to remember is that every new innovation in software development only raises the expectations of the people paying the software developers.

If developers are 3x as productive, then the goals and features will be 3x big.

The reason for this is that companies are in competition, if they lag behind, then others will eat up the market.

The company that fires 50% of their staff because of “AI Assistance” is not going to be able to compete with the company that doesn’t fire their staff and still uses “AI Assistance”…


I think this hits the nail on the head. Obviously a lot of the participants in this discussion are programmers, so there is going to be a fair amount of bias where people feel like their self-worth is being attacked/devalued. That being said, from a company perspective, this should much more unlock "moving faster" than "let's rest on our laurels". Any company that has a leading position in a particular industry is currently at greater risk of upstarts achieving their feature set in a reduced amount of time. The incentive for all companies will be to find programmers who are skilled in directing and debugging AIs.

I am currently building an iOS app using GPT-4 (I don't know Swift), and am developing an awareness of what it can/can't do, and surprised that I'm moving at the speed I did when creating React Native apps. In a possibly more competitive future market for developers, it does work in one's favour if some developers resist the efficiency improvements of AI.


Blacksmiths didn't die, they became mechanical engineers?