Hacker News new | past | comments | ask | show | jobs | submit login
Software Complexity Is Why AI Won't Replace Software Engineers (softwarecomplexity.com)
52 points by greybeard228 on March 21, 2023 | hide | past | favorite | 49 comments



I work on very complex software, and am basically using ChatGPT all day. I have somewhat stopped figuring how to do things like read from a file, serialize data, etc. Focussing on high level logic, and have chatgpt give me the code snippets I assemble together. Its still difficult, but its pretty clear software development is going to change


My flow with ChatGPT lately is like this:

1. I get stumped and ask it “can I do thing X with technology Y?”

2. It always says “yes, here is how”

3. A majority of the time, it’s lying and you can’t actually do whatever I asked it.

It’s incredibly frustrating - it’s like it doesn’t know how to say “no”.

Granted, I am usually asking it pretty niche stuff about iOS development, but so far it’s proving to be a waste of time. I guess it does feel good to feel like you have a lead on some problem you’ve been beating your head against for hours, but then the harsh reality of actual reality comes quick.


> its like it doesnt know how to say "no"

chatGPT is basically a junior dev. Does grunt work, gets it wrong from time to time, and doesnt ever say no even when its appropriate


or like an intern who can google snippets of code from previously known solutions and is sometimes out of context or outright false.


This might be an issue of phrasing though: ChatGPT isn't good if it has too many options, in my experience it helps a lot to narrow down tech and specs as much as possible. In addition, if you know how to do X in Z, you can usually quite effectively ask it how to do X in Y in line with Z.


For chuckles I asked ChatGPT to write me a sort function on integers. It dutifully served up code to call a library sort function. Technically correct, but I am and remain unimpressed.


This got me curious and I asked ChatGPT/GPT-4 to write a sorting function for integers.

It came up with the following along with a verbal explanation of how it works:

  def quick_sort(arr):
      if len(arr) <= 1:
          return arr
      pivot = arr[len(arr) // 2]
      left = [x for x in arr if x < pivot]
      middle = [x for x in arr if x == pivot]
      right = [x for x in arr if x > pivot]
      return quick_sort(left) + middle + quick_sort(right)

  # Example usage:
  arr = [3, 6, 8, 10, 1, 2, 1] 
  print("Unsorted array:", arr)
  sorted_arr = quick_sort(arr)
  print("Sorted array:", sorted_arr)
I then asked it to make the code more efficient. To this, it offered "a more efficient in-place implementation [...] using the Lomuto partition scheme". The code was reasonable but had an off-by-one error. It took me about thirty seconds to find and fix the error.

Since ChatGPT said that "the Lomuto partition scheme is generally less efficient than the Hoare partition scheme, but it is easier to implement and understand", I asked it to show the code for the Hoare scheme.

The code was correct and was accompanied by an explanation that "this version of quick sort is more efficient because it performs fewer swaps on average compared to the Lomuto partition scheme."

I asked it about the average number of swaps for both schemes. It gave some answers. However, since I didn't know the correct answers off the top of my head and couldn't be bothered to work them out, I couldn't just take ChatGPT's at face value (but I suspect that one was the correct while the other wasn't).

Finally I asked "What's the worst-case input for these three implementations?"

Here things went completely off the rails. It repeated the same answer for all there snippets. The answer happened to be correct for the last two schemes but for the first it read:

<quote> Basic Quick Sort (with middle element as pivot): The worst-case input for this implementation is a sorted or reverse-sorted array. Since the pivot is chosen as the middle element, the partitioning will be unbalanced, and the algorithm will have a time complexity of O(n^2). </quote>

This is clearly nonsense.


I think that’s more of a prompt problem. It answered in a realistic, and I would even claim “correct” way. You should not be writing a sort by hand in 2023. If you don’t want it to use a library, ask it to not use any library calls. And, of course, ChatGPT 4 gives much better results.


This is where I think problems will start, people will stop using libraries and leaning on this, unleashing spaghetti wars.


I used ChatGPT4 daily, for coding. I’m not seeing this. More often than not, it will hallucinate libraries and functions within them. I’ve never seen it write out anything tedious. It’s based on reality, after all. It’s hard to find a hand rolled sort, in code, these days.


I can see myself using ChatGPT as a search assistant to help me with proof of concepts and snippets, but I would still manually write all of the code that I'm putting into production. So far I've not found a reliable predictable way to incorporate ChatGPT or Copilot into my daily workflows as a pair programmer or code generator.

But for generating templates or snippets or helping me with documentation for proof of concepts...or being an amazing search friend with great abilities to format data for me, absolutely an amazing friend to have onboard.


> I have somewhat stopped figuring how to do things like read from a file, serialize data, etc.

Shouldn’t that stuff be in a library or API call with examples a quick google search away?


Yes. And in my experience Bing Chat/Copilot are simply faster.


I find it wild people say this, I’ve really tried using chat gpt for a bunch of different problems and 9 times out of 10 it was a waste of time and I would get to a better answer faster by googling it or reading the docs.

This stuff will be amazing in coming years but I’m shocked people say they use it all day


Keep seeing these types of replies and I often ask myself what industry do they work in? because the Embedded Software Engineering and R&D I am currently apart of is full on NDA on top of NDA. And I suspect that is true for a lot of people working in other software fields. ChatGPT is only a curiosity and I doubt will ever make an impact in the hard/soft world where IP rules the roost.


I’m not sure I agree.

Over the last few days I’ve been cobbling together a little service which asks GPT to identify issues with code, then if it can generate solutions, the service attempts to apply the solution and run the code.

I’ve been learning to dial in prompts and finding this truly is a major factor of the job that needs to be just right. I went from unreliable response formats, hallucinated solutions, naive solutions, and so on all the way to fairly conservative, pragmatic, consistent responses.

My success rate isn’t as high as I’d like, but I find it absolutely remarkable.

My key takeaway so far has been that a lot of perceived limitations are in fact a lack of exposure to what effective prompting makes possible. Another feature that’s powerful yet overlooked is what recursive prompting can accomplish. This is what makes me think complexity might be overcome in the not-so-distant future.

For example, if I apply a solution and it breaks something else, I can then reform a prompt based on the previous prompt along with a new one informing it of what went wrong. These little recursive exchanges have actually yielded useful results in multiple tests. That’s a huge deal in my opinion; this is totally automated by a smooth brain like me.

As token limits increase and the models become more capable, I suspect this tool will actually improve without me changing much. I suspect it could still work quite a bit better and more efficiently if I could more succinctly provide context about the code in my prompts, too. That would overcome token limitations for the time being and likely improve accuracy quite a bit.

So, if things like this are possible already… I have a feeling we’re going to overcome the complexity issue sooner than it currently seems.


> dial in prompts

> effective prompting... recursive prompting

It feels like you are massaging a giant library of macros :)

The way you describe interacting with GPT by iterative attempts, constructing working primitives is pretty much programming, right?

AI might get more folks into computer science, or even programming; more hackers, more inquisitive minds, and maybe--just maybe--this "assault on complexity" earns some real victories.


So you are converting your code into english and then asking GPT to convert it back to code? Might as well right the code at that point.


No, I send the code causing the error as part of the prompt, and it responds with that code with corrections (if it can). The only English is the prompt describing the role and response I need from GPT.

I could write the code myself, but the thing is, if this program works then in theory it could address trivial errors at any time of day, at very high speed. The better it gets at doing this, the more useful it could be.

Most automations start out very inefficient, and many stay that way. I mostly do it to learn but I also see a real opportunity to remove a certain type of task from my plate if this service works well.

Of course, other companies (like GitHub) have likely already prototyped this and whoever runs with it will do a far better job than I'm doing. That's totally fine though, I'm just doing this to learn.

At first I was thinking I would have this inspect errors from something like Sentry, try to solve the issue, then spin up a PR or something if it succeeded (I have a version which also writes tests to verify its changes succeeded, but it's flaky when the intent in the code isn't crystal clear).

I realized though that this would probably be better as an extension in an IDE which can intercept error stacks in a terminal or over a port. That way there's full control over apply patches and no need to arbitrarily run code, which requires a whole virtual machine approach. Anyway, fun to learn, might be useful somehow, but totally cool if ultimately I do just write the code myself.


The first lesson in CS101 is divide and conquer. Once the LLMs start doing this from needs assessment, iteratively breaking down each problem to be small enough to become functional in description, I could see them begin to create working large prototypes or solutions. Perhaps genetic algorithms will fine tune the code produced (and quickly).


Interesting ... I wonder if a good analogy would be to the music "industry". Without even diving into the potential impacts of AI, the building blocks of music that are available to folks wanting a music product, along with the tools to piece them together into that product, have become so sophisticated that what it means to "make music" is fundamentally different today than it was 5, 10, 15 years ago (let alone 20, 30, or 40 years ago).

Today there are musicians who play instruments or compose music in "traditional" ways, and there are other composers who have maybe never touched a piano, but are very adept at assembling high-level building blocks.

Both are musicians, though their skill sets might overlap in only the most basic of ways. Both can write/deliver music. But the one whose expertise is centered around assembling high-level blocks can only make certain kinds of edits, changes, and fixes.

The challenge in education and implementation becomes: as these creation toolkits become more integrated into the way we learn how to make music (or code), do we remove opportunity for folks to develop skills at a deeper level? Do we end up with a big but shallow pool of talent, as opposed to a smaller but deeper pool? (I fully recognize here that this is a question that might apply to any "shortcuts" introduced into any skilled field of work)


Writing code is the easy part. Requirements are the hard part.


Requirements are easy. 'Requirements elicitation' is the hard part.


A magical form of artificial intelligence was invented that eliminates all this busy work.

End users can now type whatever they can imagine into a computer and it will program itself.

This miracle technology is called a "compiler".

If this upstart AI is going to compete, people had better start testing it on the Linux kernel source rather than the SAT or bar exam.


Right. You can see it as yet another level of compiling. For languages (natural ones), that are not constructed for this purpose and suffer from a lot of ambiguity and fuzzyness.


I can't see it as compiling. It has nothing to do with compiling that I can see. If it did, then wouldn't it have been released as a compiler update?


What about 'transpiling'?


> Requirements are easy. 'Requirements elicitation' is the hard part.

Rephrased as:

Requirements in your head are easy (or at least seem so); requirements expressed in AI-grokkable form are less easy.


The whole premise seems off. The article claims that AI could only ever help with code editing/typing, and not with navigating, understanding the codebase, etc. Which current AIs can already do to some extent.

And therefore, according to major-citation-needed pie chart, editing code is only 5% of the job, thus AI will always and forever only be able to save us 5% or less of our time.

Seems pretty wooly.


Let's not pretend that there isn't some truth to software complexity though, and let's not pretend that the faster we can copy pasta solutions into codebases using "AI", the more complexity we'll likely be introducing. A 10x productivity increase could introduce a 10x+ complexity increase.

One reason I think ChatGPT is so impressive is because it's remixing and suggesting code, from already well structured code bases, functions etc, calling on well designed libraries etc. My gut feeling is that MS have allowed all of Github to be "sampled", which is why the code ChatGPT is serving up seems to "familiar".

Unless these models start to actually improve the code, ie suggest better, less complicated paths and architectures, there is a chance software will go backwards to the point where just getting LLMs to pattern match a change into a 500,000 long line codebase of auto-generated, randomized solutions stops working, then of course, we'll need a "bigger" model to grok it all, explain what the code is doing etc.

We could call it "drift" or similar, but I think the fact we write code that is easy to understand for others, including LLMs is actually a feature and not a bug. Anything "cognitive" would have to agree at some stage, unless we have unlimited cognition, for every code change available, which is honestly a fantasy at this stage.

I might be wrong, and I'm happy to be wrong, but this is probably the best time in history to be using auto-completion for coding. There's plenty of well thought out, nice training data available, I don't think that will hold water if we just get copy-pasting, neglect using nice libraries etc.

One interesting example I saw recently was someone showed me how they were using ChatGPT-4 to check their code for vulnerabilities, it actually did quite well, albeit it was very basic / best practices stuff everyone should know that needed fixing. Then i watched someone else in a different context, use it to generate code, that was vulnerable and susceptible to the same problems.

We could automate the vulnerability checking (many people already do), but then it becomes an interesting back and fourth between systems writing and fixing code, at some stage it probably becomes inefficient to do this in it's own right. Brute forcing.


I believe people are coming to these with two world views: to some, code is just something you feed into the computer to get a result. This mental model lends itself to getting impressed by these new chatbots and seeing them as a helping hand.

Some other people see code as a way to communicate about systems specifications. This is the foundation of collaboration on complex software products. In this arena, lowering entry barriers and the cost of writing code will reduce quality and introduce complex, hard to debug failure modes.


Your first four paragraphs make me wonder if A.I. might nudge developers into converging onto "best algorithms" - and to a possibly lesser extent, onto "best practices".

Or is real-world code an impossibly complex algorithm soup ?


It's an interesting problem to consider I guess.


@QuadrupleA the paragraph below the chart you referenced has a link to a post I did on that subject which does link to the study it is from if you are interested.


Good point about hallucinations - low accuracy, high confidence. I wonder if AI will develop the ability to nuance its own confidence. It would be a more useful tool if it could provide a reasonable confidence level along with its output. Much like a human would say, "not sure about this, but..."


I'm not an AI expert so I could be wrong, but it's my understanding that there is a confidence score behind the scenes. It's just not shown in the current UI.

An automated AI system should be able to ask a human for help whenever the confidence score is below a certain threshold or even spit out a backlog of all the tasks it can't confidently handle itself.


FWIW, Watson used its internal confidence score when playing Jeopardy.


It needs to be able to evaluate its own output. We human do a quick sanity check most of the time before we speak - “On what do I base this assertion?” … etc.


I wonder if multiple, independently trained LLM‘s could be used in a voting system to determine confidence, or simply call out each others’ bulls**.


Two wrong systems won't make a right though. Especially when the wrong systems are getting move convincing at being right.


Two wrong systems can help determine if your answer is wrong if they don’t agree. That’s pretty useful, even if neither is actually correct.


These titles are the exact titles I expect to laugh about in years to come.


On the other hand, a lot of jobs that were supposed to go away exist still, people just get paid less and no one really cares about them because they're not promising to eradicate hunger in Africa, which is also a job that shouldn't exist but still does mind you.

So laugh all you like.


Prompt OS: prompts so huge and chained in convoluted contexts, accreted over a generation of ChatGPTs, backwards-compatible to other models, the text file is megabytes in size.

ChatGPT gets an import statement.

Prompt Engineers will have to maintain that. It will be written in a nightmare of English, without comments, and heaven help what it means for the tool to "break production."

The reference manual contains a chapter on Forbidden Phrases, but it's marked up on paper, scrawled and crossed out by various predecessors.

Initialization is invoking special phrases, runtime feature flags, to disable whatever behaviors were previously tagged.

Yep: we humans will need to tag behaviors of the AI to describe what is happening, without the benefit of breakpoints.

Where are the breakpoints, anyway? What's our `gdb` of ChatGPT, I wonder.


Yes, it's like when people were saying blacksmiths would continue to exist. They do albeit at very reduced numbers. In their place we have mechanics. It will be the same in other industries.


mechanics, welders, CNC operators, CAD operators, industrial designers, materials engineers, metallurgists... really it was an explosion of roles once the technology was there.


This is the positive take I hope to see with AI. Because yes, blacksmiths no longer exist, but all the professions you named above have their roots in smithing.


Let a thousand varieties of codesmithing bloom !


Eh, scapegoating is why software engineers will still exist.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: