AI coding assistants are changing how we work, but they're not replacing the core skills that make a developer. What matters most is our ability to hold complex systems in our minds and think through their implications.
A big problem with current AI is a simple one - we don't document our thought processes that go into building solutions, at least not in enough detail.
Without the thousands of micro decisions that go into building even the simplest of solutions, it doesn't matter how large your context window is. It's not about holding a system in your mind - it's about what you do with it, what decisions you make to move towards your goal.
At least that's my take on current LLMs and their limitations.
Easily, I have a somewhat working understanding of the SAP version we run in my head, LLMs love hallucinating columns or endpoints that do not exist. I‘m sure the full SAP documentation easily clears 2 mil tokens. And thats not even touching our own codebase.
Enterprise software gets quite complex, has a ton of dependencies that need to be understood together, etc.
Just the data model can take up hundreds of thousands of tokens describing the tables and relationships in some of the code bases I've worked on.
These models degrade like crazy at those long token counts though. I have not found them useful if I need to just stuff everything in a giant context window. I'm mostly using Claude though, so slightly different context scale.
> Has your mind ever held a system so complex, it wouldn't fit in two million tokens? My hasn't.
This is one of those things which superficially seems like a slam-dunk gotcha, but isn't.
Yes, correct, I can't do that.
Unfortunately, my experience with LLMs is that they can't really pay attention to all the things in the context window either.
Even a mere 5,613 tokens[0] had it getting confused.
If any AI could really do two million tokens with perfect recall of the problem, that would indeed be wildly super-human. Even just having a 6k tokens worth of custom instructions that are applied consistently to an ongoing data stream — which I bet could be done with the right scaffolding on the API of better models, even if not a naïve use of chat UIs — is superhuman. That kind of ongoing focus and persistence would still be superhuman even when the quality of the result is "ok, not great, just ok", owing to how "human doing same thing for 4 hours" is much worse than "freshly rested human begins work for the day".
I don't know where the boundary really is, though, where it becomes superhuman on any axis[1]. The failure mode I'm describing here reminds me of art lessons towards the end of my time at school, where the teacher had to remind people that accurate still-life studies required you to keep looking again and again at the material, not just once when you started and then filling in the details entirely from your imagination.
[0] I tried using it to translate this Wikipedia page to English, and it was hallucinating plausible but false things by the time it got to the timeline: https://de.wikipedia.org/wiki/Döberitzer_Heide
Tried it again while writing this to see if current models are any better, this time the same prompt when to the canvas editor, it didn't complete the translation, when I replied "continue" it replaced the attempted first half with a non-translated German wikipedia article that was essentially unrelated: https://chatgpt.com/canvas/shared/67b091d021088191bd9e0ca7c3...
[1] and there are many different aspects of intelligence.
Consider the converse: if it was a fundamental requirement of the nature of intelligence that all aspects of human intelligence correlate well with each other, then chess AI could only have beat world champions like Kasparov in the same year that Go AI beat those like Lee Sedol.
Obviously not going to reply to everybody, but many people here got confused thinking that the LLM two million input tokens is analogous to their brain long-term memory. This is not true. The LLM two million input tokens is more like your brain "working memory," and your brain long-term memory is more like the LLM training data.
I do agree with you that LLMs get "confused," failing to follow constraints. This is also my experience. But, the reason for this phenomena is the lack of emphasis on your constraints.
For example, when working with stable diffusion, you can manipulate the weight of parts of your prompt. Say, you wanted to generate an image, and you really wanted there to be a dog, you could prompt: "a clear sky under the moonlight, (dog:1.5)," and this case the model would give the "dog" part would be 1.5x more important to the model then the rest of the prompt. Not sure why there is no such feature for LLMs (it could be, just that I'm not aware).
I looked at your prompt history with the German article, and I can see that the reason it fails is that you prompt incorrectly. When you want to give certain information or context to the LLM, say, your codebase, or some documentation, or some article, you gotta put it first in your prompt, and at the very bottom you should put your instructions. This apparently makes it easier for the LLM to parse your request.
Also, generally, LLMs will not give you a response longer than a few thousand tokens, so what you should do is: ask it to translate it section by section, and keep asking "Translate the next section," until it translates all of them. I was able to translate your article this way using Gemini, not sure how accurate it is though.
> I looked at your prompt history with the German article, and I can see that the reason it fails is that you prompt incorrectly. When you want to give certain information or context to the LLM, say, your codebase, or some documentation, or some article, you gotta put it first in your prompt, and at the very bottom you should put your instructions. This apparently makes it easier for the LLM to parse your request.
Good to know — I somehow failed to be aware of this before now despite playing with these models since the days of AI Dungeon (for open source models) and text-davinci-003 (from OpenAI).
> I was able to translate your article this way using Gemini, not sure how accurate it is though.
The second part is important — checking the translation was how I knew the it was making things up in the "translated" timeline.
reply