The first demo was pretty impressive. While nothing revolutionary, that's a good progress. I can only hope there's a real value in gpt pro to justified the (rumored) $200 price tag
I rarely ever use o1-preview, almost all usage goes to 4o nowadays. I don't see the point in a model without web search, it's closed off. And the wait time is not worth the result unless you're doing math or code.
Personally I can’t get it to stop using deprecated apis. So it might give me a perfectly good solution that’s x years stale at this point. I’ve tried various prompts of course with the most recent docs as markdown etc.
(Reading your comment and code reminds me that I might have confused user with the terms of the plugin I proposed and the plugin in popular LLM backends. I will make it clear in ell documents)
What kind of plugins are you going to integrate? I implemented the hook system but actually don't have many ideas to add. Currently I only added paginator and syntax highlight plugins and both of them are applied after getting response from LLM backends.
Pretty sure, LLMs even at current stage would be better teachers than at least 20% bottom shitty teachers out there. But I like your analogy! Please tell me where to get shirts that last longer then 3 washing cycles
While I don't think AI will replace anyone in the near term, I find it interesting how many people react in IT field to any attempts to automate code development to a certain degree using AI.
This reaction is quite harsh and emotional. "If you think you can be replaced by AI, it means you are a shitty developer" is quite popular). This talks more not about LLMs, but about our own insecurities.
Yes, you CAN be replaced by either AI, or any other technology shift, or by younger more productive developers, or simply by market forces rulling your skills out of favour. It happened before, it'll happen again.
I wish we had more articles not about what good practices are, or how things should be designed in theory, but rather about how to enforce those practices in a large organization, when you are not necessary a guy who makes decisions.
I was recently working on a UI system that attempts to let AI build websites autonomously. When you start working on it you quickly realize how many tradeoffs you have to think through. Most of the questions arise from the limitedness of the context window. For example, Claude 3 supports 200k tokens.
You also have to take into consideration, that since chat history is sent with every new message, the price of the conversion growths ~ n^2 of the number of messages. So do you send the whole codebase? Or do you let AI run commands like ls and cat to read the files it needs? Do you want a file in the directory with quick history of what's already done, and what needs to be done?
Another thing I find interesting is how microservices became a natural choose vs. monolith apps when building with AI, again due to limits of the context window. So you focus on thinking through all the components and their APIs, and then let AI build each of the component. If it can be done in isolation without any knowledge of other components, that's better.
Also, it quickly becomes obvious that fully-autonoumous builder does not make any practical sense. Real person still needs to look at the progress, and give guidance. Not even because AI can't do this, it probably can. But because your own understand of what you are building changes over time. So it should be semi-automatical, with real users being able to change the course any moment.
How do you build the autonomous loop?
One thing I find useful is to let AI write tests first, and then run those automatically on each new chat message. TypeScript types also helps catching broken code early. In those case automatical message is sent "Hey, you broke the tests. Here are the error messages. Go ahead and fix those." Operator doesn't have to bother, until it's fixed.
Another loop can be build with the ability to send screenshots. So at any moment system can send a screenshot to AI, and ask if it's good enough, and if it wants to make any changes. That also improves the quality.
Well, you get the ides. It's an interesting task to ponder.