Hacker News new | past | comments | ask | show | jobs | submit login
Auto-GPT Unmasked: The Hype and Hard Truths of Its Production Pitfalls (jina.ai)
90 points by artex_xh on April 13, 2023 | hide | past | favorite | 55 comments



This is a very strange article. It's also very strange to read some of the github issues which are basically like "this doesn't work!" it's so clear from the code and the discussions that this was basically the equivalent of "random rocket to fame" for a toy project that is even described as a toy.

Why should it even work? Cool, now there's thousands of people playing with the same code, glad I'm not getting the github issues or trying to manage the PRs, but lots of smart people that want to play around are playing with the same code and forking it and suggesting things and we have def not reached a point where we need to have a 3 month argument about formatting guidelines, so how can it be a bad thing that a bunch of people are playing with some code and a new tool?

Did we really go from "oh my god deep mind can make things have weird eyeballs" to "WTF this hyper-realistic photograph I created with a paragraph has some shadows that aren't quite in the right place and autogpt gets stuck in loops and thousands of API calls hitting some of the most computationally expensive things humans have ever done cost some money and sometimes go into loops in this very mature codebase?"

humans man....


even my "humans, man" is sarcastic and resigned - arguably it's our ability to be discontented like but this could or should work that drives these things in the first place...


> As we celebrate Auto-GPT's rapid ascent, it's crucial to take a step back and scrutinize its potential shortcomings.

Why is it crucial? This project has existed for barely a month, and describes itself as an experiment. Who is using this for production?


I have never, ever seen a hype wave like this. Cryptocurrency doesn't even come close.

We have people saying that unfinished toy projects on GitHub are going to revolutionize entire industries on the basis of one contrived demo. We have other people calling for air strikes on data centers and nuclear war to stop... text generators.

What's wrong with people?


>I have never, ever seen a hype wave like this. Cryptocurrency doesn't even come close.

The internet was a hype wave too. Remember the dotcom bubble? And then also remember how eventually the internet actually did revolutionize communication, business and entertainment?

Yes, there's hype. There's also a tangible sense that we're at this 'internet circa 1995' moment. I feel like the hype is warranted.


I wasn't arguing that there's nothing there, just that this hype wave has hit incredibly quickly and hard.

There's far more here than cryptocurrency.



There's some YT videos making really wild claims about how everything is about to be revolutionized from some examples people have tried, or just ideas of using LLMs for everything. The cynical part me things it's half fundraising.


People are already losing their careers to AI systems. As in being fired and literally told it was because of the AI being cheaper.

This is not the nothingburger of crypto


Lots of people who can't be bothered to learn programming are excited about doing things without having to learn programming.


It’s something regular people can see and touch and it does things, it’s probably first time most people have played with new technology.

But yea, I’m seriously concerned about the hype and the stupid things people immediately try to do with it. About others safety and intentions.

“I’ll just hook it up to my credit card and put it on the Internet…”


now we don't need research if AGI exists, how it escape.


That's hyperbole. When a sports stadium gets named after an AI product then we'll have hit the levels of crypto hype.


Roko's Basilisk Stadium!


Yes, we should build it immediately!


We've been calling for nuclear war for years, you just noticed it recently.


On the LLMs? Is that a first strike against SkyNet?


>What's wrong with people?

Boredom?


Speaking of $16,000 Bitcoins... I think this might be the wave that pushes you over $100k each. I have a github repo that says so, with math.


I invoke Poe's law with this one.


Didn't know but yeah, was pointing out how many of my friends had "predicted" the sudden climb of Bitcoin because of "inverted curves" or some other magic.


It's "crucial" because the model that generated most of the text in this article likes saying that word. https://twitter.com/topynate/status/1641468708928249857


One quickly gets that x-ray vision when interacting enough with chatgpt


It's a new weird human spider sense! I'm getting it more in recruiter emails these days for sure


Exactly this. And the article seems to assume the whole world will remain in stasis forever more. ChatGPT won't get cheaper, or faster; AutoGPT won't be iterated on, extended, or improved; GPT-4 will be the last GPT ever released.

Or maybe, just maybe, all variables will continue trending in a positive direction in the time it would take to build a production product. And then I wonder what point, exactly, this article is trying to make.


I think people are just failing to see the progress as exponential and to extrapolate that way. I see a lot of 'pooh pooh' claims that make this assumption that because the current state is flawed or doesn't measure up that somehow justifies the belief that they can rest easy and have nothing to worry about. I contrast that with my own experience of I go to bed and wake up the next day and check what new tools, models and papers have been released and usually have one "hot damn!" moment per day.


We’re on the zero downside, only upside technology curve to utopia ?


>We’re on the zero downside, only upside technology curve to utopia ?

Not how technology works in my experience. My mental model of technology is that it acts as a multiplier on life's peaks and valleys, so the highs get higher and the lows get lower. To my mind that means AGI will bring the ultimate of highs along with the ultimate of lows. Doesn't sound like utopia to me. Sounds like a good time, until it's not.


Your hypotheticals would be nice.


Sounds like GPT-speak to me...


there are couple of efforts on making autogpt project into more professional service: - https://www.cognosys.ai/ - https://agentgpt.reworkd.ai/

tho dont see anyone building paywall yet


The problem with most of these GPT-4 experiments is because of the buzz around AI it's much more advantageous for a company to build their userbase on hype rather than efficacy. Everyone has FOMO and wants to be part of the infinite passive income game so whoever promises that with the greatest apparent clarity will win out in the end. Fake it until you make it (and by make it I mean that GPT-5 is released and automatically eases all the bottlenecks via just being a lot smarter)


agree. i do believe authors of those experimental projects do not intent to create such buzz, it is just too many shitinfluencers pushing the projects in that way.


Strikes me that some sort of inflection point has been reached in terms of buzz, hype, desperateness to make money and get out of the rat race, snake oil, etc.

We literally watched crypto/web3 do the entire speed run from weird nerd project to spam/ponzi/get rich quick/celebrity/vaporware shell game in a couple years. Like All The Way to stadiums being bought and international manhunts and average people losing lots of money with something they didn't understand to most people being annoyed by bots and influencers to virtually 0% of projects having any other purpose but to generate hype

(with actual ongoing interesting tech being built underneath it)

the fact that autogpt in One Week went from AI experiment to "why doesn't this ship money to my bank account or what is github I'm here for the free money" based on youtube videos and spam articles etc really does note bode well for the hype phase of AI


Nice counter to all the AGI/ASI claims and rampant speculation going on. I'm seeing videos and podcasts making all sorts of huge claims about what can be done right now or in the very near future.


Well, it's certainly more than could be done lat year at this time, so that's saying something, yeah?


Yes, but it’s starting to sound like the previous breakthroughs with self-driving cars, 3D printing, VR/AR and blockchain.


You forgot the internet.


It is a strange article, speaking as if AutoGPT were not completely nascent, which it is. So the critiques aren't even really wrong. The most valuable observation is that vector DBs are overkill (does LangChain have a stake in Pinecone?)


It’s been 7 days. Time to do an extended deep dive critical retrospective!


It seems like the people who are massively over-hyping AutoGPT have never used it. It's fascinating how the entire hype cycle can be based on an idea of what's been done as opposed to what's actually been done.

As to the tool itself, I played around with it and it has some cool ideas that I think are valuable, even if it's not AGI


In my (few hours of) testing, Auto-GPT was quite unreliable. If you'll pardon the expression, it suffers from severe ADHD: procrastinates, overthinks, gets distracted.

I think this is due to the main loop being a GPT feedback loop. Each loop it has a small chance of something going wrong (or a large chance, depending on the query), so as it loops repeatedly, the chance of failure approaches 100%.

My idea was to replace the core loop, instead of a GPT feedback loop just make it a few lines of Python.

Now the thing actually does what it says it's going to do, "thinking lag" is eliminated, and API usage is reduced 80%.

I turned (parts of) Auto-GPT into a tiny Python library, specialized for internet research.

GPT-3 and GPT-4 are able to use this library to write Python programs that do useful work.

This way they can "crystallize" their plans in code, to ensure that they will run.

Here is the interface:

    def search(query, max_results=8) -> List[Dict]: pass # uses duckduckgo
    def load(url) -> str | None: pass                    # uses requests and beautifulsoup
    def summarize(text, task) -> str | None: pass        # uses gpt-3
    def save(filename, text) -> bool: pass
See the comments below the gist for the GPT-3 and 4 versions of main.py (20-30 lines for an internet research agent!).

https://gist.github.com/avelican/2d4e718954593e3df9e0e5ee675...

Note: It's currently optimized for my main use-case, which is internet research. So it's not an Auto-GPT in any sense. But it does one thing, and does it fairly well.

P.S. The Holy Grail would be a system where the user enters a query, and the system translates it into Python on top of Auto-GPT's library, and runs that. I haven't tried that yet, and I'm a little afraid to...


In using it, I found that my summaries / queries would often result in "The text did not include (thing you asked for)." because the 2nd half of the page text would be unrelated to the main content (and the text is split into small chunks for processing to fit in the context window).

My solution to this (not yet integrated) is to use a faster / cheaper model to process all the text first and see if it's actually relevant or not, before running the summarization / task prompt with the main model.

I realized that this is essentially a semantic search engine for text. Using the same principle you can feed a very large text file in, ask it a question, and it finds the page that answers that question.

This would be useful as a layer "beneath" the internet research agent, that it could use to sort through all the noise and answer the question.

https://gist.github.com/avelican/58958a2cf2b7e9f9f555ab94549...

It gives a lot of false positives, I'm not sure if this is a limitation of GPT-3 (have not tried GPT-4 for this yet, a bit too expensive for searching books) or a limitation of my implementation.

It's probably slower and more expensive than just using a vector db, I haven't tried those yet.


You could give this a try: https://www.aomni.com/


Update: Using the embeddings API is an order of magnitude cheaper for filtering text.


Yeah, tried to have it describe an image that it downloaded from a Dropbox link, and it spat out junk. Ah well. Guess the GPT4 in ChatGPT isn't the multi-modal model.


You've got to upload a photo, it can't read it from a link, and it's only GPT4 if you've got ChatGPT Plus.


I thought that feature of GPT-4 was only available via waitlist?


It’s available over chat gpt but I did not see any image upload options


This brings some much needed balance to the unchecked hype of Auto-GPT. Because most people don't have access to GPT-4; and from Twitter, it seems to most like these agents are already "ready", which is definitely not the case.

All the breathless evangelizing out there is for incredibly simple problems that break down the second we try something complex. Yes a GPT-4 agent can go far in the coming months, but there's a ceiling to how good it can be because there's a ceiling to how well the model can reason. Newer LLMs will be the ultimate answer.


This is just a state of play issue. It's not good enough YET.

On cost: 3.5 is 1/15th the cost, and a bunch faster though supports less context, so it's worth experimenting with which parts of tasks need GPT-4 and which need 3.5 (perhaps a feature where GPT-4 manages the main tasking / verification and 3.5 handles the individual small tasks - even if GPT has to be called 15 times to get the same result, this approach wins).

On functionality: yeah, this is just some person's plaything. Reading the code it doesn't seem particularly well production grade (like many a python passion program). Something built similarly with a decent architecture (multi-agent + using an approach that automatically optimizes tasks) I can see this getting cheaper and better easily.

Just like in software dev, where we write a spec, write a plan, write acceptance criteria, write code, write tests, run tests, iterate, Auto-GPT type software needs a similar framework to work within that is not just defined by the code, but by generalizing that to an architecture. https://github.com/daveshap/raven is an interesting project exploring some of this.


Current gpt is knowledgable, however not smart enough to work without the humans. It can come to great results fast when aided by humans. So adding microwork (mTurk or others) APIs to its arsenal could make this vastly better and resolve the pitfalls until the AI gets better. If it ever does.

It saves tons and tons of work though; now that we integrated it into our pipeline after months of tweaking, 3.5 is really starting to remove the need for many of my colleagues. Some of us are needed to say ‘yes or no or redo’ as it were, but everything is far more efficient and much (100x $ or more less per day) cheaper.


autoGPT mTurk integration on the way? would be a great thing for human in the loop systems


Hugely disappointing to see so many people take this argument at face value instead of seeing it for what it is: silly sensationalist content written in bad faith.


"This is like the robot's ability to learn from its mistakes. Auto-GPT can review its work, build on its previous efforts, and use its history to produce more accurate results."

Can it? I've played with it a bit and haven't seen that. If someone has some excellent examples, I would love to see.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: