Hacker News new | past | comments | ask | show | jobs | submit login
IBM's new SWE agents for developers (ibm.com)
77 points by sandwichsphinx 51 days ago | hide | past | favorite | 72 comments



I wanted to find the actual change performed by these agents so I watched the embedded video. I can not believe what I saw.

The video shows a private fork of a pubic repository. The bug is real, but it was resolved in February 2023 and doesn’t seem like the solution was automated [1]

The bug has a stack trace attached with a big arrow pointing to line 223 of a backend_compat.py file. A quick grasp on this stack trace and you already know what happened and why, and how to fix this, but…

not for the agent. It seems to analyze the repository in multiple steps and tries to locate the class. Why did they even release this video?

[1] https://github.com/Qiskit/qiskit/issues/9562


Mgmt at every company is asked - what are you doing to be agentic ?

so, they organize hackathons where devs build a hypothetical agentic framework nobody will dare use. So, mgmt can claim, look here what i have done to be agentic.

you should ask: would you dogfood your agent, and the answer is no way. these are meant purely for marketing purposes, as they dont meet an end user need.


whats hilarious in this farce is how these are being rebranded from "co-pilots" to "agents"

just goes to show, it is all a big song-and-dance. much ado about nothing.


The term "co-pilot" implies a company has to hire a software engineer to guide the AI.

The term "agent" implies you can give the AI full access to your repos and fire the software engineers you're grudgingly paying six figures to.

The second is much more valuable to executives not wanting to pay the software people that demand higher salaries than virtually everyone else in the organization.


They're was no rebrand. They're different concepts. Copilot and similar solutions are giving hints as you do the development. Agents are systems that receive a goal and will iterate actions and queries for more information until they achieve the goal.


you are quoting the party-line.

i am saying, the thing is snake-oil - a solution looking for a problem.


I'm explaining what words mean. Agentic approach has been a thing for years https://en.wikipedia.org/wiki/Intelligent_agent You can just say you don't like AI in programming, without saying incorrect things on top of that.


Right. Woe is the startup that doesn't have an AI story right now.


The companies that have a data moat and no AI are in a much better position than those who’ve got it the other way around.


Depends on what you are optimizing for.

Long term value, I agree.

Fundraising, hard disagree.


Classic machine learning researcher trick: just select your test example from the training set! It certainly saves a lot of effort.


That’s true, but this repo has thousands of bugs. They could at least find one that was in the training set, but also did not contain the location in the bug description.

This way it would at least look like it may work


Decision makers and those writing the check aren’t sophisticated enough to know the difference, in my experience with orgs that buy from IBM.


every hype cycle runs through a predictable course.

we are at a phase where the early adopters have seen the writing on the wall.. ie that llms are useful for a limited set of usecases. but there are lots of late adopters who are still awestruck and not disillusioned yet.


Indeed. It's also amusing how it produces a multi-page essay on the bug instead of submitting a pull request with an actionable fix.


The demo is not supposed to wow the technical people. The business people whose budgets will pay for this are less likely to notice.


I think the process could be better, but if you want good quality you really shouldn't expect it to just jump at the "obvious" thing. Just like you wouldn't want the developer to just make the error to away in the quickest way. Getting more context is always going to be a good idea, even if it wastes some time in the "trivial" cases.


it takes more time to watch the video than fix the bug


you can't expect all at once. just one step forward. note how fast everything moves since 2020, and accelerating. finally 'it's' coming...


> But with the SWE localization agent, a [ibm-swe-agent-2.0] could open a bug report they’ve received on GitHub, tag it with “ibm-swe-agent-1.0” and the agent will quickly work in the background to find the troublesome code. Once it’s found the location, it’ll suggest a fix that [ibm-swe-agent-2.0] could implement to resolve the issue. [ibm-swe-agent-2.0] could then review the proposed fix using other agents.

I made a few minor edits, but I think we all know this is coming. This calls itself "for developers" for now, but really also it's "instead of developers", and at some point the mask will come off.


It will suck to babysit LLMs as a job. In one sense perhaps it will be nice to have models do the chores. But I fear we’ll be 90% babysitting. Today I was in an hour long chat with ChatGPT about a problem when it circled back to its initial (wrong) soliton.

I have very little fear for my own job no matter how good models get. What happens is that software gets cheaper and more of it is bought. It’s what happened in every industry with automation.

Those who can’t operate a machine though (in this case an AI) should maybe worry. But chances are their jobs weren’t very secure to begin with.


Baby sitting LLMs is already my job and has been for a year. It's kind of boring but honestly after nearly 20 years in the game I felt like I was approaching endgame for programming anyways.


one more thing - you won't get a "job" .. on-demand temps can fill the roles, and are much cheaper for the company. It is happening already.


All the project/product managers that think they are the ones responsible for team success are going to get a rude awakening. When they try to do the job of an entire team, it's going to come apart pretty quickly. LLMs are a tool, nothing more, they don't magically imbue the user with competency.


They're not going to try to do the job, they're going to hire cheaper, worse SWEs to manipulate AI... and then things will come apart pretty quickly :) But they'll still have someone else to blame.

> LLMs are a tool, nothing more, they don't magically imbue the user with competency.

Not a good take though, IMO. They're literally a tool that can teach you how to use them, or anything else.


> > LLMs are a tool, nothing more, they don't magically imbue the user with competency.

> Not a good take though, IMO. They're literally a tool that can teach you how to use them, or anything else.

I disagree. In their current incarnation, LLMs require a human subject matter expert to determine if the output is valid. In the project manager team lead example, the LLM won't tell you if the database is sized correctly, or if you even need a database.


>they're going to hire cheaper, worse SWEs to manipulate AI

This is 100% the play.

Right now you can hire 5 devs in India to do the job of 1 competent US dev and save 30-40% on total cost.

Add in AI and it will only take 3 devs in India to do the same work, and can now save 50-60% on total cost.


They will ensure that before that happens that won't occur; I'm sure they will cover their bases. AI is great for PM's/Product/C-Suite types (i.e. the decision makers). Bad for the do'ers/builders long term IMO.


I don't care. I swore to myself that if the time comes my skills will no longer be needed, I'd gracefully ride into the sunset and do some other thing.


Sounds nice until you actually have to find some other thing, especially with the bar for entry being high for most interesting and well compensated jobs. It will be even worse when you have huge numbers of other devs also looking for a new job.


This is really the only answer. Be water my friend.


Incompressible, freeze around 0°C, corrosive to metal, got it.


side-step flamebait like winnie the poo


Oh, bother


Hopefully that some other thing puts bread on your table.


I've taken up a new career as an AI influencer.


Give IBM a trillion dollars and they couldn't threaten a 7 year olds lemonade stand business, I think we'll be safe lol


That’s their goal, no doubt. And I’m sure a lot of zombie projects will be blindly turned over to this type of agent and left to rot. But in practice, these agents will never replace humans, because someone will have to oversee them, and that human will probably just be the “developer” that was “replaced” by them. The work will suffer, the quality will suffer, the enjoyment of the human will suffer, the costs will increase, but some salesperson and some mid level exec will be able to claim they sold and deployed AI and get a bonus.


Developers are not going to go away, but the cushy high salaries likely will. Skill development follows a logarithmic curve where an AI boost to junior devs will be much more than the boost given to senior devs. This discrepancy will pull down the value of devs as you will get "more band for you buck" from lower tier devs, since the AI is comparatively free.

Although I also wonder about the development of new languages that may be optimized for transformers, as it seems clumsy and wasteful to have transformers juggle all the tokens needed to make code readable by humans. That would be really interesting to have a model that outputs code that functions incredibly but is indecipherable by humans.


Junior devs don't always understand enough to know why something should or shouldn't be done.

I don't think junior devs are going to benefit; if anything, the whole role of 'junior' has been made obsolete. The rote / repetitive work a junior would traditionally do, can now be delegated wholesale to a LLM.

I figure, productivity is going to be increased a lot. We'll need less developers as a result. The duties associated with developers are going to morph and become more solutions / architecture orientated.


What you say could be true too (or a combo), the outcome will still be the same though as more devs compete for fewer positions.


at some point, this will explode in a giant mess when your Codebase is littered by AI generated trash.


There’s still a huge gulf to cross to get to “instead of”.


Easy fix, start publishing public repos on github with incorrect code so the AI is trained on it.


bring it on lol


time to start a consultancy that specializes in unfucking the mess made by generative AI


I run a startup accelerator with a law firm partner (but not a legal accelerator) - and some of the stuff I hear in the lunchroom is wild. No doubt the firm is going to do extremely well un-fucking gen AI legal mess.


not only AI, we have one 'guru' who sounds like he is reading copilot on remote audio only meetings.


Thank you for a great career idea.


great minds think alike. remote consulting looks within the reach now.


AI is the new bottom-of-the-barrel outsourced contractor.


Reminds me of fixing all the half-baked vendor's work my company pays good money for.

Let the AI write all the code and programmers will do the fixes.


yeah - alongside other in-demand services. like apartment building management, corporate janitorial services, and public transportation bus drivers.


Wondering if anyone running a Python mainly application(s) is willing to be a Design Partner for a true AI Agent that can fix a few issues but with very high accuracy and no oversight.

I am from LogicStar.ai and we are trying to get rid of the BS and show true value in real-world applications with deep program analysis, reproduction, verification etc and like everyone else sourcing fix-suggestions from LLMs.

Follow our page on https://www.linkedin.com/company/logicstar-ai/ as I plan to post updates and frank comments here on what other agents are or are NOT. Your input will drive in what direction we should focus the technology and use cases.


>That score places the IBM SWE agent high up the SWE-bench leaderboard, well above many other agents relying on massive frontier models, like GPT-4o and Claude 3.

They're not even in the top half of the leaderboard. Almost half the score of the first place agent.


Which block in the flowchart is the one which will try to sell me db2?


"It made sense for IBM to build agentic tools like these, argues Ruchir Puri, chief scientist at IBM Research, not just for its own developers, but for all the enterprise developers IBM strives to assist."

What a weird sentence. Mx. Puri does not argue anything, this is just an unfounded claim. So far it just looks like snake oil that is to be sold to other companies.

This would actually be a good business strategy: Sell software that diminishes productivity to your competition and watch them disintegrate.



How many millions were spent on building this "agent" that can "fix" a null pointer exception by wrapping it in a null check?


I would have liked to see a giant ppt of an agentic framework or architecture. Call it Enterprise Agentic Framework or something like that. The architecture diagram would fill an entire ppt slide and bedazzle its customers.

All i got instead are lame tools for developers.


I wonder what kinds of errors it can actually detect. I’d love to throw it at my support queue: find the reason this thing got stuck in the interaction between three state machines which are not defined as state machines.

Or is this the next iteration of static analysis?


A combination of static analysis trying to reproduce the problem 1st, then trying various fix-suggestions and verifying they work and do not cause other regressions are a few of the steps that will need to be involved. Before an experienced developer can review this "very-junior dev"/AI work before pushing to the main of any commercial applications. The design partnership offer of https://www.linkedin.com/company/logicstar-ai/ stands if you have core Python for now. More languages will come once we get real-world feedback as a proof of concept.


What worries me most is because there is no way to prove the negative value of these agentic scams and because swe teams are (sadly) compressible to some extent, some companies will simply let go 10% of their workforce while the remaining 90% will have no choice but to keep grudging with the additional “benefit” of having to show the positive value of this scam to their hierarchy (unless they want to apply to the 10%). So much waste and sadness all around.


Anyone who has ever used an IBM product will keep this as far away from their code as possible.


Maybe they can also create agents to replace the “business analysis” where they check and define business logic requirements.


I need to say that I'm very impressed with the PDL project, a lot of things can be done in there.

https://github.com/IBM/prompt-declaration-language


What's up with the amateur hour graphs with squashed/pixelated logos?


I said this before I left IBM, and I will say it again.

These and other models IBM is working on can do basic tasks that anyone else could. But it will all fall apart the moment you add complexity to it.

It's hilarious to see how IBM struggles to stay relevant, what did that lead to? A bot that summarizes stack trace. Why is this even on the front page of HN?


The video is really sad. No music, no sounds. I remember better videos on youtube in 2006.


Beep beep!


Can it debug RPG? /s


I do see the /s but I do find that an interesting thought experiment since (a) I'd guess the number of humans who can actually debug RPG is pretty small and (b) so where are these magical agents that are "gonna take muh job" going to get training data for the code or the fixes to any such bugs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: