IBM's new SWE agents for developers

wmal · 2024-10-22T21:20:30.000000Z

I wanted to find the actual change performed by these agents so I watched the embedded video. I can not believe what I saw.

The video shows a private fork of a pubic repository. The bug is real, but it was resolved in February 2023 and doesn’t seem like the solution was automated [1]

The bug has a stack trace attached with a big arrow pointing to line 223 of a backend_compat.py file. A quick grasp on this stack trace and you already know what happened and why, and how to fix this, but…

not for the agent. It seems to analyze the repository in multiple steps and tries to locate the class. Why did they even release this video?

[1] https://github.com/Qiskit/qiskit/issues/9562

colonwqbang · 2024-10-22T21:40:49.000000Z

Classic machine learning researcher trick: just select your test example from the training set! It certainly saves a lot of effort.

wmal · 2024-10-22T21:47:49.000000Z

That’s true, but this repo has thousands of bugs. They could at least find one that was in the training set, but also did not contain the location in the bug description.

This way it would at least look like it may work

toomuchtodo · 2024-10-22T21:51:05.000000Z

Decision makers and those writing the check aren’t sophisticated enough to know the difference, in my experience with orgs that buy from IBM.

negoutputeng · 2024-10-22T22:17:22.000000Z

every hype cycle runs through a predictable course.

we are at a phase where the early adopters have seen the writing on the wall.. ie that llms are useful for a limited set of usecases. but there are lots of late adopters who are still awestruck and not disillusioned yet.

colonwqbang · 2024-10-22T22:01:48.000000Z

Indeed. It's also amusing how it produces a multi-page essay on the bug instead of submitting a pull request with an actionable fix.

negoutputeng · 2024-10-22T21:24:00.000000Z

Mgmt at every company is asked - what are you doing to be agentic ?

so, they organize hackathons where devs build a hypothetical agentic framework nobody will dare use. So, mgmt can claim, look here what i have done to be agentic.

you should ask: would you dogfood your agent, and the answer is no way. these are meant purely for marketing purposes, as they dont meet an end user need.

negoutputeng · 2024-10-22T21:28:38.000000Z

whats hilarious in this farce is how these are being rebranded from "co-pilots" to "agents"

just goes to show, it is all a big song-and-dance. much ado about nothing.

jjmarr · 2024-10-22T21:36:41.000000Z

The term "co-pilot" implies a company has to hire a software engineer to guide the AI.

The term "agent" implies you can give the AI full access to your repos and fire the software engineers you're grudgingly paying six figures to.

The second is much more valuable to executives not wanting to pay the software people that demand higher salaries than virtually everyone else in the organization.

viraptor · 2024-10-22T22:02:45.000000Z

They're was no rebrand. They're different concepts. Copilot and similar solutions are giving hints as you do the development. Agents are systems that receive a goal and will iterate actions and queries for more information until they achieve the goal.

negoutputeng · 2024-10-22T22:09:20.000000Z

you are quoting the party-line.

i am saying, the thing is snake-oil - a solution looking for a problem.

viraptor · 2024-10-22T22:20:02.000000Z

I'm explaining what words mean. Agentic approach has been a thing for years https://en.wikipedia.org/wiki/Intelligent_agent You can just say you don't like AI in programming, without saying incorrect things on top of that.

mooreds · 2024-10-22T21:29:36.000000Z

Right. Woe is the startup that doesn't have an AI story right now.

whiplash451 · 2024-10-22T21:53:15.000000Z

The companies that have a data moat and no AI are in a much better position than those who’ve got it the other way around.

viraptor · 2024-10-22T21:44:19.000000Z

I think the process could be better, but if you want good quality you really shouldn't expect it to just jump at the "obvious" thing. Just like you wouldn't want the developer to just make the error to away in the quickest way. Getting more context is always going to be a good idea, even if it wastes some time in the "trivial" cases.

bubaumba · 2024-10-22T21:27:16.000000Z

you can't expect all at once. just one step forward. note how fast everything moves since 2020, and accelerating. finally 'it's' coming...

BugsJustFindMe · 2024-10-22T20:48:26.000000Z

> But with the SWE localization agent, a [ibm-swe-agent-2.0] could open a bug report they’ve received on GitHub, tag it with “ibm-swe-agent-1.0” and the agent will quickly work in the background to find the troublesome code. Once it’s found the location, it’ll suggest a fix that [ibm-swe-agent-2.0] could implement to resolve the issue. [ibm-swe-agent-2.0] could then review the proposed fix using other agents.

I made a few minor edits, but I think we all know this is coming. This calls itself "for developers" for now, but really also it's "instead of developers", and at some point the mask will come off.

RealityVoid · 2024-10-22T21:15:56.000000Z

I don't care. I swore to myself that if the time comes my skills will no longer be needed, I'd gracefully ride into the sunset and do some other thing.

giantg2 · 2024-10-22T21:33:56.000000Z

Sounds nice until you actually have to find some other thing, especially with the bar for entry being high for most interesting and well compensated jobs. It will be even worse when you have huge numbers of other devs also looking for a new job.

soco · 2024-10-22T21:22:16.000000Z

Hopefully that some other thing puts bread on your table.

mycall · 2024-10-22T21:18:32.000000Z

This is really the only answer. Be water my friend.

rzzzt · 2024-10-22T21:26:58.000000Z

Incompressible, freeze around 0°C, corrosive to metal, got it.

bun_at_work · 2024-10-22T21:35:40.000000Z

side-step flamebait like winnie the poo

sesteel · 2024-10-22T21:45:56.000000Z

I've taken up a new career as an AI influencer.

Workaccount2 · 2024-10-22T21:52:35.000000Z

Developers are not going to go away, but the cushy high salaries likely will. Skill development follows a logarithmic curve where an AI boost to junior devs will be much more than the boost given to senior devs. This discrepancy will pull down the value of devs as you will get "more band for you buck" from lower tier devs, since the AI is comparatively free.

Although I also wonder about the development of new languages that may be optimized for transformers, as it seems clumsy and wasteful to have transformers juggle all the tokens needed to make code readable by humans. That would be really interesting to have a model that outputs code that functions incredibly but is indecipherable by humans.

lwhi · 2024-10-22T22:01:22.000000Z

Junior devs don't always understand enough to know why something should or shouldn't be done.

I don't think junior devs are going to benefit; if anything, the whole role of 'junior' has been made obsolete. The rote / repetitive work a junior would traditionally do, can now be delegated wholesale to a LLM.

I figure, productivity is going to be increased a lot. We'll need less developers as a result. The duties associated with developers are going to morph and become more solutions / architecture orientated.

Workaccount2 · 2024-10-22T22:05:40.000000Z

What you say could be true too (or a combo), the outcome will still be the same though as more devs compete for fewer positions.

j-krieger · 2024-10-22T22:05:06.000000Z

at some point, this will explode in a giant mess when your Codebase is littered by AI generated trash.

bloopernova · 2024-10-22T21:45:05.000000Z

All the project/product managers that think they are the ones responsible for team success are going to get a rude awakening. When they try to do the job of an entire team, it's going to come apart pretty quickly. LLMs are a tool, nothing more, they don't magically imbue the user with competency.

digging · 2024-10-22T22:06:26.000000Z

They're not going to try to do the job, they're going to hire cheaper, worse SWEs to manipulate AI... and then things will come apart pretty quickly :) But they'll still have someone else to blame.

> LLMs are a tool, nothing more, they don't magically imbue the user with competency.

Not a good take though, IMO. They're literally a tool that can teach you how to use them, or anything else.

alkonaut · 2024-10-22T21:50:07.000000Z

It will suck to babysit LLMs as a job. In one sense perhaps it will be nice to have models do the chores. But I fear we’ll be 90% babysitting. Today I was in an hour long chat with ChatGPT about a problem when it circled back to its initial (wrong) soliton.

I have very little fear for my own job no matter how good models get. What happens is that software gets cheaper and more of it is bought. It’s what happened in every industry with automation.

Those who can’t operate a machine though (in this case an AI) should maybe worry. But chances are their jobs weren’t very secure to begin with.

zeroonetwothree · 2024-10-22T21:00:40.000000Z

There’s still a huge gulf to cross to get to “instead of”.

lyu07282 · 2024-10-22T21:29:44.000000Z

Give IBM a trillion dollars and they couldn't threaten a 7 year olds lemonade stand business, I think we'll be safe lol

skywhopper · 2024-10-22T21:48:26.000000Z

That’s their goal, no doubt. And I’m sure a lot of zombie projects will be blindly turned over to this type of agent and left to rot. But in practice, these agents will never replace humans, because someone will have to oversee them, and that human will probably just be the “developer” that was “replaced” by them. The work will suffer, the quality will suffer, the enjoyment of the human will suffer, the costs will increase, but some salesperson and some mid level exec will be able to claim they sold and deployed AI and get a bonus.

invalidOrTaken · 2024-10-22T20:55:48.000000Z

bring it on lol

dingnuts · 2024-10-22T21:06:22.000000Z

time to start a consultancy that specializes in unfucking the mess made by generative AI

neom · 2024-10-22T21:43:17.000000Z

I run a startup accelerator with a law firm partner (but not a legal accelerator) - and some of the stuff I hear in the lunchroom is wild. No doubt the firm is going to do extremely well un-fucking gen AI legal mess.

alfalfasprout · 2024-10-22T21:31:02.000000Z

AI is the new bottom-of-the-barrel outsourced contractor.

bubaumba · 2024-10-22T21:22:29.000000Z

not only AI, we have one 'guru' who sounds like he is reading copilot on remote audio only meetings.

giantg2 · 2024-10-22T21:35:20.000000Z

Thank you for a great career idea.

mycall · 2024-10-22T21:19:47.000000Z

Reminds me of fixing all the half-baked vendor's work my company pays good money for.

Let the AI write all the code and programmers will do the fixes.

mistrial9 · 2024-10-22T21:14:58.000000Z

yeah - alongside other in-demand services. like apartment building management, corporate janitorial services, and public transportation bus drivers.

jcgrillo · 2024-10-22T21:12:16.000000Z

Which block in the flowchart is the one which will try to sell me db2?

negoutputeng · 2024-10-22T22:25:29.000000Z

I would have liked to see a giant ppt of an agentic framework or architecture. Call it Enterprise Agentic Framework or something like that. The architecture diagram would fill an entire ppt slide and bedazzle its customers.

All i got instead are lame tools for developers.

TeslaCoils · 2024-10-22T21:09:33.000000Z

Sure... https://www.cnbc.com/2024/06/17/mcdonalds-to-end-ibm-ai-driv...

kayodelycaon · 2024-10-22T20:40:56.000000Z

I wonder what kinds of errors it can actually detect. I’d love to throw it at my support queue: find the reason this thing got stuck in the interaction between three state machines which are not defined as state machines.

Or is this the next iteration of static analysis?

hrmacb · 2024-10-22T21:16:40.000000Z

"It made sense for IBM to build agentic tools like these, argues Ruchir Puri, chief scientist at IBM Research, not just for its own developers, but for all the enterprise developers IBM strives to assist."

What a weird sentence. Mx. Puri does not argue anything, this is just an unfounded claim. So far it just looks like snake oil that is to be sold to other companies.

This would actually be a good business strategy: Sell software that diminishes productivity to your competition and watch them disintegrate.

whiplash451 · 2024-10-22T21:58:27.000000Z

What worries me most is because there is no way to prove the negative value of these agentic scams and because swe teams are (sadly) compressible to some extent, some companies will simply let go 10% of their workforce while the remaining 90% will have no choice but to keep grudging with the additional “benefit” of having to show the positive value of this scam to their hierarchy (unless they want to apply to the 10%). So much waste and sadness all around.

d3m0t3p · 2024-10-22T22:03:21.000000Z

The video is really sad. No music, no sounds. I remember better videos on youtube in 2006.

RealCodingOtaku · 2024-10-22T22:17:55.000000Z

I said this before I left IBM, and I will say it again.

These and other models IBM is working on can do basic tasks that anyone else could. But it will all fall apart the moment you add complexity to it.

It's hilarious to see how IBM struggles to stay relevant, what did that lead to? A bot that summarizes stack trace. Why is this even on the front page of HN?

eloycoto · 2024-10-22T21:07:56.000000Z

I need to say that I'm very impressed with the PDL project, a lot of things can be done in there.

https://github.com/IBM/prompt-declaration-language

constantlm · 2024-10-22T21:39:15.000000Z

What's up with the amateur hour graphs with squashed/pixelated logos?

m3kw9 · 2024-10-22T20:51:36.000000Z

Maybe they can also create agents to replace the “business analysis” where they check and define business logic requirements.

FL410 · 2024-10-22T21:05:55.000000Z

Can it debug RPG? /s