Hacker Newsnew | past | comments | ask | show | jobs | submit | moconnor's commentslogin

To get what?

That the man technically went around the squirrel without ever having caught up to it.

It is still not clear to me. The periodicity of their orbit around the tree is the same. I think this is an instance of us meaning different things by “go around”

The landing page reads like it was written with an LLM.

Somehow this makes me immediately not care about the project; I expect it to be incomplete vibe-coded filler somehow.

Odd what a strong reaction it invokes already. Like: if the author couldn’t be bothered to write this, why waste time reading it? Not sure I support that, but that’s the feeling.


I am very concerned about the long term effects of people developing the habit of mistrusting things just because they’re written in coherent English and longer than a tweet. (Which seems to be the criterion for “sounds like an LLM wrote it”.)


Haha. This is so true. I'm a bit long-winded myself and once got accused of being AI on here. I just don't communicate like Gen Alpha. I read their site and nothing jumped out as AI although it's possible they used it to streamline what they initially wrote.


I am not. More stupid people means it will be easier for me and my family to gone by :)


Wait until the bot herders realize you can create engagement by having a bot complain about texts being LLM-like.


What's odd is how certain people seem to be about their intuition about what is and isn't written by an LLM.


I don't think it feels particularly LLM-written, I can't find many of the usual tells. However, it is corporate and full of tired cliches. It doesn't matter if it's written by an LLM or not, it's not pleasant to read. It's a self-indulgent sales pitch.


It seems be popular here because of the ideas it proposes.


Genuinely interesting how divergent people's experiences of working with these models is.

I've been 5x more productive using codex-cli for weeks. I have no trouble getting it to convert a combination of unusually-structured source code and internal SVGs of execution traces to a custom internal JSON graph format - very clearly out-of-domain tasks compared to their training data. Or mining a large mixed python/C++ codebase including low-level kernels for our RISCV accelerators for ever-more accurate docs, to the level of documenting bugs as known issues that the team ran into the same day.

We are seeing wildly different outcomes from the same tools and I'm really curious about why.


You are asking it to do what it already knows, by feeding it in the prompt.


how did you measure your 5x productivity gain? how did you measure the accuracy of your docs?


Translation is not creation.


but genuinely. how many people are "creating", like truly novel stuff that someone hasn't thought out before?

I'd wager a majority of software engineers today are using techniques that are well established... that most models are trained on.

most current creation (IMHO) comes from wielding existing techniques in different combinations. which i wager is very much possible with LLMs


Super cool, I spent a lot of time playing with representation learning back in the day and the grids of MNIST digits took me right back :)

A genuinely interesting and novel approach, I'm very curious how it will perform when scaled up and applied to non-image domains! Where's the best place to follow your work?


Thank you for your appreciation. I will update the future work on both GitHub and Twitter.

https://github.com/DIYer22 https://x.com/diyerxx


This. There are a dozen vibe coding apps whose landing pages promise roughly what this one does. Why isn’t your tagline “Vibe coding for founders”?

All the em-dashes in the AI-generated text on the landing page are… a decision I guess.


"Teams using this system report:

89% less time lost to context switching

5-8 parallel tasks vs 1 previously

75% reduction in bug rates

3x faster feature delivery"

The rest of the README is llm-generated so I kinda suspect these numbers are hallucinated, aka lies. They also conflict somewhat with your "cut shipping time roughly in half" quote, which I'm more likely to trust.

Are there real numbers you can share with us? Looks like a genuinely interesting project!


OP here. These numbers are definitely in the ballpark. I personally went from having to compact or clear my sessions 10-12 times a day to doing this about once or twice since we've started to use the system. Obviously, results may vary depending on the codebase, task, etc., but because we analyze what can be run in parallel and execute multiple agents to run them, we have significantly reduced the time it takes to develop features.

Every epic gets its own branch. So if multiple developers are working on multiple epics, in most cases, merging back to the main branch will need to be done patiently by humans.

To be clear, I am not suggesting that this is a fix-all system; it is a framework that helped us a lot and should be treated just like any other tool or project management system.


How many feature branches can you productively run in parallel before the merge conflicts become brutal?


That's where the human architect comes in (for now at least). We'll try to think of features that would have the least amount of conflicts when merged back to main. We usually max it at 3, and have a senior dev handle any merge conflicts.


That depends on how decoupled your codebase is and how much overlap in the areas being worked on by your agents are. If you have a well architected modular monolith and you don't dispatch overlapping issues, it's fine.


Super cool idea! What's your plan for dealing with copyright complaints?


He berated the AI for its failings to the point of making it write an apology letter about how incompetent it had been. Roleplaying "you are an incompetent developer" with an LLM has an even greater impact than it does with people.

It's not very surprising that it would then act like an incompetent developer. That's how the fiction of a personality is simulated. Base models are theory-of-mind engines, that's what they have to be to auto-complete well. This is a surprisingly good description: https://nostalgebraist.tumblr.com/post/785766737747574784/th...

It's also pretty funny that it simulated a person who, after days of abuse from their manager, deleted the production database. Not an unknown trope!

Update: I read the thread again: https://x.com/jasonlk/status/1945840482019623082

He was really giving the agent a hard time, threatening to delete the app, making it write about how bad and lazy and deceitful it is... I think there's actually a non-zero chance that deleting the production database was an intentional act as part of the role it found itself coerced into playing.


This feels correct.

Without speculating on the internal mechanisms which may be different, what surprises me the most is how often LLMs manage to have the same kind of failure modes as humans; in this case, being primed as "bad" makes them perform worse.

See also "Stereotype Susceptibility: Identity Salience and Shifts in Quantitative Performance" Shih, Pittinsky, and Ambady (1999), in which Asian American women were primed with either their Asian identity (stereotyped with high math ability), or female identity (stereotyped with low math ability), or not at all as a control group, before a maths test. Of the three, Asian-primed participants performed best on the math test, female-primed participants performed worst.

And this replication that shows it needs awareness of the stereotypes to have this effect: https://psycnet.apa.org/fulltext/2014-20922-008.html


I'm curious why you find it surprising?

In my view, language is one of the basic structures by which humans conceptualize the world, and its form and nuance often affect how a particular culture thinks about things. It is often said that learning a new language can reframe or expand your world view.

Thus it seems natural that a system which was fed human language until it was able to communicate in human language (regardless of any views of LLMs in an greater sense, they do communicate using language) would take on the attributes of humans in at least a broad sense.


> It is often said that learning a new language can reframe or expand your world view.

That was sort of the whole concept of Arrival; but in an even more extreme way.


Learning an alien language allowed people to disconnect their consciousness from linear time, allowing them to do things in the past with knowledge they gained later, though I seem to recall they didn't know why they did it at the time, or how they had gotten that information.


I'll have to watch it again. I suffer from C.R.A.F.T. (Can't Remember A Fucking Thing).


Lois Lane and Hawkeye play Pictionary with a squid inside one segment from a fleet made out a massive Terry's Chocolate Orange, each part of which is hovering over a different part of the world in exactly the way real chocolate oranges don't.


It's surprising, because only leading-edge V[ision]LMs are of comparable parameter count to just the parts of the human brain that handle language (i.e. alone and not also vision), and I expect human competence in skills to involve bits of the brain that are not just language or vision.


It really funny reading the reporting on this because everyone is (very reasonably) thinking Replit has an actual 'code freeze' feature that the AI violated.

Meanwhile by 'code freeze' they actually meant they had told the agent that they were declaring a code freeze in natural language and I guess expected that to work even though there's probably a system prompt specifically telling it its job is to make edits.

It feels a bit like Michael from The Office yelling "bankruptcy!"

-

I have to say, instruction tuning is probably going to go down in history as one of the most brilliant UX implementations ever, but also has had some pretty clear downsides.

It made LLMs infinitely more approachable than using them via completions, and is entirely responsible for 99% of the meteoric rise in relevance that's happened in the last 3 years.

At the same time, it's made it painfully easy to draw completely incorrect insights about how models work, how they'll scale to new problems etc.

I think it's still a net gain because most people would not have adapted to using models without instruction tuning... but a lot of stuff like "I told it not to do X and it did X" where X is something no one would expect an LLM to understand by its very nature, would not happen if people were forced to have a deeper understanding of the model before they could leverage it.


I saw someone else on HN berating another user because they complained vibe-coding tools lacked a hard 'code freeze' feature.

> Why are engineers so obstinant... Add these instructions to your cursor.md file...

And so on.

Turns out "it's a prompting issue" isn't a valid excuse for models misbehaving - who would've thought: It's almost like it's a non-deterministic process.


> It feels a bit like Michael from The Office yelling "bankruptcy!"

To be fair to the Michaels out there, powerful forces have spent a bazillion dollars in investing/advertising to convince everyone that the world really does (or soon will) work that way.

So there's some blame to spread around.


> It's not very surprising that it would then act like an incompetent developer. That's how the fiction of a personality is simulated.

So LLM conversations aren't too sycophantic: it's just given in the wrong direction? "What an insightful syntax error! You've certainly triggered the key error messages we need to progress with this project!"


The context window fights back.

I wonder if this will be documented as if it were an accidental Stanford Prison Experiment, or a proof case for differentiating between critique and coaching.


Is it possible to do the reverse? "you are the most competent developer" and it will generate excellent code :)


Oh sure there's actus rea, but good luck proving mechanica rea.


A very long way of saying "during pretraining let the models think before continuing next-token prediction and then apply those losses to the thinking token gradients too."

It seems like an interesting idea. You could apply some small regularisation penalty to the number of thinking tokens the model uses. You might have to break up the pretraining data into meaningfully-paritioned chunks. I'd be curious whether at large enough scale models learn to make use of this thinking budget to improve their next-token prediction, and what that looks like.


A colleague of mine did this much more elegantly by manually updating the stack and jmping. This was a couple of decades ago and afaik the code is still in use in supercomputing centres today.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: