Hacker News new | past | comments | ask | show | jobs | submit | imiric's comments login

The moat is how much money they have to throw at the problem. Corporations with deep pockets and those that secure investments based on fairy tales will "win".

Benchmark scores are marketing fluff. Just like the rest of this article with alleged praises from early adopters, and highly scripted and edited videos.

AI companies are grasping at straws by selling us minor improvements to stale technology so they can pump up whatever valuation they have left.


The fact that people like you are still posting like this after Veo 3 is wild. Nothing could possibly be forcing you to hold onto that opinion, yet you come out in drones in every AI thread to repost it.

I concede that my last sentence was partly hyperbolic, particularly around "stale technology". But the rest of what I wrote is an accurate description of the state of the AI industry, from the perspective of an unbiased outsider, anyway.

What we've seen from Veo 3 is impressive, and the technology is indisputably advancing. But at the same time we're flooded with inflated announcements from companies that create their own benchmarks or optimize their models specifically to look good on benchmarks. Yet when faced with real world tasks the same models still produce garbage, they need continuous hand-holding to be useful, and they often simply waste my time. At least, this has been my experience with Sonnet 3.5, 3.7, Gemini, o1, o3, and all of the SOTA models I've tried so far. So there's this dissonance between marketing and reality that's making it really difficult to trust what any of these companies say anymore.

Meanwhile, little thought is put into the harmful effects of these tools, and any alleged focus on "safety" is as fake as the hallucinations that plague them.

So, yes, I'm jaded by the state of the tech industry and where it's taking us, and I wish this bubble would burst already.


From my experience as a SWE over the past ~15 years working in teams ranging from 10+ engineers in large companies doing daily in-person standups, to ~5 engineers in small remote-first companies doing daily videocall or async standups, and everything in between: the standup ceremony is a waste of everyone's time. Its only purpose is social/political as with any meeting, and giving micromanaging managers something to do and something to report to their higher-ups (I've been part of teams where the "Scrum Master" and even "Product Owner" attends the standup...). Which is fine if that's the way the company works and the participants enjoy the ceremony for personal reasons, but there are no tweaks to the standup formula that makes teams more productive or functional.

If I'm working on a solo project, nobody cares about the details of my progress besides the users I'm building it for. Whether this is an API for someone who is part of the standup, or a feature for someone in the company, I would communicate with them directly when needed. They would already be aware of my progress, and they usually don't need to be informed of it on a daily basis. If I'm stuck on something, then I would also communicate directly with the person who can help me.

If I'm working on a team project, the team members would already be aware of the project status, whether that's via pull requests, issues, or direct communication (in-person, chat, email, etc.). The users of the project would be notified of the progress as needed in this case as well.

So the standup ceremony is redundant for the few people familiar with the context, and completely useless for those who are not. The standup assumes that teams aren't communicating already, which is ludicrous.

It's about time that the industry accepts that the Agile manifesto, while noble in principle, has been completely misunderstood and corrupted in all its implementations, and that the modern software development process is not an improvement over the Waterfall one it aimed to replace.

To me the only process that makes sense is giving engineers autonomy, assuming they're self-sufficient and capable of making good decisions, and providing them with a very lightweight set of tools for collaboration (issue/task tracker, code review, chat, email, meeting, etc.). A process that works best for that team will emerge by consensus. Anything that's imposed from the top down or by cargo culting some process that worked for some other team will inevitably cause friction and be of little practical value.


As with many things, it depends on the team. Some teams and people really do seem to need some amount of daily direction to have confidence in the work they are doing, and that does have a meaningful impact on productivity.

My bias is always to only participate in frequent recurring meetings as a last resort, but sometimes they seem to be necessary.


One of my basic philosophies, when I ran a team, was “No regularly-scheduled meetings.” Every meeting needed a specific goal and need.

But I worked for a Japanese company, and they take meetings very seriously.

One of my employees suggested daily standups. I tried to support my employees, when they suggested new stuff, so I said “let’s give it a try.”

The Japanese Liaison really liked the idea, but it needed just a little tweak…

In a short time, we were having hour-long meetings every Friday at lunchtime.


The trick for greenfield projects is to use it to help you design detailed specs and a tentative implementation plan. Just bounce some ideas off of it, as with a somewhat smarter rubber duck, and hone the design until you arrive at something you're happy with. Then feed the detailed implementation plan step by step to another model or session.

This is a popular workflow I first read about here[1].

This has been the most useful use case for LLMs for me. Actually getting them to implement the spec correctly is the hard part, and you'll have to take the reigns and course correct often.

[1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/


Here’s my workflow, it takes that a few steps further: https://taoofmac.com/space/blog/2025/05/13/2230

This seems like a good flow! I end up adding a "spec" and "todo" file for each feature[1]. This allows me to flesh out some of the architectural/technical decisions in advance and keep the LLM on the rails when the context gets very long.

[1] https://notes.jessmart.in/My+Writings/Pair+Programming+with+...


Yeah, I limit context by regularly trimming the TODOs. I like having 5-6 in one file because it sometimes informs the LLM as to how to complete the first in a way that makes sense for the follow-ups.

READMEs per module also help, but it really depends a lot on the model. Gemini will happily traipse all over your codebase at random, gpt-4.1 will do inline imports inside functions because it seems to lack any sort of situational awareness, Claude so far gets things mostly right.


> The output of artists has copyright.

Copyright is a very messy and divisive topic. How exactly can an artist claim ownership of a thought or an image? It is often difficult to ascertain whether a piece of art infringes on the copyright of another. There are grey areas like "fair use", which complicate this further. In many cases copyright is also abused by holders to censor art that they don't like for a myriad of unrelated reasons. And there's the argument that copyright stunts innovation. There are entire art movements and music genres that wouldn't exist if copyright was strictly enforced on art.

> Artists shape the space in which they’re generating output.

Art created by humans is not entirely original. Artists are inspired by each other, they follow trends and movements, and often tiptoe the line between copyright infringement and inspiration. Groundbreaking artists are rare, and if we consider that machines can create a practically infinite number of permutations based on their source data, it's not unthinkable that they could also create art that humans consider unique and novel, if nothing else because we're not able to trace the output to all of its source inputs. Then again, those human groundbreaking artists are also inspired by others in ways we often can't perceive. Art is never created in a vacuum. "Good artists copy; great artists steal", etc.

So I guess my point is: it doesn't make sense to apply copyright to art, but there's nothing stopping us from doing the same for machine-generated art, if we wanted to make our laws even more insane. And machine-generated art can also set trends and shape the space they're generated in.

The thing is that technology advances far more rapidly than laws do. AI is raising many questions that we'll have to answer eventually, but it will take a long time to get there. And on that path it's worth rethinking traditional laws like copyright, and considering whether we can implement a new framework that's fair towards creators without the drawbacks of the current system.


Ambiguities are not a good argument against laws that still have positive outcomes.

There are very few laws that are not giant ambiguities. Where is the line between murder, self-defense and accident? There are no lines in reality.

(A law about spectrum use, or registered real estate borders, etc. can be clear. But a large amount of law isn’t.)

Something must change regarding copyright and AI model training.

But it doesn’t have to be the law, it could be technological. Perhaps some of both, but I wouldn’t rule out a technical way to avoid the implicit or explicit incorporation of copyrighted material into models yet.


> There are very few laws that are not giant ambiguities. Where is the line between murder, self-defense and accident? There are no lines in reality.

These things are very well and precisely defined in just about every jurisdiction. The "ambiguities" arise from ascertaining facts of the matter, and whatever some facts fits within a specific set of set rules.

> Something must change regarding copyright and AI model training.

Yes, but this problem is not specific to AI, it is the question of what constitutes a derivative, and that is a rather subjective matter in the light of the good ol' axiom of "nothing is new under the sun".


> These things are very well and precisely defined in just about every jurisdiction.

Yes, we have lots of wording attempting to be precise. And legal uses of terms are certainly more precise by definition and precedent than normal language.

But ambiguities about facts are only half of it. Even when all the facts appear to be clear, human juries have to use their subjective human judgement to pair up what the law says, which may be clear in theory, but is often subjective at the borders, vs. the facts. And reasonable people often differ on how they match the two up in many borderline cases.

We resolve both types of ambiguities case-by-case by having a jury decide, which is not going to be consistent from jury to jury but it is the best system we have. Attorneys vetting prospective jurors are very much aware that the law comes down to humans interpreting human language and concepts, none of which are truly precise, unless we are talking about objective measures (like frequency band use).

---

> it is the question of what constitutes a derivative

Yes, the legal side can adapt.

And the technical side can adapt too.

The problem isn't that material was trained on, but that the resulting model facilitates reproducing individual works (or close variations), and repurposing individual's unique styles.

I.e. they violate fair use by using what they learn in a way that devalues other's creative efforts. Being exposed to copyrighted works available to the public is not the violation. (Even though it is the way training currently happens that produces models that violate fair use.)

We need models that one way or another, stay within fair use once trained. Either by not training on copyrighted material, or by training on copyrighted material in a way that doesn't create models that facilitate specific reproduction and repurposing of creative works and styles.

This has already been solved for simple data problems, where memorization of particular samples can be precluded by adding noise to a dataset. Important generalities are learned, but specific samples don't leave their mark.

Obviously something more sophisticated would need to be done to preclude memorization of rich creative works and styles, but a lot of people are motivated to solve this problem.


It seems like your concerns is about how easy it is going to be to create derivative and similar work, rather than a genuine concerns for copyright. Do I understand correctly?

No, I am just narrowing down the problem definition to the actual damage.

Which is a very fair use and copyright respecting approach.

Taking/obtaining value from works is ok, up until the point where damage to the value of original works happen. And that is not ok. Because copyright protects that value to incentivize the creation and sharing of works.

The problem is that models are shipping that inherently make it easy to reproduce copyrighted works, and apply specific styles lifted from single author's copyrighted bodies of work.

I am very strongly against this.

Note that prohibiting copying of a recognizable specific single author's style is even more strict than fair use limits on humans. Stricter makes sense to me, because unlike humans, models are mass producers.

So I am extremely respectful of protecting copyright value.

But it is not the same thing as not training on something. It is worth exploring training algorithms that can learn useful generalities about bodies of work, without retaining biases toward the specifics of any one work, or any single authored style. That would be in the spirit of fair use. You can learn from any art, if it's publicly displayed, or you have paid for a copy, but you can't create mass copiers of it.

Maybe that is impossible, but I doubt it. There are many ways to train that steer important properties of the resulting models.

Models that make it trivial to create new art deco works, consistent with the total body of art deco works, ok. Models that make it trivial to recreate Erte works, or with an accurately Erte style specifically. Not ok.


> The problem is that models are shipping that inherently make it easy to reproduce copyrighted works, and apply specific styles lifted from single author's copyrighted bodies of work. > I am very strongly against this. > Note that prohibiting copying of a recognizable specific single author's style is even more strict than fair use limits on humans. Stricter makes sense to me, because unlike humans, models are mass producers.

This sounds like gate-keeping rather than genuine copyright concerns.

> Models that make it trivial to create new art deco works, consistent with the total body of art deco works, ok. Models that make it trivial to recreate Erte works, or with an accurately Erte style specifically. Not ok.

Yeah, again, sounds like gate-keeping more than an economic and incentives argument which are, in my opinion, the only legitimate concerns underpinning copyright's moral ground.

Every step of progress has made doing things easier and easier to the point that now arguing with some strange across the world seems trivial, almost natural. Surely there are some arguments to curtail this dangerous machinery that undermines the control of information flow and corrupts the minds of the naive! we must shut it down!

Jokes aside, "making things easier/trivial" is the name of the game of progress. You can't stop progress. Everything will be easier and easier as the time goes on.


>Art created by humans is not entirely original.

The catch here is that a human can use single sample as input, but AI needs a torrent of training data. Also when AI generates permutations of samples, does their statistic match training data?


No human could use a single sample if it was literally the first piece of art they had ever seen.

Humans have that torrent of training data baked in from years of lived experience. That’s why people who go to art school or otherwise study art are generally (not always of course) better artists.


I don't think the claim that the value of art school simply being more exposure to art holds water.

Not without a torrent of pre-training data. The qualitative differences are rapidly becoming intangible ‘soul’ type things.

A skilled artist can imitate a single art style or draw a specific object from a single reference. But becoming a skilled artist takes years of training. As a society we like to pretend some humans are randomly gifted with the ability to draw, but in reality it's 5% talent and 95% spending countless hours practising the craft. And if you count the years worth of visual data the average human has experienced by the time they can recreate a van Gogh then humans take magnitudes more training data than state of the art ML models

In case of an ML model either a very good description or that single reference could be added to the context.

A machine learning algorithm that summarizes and hallucinates information is arguably worse than a machine learning algorithm that decides which social media posts you see. They're both controlled by corporations, but at least on social media you (still) have the option to read content written by humans.

Finally someone making sense. The fact this project works by applying patches instead of forking the original project and committing changes should alone be reason for concern.

But OP's entire GitHub presence is suspicious. On May 12th they fired off LLM slop PRs to a bunch of popular projects, and only the JAX ones were rejected. Nevertheless, this allowed them to pin these popular projects to their profile, as if they were a contributor.

I can't put into words how despicable this all is. Anyone working in the AI field is complicit in the corruption of information, the ramifications of which we can't even predict yet. Dead internet and the flood of AI slop is just the beginning.


> And if it is high entropy – where are you storing that in turn?

A password manager.

Neat project!


Agreed. The only advertising I stand is one I personally seek out. To me the pinnacle of this were product catalogs. I want to buy a computer, so I subscribe to "Computer Shopper Monthly", and get a magazine with nothing but computer ads. Those were always fun to browse, since I was interested in the product in the first place. E-commerce started as a digital implementation of product catalogs, but as companies got greedier, that just wasn't enough.

The key culprit is that user data is used not just for advertising products that the user might be interested in _today_. But to create a profile of their interests so that companies can predict what they might be interested in at any point in the future, which can then be used to design more effective advertising campaigns tailored to the type of products they're most susceptible to be manipulated into buying.

Furthermore, this profile is also generally useful to anyone who wishes to psychologically manipulate a group of people into thinking or acting a certain way. Since advertising is a branch of propaganda, governments and political agencies are particularly interested in this use case. It's pretty obvious that the current global sociopolitical instability is largely a product of this type of manipulation.

So considering that both governments and companies have an interest in user data, this genie is never going back in the bottle. The best we can hope for is for the exploitation to be contained via regulation by governments that haven't been fully corrupted yet.


> IMO data should be radioactive for companies, especially if it approaches PII.

That's an idealistic, but highly unrealistic, thought.

As long as a market exists that can profit from exploiting PII, and is so large that it can support other industries, data will never be radioactive. The only way to make it so is with regulation, either to force companies to adopt fair business models, or by _heavily_ regulating the source of the problem—the advertising industry. Since the advertising industry has its tentacles deeply embedded everywhere, regulating it is much more difficult than regulating companies that depend on it.

So this is a good step by the EU, and even though it's still too conservative IMO, I'm glad that there are governments that still want to protect their citizens from the insane overreach by Big Tech.


> As long as a market exists that can profit from exploiting PII, and is so large that it can support other industries, data will never be radioactive.

The EU bureaucracy machine can be slow moving, but has the potential to fix this. The stricter the rules, the simpler the implementation. You could cut a LOT of the administrative burden by specifying what data is allowed to be stored at all, instead of what isn't.

Big tech needs to be put in their place, and as others have commented; if this kills your business model, your business model doesn't deserve to exist.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: