Hacker Newsnew | past | comments | ask | show | jobs | submit | maplethorpe's commentslogin

> Toby Pohlen, a former DeepMind researcher, was put in charge of the “Macrohard” project to build digital agents that Musk said could replicate entire software companies. Musk said it was the “most important” drive at the company. The name is a “funny” reference to Microsoft, the billionaire added. Pohlen left 16 days later.

When I was 9 years old, my uncle asked me what I was going to do for work when I got older. I told him I was going to start a company called "MacroHard", and become the richest man alive. He told me that's not how the world works. Turns out it is.


I suppose I see the split a little bit differently. To me it's more that one camp of developers can still get a hit of satisfaction as if they built something themselves even if it was entirely generated by AI.

Would they get the same satisfaction from cloning a public repo? Probably not. It's too clear to their brain that they didn't have anything to do with it. What about building the project with cmake? That requires more effort, yes, but the underlying process is still very obviously something that someone else architected, and so the feeling of satisfaction remains elusive.

AI, however, adds a layer of obfuscation. For some, that's enough to mask the underlying process and make it feel as if they're still wielding a tool. For others, not so much.


I don't follow your analogy at all. Suppose I want to build an application with xyz features. My research yields that there are no such applications that include xyz features. However, there are plenty of applications that might have x feature, y feature or z feature, or a combination of two, but not all three.

If there are no such applications, I don't have a choice but to write it myself. This could take some time, especially if an MVP is all I'm interested in. LLMs are a novel tool in building an MVP. If time is a constraint, I can use an LLM, which should excel since xyz features are in its training set.

I suppose your analogy follows for developers who write applications that support abc features even though there are already applications out there that support abc features. Yes, I don't think that is very interesting. Your umpteenth clone of Snake is not interesting.


Further, I don't argue that 100% prompting an application together isn't building something themselves. Built on the shoulders of leviathans, as libraries were built on the shoulders of giants of yore.

But an application that combines xyz features is novel in this scenario. There is inherent value in that.


I'm not arguing whether AI has value as a code-generating application. I'm more interested in whether you, as a developer, still get satisfaction from building with AI, the same way you would get satisfaction if you built it yourself.

How can HN be so pro-AI for the rest of the world, but anti-AI on HN?

Do we not think that other people want to see words, pictures, software, and videos created by humans too?


HN is not a single entity, but many people with varying views.

"A flock of sheep is not a single entity, but a group made up of distinct individuals", the sheep yells to onlookers, as it runs, with the rest off the flock in tow, off the edge of the cliff, and into the sea below.

"You can give someone the answer to their question, but you cannot make them understand it"

A group of people with varying views can still exhibit bias towards one particular direction. The fact that the individuals within the group have distinct personalities does not eliminate this effect.

One of Dang's comments mentions that he removed some of the other rules because they are already embedded within the HN culture. Other prevailing views exist within the HN culture too. Maybe you just haven't noticed yet.


Astroturfing with AI generated comments about AI, it feeds itself. By definition, the intent os to make real people think there's consensus formed around an issue by other humans.

It's interesting you interpret the consumer's response as a desire for the expansion of IP laws. As an artist whose work exists in many of these training sets, I'm of a different opinion: IP laws can stay the same, but they should have purchased a license to use my art before including it in their training data.

Since the didn't, they should go to jail. The same way I would have gone to jail if I built Sora in my basement and sold it to the public.


As an artist your license didn't ban learning from your work. Unless your content was acquired without a license at all - you absolutely gave them permission to use it in training sets.

That is the gap in the legal landscape.


No I didn't. It's use in a software product without my permission. That's never been allowed.

Just because you obfuscate what's happening by calling it "learning" and pretending your model is actually just looking at pictures the same as a human, doesn't make it true.


Unfortunately you did grant that permission. Once you granted the permission for someone to hold a copy, they have the permission to process it.

I can assure you, that you didn't grant a license with an exclusive list of operations that can be performed on your work of art. At best you may have had something like "no commercial use" clause and general broad terms.


I thought it was at most a monetary fine, do people go to jail for copyright infringement? But you seem to want to own all the air around your work, the ground beneath it too. Nothing can exist around it, so a creative person would do better to avert their eyes rather than loading useless ideas. Why should I install in my brain your "furniture" when I am not allowed to sit on it? In these cases I think authors provide a net negative to society by creating more works that further forbid others from creating in the same space.

Here, for example, any comment is open to read and respond to. On ArXiv any paper can be downloaded, read and cited. Wikipedia contains text from many thousands of editors, building on each other. We like collaboration more than asserting our exclusivity rights. That is why these places provide better quality than work for direct profit or, God forbid, ad revenue, that is where the slop starts flowing.


>IP laws can stay the same, but they should have purchased a license to use my art before including it in their training data.

But including your art in the training data is fair use (or otherwise exempt) by most standards, as no reproduction occurs. You are advocating for a change to IP law to make it more restrictive.


> But including your art in the training data is fair use

The four factors of fair use in the US:

> the purpose and character of your use

Commercial, for-profit. Not scholarship, not research, not commentary, not parody, etc.

> the nature of the copyrighted work

Absolutely everything. Artistic, creative, not purely factual.

> the amount and substantiality of the portion taken, and

All of it, from everyone.

> the effect of the use upon the potential market.

Directly competing with those whose data was copied.


3 and 4 are what that argument is based on, I believe. 3) on the basis that the output is not _reproduced_, and 4) on similar grounds that output that's just not at all the same as the input data isn't affecting the market for the original image (I think this is the more debatable one, but in general the existing cases have struggled at the early stages because the plaintiffs have not been able to actually point to output that is a copy of their part of the input, and this does actually matter).

>Directly competing with those whose data was copied.

An LLM doesnt compete with Art the same way that Photoshop doesnt compete with Art.

>All of it, from everyone.

With the result that anything produced by the LLM does not reproduce any single source in its entirety (and where compelled if they are able to do that is a bug not a feature)

Fair use is too specific tbh, rather than ruling it fair use (which seems to be where things are going) it should just be ruled "use". There's nothing wrong with building a mathematical model using available data.


> An LLM doesnt compete with Art the same way that Photoshop doesnt compete with Art.

Yes, it does. Many people are using AI-generated works in places where they originally would have either paid an artist, programmer, or other creative professional, or done without. Many companies are claiming to reduce staff because of AI (whether that's true or an excuse). There is plenty of evidence that AI is directly competing with various individuals, businesses, and industries.

> With the result that anything produced by the LLM does not reproduce any single source in its entirety

You do not have to reproduce sources in their entirety to produce derivative works.


>Yes, it does.

Tools compete with Tools. Operators of tools compete with other tool operators. The tool doesn't compete in the same market as the operator. Lowering the barrier of entry for being a tool operator is cool and good actually.

>You do not have to reproduce sources in their entirety to produce derivative works.

True, but if there's no great % of the original in the derivative it doesn't matter. Like you need to actually make the positive case clearly demonstrating the wounded party or its just noise. This actually happened one time, where a legal firm loaded another parties data into an LLM and had it regenerate the data. Judge found that the result infringed despite the LLM use, which makes sense. But pointing at some weird AI generated boomer comic you cant identify any wounded party. Its slop made from enough unique sources that there's no victim, much like most derivative art forms. Making something that's 0.1% like 1000 different sources * random noise is unable to cause injury. Its not recognizably derivative in any sense except for style which isn't protected.


> the amount and substantiality of the portion taken, and

> All of it, from everyone.

Yea I'd like to see how drawing two circles violates the copyright of drawing one circle!


Fair use by most standards? Which standards are those? I don't think a standard about training an AI on billions of images exists.

By the same 'transformative' standards that allow satire, reaction and commentary videos to exist. And those take 100% from the source and add context, whereas good generated AI images that aren't wholesale copying take like less than 10% from the original source.

In addition, the idea that you need to pay rent on *your observation* of someone else's work is absurd. No one pays Newton's descendants for making lifts or hosting bungee jump sport activities.


> good generated AI images that aren't wholesale copying take like less than 10% from the original source.

So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?

> In addition, the idea that you need to pay rent on your observation of someone else's work is absurd.

I agree that's absurd. But training a model is no more "observing images" than an F1 car is "walking" down a race track. Just because a race car uses kinetic energy, gravity, and friction to propel itself, the same way a human does, doesn't mean it's doing the same thing as a human. That comparison you're making is the real absurdity.


> So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?

The model works by training on what features humans can make sense out of the image they're presented with, if the image and the observations of the image's feature were clear/observable enough. Then the generation makes use of those observations. I'm just using 10% as an arbitrary number to describe proportions. If the generation were 100% of the observations from the same image, the model would be overfitting, and many would have deemed it to have produced a copy.

> Just because a race car uses kinetic energy, gravity, and friction to propel itself, the same way a human does, doesn't mean it's doing the same thing as a human.

WTF does this even mean? A race car uses concepts from Newton, just as how a human uses gravity to train it's muscles to move be it knowingly or unknowingly. But you don't see them (car makers/humans) paying rent to Newton after he discovered gravity. Come on!


Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?

Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.


If I buy a book entitled "How to make a table" and then make a table, the author does not own the table I made.

If I buy a book and use it to prop up a table, the author likewise does not own the table, or any works I undertake on that table.

If I buy a book and rip out the pages to make a collage, the US is the only legal jurisdiction where I run even slight risk of civil penalties.

An LLM is downstream of a book. Using a book to make an LLM does not confer any rights or privilges towards the LLM on the original author, just as using a hammer or nails dont permit the hammer or nail manufacturers any royalties on what I make, even if I build a hammer making machine with them. Theres no right to the works of people who build on your work without reproducing your work, at least outside of strict copyleft.

Its like demanding a cut from people who learned how to use photoshop by watching your photoshop tutorial youtube videos.

This is why the most successful cases against LLMs have been on the "Did they purchase the book" side of the fence, and not on the "What did they do with it" outside of the one case, where the legal company tried to use the LLM to 1:1 reproduce the content they had a limited license to, but thats obviously a no go and they should have known better.


These are my opinions ofc.

> Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?

If you meant it literally.. I'd think that such a version would be a sort of parody. It'd be up to lawyers doing their cross-examinations to prove the work was intended for such a purpose though..

> Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.

Probably a lawyer would answer this better than me, but the 'content' is the same and would violate copyright. There's also other factors, like if it was translated/distributed for free.

Besides that I regard that LLMs to hold mathematical observations in contrast to a translated work. So long as the user ensures the output isn't close to what's already available imo it fits the transformative criteria.


You cannot claim that a formulaic thesaurusing of a text is parody, not unless the process is related to the message of the original text itself. Even then, that's a dubious claim. Especially if it was done automatically.

I can just as well say that a translated work contains "linguistic observations". In fact a translator has to do a lot of transformative work in order to translate a text.

An LLM just takes a set of texts, looks at n-gram distributions, and generates similar text. It is quite literally a fuzzy way of copying. There aren't any mathematical observations in the output. Any math (statistics) is done in the copying process.


> You cannot claim that a formulaic thesaurusing of a text is parody, not unless the process is related to the message of the original text itself. Even then, that's a dubious claim. Especially if it was done automatically.

Oh even if it's not a parody it would look transformed enough that a first-time reader would be getting a completely different interpretation of the story* compared to the original source. And that's all that matters.

> There aren't any mathematical observations in the output. Any math (statistics) is done in the copying process.

Wrong. Weights, which these models comprise of, are literally numbers to an extensive mathematical equation.

> It is quite literally a fuzzy way of copying.

And no one knows/there is no consensus on what a 'fuzzy way of copying' is. It is either copying or it is not. You could say that training an LLM is abstracting and integrating various text into it's weights, hereby transforming the source material and again transforming it a second time via integrating it into its weights.


>It is quite literally a fuzzy way of copying.

Even if it involved copying that isnt immediately an issue. Its the distribution of a copy thats an issue. And if you look at the data side by side, you can see that while copying might be part of the process of creating an LLM, the LLM is not a copy of its source material.


Google scrapes the entire internet to generate a searchable index of the internet. But the resulting search engine is only infringing where it reproduces entire copies of scraped news articles and images. Both places where they have been put back in their place through legal means.

Like LLM's, it retains the produced index but not the original data.

The big concern is whether producing an LLM is competing with artists directly, but as artists dont make LLMs, this seems to be consistently ruled as non competing.


I don't quite follow. People don't go on Google and search for midieval history and pretend they wrote the Wikipedia article on it because they found it on Google.

People _do_ use LLMs to make art in someone else's style (knowingly or unknowingly) and claim it as their own creation.

Also, I wouldn't say the creators of LLMs are competing with artists. The users of LLMs are. Arists don't make LLMs, they make art, and people who use midjourney and such make art.

But I'd argue that creators of LLMs are still liable for the harm people cause using their tools. Perhaps not legally, but certainly ethically.


No precedent has been set when it comes to training and fair use

Which case decided that?

> But including your art in the training data is fair use

It shouldn't be!


Why? I dont undertand this take at all.

I mean, they've made the argument that their computer learns like a human, so should be able to get away with ingesting all the data it sees, the same way a human does.

Why shouldn't it also go to jail, the same way a human does?


What? How? By putting the computer or robot that made mistake in a prison cell?

Yes. Claude exists on physical media somewhere. Put that media in a cell with no access to the internet. No one must access Claude outside of visitation hours.

Just because it's difficult doesn't mean it can't be done. If you're claiming your machine should be treated like a human, then let's treat it like a human.


I can’t tell if you’re being serious or not…

It's a funny way of imposing a very large fine. Make the service only available during predefined "visitation hours", prevent updated learning except from resources available in the prison, restrict speech and actions according to prison rules etc.

I'd personally prefer to see the version with worse grammar, because I know it was written by a real person.

Do you find people respond better to your LLM-corrected posts, or is it mainly for your own comfort?


> I'd personally prefer to see the version with worse grammar, because I know it was written by a real person.

I've been thinking about this too; perhaps the authenticity (or "voice", even a poor one) should -- or will -- matter more than grammar etc.


I do this as a teacher lol. I say just spell it how it sounds so i dont have to grade ai papers.

hahaha, i guess it just the nature that wanted to get things done correctly and don't want to look bad? but I kind get it now and you prob can tell from how I am writing this comment. lol

If those coworkers are still writing in their own voice at work, you should be thankful.

I'm kind of going through the opposite. I was alone until 38, and then suddenly I wasn't alone anymore. I'm now realising that I did indeed develop an effective strategy to combat loneliness over the years.

What you need is a personal project. A lot of people make the mistake of thinking this needs to be something with significant utility, but that's a lot of pressure to put on yourself. My recommendation is to make your project as specific to your life as you want, and as dumb as you want. It could be a radio station localised only to your house, or a chat app for you and three other people, or a mechanical toy for a friend. Then find some friends or a community you can share that thing with when you're done.

Think about what it would be fun for you, specifically, to do, or to make, or to achieve, go do that thing, and then share it. This technique got me through 38 years of loneliness (mostly) unscathed, so I'm confident it can work for others too.


In my experience, AGI always seemed to be the stand-in phrase for "human like" intelligence, after AI was co-opted to mean simpler things like markov-chain chat bots and state machines that control agent behaviour in video games.

If the definition has shifted once again to mean "a computer program that does a task pretty well for us", then what's the new term we're using to define human-level artificial intelligence?


Probably the best drummer there ever was though.

I doubt anyone in Metallica (even Lars) believes that.

The venn diagram is closer to a circle than you think

at the top there's Neil Pert, then a huge gap, then anyone else.

And then, if we grant that Peart is the best of the best, the 100 names that would follow in any sane list, Lars is no where to be seen. I doubt he'd crack the top 250.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: