Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I missed the part where OpenAI got library cards for all the libraries in the world.

Is having a library card a requirement for being hired over there?




I missed the part where we throw away rational logic skills

Have you never been to a public library and read a book while sitting there without checking it out? Clearly, age is a factor here, and us olds are confused by this lack of understanding of how libraries function. I did my entire term paper without ever checking out books from the library. I just showed up with my stack of blank index cards, then left with the necessary info written on them. Did an entire project on tracking stocks by visiting the library and viewing all of the papers for the days in one sitting rather than being schmuck and tracking it daily. Took me about an hour in one day. No library card required.

Also, a library card is ridiculously cheap even if you did decide to have one.


> Have you never been to a public library and read a book while sitting there without checking it out?

See my comment here: https://news.ycombinator.com/item?id=43355723. If OpenAI built a robot that physically went into libraries, pulled books off shelves by itself, and read them...that's so cool I wouldn't even be mad.


What about checking out eBooks? If you had an app that checked those out and scanned it at robot speed vs human feed, that would be the same thing. The idea that reading something that does not belong to you directly means stealing is just weird and very strained.

theGoogs essentially did that by having the robot that turned each page and scanned the pages. that's no different than having the librarian pull material for you so that you don't have to pull the book from the shelf yourself.

There's better arguments to make on why ClosedAI is bad. Reading text it doesn't own isn't one of them. How they acquired the text would be a better thing to critique. There's laws for that in place now that does not require new laws to be enacted.


> If you had an app that checked those out and scanned it

You mean...made a copy? Do you really not see the problem?

> How they acquired the text would be a better thing to critique

Well...yeah that's what I said in the comment that started this discussion branch: https://news.ycombinator.com/item?id=43355147

This isn't about humans or robots reading books. It's that robots are allowed to violate copyright law to read the books, and us humans are not.


> You mean...made a copy? Do you really not see the problem?

In precisely the same way as a robot scanning a physical book is.

If this is turned into a PDF and distributed, it's exactly the legal problem Google had[0] and that Facebook is currently fighting due to torrenting some of their training material[1].

[0] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,...

[1] https://news.ycombinator.com/item?id=43125840

If the tokens go directly into training an AI and no copies are retained, that's like how you as a human learn — except current AI models are not even remotely as able to absorb that information as you, and they only make up for being as thick as a plank by being stupid very very quickly.

> It's that robots are allowed to violate copyright law to read the books, and us humans are not.

More that the copyright laws are not suited to what's going on. Under the copyright laws, statute and case law, that existed at the time GPT-3.5 was created, bots were understood as the kind of thing Google had and used to make web indexes — essentially legal, with some caveats about quoting too much verbatim from news articles.

(Google PageRank being a big pile of linear algebra and all, and the Transformer architecture from which ChatGPT get's the "T" being originally a Google effort to improve Google Translate).

Society is currently arguing amongst itself if this is still OK when the bot is a conversational entity, or perhaps even something that can be given agency.

You get to set those rules via your government representative, make it illegal for AI crawlers to read the internet like that — but it's hard to change the laws if you mistake what you want the law to be, with what the law currently is.


but you keep saying to read the books. there is no copyright violation to read a book. making copies starts to get into murky grounds, but does not immediately mean breaking the law.


You might be thinking of someone else.


If I spent every last second of my life in a public library, I couldn't even view a fraction of the information that OpenAI has ingested. The comparison is irrelevant. To make the comparison somehow valid, I'd have to back up my truck to a public library, steal the entire contents, then start selling copies out of my garage


Look, even I'm not a fan of ClosedAI, but this is ridiculous. ClosedAI isn't giving copies of anything. It is giving you a response it infers based on things it has "read" and/or "learned" by reading content. Does ClosedAI store a copy of the content it scrapes, or does it immediately start tokenizing it or whatever is involved in training? If they store it, that's a lot of data, and we should be able to prove that sites were scraped through lawsuit discovery process. Are you then also suggesting that ClosedAI will sell you copies of that raw data if you prompted correctly?

I'm in no way justifying anything about GPT/LLM training. I'm just calling out that these comparisons are extremely strained.


Let's say OpenAI developers use illegal copy of Windows on their laptops to save on buying a license. Is that ok to run a business this way?

Also I think it is different thing when someone uses copyrighted works for research and publishing a paper or when someone uses copyrighted works to earn money.


I don't need a card to read in the library, nor to use the photocopiers there, but it's merely one example anyway. (If it wasn't, you'd only need one library, any of the deposit libraries will do: https://en.wikipedia.org/wiki/Legal_deposit).

You also don't need permission, as a human, to read (and learn from) the internet in general. Machines by standard practice require such permission, hence robots.txt, and OpenAI's GPTBot complies with the robots.txt file and the company gives advice to web operators about how to disallow their bot.

How AI should be treated, more like a search index, or more like a mind that can learn by reading? Not my call. It's a new thing, and laws can be driven by economics or by moral outrage, and in this case those two driving forces are at odds.


We started with libraries and books, now you're moving the goalposts to websites.

Sidenote: I wouldn't even be mad if OpenAI built robots to go into all of the libraries and read all of the books. That would be amazing!


I started with libraries. OpenAI started with the internet.

The argument for both is identical, your objection is specific to libraries.

IIRC, Google already did your sidenote. Or started to, may have had legal issues.


> The argument for both is identical

How so? I don't have to pay to read most websites. To read most books I have to pay (or a library has to pay and I have to wait to get the book).

> IIRC, Google already did your sidenote

Not quite. They had to chop the spines off books and have humans feed them into scanners. I'm talking about a robot that can walk (or roll) into a library, use arms to take books off the shelves, turn the pages and read them without putting them into a scanner.


They had humans turn the pages of intact books in scanning machines. The books mostly came from the shelves of academic libraries and were returned to the shelves after scanning. You can see some incidental captures of hands/fingers in the scans on Google Books or HathiTrust (the academic home of the Google Books scans). There are some examples collected here:

https://theartofgooglebooks.tumblr.com/


> How so? I don't have to pay to read most websites. To read most books I have to pay (or a library has to pay and I have to wait to get the book).

"or" does a lot of work, even ignoring that I'd already linked you to a page about deposit libraries: https://en.wikipedia.org/wiki/Legal_deposit

Fact is, you can read books for free, just as you can read (many but not all) websites for free. And in both cases you're allowed to use what you learned without paying ongoing licensing fees for having learned anything from either, and even to make money from what you learn.

> Not quite. They had to chop the spines off books and have humans feed them into scanners.

Your statement is over 20 years out of date: https://patents.google.com/patent/US7508978B1/en




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: