> This is just a hit piece which tries to frame scraping whole internet as-is as...

bayindirh · on Dec 5, 2023

But, Google didn't train a LLM which ripped all license, context and author information, and provided a mishmash of information which can't be guaranteed to be true.

They just mapped the connection between the sites, and provided summaries of it, derived from the sites themselves verbatim.

They didn't create new articles from what they seen, generated new code from code they have harvested and disregarded its license, etc.

geraldhh · on Dec 5, 2023

mostly untrue, but irrelevant

bayindirh · on Dec 5, 2023

Pretty relevant, actually. Care to elaborate on untrue parts and/or your take?

geraldhh · on Dec 5, 2023

if you must ...

> But, Google didn't train a LLM

they sure did

> They just mapped the connection between the sites, and provided summaries of it, derived from the sites themselves verbatim.

summary is an out-of-context, altered version of the source material. alteration of intent is pretty much a given. see quick answers

> They didn't create new articles from what they seen, generated new code from code they have harvested and disregarded its license, etc.

see above. for the code side, the oracle lawsuit comes to mind. gpl-violations notwithstanding

if something is on the internet, it's in the public domain. whether you like it or not, it will be copied, altered, remixed, shared. that's why the internet is so great.

anyways, the original point was the indiscriminate scraping, which again, is common practice.

bayindirh · on Dec 5, 2023

> they sure did

Not 20 years ago. Bard is equal with GPT series in my perspective. Equally unethical. I use neither.

> see quick answers

They are copied verbatim from the source material, and only from a single source.

> gpl-violations notwithstanding

Google is not a saint. GPL violations are egregious, too, But except Bard, Google (the search engine) doesn't provide you source code stripped from its license.

> if something is on the internet, it's in the public domain. whether you like it or not, it will be copied, altered, remixed, shared. that's why the internet is so great

Tell this to publishers, RIAA, Hollywood and oh, Disney. I'm sure they will agree with you wholeheartedly. Also authors of Source-Available and xGPL licensed software will gleefully join you.

> anyways, the original point was the indiscriminate scraping, which again, is common practice.

Something being common practice doesn't make it legal. Jaywalking, downloading ripped music and movies from torrent trackers and cracking licensed software products come into my mind.

geraldhh · on Dec 5, 2023

i get the feeling that we agree in principle but not in effect, i can live with that.