More

mik1998 · 2025-02-17T22:44:38 1739832278

I think it is worthwhile to explore cryptography that is not based on the discrete logarithm problem. Currently, we're keeping all eggs in one conjectured basket. Even if quantum computing will never be viable, there is a non-zero chance that the discrete logarithm problem will be solved in some other way.

als0 · 2025-02-18T00:34:42 1739838882

What are you talking about? PQC is based on learning-with-errors and lattice problems

https://en.wikipedia.org/wiki/Learning_with_errors

layer8 · 2025-02-18T00:12:15 1739837535

There is a non-zero chance of pretty much anything. I don't think Gutmann is opposed to exploration.

bawolff · 2025-02-18T01:03:43 1739840623

I think the parent's point is that if there is a non-zero chance of anything then it is good to have a backup plan

layer8 · 2025-02-18T11:04:52 1739876692

There is a non-zero chance of a black hole forming in the LHC and swallowing the Earth, yet we don’t have a backup plan for that. There are an infinite more examples like that. A non-zero chance is not an argument for anything, unless you have a good basis on actually quantifying it.

mik1998 · 2025-02-07T12:49:01 1738932541

Libgen is a civilizational project that should be endorsed, not prosecuted. I hope one day people will look at it and think how stupid we were today to shun the largest collection of literary works in human history.

greeniskool · 2025-02-07T15:44:02 1738943042

Anna's Archive encourages (and monetizes!!) the use of their shadow library for LLM training. They have a page dedicated to it on their site. You pay them, and they give you high download speeds to entire datasets.

adamsb6 · 2025-02-07T16:49:41 1738946981

I wonder how much more libgen traffic can be attributed to the lawsuit.

When Metallica sued Napster, for many people the reaction was, "wait I can download music for free?"

luqtas · 2025-02-07T14:03:01 1738936981

Libgen turns into a problem when you have a company developing generative AI with it, either giving money to GPU manufacturers or themselves with paid services (see OpenAI)

qup · 2025-02-07T15:17:51 1738941471

What are we actually worried about happening?

Are AI-written books getting published?

If they start out-competing humans, is that bad? According to most naysayers, they can't do anything original.

Are people asking the AI for books? And then hoping it will spit it out a human-written book word for word?

jjmarr · 2025-02-07T15:36:05 1738942565

> Are AI-written books getting published?

Yes, online bookstores are full of them:

https://www.nytimes.com/2023/08/05/travel/amazon-guidebooks-...

The issue is there's an asymmetry between buyer/seller for books, because a buyer doesn't know the contents until you buy the book. Reviews can help, but not if the reviews are fake/AI generated. In this case, these books are profitable if only a few people buy them as the marginal cost of creating such a book is close to zero.

qup · 2025-02-07T15:51:12 1738943472

This really has fuck-all to do with copyright though, correct?

If you can't tell how the content is before you read it, it could be written by a monkey.

brendoelfrendo · 2025-02-07T16:04:42 1738944282

This is starting to get pretty circular. The AI was trained on copyrighted data, so we can make a hypothesis that it would not exist - or would exist in a diminished state - without the copyright infringement. Now, the AI is being used to flood AI bookstores with cheaply produced books, many of which are bad, but are still competing against human authors.

JohnHaugeland · 2025-02-07T19:49:48 1738957788

the problem with how circular the argument is is that the essence of there being an actual problem is being taken for granted

it's not clear that detriments actually exist, and the benefits are clear

roguecoder · 2025-02-07T22:35:05 1738967705

The benefits are not clear: why should an "author" who doesn't want to bother writing a book of their own get to steal the words of people who aren't lazy slackers?

satvikpendem · 2025-02-07T22:39:02 1738967942

It's as much stealing as piracy is stealing, ie none at all. If you disagree, you and I (along with probably many others in this thread) have a fundamental axiomatic incompatibility that no amount of discussion can resolve.

roguecoder · 2025-02-10T15:37:27 1739201847

It is not theft in the property sense, but it is theft of labor.

If a company interviewed me, had me solve a problem, didn't hire me or pay me in any way and then used the code I wrote in their production software, that would be theft.

That is the equivalent of what authors claiming they wrote AI books are doing. That they've fooled themselves into thinking the computer "wrote" the book, erasing all the humans whose labor they've appropriated, in my opinion makes it worse, not better. They are lying to the reader and themselves, and both are bad.

consteval · 2025-02-08T00:25:45 1738974345

Stealing is not the right word perhaps, but it is bad, and this should be obvious. Because if you take the limit of these arguments as they approach infinity, it all falls apart.

For piracy, take switch games. Okay, pirating Mario isn't stealing. Suppose everyone pirates Mario. Then there's no reason to buy Mario. Then Nintendo files bankruptcy. Then some people go hungry, maybe a few die. Then you don't have a switch anymore. Then there's no more Mario games left to pirate.

If something is OK if only very, very few people do it, then it's probably not good at all. Everyone recycling? Good! Everyone reducing their beef consumption? Good! ... everyone pirating...? Society collapses and we all die, and I'm only being a tad hyperbolic.

In a vacuum making an AI book is whatever. In the context of humanity and pushing this to it's limits, we can't even begin to comprehend the consequences. I'm talking crimes against humanity beyond your wildest dreams. If you don't know what I'm talking about, you haven't thought long enough and creatively enough.

satvikpendem · 2025-02-08T08:30:00 1739003400

> Because if you take the limit of these arguments as they approach infinity, it all falls apart.

Not everyone is a Kantian, who has the moral philsophy you are talking about, the categorical imperative. See this [0] for a list of criticisms to said philosophy.

> In a vacuum making an AI book is whatever. In the context of humanity and pushing this to it's limits, we can't even begin to comprehend the consequences. I'm talking crimes against humanity beyond your wildest dreams. If you don't know what I'm talking about, you haven't thought long enough and creatively enough.

Not really a valid argument, again it's circular in reasoning with a lot of empty claims with no actual reasoning, why exactly is it bad? Just saying "you haven't thought long enough and creatively enough" does not cut it in any serious discussion, the burden of substantiating your own claim is on you, not the reader, because (to take your own Kantian argument) anyone you've debating could simply terminate the conversation by accusing you of not thinking about the problem deep enough, meaning that no one actually learns anything at all when everyone is shifting the burden of proof to everyone else.

[0] https://en.wikipedia.org/wiki/Categorical_imperative#Critici...

consteval · 2025-02-14T23:42:34 1739576554

> Not really a valid argument

It is, because the quote you quoted is in reference to what I said above.

I explained real consequences of pirating. Companies have gone under, individuals have been driven to suicide. This HAS happened.

It's logically consistent that if we do that, but increase the scale, then the harm will be proportionally increased.

You might disagree. Personally, I don't understand how. Really, I don't. My fundamental understanding of humanity is that each innovation will be pushed to it's limits. To make the most money, to do it as fast as possible, and in turn to harm the most people, if it is harmful. It is not in the nature of humanity to do something half-way when there's no friction to doing more.

This reality of humanity permeates our culture and societies. That's why the US government has checks and balances. Could the US government remain a democracy without them? Of course. We may have an infinite stream of benevolent leaders.

From my perspective, that is naive. And, certainly, the founding fathers agreed with me. That is one example - but look around you, and you will see this mentality permeates everything we do as a species.

JohnHaugeland · 2025-02-08T11:56:42 1739015802

> Stealing is not the right word perhaps, but it is bad, and this should be obvious.

Many people say things that they don't like "should be obvious"ly bad. If you can't say why, that's almost always because it actually isn't.

Have a look at almost any human rights push for examples.

.

> For piracy, take switch games.

It's a bad metaphor.

With piracy, someone is taking a thing that was on the market for money, and using it without paying for it. They are selling something that belongs to other people. The creator loses potential income.

Here, nobody is actually doing that. The correct metaphor is a library. A creator is going and using content to learn to do other creation, then creating and selling novel things. The original creators aren't out money at all.

Every time this has gone to court, the courts have calmly explained that for this to be theft, first something has to get stolen.

.

> If something is OK if only very, very few people do it

This is okay no matter how many people do it.

The reason that people feel the need to set up these complex explanatory metaphors based on "well under these circumstances" is that they can't give a straight answer what's bad here. Just talk about who actually gets harmed, in clear unambiguous detail.

Watch how easy it is with real crimes.

Murder is bad because someone dies without wanting to.

Burglary is bad because objects someone owns are taken, because someone loses home safety, and because there's a risk of violence

Fraud is bad because someone gets cheated after being lied to.

Then you try that here. AI is bad because some rich people I don't like got a bunch of content together and trained a piece of software to make new content and even though nobody is having anything taken away from them it's theft, and even though nobody's IP is being abused it's copyright infringement, and even though nobody's losing any money or opportunities this is bad somehow and that should be obvious, and ignore the 60 million people who can now be artists because I saw this guy on twitter who yelled a lot

Like. Be serious

This has been through international courts almost 200 times at this point. This has been through American courts more than 70 times, but we're also bound by all the rest thanks to the Berne conventions.

Every. Single. Court. Case. Has. Said. This. Is. Fine. In. Every. Single. Country.

Zero exceptions. On the entire planet for five years and counting, every single court has said "well no, this is explicitly fine."

Matthew Butterick, the lawyer that got a bunch of Hollywood people led by Sarah Silverman to try to sue over this? The judge didn't just throw out his lawsuit. He threatened to end Butterick's career for lying to the celebrities.

That's the position you're taking right now.

We've had these laws in place since the 1700s, thanks to collage. They've been hard ratified in the United States for 150 years thanks to libraries.

.

> Everyone recycling? Good! Everyone reducing their beef consumption? Good! ... everyone pirating...?

This is just silly. "Recycling is good and eating other things is good, but let's try piracy, and by the way, I'm just sort of asserting this, there's nothing to support any of this."

For the record, the courts have been clear: there is no piracy occurring here. Piracy would be if Meta gave you the book collection.

.

> In the context of humanity and pushing this to it's limits, we can't even begin to comprehend the consequences.

That's nice. This same non-statement is used to push back against medicine, gender theory, nuclear power, yadda yadda.

The human race is not going to stop doing things because you choose to declare it incomprehensible.

.

> I'm talking crimes against humanity beyond your wildest dreams.

Yeah, we're actually discussing Midjourney, here.

You can't put a description to any of these crimes against humanity. This is just melodrama.

.

> If you don't know what I'm talking about,

I don't, and neither do you.

"I'm talking really big stuff! If you don't know what it is, you didn't think hard enough."

Yeah, sure. Can you give even one credible example of Midjourney committing, and I quote, "crimes against humanity beyond your wildest dreams?"

Like. You're seriously trying to say that a picture making robot is about to get dragged in front of the Hague?

Sometimes I wonder if anti-AI people even realize how silly they sound to others

consteval · 2025-02-14T23:45:32 1739576732

> The creator loses potential income

Okay. AI books make books 1 million times faster, let's say. Arbitrary, pick any number.

If I, a consumer, want a book, I am therefore 1 million times more likely to pick an AI book. Finding a "real" book takes insurmountable effort. This is the "needle in a haystack" I mentioned earlier.

The result is obvious - creators look potential money. And yes, it is actually obvious. If it isn't, reread it a few times.

To be perfectly and abundantly clear because I think you're purposefully misunderstanding me - I know AI is not piracy. I know that. It's, like, the second sentence I wrote. I said those words explicitly.

I am arguing that while it is not piracy, the harm it creates it identical in form to piracy. In your words, "creators lose potential income". If that is the standard, you must agree with me.

> how silly they sound to others

I'm not silly, you're just naive and fundamentally misunderstand how our societies work.

Capitalism is founded on one very big assumption. It is the jenga block keeping everything together.

Everyone must work. You don't work, you die. Nobody works, everyone dies.

Up until now, this assumption has been sound. The "edge cases", like children and disabled people, we've been able to bandaid with money we pool from everyone - what you know as taxes.

But consider what happens if this fundamental assumption no longer holds true. Products need consumers as much as consumers need products - it's a circular relationship. To make things you need money, to make money you must sell things, to buy things you must have money, and to have money you must make things. If you outsource the making things, there's no money - period. For anyone. Everyone dies. Or, more likely, the country collapses into a socialist revolution. Depending on what country this is, the level of bloodiness varies.

This has happened in the past already, with much more primitive technologies. FDR, in his capitalist genius, very narrowly prevented the US from falling into the socialist revolution with some aforementioned bandaid solutions - what we call "The New Deal". The scale at which we're talking about now is much larger, and the consequences more drastic. I am not confident another "New Deal" can be constructed, let alone implemented. And, I'm not confident it would prevent the death spiral. Again, we cut it very, very close last time.

greenie_beans · 2025-02-08T13:44:54 1739022294

shop at a real bookstore, they don't have this problem.

volkk · 2025-02-07T15:38:10 1738942690

> Are AI-written books getting published?

actually i think they are. lots of e-book slop

> If they start out-competing humans, is that bad?

Not inherently, but it depends on what you mean by out-competing. Social media outcompeted books and now everyone's addicted and mental illness is more rampant than ever. IMO, a net negative for society. AI books may very well win out through sheer spam but is that good for us?

qup · 2025-02-07T15:55:52 1738943752

Nobody has responded to me with anything about how authors are harmed, so I don't really get who we're protecting here.

It feels more like we just want to punish people, particularly rich people, particularly if they get away with stuff we're afraid to try.

volkk · 2025-02-07T17:07:17 1738948037

> Nobody has responded to me with anything about how authors are harmed

i imagine if books can be published to some e-book provider through an API to extract a few dollars per book generated (mulitiplied by hundreds), then eventually it'll be borderline impossible to discover an actual author's book. breaking through for newbie writers will be even harder because of all of the noise. it'll be up to providers like Amazon to limit it, but then we're then reliant on the benevolence of a corporation and most act in self interest, and if that means AI slop pervading every corner of the e-book market, then that's what we'll have.

kind of reminds me of solana memecoins and how there are hundreds generated everyday because it's a simple script to launch one. memecoins/slop has certainly lowered the trust in crypto. can definitely draw some parallels here.

consteval · 2025-02-08T00:32:40 1738974760

> Nobody has responded to me with anything about how authors are harmed

The same way good law-abiding folk are harmed when Heroin is introduced to the community. Then those people won't be able to lend you a cup of sugar, and may well start causing problems.

AI books take off and are easy to digest, and before long your user base is quite literally too stupid to buy and read your book even if they wanted.

And, for the record, it's trivial to "out compete" books or anything else. You just cheat. For AI, that means making 1000 books that lie for every one real book. Can you find a needle in a haystack? You can cheat by making things addictive, by overwhelming with supply, by straight up lying, by forcing people to use it... there's really a lot of ways to "outcompete".

> It feels more like we just want to punish people, particularly rich people, particularly if they get away with stuff we're afraid to try.

If by "afraid to try" you mean "know to be morally reprehensible" and if by "punish people" you mean "punish people (who do things that we know to be morally reprehensible)", then sure.

But... you might just be describing the backbone of human society since, I don't know, ever? Hm, maybe there's a reason we have that perspective. No, it must just be silly :P

satvikpendem · 2025-02-08T08:35:45 1739003745

> know to be morally reprehensible

In your opinion, not to everyone. There has been no actual argument as to why it's supposedly "morally reprehensible."

consteval · 2025-02-14T23:36:38 1739576198

I just explained how it's morally reprehensible. The argument is right there, above the quote you chose to quote. Neat trick, but I'm sorry, a retort that does not make.

satvikpendem · 2025-02-14T23:42:11 1739576531

You didn't explain anything about why it is so, you just said it is, hence why I said it's your opinion. If you can't explain why, in more concrete terms, then there is no reason to believe your argument.

consteval · 2025-02-15T00:04:29 1739577869

I just explained how AI books are able to cheat - they make more, faster, cheaper, and win based not on quality, never on quality, but rather by overwhelming. Such a strategy is morally reprehensible. It's like selling water by putting extra salt in everything.

Consumers are limited by humanity. We are all just meat sacks at the end of the day. We cannot, and will not, sift through 1 billion books to find the one singular one written by a person. We will die before then. But, even on a smaller scale - we have other problems. We have to work, we have family. Consumers cannot dedicate perfect intelligence to every single decision. This, by the way, is why free market analogies fall apart in practice, and why consumers buy what is essentially rat poison and ingest it when healthier, cheaper options are available. We are flawed by our blood.

We can run a business banking on the stupidity of consumers, sure. We can use deceit, yes. To me, this is morally reprehensible. You may disagree, but I expect an argument as to why.

satvikpendem · 2025-02-15T03:30:18 1739590218

> I just explained how AI books are able to cheat - they make more, faster, cheaper, and win based not on quality, never on quality, but rather by overwhelming. Such a strategy is morally reprehensible.

Okay, I fundamentally disagree with your premises, analogies to water and banking (or even in your other comment about piracy [0], as I have not seen any evidence of piracy leading directly to "suicides," as you say, and have instead actually benefited many companies [1]), and therefore conclusions, so I don't think we can have a productive conversation without me spending a lot of time saying why I don't equate AI production to morality, at all, and why I don't see AI writing billions of books having anything to do with morals.

That is why I said it is your opinion, versus mine which is different. Therefore I will save both our time by not spending more of it on this discussion.

[0] https://news.ycombinator.com/item?id=42971446#43054300

[1] https://www.wfyi.org/news/articles/research-finds-digital-pi...

sanderjd · 2025-02-07T15:46:58 1738943218

I think the concern goes to the point of copyright to begin with, which is to incentive people to create things. Will the inclusion of copyrighted works in llm training (further) erode that incentive? Maybe, and I think that's a shame if so. But I also don't really think it's the primary threat to the incentive structure in publishing.

qup · 2025-02-07T15:51:49 1738943509

> the point of copyright to begin with, which is to incentive people to create things

Is it?

(I don't agree)

sanderjd · 2025-02-11T01:23:40 1739237020

Yes, it is. It's not actually an opinion thing. It's a "what did the people who came up with the idea of copyright think it was for?" thing.

I haven't read the primary source material, so you could teach me something here, but my understanding is that the idea was to incentive creators.

roguecoder · 2025-02-07T22:42:26 1738968146

That is not actually the goal of it.

Copyright was invented by publishers (the printing guild) to ensure that the capitalists who own the printing presses could profit from artificial monopolies. It decreases the works produced, on purpose, in order to subsidize publishing.

If society decides we no longer want to subsidize publishers with artificial monopolies, we should start with legalizing human creativity. Instead we're letting computers break the law with mediocre output while continuing to keep humans from doing the same thing.

LLMs are serving as intellectual property laundering machines, funneling all the value of human creativity to a couple of capitalists. This infringement of intellectual property is just the more pure manifestation of copyright, keeping any of us from benefitting from our labor.

sanderjd · 2025-02-11T01:25:23 1739237123

Interesting! I'd love a citation on this...

satvikpendem · 2025-02-07T22:39:40 1738967980

People have created for millennia before the modern institution of copyright, so I'm not sure how that's a cogent argument.

sanderjd · 2025-02-11T01:26:06 1739237166

Yeah it's an interesting point, but it was also hard to physically copy things for all of those millennia.

greenie_beans · 2025-02-08T13:44:30 1739022270

i wrote a book and copyright was not once on my mind. having created something is the incentive to create for most artists

sanderjd · 2025-02-11T01:26:37 1739237197

I don't think we can infer the motives of most artists from your personal motives.

greenie_beans · 2025-02-11T14:38:01 1739284681

fine, because that means your claim that artists create for the copyright incentive is also false

rcMgD2BwE72F · 2025-02-07T15:54:26 1738943666

>What are we actually worried about happening?

Few company can amass such quantities of knowledge and leverage it all for their own, very-private profits. This is unprecedented centralization of power, for a very select few. Do we actually want that? If not, why not block this until we're sure this a net positive for most people?

qup · 2025-02-07T15:56:28 1738943788

Meta open-sourced it my guy

rcMgD2BwE72F · 2025-02-11T07:37:31 1739259451

Because they expect not to have to opens-source future models. Easy to open stuff as long as you strengthen your position and prevent the competition from emerging.

Ask Google about Android and what they now choose to release as part of AOSP vs Play Services.

bbor · 2025-02-07T14:23:22 1738938202

…why? Will people buy less books because we have intuitive algorithms trained on old books?

Personally, I strongly believe that the aesthetic skills of humanity are one of our most advanced faculties — we are nowhere close to replacing them with fully-automated output, AGI or no.

luqtas · 2025-02-07T14:38:49 1738939129

old books? i can imagine the shit/hallucinated-like generative AI we would have if the training weight was restricted to public domain stuff...

i think when chatGPT was around version 2 or 3, i had extracted almost 2 pages (without any alteration from the original) with questions that considered the author from this book here, https://www.amazon.com/Loneliness-Human-Nature-Social-Connec...

now it's up to you to think this is okay... but i bet you are no author

horsawlarway · 2025-02-07T15:30:50 1738942250

I find this such a strange remark on this front.

You got less than 1% of a book... from an author who has passed away... who wrote on a research topic that was funded by an institution that takes in hundreds of millions of dollars in federal grants each year...

I'm not an author (although I do generate almost exclusively IP for a living) and I think this is about as weak a form of this argument as you possibly make.

So right back at ya... who was hurt in your example?

sanderjd · 2025-02-07T15:59:15 1738943955

I think the key is to think through the incentives for future authors.

As a thought experiment, say that the idea someday becomes mainstream that there is no reason to read any book or research publication because you can just ask an AI to describe and quote at length from the contents of anything you might want to read. In such a future, I think it's reasonable to predict that there would be less incentive to publish and thus less people publishing things.

In that case, I would argue the "hurt" is primarily to society as a whole, and also to people who might have otherwise enjoyed a career in writing.

Having said that, I don't think we're particularly close to living in that future. For one thing I'd say that the ability to receive compensation from holding a copyright doesn't seem to be the most important incentive for people to create things (written or otherwise), though it is for some people. But mostly, I just don't think this idea of chatting with an AI instead of reading things is very mainstream, maybe at least in part because it isn't very easy to get them to quote at length. What I don't know is whether this is likely to change or how quickly.

bbor · 2025-02-07T18:04:23 1738951463

  there is no reason to read any book or research publication because you can just ask an AI to describe and quote at length from the contents of anything you might want to read

I think this is the fundamental misunderstanding at the heart of a lot of the anger over this, beyond the basic "corporations in general are out of control and living authors should earn a fair wage" points that existed before this.

You summarize well how we aren't there yet, but I'd say the answer to your final implied question is "not likely to change at all". Even when my fellow traitors-to-humanity are done with our cognitive AGI systems that employ intuitive algorithms in symphony with deliberative symbolic ones, at the end of the day, information theory holds for them just as much as it does for us. LLMs are not built to memorize knowledge, they're built to intuitively transform text -- the only way to get verbatim copies of "anything you might want to read" is fundamentally to store a copy of it. Full stop, end of story, will never not be true.

In that light, such a future seems as easy to avoid today as it was 5 years ago: not trivial, but well within the bounds of our legal and social systems. If someone makes a bot with copies of recent literature, and the authors wrote that lit under a social contract that promised them royalties, then the obvious move is to stop them.

Until then, as you say: only extremists and laymen who don't know better are using LLMs to replace published literature altogether. Everyone else knows that the UX isn't there, and the chance for confident error way too high.

luqtas · 2025-02-07T18:04:33 1738951473

that was just a metaphor, you can ask your AI what's that or use way less energy and use Wikipedia's search engine... or do you think OpenAI first evaluates if the author is an independent developer &/or has died &/or was funded by a public university before adding the content to the training database? /s

and one thing is publishing a paper with jargon for academics, another is to simplify the results for the masses. there's a huge difference between finishing a paper and a book

rcMgD2BwE72F · 2025-02-07T16:00:14 1738944014

It isn't that someone was hurt. We have one private entity gaining power by centralizing knowledge (which they never contributed to) and making people pay for regurgitating the distilled knowledge, for profit.

Few entities can do that (I can't).

Most people are forced to work for companies that sell their work to the higher bidder (which are the very entities mentioned above), or ask them to use AI (under the condition that such work is accessible to the AI entities).

It's obviously a vicious circle, if people can't oppose their work to be ingested and repackaged by a few AI giants.

andybak · 2025-02-07T16:33:47 1738946027

Are you talking about Meta? They released the model. It's free to use.

satvikpendem · 2025-02-07T22:40:50 1738968050

Then you should be in support of OSS models over private entity ones like OpenAI's.

rcMgD2BwE72F · 2025-02-10T11:17:55 1739186275

Like supporting Android Open Source Project… until Google decides to move the critical parts into Google Play Services? I run GrapheneOS (love it) but almost no banks will allow non-Google-sponsored Android ROMs and variants to do NFC transactions because… the AOSP is designed to miss what Google actually needs.

Idem with ML Kit loaded by Play Services, which makes Android apps fail in many cases.

And I'm not talking about biases introduced by private entities that open source their models but pursue their own goals (e.g geopolitical).

As long as AI is designed and led by huuge private entities, you'll have a hard time benefiting from it without supporting the entities' very own goals.

satvikpendem · 2025-02-15T03:30:57 1739590257

Something is better than nothing, better to have AOSP than to have a fully closed-source OS like iOS.

Workaccount2 · 2025-02-07T15:07:30 1738940850

The answer is to censor the model output, not the training input. A dumb filter using 20 year old technology can easily stop LLM's from verbatim copyright output.

bbor · 2025-02-07T17:55:24 1738950924

I know that this seems likely from a theoretical perspective (in other words, I would way underestimate it at the sprint planning meeting!), but

A) checking each output against a regex representing a hundred years of literature would be expensive AF no matter how streamlined you make it, and

B) latent space allows for small deviations that would still get you in trouble but are very hard to catch without a truly latent wrapper (i.e. another LLM call). A good visual example of this is the coverage early on in the Disney v. ChatGPT lawsuit:

[1] IEEE: https://spectrum.ieee.org/midjourney-copyright

[2] reliable ol' Gary Marcus: https://garymarcus.substack.com/p/things-are-about-to-get-a-...

esafak · 2025-02-07T15:39:01 1738942741

What if the model simply substitutes synonyms here and there without changing the spirit of the material? (This might not work for poetry, obviously.) It is not such a simple matter.

Workaccount2 · 2025-02-07T16:01:26 1738944086

It's pretty simple, you are absolutely allowed to do that, and it's been done forever.

Imagine having the copyright claim to "Person's family member is killed so they go and get revenge".

esafak · 2025-02-08T02:24:29 1738981469

So I can duplicate a book and change and word or two and sell it? That does not sound right.

rafram · 2025-02-07T13:07:33 1738933653

I think you’re overstating its importance. The internet already makes it possible to order almost any book in existence and have it arrive at your doorstep within a week or so, or often on your ebook reader instantly. And your local library probably participates in an interlibrary loan system that lets you request any book held by any library in the country for free.

LibGen gives you access to a much smaller body of works than either of those. It’s a little more convenient. But the big difference is that it doesn’t compensate the author at all.

Just go to a real library.

intotheabyss · 2025-02-07T13:57:22 1738936642

And what about the other billions of people on the planet that don't even have a library, let alone a doorstep to receive a first world delivery service.

Cyph0n · 2025-02-07T13:53:43 1738936423

1. We are not talking about physical books.

2. DRM is built in to most purchased ebooks, which means you can’t consume the book on any device. “Illegal” tools exist to circumvent this.

3. Large ebook stores - like other digital stores - essentially lend you a copy of the book. So when they are forced to pull a book, they’ll pull your access too.

Of course, now that the big players have consumed/archived the entire book dump, they can go ahead and kill it to prevent others from doing the same thing.

ALittleLight · 2025-02-07T14:34:50 1738938890

It is *much* more convenient. When a research path takes me to an article or book - I could buy or order or go to a physical library, that would take hours or days. I could also open it as a PDF in seconds. If you need to read a chapter from a book, or an article, or skim such checking to see if it's worthwhile, 20-30 times to figure something out, then libgen is the difference between finishing in a day or a month.

thfuran · 2025-02-07T14:03:25 1738937005

There are a whole lot of books that are out of print, and if a book went out of print before ebooks were a thing, it probably doesn't have a legal digital edition either.

xtracto · 2025-02-07T14:47:02 1738939622

This. Few people here would remember ebooksclub/gigapedia/smiley/library.nu [1] which predated LibGen by several years. But that online library had a lot of books that are not availble nowadays. There were lots of scanned books (djvu) that people uploaded. So much lost knowledge.

[1] https://en.wikipedia.org/wiki/Library.nu

mik1998 · 2025-02-07T13:39:53 1738935593

No one sells scans of older books, which are often sparsely available in obscure (often private) libraries.

rafram · 2025-02-07T15:34:22 1738942462

Sure, but I have a strong feeling that scans of out-of-print books only constitute a small portion of LibGen’s traffic.

It’s like the idea that most BitTorrent users are just using it to share free software and Creative Commons media. (See the screenshots on every BitTorrent client’s website.) It would definitely be helpful if it were true, but everyone knows it’s just wishful thinking.

crazygringo · 2025-02-07T17:53:08 1738950788

Why does the proportion matter?

Academics are huge users of LibGen for academic books from the entire past century and beyond. It's infinitely more convenient to instantly get a PDF you can highlight, than wait weeks for some interlibrary loan from an institution three states away.

Just because the majority of people might be downloading Harry Potter is irrelevant.

sva_ · 2025-02-07T15:24:39 1738941879

Libraries can burn down (see Library of Alexandria), civilizations end (see various). LibGen makes it possible for an individual to backup a snapshot of cumulative human knowledge, and I think that's commendable.

greenavocado · 2025-02-07T14:46:11 1738939571

> LibGen gives you access to a much smaller body of works than either of those.

> Just go to a real library.

The thrill of waiting a week for a book to arrive or navigating the labyrinthine interlibrary loan system is truly a privilege that many can afford. And who needs instant access to knowledge when you can have the pleasure of paying for shipping or commuting to a physical library?

It's also fascinating that you mention compensating authors, as if the current publishing model is a paragon of fairness and equity. I'm sure the authors are just thrilled to receive their meager royalties while the rest of the industry reaps the benefits.

LibGen, on the other hand, is a quaint little website that only offers access to a vast, sprawling library of texts, completely free of charge and accessible to anyone with an internet connection. I'm sure it's totally insignificant compared to the robust and equitable systems you mentioned.

Your suggestion to "just go to a real library" is also a brilliant solution, assuming that everyone has the luxury of living near a well-stocked library, having the time and resources to visit it, and not having any other obligations or responsibilities. I'm sure it's not at all a tone-deaf, out-of-touch recommendation.

rafram · 2025-02-07T15:29:30 1738942170

Yes, publishers don’t pay authors as much as they deserve, but LibGen pays them literally nothing. Authors tend to love libraries but hate piracy. Why? Because earning something is better than earning nothing.

Have you ever submitted an ILL request? It’s extremely simple. Many library systems even integrate with WorldCat, so submitting a request for any book just takes a few clicks.

I’m mostly speaking about people in the US. Every single county in the entire country has a public library. Almost all of them have ILL.

I think equity is a fair argument for the existence of services like LibGen in many parts of the world, but the reality is that almost everyone using a book piracy sites in a first-world country is using it to pirate an in-print book that they just don’t want to go to the trouble of borrowing or buying.

JambalayaJimbo · 2025-02-07T15:22:05 1738941725

Your library almost definitely offers digital loans as well.

crtasm · 2025-02-07T16:36:25 1738946185

Seeing the high prices they are charged for a digital licence which expires after a fairly small number of loans, I feel it'd be better for my library if I pirate when possible. Save those limited loans for someone who prefers/needs them.

oguz-ismail · 2025-02-07T13:59:52 1738936792

[flagged]

rafram · 2025-02-07T15:21:12 1738941672

I definitely do know what I think.

esafak · 2025-02-07T15:50:45 1738943445

Do you think developing countries are just peppered with libraries, and their inhabitants order books from Amazon?

Libgen originated in Russia, and its users are global. This is not a purely American issue.

rafram · 2025-02-07T16:30:52 1738945852

I was responding to a comment arguing that LibGen is the largest collection of knowledge in human history, which I think is an overly romantic and totally incorrect take. It may be a very useful collection of knowledge to people in developing countries, but it simply is not larger than the collection of knowledge accessible via any first-world public library. Obviously not everyone has access to that, but again, that’s not what the comment I replied to was about, and not relevant to what I’m saying.

satvikpendem · 2025-02-07T22:43:24 1738968204

> but it simply is not larger than the collection of knowledge accessible via any first-world public library

How do you know or can quantify this? At a first approximation, libraries are finite in space while the internet is (for the purposes of this discussion) infinite. I'd agree with you if you had said something like Wikipedia was bigger than libgen (and probably not even then, as Wikipedia is merely a summary of primary sources, which would be theoretically contained in libgen).

mik1998 · 2025-01-07T14:36:02 1736260562

There has never been one.

mik1998 · 2024-07-12T11:07:46 1720782466

So, what do we do with actual people who have a very similar voice to some "more famous" person? It's quite silly when voices are far away from being unique to a person.

mik1998 · 2024-06-12T22:24:28 1718231068

All I can read from this article is "the end of mathematics is on the horizon". Hopefully nothing he says actually materializes.

mik1998 · 2024-05-17T22:09:23 1715983763

    for i in 0:7
        c += (r >> i) & 1
    end

This is just popcnt, surely Julia has a built in for that.

borodi · 2024-05-17T22:49:20 1715986160

There is, it's called count_ones. Though I wouldn't be surprised if LLVM could maybe optimize some of these loops into a popcnt, but I'm sure it would be brittle

lunaticd · 2024-05-18T00:08:31 1715990911

author here. I thought there might be a machine instruction for this but wasn't sure, I also didn't know Julia had a count_ones that counted the 1s.

Thanks! With this the timings are even faster. I'll update the post.

lunaticd · 2024-05-18T01:12:41 1715994761

julia> @code_typed hamming_distance(Int8(33), Int8(125)) CodeInfo( 1 ─ %1 = Base.xor_int(x1, x2)::Int8 │ %2 = Base.ctpop_int(%1)::Int8 │ %3 = Base.sext_int(Int64, %2)::Int64 │ nothing::Nothing └── return %3 ) => Int64

julia> @code_llvm hamming_distance(Int8(33), Int8(125)) ; Function Signature: hamming_distance(Int8, Int8) ; @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:13 within `hamming_distance` define i64 @julia_hamming_distance_16366(i8 signext %"x1::Int8", i8 signext %"x2::Int8") #0 { top: ; @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:14 within `hamming_distance` ; ┌ @ int.jl:373 within `xor` %0 = xor i8 %"x2::Int8", %"x1::Int8" ; └ ; ┌ @ int.jl:415 within `count_ones` %1 = call i8 @llvm.ctpop.i8(i8 %0) ; │┌ @ int.jl:549 within `rem` %2 = zext i8 %1 to i64 ; └└ ret i64 %2 }

it lowers to the machine instruction now.

I also tried 8 Int64s vs 64 Int8s and it doesn't seem to make a difference when doing the search.

EDIT: apologize for the formatting

shiandow · 2024-05-18T15:37:24 1716046644

I think you may need to update the figures in the rest of the article. At some point you mention it should take around 128ns but with the new benchmark that's probably closer to 64*1.25=80ns.

sitkack · 2024-05-18T00:27:33 1715992053

I had Opus translate your code to Rust

    fn hamming_distance_u8(x1: u8, x2: u8) -> usize {
        (x1 ^ x2).count_ones() as usize
    }

ummonk · 2024-05-17T22:19:47 1715984387

From what I've heard it's actually faster to create a 256 byte lookup table than to use popcnt.

mik1998 · 2024-05-17T22:24:12 1715984652

It used to be pretty bad on old intel processors but nowadays it should be faster than an L1 fetch.

mik1998 · 2024-05-17T09:56:12 1715939772

Some kind of Casio watch?

mik1998 · 2024-05-13T10:58:54 1715597934

Mathematica.

mik1998 · 2024-05-13T10:29:47 1715596187

"Europe" is not USA. I don't know why the author wants it to be. Germany and Spain are different countries, even if we have some degree of a unified market.

> From consumer markets, languages, laws, education systems, taxes, to funding – Europe acts like a network of small countries instead of one unified market.

"Europe" does not ACT AS, it IS a network of "small" (not quite) countries.

piaste · 2024-05-13T10:57:10 1715597830

I think these guys are doing themselves a disservice by donning the mantle of Defenders of California-Style Capitalism®, a thing that nobody likes outside of oligarchs and aspiring oligarchs.

Their first project, smoothing out and equalizing the process of creating a new legal entity, seems like an unqualified good thing. Nobody likes notaries, outside of notaries and aspiring notaries. Having Estonia's level of simplicity in business creation would be a good thing for reckless 'disruptive' startups, but it would be a good thing for small brick-and-mortar artisans and shops too.

If one of their subsequent projects is going to be 'let's make tax laws nicer for billionaires', then yeah, screw those guys. But the first one is already a tall order and I frankly think they're unlikely to get political momentum (see: first paragraph.)

mik1998 · 2024-05-11T21:16:15 1715462175

Really only a thing in the USA due to the ridiculous American rule wrt. attorney fees.