Hacker News new | past | comments | ask | show | jobs | submit | bawolff's comments login

> While Russia’s invasion of Ukraine “has complicated things,” Elliott said, the collaboration between the U.S. and Russia on BEST is still ongoing, for now.

Nice to hear in these uncertain times.

> How is it any different when a machine does the same thing?

I think the argument is that the machine is not doing that, or at least there isn't evidence that it is doing that.

Specificly no evidence that github is doing both 1 and 2 at the same time. There might be cases where it makes trivial changes to code (point 2) but for code that does not meet the threshold of originality. Similarly there might be cases with copyrighted code where the idea of it is taken, but it is expressed in such a different way that it is not a straightforward derrivitave of the expression (keeping in mind you cannot copyright an idea, only its expression. Using a similar approach or algorithm is not copyright infringement)

And finally, someone has to demonstrate it is actually happening and not just in theory could happen. Generally courts dont punish people for future crimes they haven't comitted yet (sometimes you can get in trouble for being reckless even if nothing bad happens, but i dont think that applies to copyrighg infringement)

DMCA (at least the take down requests part) is not really suing someone and not really about making money. Its about getting certain works off the internet.

You are probably more likely to be on the wrong end of a dmca take down request as a poor person since you dont have the resources to fight it, and its not about recovering damages just censorship.

We are really losing the plot of what this thread is about here, but: DMCA takedown requests that are ignored or wheee the site does not comply with the process are subject to private civil action. Obviously, a takedown request is distinct from suing someone. And the way that the rights holder forces the site to remove the content is under threat of monetary penalties.

I would think it is pretty obviously not.

Is taking away a drunk driver's keys (before they get in the car) destruction of the evidence of their drunk driving?

This is not what I meant. By placing a copyright filter and claiming it never happened (please read the line I was replying to) before the system can be audited, they're indeed taking away the drunk driver's keys, which is a good thing, but also removing the offending car before Police arrives.

In this metaphor, removing the car of someone who was going to drink and drive but didn't, is certainly not a crime. Presumably though you mean removing the car after drunk driving actually took place - which might be, but probably depends a lot on if the person knew, and what the intent of the action was.

In the current case - its unclear if any crime took place at all, it seems clear that the primary intent was to prevent future crime not hide evidence of past ones. Most importantly the past version of the app is not destroyed (presumably). Github still has the version of the software without the copyright filter. If relavent and appropriate, the court could order them to produce the original version. It can't be destroying evidence if the evidence was not destroyed.

Yes, sorta. We're talking about software, therefore a piece of code that does something programmatically isn't like the drunk driver in a car that may cause more accidents, and although we aren't sure about that we prevent him/her to drive anyway just to be safe. The software would most certainly repeat its routine because it has be written to do so, that's why I wondered about destruction of evidence; by removing/modifying it, or placing filters, they would prevent it from repeating the wrongdoing, but also take away any means of auditing the software to find what happened and why.

> The copilot team rushed to slap a copyright filter on top to keep these verbatim examples from showing up, and now claims they never happen.

Well if the copyright filter is working they indeed aren't happening. Putting in safe gaurds to prevent something from happening doesn't mean you're guilty of it. Putting a railing on a balcony doesn't imply the balcony with railing is unsafe.

> LLMs are prone to paraphrasing. Just because you filter out verbatim copies doesn't mean there isn't still copyright infringement/plagiarism/whatever you want to call it

Copyright infringement and plagerism are different things. Stuff can be copyright infringement without being plagerized, and can be plagerized without being copyright infringement. The two concepts are similar but should not be conflated, especially in a legal context.

Courts decide based on laws, not on gut feeling about what is "fair".

> They clearly know the problem is real

They know the risk is real. That is not the same thing as saying that they actually comitted copyright infringement.

A risk of something happening is not the same as actually doing the thing.

> "Ner ner ner ner ner, you can't prove it to a boomer judge".

Its always a cop-out to assume that they lost the argument because the judge didn't understand. I suspect the judge understood just fine but the law and the evidence simply wasn't on their side.

> Well if the copyright filter is working they indeed aren't happening. Putting in safe gaurds to prevent something from happening doesn't mean you're guilty of it. Putting a railing on a balcony doesn't imply the balcony with railing is unsafe.

Doesn't mean you weren't, at some point, guilty of it, either. It doesn't retcon things.

Sure, which is why we require evidence of wrong doing. Otherwise its just a witch hunt.

After all, you yourself probably cannot prove that you didn't commit the same offense at some point in time in the past. Like Russel's teapot, its almost always impossible to disprove something like that.

Yeah but I think the main concern in this situation is copilot moving forward, not their past mistakes.

The intent of the work can matter when determining if de minimis applies as well as fair use.

Part of my point is that fair use doesn't apply.

Training a model doesn't involve reproducing a copyrighted work, preparing a derivative work, distributing that work, or performing that work.

Fair use isn't required because none of the exclusive rights afforded by copyright apply.

> 24-bit audio is obviously a noticeable improvement

Audio quality is famous for having an extremely strong placebo effect. Unless you did the test double blinded, your anecdote has a good chance of being wrong.

Organizing data is valuable.

Not as valuable as the actual data, but its not nothing either.

> Is web scraping now considered a cyberattack? Was it eating their bandwidth even if it was served through Cloudflare? LOL.

We've been tobagonning down the slippery slope of "cyber" damages for a long time now.

Dang, surely this will put an end to AI training by scraping the web! Unless... perhaps such standard might not get evenly applied?

Is it just the subsampling?

Naively i would assume that even if you have the unnessary the C_b C_r channels in a greyscale image, they are going to compress really well since they have very little information in a greyscale image.

Storing photos and sufficiently photo-like images as YCbCr rather than RBG does indeed tend to make them more compressible even without subsampling.

JPEG / JFIF isn't really smart enough to take advantage of that. It's a file format from 1992, CPU speed was limited.

JPEG does entropy encoding.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact