> If you commit or post content to this repository that violates our Terms of Service, we will delete that content and may suspend access to your account as well
My reading of this is that github wants to stem the flood of garbage pull requests to their [dmca](https://github.com/github/dmca/pulls?q=is%3Apr+is%3Aclosed) repo, which I guess is probably reasonable.
I don't quite see where github is banning certain commit hashes or tree hashes on sight.
I certainly respect the authors desire to not publish their repos to github (and indeed one of the nice things about git is how easy it is to self host). And I can understand not wanting others to fork the repos to github (due to objections about the company itself), but I'm not sure how I feel about adding deliberate boobytraps to repos that seems to be purely punitive.
When you make that PR, you do not transfer any of youtube-dl to github. Their repo already has those objects.
So all you have done is pushed a repo containing a specific hash, which is now a TOS violation, in that specific case at least. I wrote "Certain commit hashes are rapidly heading toward being illegal" because of that specificity.
(I wonder when illegal transcendental numbers will become a thing.)
tl;dr yes. An illegal number multiplied by two is illegal. Randomly generating that number is not. The illegality is not strictly a property of the number itself, but it's color.
Another idea is to generate a random number and multiply it with an illegal prime. If the illegal prime is a sufficiently large number; we can extract the original illegal prime with very high confidence by just finding its prime factors and picking the shortest one.
Therefore all numbers are illegal.
You can't just dodge color by mutating the number via mathematical operations, color doesn't work like that. You can only get rid of the taint by using a fresh number without taint.
If there's an illegal number X and I give you a number Y with the intention that you obtain X from it by some transform then Y is also illegal because I "tainted" it when I did the original transformation from X.
If I just need Y (or X) for some operation that's not related to the "illegality" then those numbers are not illegal.
The essay that droffel linked to explains this very well.
1. This appears to show a complete disregard for basic mathematics...
2. Do lawyers ever wonder why people make jokes about them?
The number itself isn't illegal, because it can come from any number of contexts. What's illegal is wilfully working towards generating that number while knowing its purpose as a key. It's just the principle of Mens Rea at work.
Used to be illegal to have it in US and other countries, don't know status now
It's actually even cooler: store this prime in base-2 and interpret it as a gzip file, and it'll gunzip to a full working DeCSS (DVD decrypter) source code in C :-)
When framed this way, so many laws become ridiculous. Copyright is about establishing a monopoly on numbers. It's illegal for normal people to know certain numbers because they represent classified information. People go to jail because of numbers stored in their hard drives.
Numbers that are huge and precise are data, information, something quite different.
The thing with illegal primes is that they are small enough to really feel like numbers rather than a concatenation of things forced to look like a number (“the shape of my bedroom is 201809!”)
I think there is a biggest number though, and it should be higher than 64bit-1 because your processor would otherwise occasionally handle illegal numbers, thereby (probably) making it illegal as well.
I think the better point to make is that the law will never actually produce a concrete N, preferring to leave itself vague so that it can criminalise things at a whim. This is what makes it a farce and why such notions need to be purged from the law.
For natural numbers, the lawyers seem unable to do even that. Sure, they might want to name some of the "illegal primes" mentioned in this thread as examples, but what will they say about those numbers +1? They're gonna paint themselves into a corner of ridiculousness nomatter what.
What does "in real life" mean? You and your brain presumably exist in real life. If you brain can construct any natural number, then do they not exist just as much as the ones that happen to appear as counts of rice or the price of beer?
One might stretch the definition to allow numbers that you could read on a postcard in one go to still count as numbers, but it’s a big stretch. If there’s a more understandable representation of the number — especially if there’s a more understandable representation — then it’s data and not a number.
Beyond a certain amount of information content numbers just become data. Certainly at the point where counting the number of digits is non trivial. (There are notations to make magnitude easier to see, but these purposefully delete information content.)
But this will, in time, generate every copyrighted material.
Except it's not.
You can post the numbers alone, without including information (directly or by reference) to the application about which information is banned, all you want.
You need to somehow include information on how to use the number, which is arguably easier to pull off, but still doesn't make it legal, just harder to find out.
1. Someone somewhere creates a numbered list of primes (so 1=1,2=2,3=3,4=5,5=7,6=11,7=13,8=17, etc etc)
2. You refer to "the [N]th prime" with a vastly smaller number, and people could fetch it from the list of primes.
3. They could theoretically ban the number N?
Fun fact, there are a _LOT_ of prime numbers. There are about 10^22 prime numbers smaller than 10^24, so about every 100th number is prime. The ratio decreases the higher you go of course. The vastly smaller number is actually just a few bits smaller :)
Edit: except for this one, I guess. This one wasn't about DRM, but rather just a locked down device. TI sent bogus DMCA notices for that one, but as far as I can tell there is no solid legal theory under which those primes were illegal, and nobody ever got sued, so we can file this one under "manufacturer upset they no longer control their users' devices throws a legally meaningless fit".
But the first popular 'illegal prime' was a gzipped code that would defeat the DRM of DVDs. See https://en.wikipedia.org/wiki/Illegal_prime
There is a great classic essay describing this called "What color are your bits?"
So finding a variant of one offending digital file that's also a prime is a rather straightforward process.
What fascinates me more is all the illegal bits people put into the bitcoin block chain; and how bitcoin people deal with that.
Especially since you can make random looking strings illegal after the fact:
I have an illegal file A. I produce a random binary file R of the same size as A. I publish (R xor A) and R in two different venues, both mathematically-perfectly random files on their own. But together, they produce illegal file A.
Git commit hashes are indeed deterministic, though it depends on the inputs.
From StackOverflow :
Git uses the following information to generate the sha-1:
* The source tree of the commit (which unravels to all the subtrees and blobs)
* The parent commit sha1
* The author info (with timestamp)
* The committer info (right, those are different!, also with timestamp)
* The commit message
$ git rev-parse HEAD
$ git log HEAD
commit 78d66a4f169fec9cb9b3252f7a22bb020e967cd2 (HEAD -> master)
Date: Mon Nov 2 20:44:04 2020 -0500
$ git reset --soft HEAD\^
$ GIT_COMMITTER_DATE='Mon Nov 2 20:44:04 2020 -0500' GIT_AUTHOR_DATE='Mon Nov 2 20:44:04 2020 -0500' git commit -m temp2
[master 78d66a4] temp2
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 b
$ git rev-parse HEAD
Given an arbitrary stopping point x, you have approximately x / log x primes between 1 and x.
In other words, for a file of length n, you have to try about O(n) slight variations to find a prime number.
"The Trinity Hall Prime - Numberphile" https://www.youtube.com/watch?v=fQQ8IiTWHhg
"The Emerging Art of Drawing in Prime Numbers" https://www.popularmechanics.com/science/math/a28649996/the-...
If you’re going to restrict “publishing” to binary fixed point, arguably 1/3 is unpublishable.
The integers are a subset of the rationals, which are a subset of the algebraic numbers, which are a subset of the computable numbers, which (most mathematicians would accept) are a subset of the reals.
There are lots of subtleties about this. In fact Scott Aaronson just started a series that is in some sense about some of those subtleties, which I hope to be able to understand some day!
Hmm... are you saying that, for example, pi is the one-step sequence of operations of dividing a circle's circumference by its diameter? But first, you have to get both those values...
In practice, speaking informally, people talk about programs that may run forever all the time.
People have developed lots of theory around those as well. Of specific interest here would be the class of programs that provably only runs for a finite time before it produces the next bit of output.
You need that latter class of programs to talk about producing digits of pi, or running the main loop of an OS.
See also https://en.wikipedia.org/wiki/Total_functional_programming
That captures all of the transcendentals we care about. The remainder (to be fair, almost all of them) aren't computable at all, even in principle.
Are you aware of any specific instances where the presence of youtube-dl hashes in other repos have been found by github and used as the grounds for disciplinary action? That would be noteworthy indeed.
DMCAs would be much better served OwO'ified.
It also feels like it would violate the license of the repository. I see that one of the author's repos is LGPL , which allows redistribution of the source. Restricting how that source is distributed seems to me like a violation of that license, if not in letter at least in spirit.
(Of course, it goes without saying that I'm not a lawyer, which is why I said it feels like a violation.)
(The license is ~14 years older than git itself. There’s no way distribution using a specific git hosting service is required under the license.)
GitHub is free to change their minds about the implementation and start accepting those commits again. That is, you, as a recipient of the code, are free to ask GitHub to host it, and to offer them a license under the terms of the GPL; GitHub is free to accept or to decline, same as if you sent me the code, I am free to host it on my personal website but I'm also free to not host it.
(It would feel more like violating the spirit if it, say, prevented you from uploading it to a GitLab instance you self-host... but since you self-host it, you could just patch out any suppression code, so that case can't come up.)
There are all sorts of compatibility issues that can arise from platforms doing dumb things. For example, a repo with the files README.md and readme.md in the same directory would break on certain Windows machines due to case sensitivity problems.
"Hey, GitHub has stupid practices which are just terribly harmful for the community. Also, here's my source code."
Someone comes along and pushes that same repo to GitHub. GitHub bans them. This is exactly what you'd been warned of!
If only it was! In that case, it is github who is allegedly making the repo impossible to host. It is, thus, github who is acting in a childish way. The repo author is only adding some silly hashes.
By the way, the real "childish" behavior is believing that speech can be censored by technical means alone. This is impossible, there are always technical means to circumvent it. Speech can only be effectively censored by means of armed police action.
We’re not talking about bugs here, we’re talking about the deliberate inclusion of dangerous logic traps based on assumptions that could be wrong.
In engineering terms, this is trying to solve the problem at the wrong level of abstraction. Maybe you don’t like the thought of your code being used in weapons; that’s perfectly reasonable. Maybe you want to rid the world of such weapons; also reasonable. My advice would be that open source is not the right model in those circumstances, and that to really solve the problem you’ll have to be politically engaged.
Littering code with booby traps and publishing it is an absolutely awful idea, one that should not be acceptable to us as a professional community.
1) Publish arbitrary commits under your https://github.com/my/project URL, e.g. a fake https://github.com/my/project/blob/<faked_commit>/README.md in your project describing how to install it that actually describes installing malware.
2) Publish those commits under your name, with your email address, and GitHub will prominently display it as if you made the commit (most do not use GPG signatures, and most do not know to look for "Verified" anyway)
It seemed only a matter of time before this behavior got abused for something (anti-DMCA action is perhaps the best outcome of this situation I can imagine..)
If so it seems like that's a relatively easy fix for Github, just check if the commit is actually contained in userB's fork of the repo.
These are "known issues" I believe GitHub doesn't intend to fix:
1. With youtube-dl (and probably long before?) we know you can push a commit and view it under the web UI under another user, GitHub hasn't indicated this is a vulnerability at all.
2. Many people know you can impersonate another user through Git email addresses (GPG signing is supposed to solve this, but a lot of people don't use it and even when they do others don't really know they should look for "Verified" in GitHub's web UI.)
Combine these two known-issues and you get a really convincing phishing attack.
I really hope this doesn't become a common-place thing. Hopefully this raises some awareness that, basically, you should not trust any GitHub URL with a commit SHA in it - only trust ones with branch names - because it could be a phishing attack otherwise.
Though I probably wouldn't use a thing from a random hash? I'd go to the repo's main GitHub page and look for branches/tags that interest me. Doesn't everyone do that?
Very close to that, yes. There's actually just one more step you need to take: you have to also open a pull request from userA/project to userB/project. As a convenience feature, GitHub automatically makes PR commits available under a special namespace. You can try it yourself with any pull request that comes from another repo--just `git fetch origin pulls/<PR #>/head.
Since the foreign commit is there in this special 'pulls' namespace, it can be viewed in the web UI by its commit ID. That's how people are adding youtube-dl to github's DMCA notice repo; they're just opening a PR containing youtube-dl's commit history.
It's not that easy. First of all, you'd need to define "contained in". The naive choice would be anything reachable by a named ref. However, all pull requests also create a ref in your repository, so you'd need to exclude those. But that would mean that if you make a PR and then delete the branch the PR is made from, the PR contents aren't visible anymore.
But even if you have the set of refs that you consider to be in a repository, that means every time an object is requested you'll have to walk the whole graph backwards to find if an object is reachable from a ref. That's an expensive operation, and Git is usually pretty fast because it tries really hard to avoid this.
Am I missing something?
Smaller hosts could get away with not honoring DMCAs since the RIAA likely isn't going to waste resources actually filing a lawsuit, but this yt-dl situation seems like the perfect setup for the RIAA to set a precedent outlawing video/music downloaders if someone were to actually fight them on it (and until then, they can continue to take down video/music downloaders until someone does counter it).
No matter if your content may be legal under e.g. European law (e.g. right to repair, right to interoperability, right to reverse engineer), you are going to have a hard time hosting it. And even if you get it hosted at an European provider (remember, we don't have anything that competes with any of the three US cloud giants in terms of functionality!), you will have issues with accepting donations easily - Paypal, Stripe and all credit cards are under US regulation.
And it's not just theoretical, just look at what happened to Kim Dotcom/Megaupload (or, tangentially related, Julian Assange). If the US deems you a danger to their business interests, you are going to get hunted down, no matter where in the world you are and if what you are doing is legal under the jurisdiction of that country.
As a counterexample, I'd like to offer you sci-hub which doesn't seem to significant hosting problems. Remember, we are not trying to replace an entire industry and all possible use cases at once. We're simply discussing the hosting of a few git repositories which some US entity might consider unsavoury due to a borked and unfair law.
They have to change domains all so often as the copyright mafia has a "blanket" seizure grant (https://torrentfreak.com/publisher-gets-carte-blanche-to-sei...), Cloudflare won't touch them as a result, their founder has (at least!) one court judgement of 15 million US$ by Elsevier in New York and another 4.8M$ by ACS against them and I bet that there is some sort of secret indictment floating around that gets unsealed in the case Elbakyan ever travels out of Russia so an extradition warrant can be put out.. the relatively unique advantage they have is that their founder is possibly linked to the Russian secret service GRU: https://www.washingtonpost.com/national-security/justice-dep...
Effectively, Elbakyan's right to free movement is restricted to those nations that don't extradite to the US and have friendly relations to Russia. And for what we learned from the Snowden and Assange cases, it's safe to assume that even flying over a country that has extradition agreements with the US in a passenger flight is grounds enough for intervention.
- a lot of big tech companies are based in the US
- a lot of companies want to do business in the US
- DCMA can become part of a trade agreement with the US, I don't know if the E.U. will save us at this point.
I do appreciate that these are factors which make DMCA more relevant than if these factors did not exist. But your last point is not even a current fact but a hypothetical future.
I always invite people not to be defeatist but proactive in materializing the future they want to see. Citing unfortunate potential futures is not the right way to solve our problems.
So he is jokingly showing you a way to create a Github poison pill. The joke is that if you include this 'illegal' hash you subject the repo to being DMCA'd out (not really, but it's sort of an 'imagine if'). Which means his repo is now unpublishable on Github (for long anyway).
It's kind of hacker humour in the sort of vein that you'd expect from an old-schooler like him. You sort of have to get the joke to get the joke.
Since as far as I can tell, Github isn't actually banning specific hashes (because they care about the content, not the hashes), and there's no sign that they're going to do so in the future, I can't say I understand the humor...
So, the hacker humor point is that you have a "poison" repo which, if you push it to GitHub, might get your account deleted in the future. You have no guarantee that it won't, for sure.
EDIT: I read the article again and couldn't find any of that, so I must have read that in a comment here on HN instead.
A git repo is fundamentally composed of 3 object types:
* Blob (effectively a file)
* Tree (which refers to collection of Blobs and Trees, effectively a dir)
* Commit (comment/ author info and various other metadata, a Tree, and a previous Commit)
This is what the git tools are used to dealing with. However git submodules changed things. With submodules, a Tree object can also contain a commit object! Together with the .gitmodules metadata file, this allows you to include another git repo inside your repo.
Joey leveraged that ability to add the youtube-dl latest master commit into his repo as a submodule, but deleted the .gitmodules file so that git wouldn't be verbose about it.
And that's how you sneak a commit into a repo.
This additionally leverages GitHub's data model (which is really the git data model), where they just have this huge DB of all the objects of all the repos on GitHub. So effectively by including this commit in a repo on GH, you're making it refer to the the entire (buried) youtube-dl repo, but you need to be sneaky to be able to see it.
Beyond that, it seems like a very community-hostile move to intentionally embed commit hashes that would even potentially cause a user to get banned from another platform if they push a copy of the code there. This doesn’t stick it to GitHub, it sticks it to the users of your repo.
Honestly if a user did get banned for sharing that poisoned repo, I would still put 95% responsibility for that on GitHub. But it's not happening yet as far as I know.
If they figure it out, then ROT14 it.
If they figure that out, encrypt it and post the key somewhere that blocks the offices of Github.
If they figure that out, we'll still find a way to build a better mouse.
Far better would be to spend your efforts finding which RIAA members pressed the RIAA to make this takedown. Follow the money, cui bono, etc.
Rosa Parks was asked to leave, but she didn't leave.
Github is extremely close to having a monopoly, leaving is the only way to prevent it from happening. If it becomes one, it has no incentives to listen to a few protests.
The problem lies with whichever member organizations inside the RIAA (or nonmember orgs friendly with the RIAA) convinced them to file the takedown. Most of us aren't positioned to find that out, but that's where the most can be gained. Maybe the EFF or similar orgs can help. Then we'll know what the true motives were, and how to address them.
Right now they're issuing what, 1 takedown notice to GitHub? When we make that 100,000 takedown notices, some in ROT13, some in ROT24 on an Israeli server, some in RSA encryption on a Chinese server with keys published on Russian servers, maybe they'll just be too overwhelmed with paperwork at that point.
Edit: On second thought, I just visited riaa.com . How about we lobby jQuery, WordPress and bootstrap to add a specific exclusion to their licensing? Now, this wouldn't affect the RIAA immediately, only the next time they update.
Git has a mechanism called "submodules" which allows one repo to reference another repo. It doesn't actually include any of the content from the second repo: all that's included is the URL (in the file .gitmodules) and a commit hash (in Git's view of the directory structure).
When you clone a Git repository with submodules and you pass an option to git clone, or if you run a git submodule command later, Git will make a nested clone at whatever subdirectory path contains the submodule, and check out that commit. If you make commits in the submodule (and, hopefully, remember to push them), the outer repo will appear to be modified, and you can git add the changes, which will record the new commit hash, but only the hash.
The purpose is to deal with vendoring or sometimes making changes to large repositories owned by someone else without copying the whole source code into your own repository. (It's also occasionally used as a mechanism to split up very large repositories that are owned by the same group/organization, e.g., if you have a test suite or large data files or something, or a shared library used by multiple projects, you might find it helpful to use submodules.) https://github.com/ceph/ceph is an example repo that uses them - "ceph-object-corpus" and "ceph-erasure-code-corpus" at the top level are submodules, and if you click on the .gitmodules file, you'll find that there are other submodules inside src/, too. This is also an example of how GitHub handles submodules also hosted on GitHub - it will link you to the submodule at that commit.
So, GitHub does at least some parsing of submodules. The author is claiming that if you include a banned commit hash as a submodule in the history of your own project, and then promptly delete it, it won't noticeably increase the size of your Git repo, nor will you actually include any of the banned content yourself, but GitHub will potentially prevent that history from being pushed. As a result, you have a Git repo that GitHub would not accept.
Since the author both does not use GitHub and wants to encourage others to do the same, this would be a fairly effective way of forcing that.
(The article doesn't show GitHub actually refusing to accept the history containing this submodule, though - it only shows the submodule having been created, locally. Although maybe the author's argument is simply that pushing it violates the Terms of Service even if it's not blocked by automated means.)
Yes, github is disabling accounts to people who post youtube-dl. No, this is not in compliance with github's ToS.
github says: “Please note that re-posting the exact same content that was the subject of a takedown notice without following the proper process is a violation of GitHub’s DMCA Policy and Terms of Service." Ref: https://torrentfreak.com/github-warns-users-reposting-youtub...
github's DMCA policy says: "One of the best features of GitHub is the ability for users to "fork" one another's repositories. What does that mean? In essence, it means that users can make a copy of a project on GitHub into their own repositories. As the license or the law allows, users can then make changes to that fork to either push back to the main project or just keep as their own variation of a project. Each of these copies is a "fork" of the original repository, which in turn may also be called the "parent" of the fork.
GitHub will not automatically disable forks when disabling a parent repository. This is because forks belong to different users, may have been altered in significant ways, and may be licensed or used in a different way that is protected by the fair-use doctrine. GitHub does not conduct any independent investigation into forks. We expect copyright owners to conduct that investigation and, if they believe that the forks are also infringing, expressly include forks in their takedown notice." Ref: https://docs.github.com/en/free-pro-team@latest/github/site-...
youtube-dl doesn't fall under any of the restrictions in github's ToS either. ref: https://docs.github.com/en/free-pro-team@latest/github/site-...
I have very little sympathy for github here. They ought to follow the DMCA process. Per the DMCA process, I can't dispute the takedown of the main Youtube-dl repo, since I'm not a party to the process. If I post youtube-dl to github, it should stay up, pending a DMCA notice from RIAA. If such a notice is sent, I should then have the right to then dispute that.
Create an empty repo else where.
With your environment do a: git remote rm origin; git remote add origin <new environment> ; git push -u origin master
You've filled up the new repo with what you have, history and all.
There in lies the problem. Encouraging people to put their resume on centralized platforms (Be it Facebook, LinkedIn, or Github) is putting your entire career at risk. It's better to have your own domain and self-host, that way you aren't locked in and at risk of loosing your entire professional history if you get banned or a service goes offline.
That said, this is a very clever use of / exposure of the current systems in place. Unintended consequences of bad laws should be highlighted like this more regularly.
But for public repositories, OSS development and to a certain degree even job applications, Github does have a monopoly.
Yeah, well, that's just, like, your opinion, man.
"If you use one of my open-source repository, you can get in trouble, mouhahaha"
It feels like a dick move to me.
But then, I realize that the RIAA bans can now be used as an anti RIAA partnership against GitHub and others, which is... pretty cool.
Definitively a feature that we need \o/ !
Microsoft has refused to challenge a highly dubious DMCA notice when they have more than enough resources to do so. Isn't Microsoft the party at fault given that their lawyers know the DMCA notice doesn't have a leg to stand on.
then boom one day it's called azure officeserver or some similar microsoft style name and monetised like crazy with github being a deprecated sub-project left to wither and die.
it is NOT in their interest to defend github in any way. any negative press attached to the product literally doesn't matter to them since it's not associated with microsoft
I mean really. what a stupid article
my last job was through an automated email i got about my github resume hashtags. iirc all they look for is certain tags and a regular commit history after which the recruiter takes a brief look at the repo and fires off the email if the criteria is met.
seems enough to call github a resume to me.
This is why aliens don't visit us.
Also been using Gitlab off-prem, on-prem, for more than 6 years. When someone says Gitlab problems all I can think of is resource issues with Sidekiq when I was self-hosting it.
All this drama is only bringing to light how unsafe it is for projects to host their code on a closed platform.
Better be on the safe side and use something you can easily migrate away, like Gitlab.
The git-annex project (and I'm sure many of his other ones) have been going strong without Github and I appreciate him for that. He's finally being vindicated I believe (and eventually many of the warners against these free-as-in-beer but not free-as-in-software services have been proven right)