
GitHub Turned into an Enterprise Under Microsoft? - aliostad
I was requested to remove a training file from my Deep Learning Language detection repo (only 64 stars but still). The repo used Deep Learning to detect programming language of a file or snippet. The files and snippets were harvested from public files and snippets of github and stackoverflow. The repo was taken down even after I removed the file from the git history. More info and screenshots here: https:&#x2F;&#x2F;twitter.com&#x2F;aliostad&#x2F;status&#x2F;1222440190821781506?s=20
======
zegerjan
The reason you couldn't delete the blob, is because someone forked your
repository and GitHub uses git alternates for deduplication of fork networks.

I think you could ask GitHub if you can recreate your repository without the
offending blob, and you should be good again.

~~~
aliostad
This is essentially what I did using bfg tool. They still took it down.

[https://help.github.com/en/github/authenticating-to-
github/r...](https://help.github.com/en/github/authenticating-to-
github/removing-sensitive-data-from-a-repository)

~~~
zegerjan
I'm saying the blob was in a new repository, you had no control over. You
couldn't have removed it, you could only make sure it doesn't get referenced
in _your_ repository. Which is what you did.

~~~
aliostad
sure, but I imagine they have already removed those forks too.

Have a look at this list
[https://github.com/github/dmca/blob/master/2020/01/2020-01-2...](https://github.com/github/dmca/blob/master/2020/01/2020-01-27-ibm.md)

------
tastroder
github.com was always a corporate entity and subject to DMCA takedowns.

The notice link so others don't have to hand-type it:
[https://github.com/github/dmca/blob/master/2020/01/2020-01-2...](https://github.com/github/dmca/blob/master/2020/01/2020-01-27-ibm.md)

I feel like that other twitter user / BSA / IBM as the originators of that
takedown notice are more useful targets of animosity here.

~~~
aliostad
Lack of communication and courtesy - disrespect to public good. I am happy to
remove the mention per his request.

~~~
tastroder
Fair enough, communication could indeed be improved. NAL but I was under the
impression that, while they could surely be more helpful here, once they
received that official DMCA takedown notice they don't really have a choice in
the matter of taking it down or not.

Edit: disabling the repository after being notified by you within 24 hours
seems to be against their own policy at
[https://help.github.com/en/github/site-policy/dmca-
takedown-...](https://help.github.com/en/github/site-policy/dmca-takedown-
policy) \- have you tried contacting their support again?

~~~
aliostad
Well they said, I had 24 hours to remove the offending item according to the
"remove sensitive data" link which I abided in a matter of a few minutes. They
still took down the repo - that is the problem, not sending the notice.

"We're giving you 24 hours to make the changes identified in the following
notice:

[https://github.zendesk.com/attachments/token/BqByLyvvRzOAmVy...](https://github.zendesk.com/attachments/token/BqByLyvvRzOAmVyL9TW8FxZ0L/?name=2020-01-27-ibm.rtf)

If you need to remove specific content from your repository, simply making the
repository private or deleting it via a commit won't resolve the alleged
infringement. Instead, you must follow these instructions to remove the
content from your repository's history, even if you don't think it's
sensitive:

[https://help.github.com/articles/remove-sensitive-
data](https://help.github.com/articles/remove-sensitive-data)

------
aliostad
Here is the terminal output of what I did to remove the file from the git
history:

~/g/aliostad bfg --delete-files 1703 deep-learning-lang-detection.git

Using repo : /Users/alikheyrollahi/github/aliostad/deep-learning-lang-
detection.git

Found 72811 objects to protect Found 2 commit-pointing refs : HEAD,
refs/heads/master

Protected commits \-----------------

These are your protected commits, and so their contents will NOT be altered:

* commit ac12aa68 (protected by 'HEAD') - contains 8 dirty files : \- data/stackoverflow-snippets/cpp/1703 (3.0 KB) \- data/stackoverflow-snippets/csharp/1703 (835 B) \- ...

WARNING: The dirty content above may be removed from other commits, but as the
_protected_ commits still use it, it will STILL exist in your repository.

Details of protected dirty content have been recorded here :

/Users/alikheyrollahi/github/aliostad/deep-learning-lang-detection.git.bfg-
report/2020-01-27/22-24-03/protected-dirt/

If you _really_ want this content gone, make a manual commit that removes it,
and then run the BFG on a fresh copy of your repo.

Cleaning \--------

Found 69 commits Cleaning commits: 100% (69/69) Cleaning commits completed in
304 ms.

Updating 1 Ref \--------------

Ref Before After \--------------------------------------- refs/heads/master |
ac12aa68 | c51406cc

Updating references: 100% (1/1) ...Ref update completed in 13 ms.

Commit Tree-Dirt History \------------------------

Earliest Latest | |
.................................................DDDDDDDDDDm

D = dirty commits (file tree fixed) m = modified commits (commit message or
parents changed) . = clean commits (no changes to file tree)

    
    
                             Before     After
     -------------------------------------------
     First modified commit | a4a1bbac | cb32cfbf
     Last dirty commit     | 45322921 | 6b9e8d5d
    

Deleted files \-------------

Filename Git id \--------------------------------------------------- 1703 |
530293d7 (614 B), 98c9b646 (3.0 KB), ...

In total, 47 object ids were changed. Full details are logged here:

/Users/alikheyrollahi/github/aliostad/deep-learning-lang-detection.git.bfg-
report/2020-01-27/22-24-03

BFG run is complete! When ready, run: git reflog expire --expire=now --all &&
git gc --prune=now --aggressive

\-- You can rewrite history in Git - don't let Trump do it for real! Trump's
administration has lied consistently, to make people give up on ever being
told the truth. Don't give up: [https://www.aclu.org/](https://www.aclu.org/)
\--

~/g/aliostad cd deep-learning-lang-detection.git ~/g/a/deep-learning-lang-
detection.git git reflog expire --expire=now --all && git gc --prune=now
--aggressive Enumerating objects: 89539, done. Counting objects: 100%
(89539/89539), done. Delta compression using up to 8 threads Compressing
objects: 100% (89537/89537), done. Writing objects: 100% (89539/89539), done.
Total 89539 (delta 28336), reused 61123 (delta 0) ~/g/a/deep-learning-lang-
detection.git git push Enter passphrase for key
'/Users/alikheyrollahi/.ssh/id_rsa': Enumerating objects: 89539, done.
Counting objects: 100% (89539/89539), done. Delta compression using up to 8
threads Compressing objects: 100% (61201/61201), done. Writing objects: 100%
(89539/89539), 40.83 MiB | 1.01 MiB/s, done. Total 89539 (delta 28336), reused
89539 (delta 28336) remote: Resolving deltas: 100% (28336/28336), done. To
github.com:aliostad/deep-learning-lang-detection.git \+ ac12aa680...c51406cc8
master -> master (forced update) ~/g/a/deep-learning-lang-detection.git cd ..

