
GitHub Thinks I'm a Robot - vonnieda
http://vonnieda.org/archives/1905
======
hlieberman
In all honesty -- and I say this as someone who tends to advocate GitLab over
GitHub -- I think they did everything right here.

They implemented spam detection in a way that minimizes the potential impact,
prominently notified the affected user, provided a mechanism for solving the
problem and did so quickly.

I understand the author's frustration and sympathize, but I'm not sure how
much better GitHub could have done here.

~~~
vonnieda
Author here. I agree that they handled it well and quickly. Props for that. I
have two main concerns:

1\. Why did it happen in the first place? I have a very active account with
lots of "human" evidence. If I'd posted a single spammy link or something in a
Gist (which I didn't) how does that override the fact that I'd just committed
and updated my Wiki 12 hours ago?

2\. Why not notify me and let me respond? If the account is flagged, send an
email and require me to log in, answer a question, perform a captcha, etc. I'd
even be okay with it if they flagged the account first and then sent me an
email, but the only way I found out about this is by logging in this morning.

I love Github, but I think this system / policy could use some work.

~~~
Mithaldu
As someone who deals with spam in an even more difficult situation, that being
a forum where anyone can post just by accessing it in a browser with no
account needed or even possible: Often nicer ways are not usefully scalable or
even feasible.

I get a lot of child porn link spam and am currently looking into OCR software
to identify images spam bots post, because the spammers have been hard at work
at circumventing any other measure i put in place. With that kind of thing
being "spam", i can't be nice about it. I need to be zero-tolerance on any
detection, even if it may be a false positive.

As for why it's not reviewed by a human: That may not scale. My forum is
rather small, 30000 unique visitors per month, but even so, with a team of
three people, there is simply no way to look at everything identified as spam.
And going by the contacts we get from users about false positives, even if i
assume 10 times as many things are false positive than people ask us about,
we're still below 1%.

Also, at a wild guess: With an account as big as yours, maybe you just went
past a total size limit for all the repos in your account, looking like a bot
trying to treat github like a storage system.

Lastly, something meant to be friendly advice: I understand that you're a
little upset, but going to the effort of writing such a big blog post, and
using that strong language for something that did no actual damage is over the
top. This may not be part of your daily dealings, but before you attack github
over this, i invite you to try and spend some time to think about how you
could write bots to abuse github, which is pretty much a publicly writable
storage system, and how you would counteract such abuse.

~~~
vonnieda
I understand that fighting spam is difficult; I ran public email servers for
years, and I think that Github does a great job at it.

My gripe here is that they didn't notify me before or after hiding my account.
I don't expect a human to review the flag and I'm not suggesting they do. I
agree, that doesn't scale. But it doesn't take a human to send me an email
that says "Your account has been flagged for spam. Please contact support to
resolve this issue."

The issue is that if I hadn't happened to log in this morning I would not have
known that my account was hidden until I started getting complaints about
broken links. Github was returning 404 errors for my repos. That means no one
could download code or releases, read Wikis, file Issues, etc. It was like the
repos had just been deleted.

My projects are pretty small, comparatively. But imagine if you went to read
some docs on Angular or Express or any other large, open source project and
you just got a 404 for the entire project? It's pretty scary.

My only ask here is that they notify users that they flag. Nothing more. I
think it's a pretty reasonable request. I have since exchanged a couple more
emails with them and they have said they'll raise the issue internally.

~~~
Mithaldu
Ah, i did not realize they did not email you.

That is something to which you at least deserve an answer to, though i expect
they don't do it because sending out that email would alert the owner of the
bot which would otherwise never be any wiser. It's a cousin of hellbanning,
something which even HN does.

Though i think it's also fair to point out that your post only raises that
issue as an aside, while that is the most salient thing in it.

Also, a bit of github background: Personal accounts are meant for small
personal things. They expect anything big and important to happen in
organizations, which i suppose are treated slightly differently.

------
gkoberger
Spam detection is far from perfect. The "error" message seemed incredibly
friendly (it made me smile), the support guy was nice and as helpful as he
could be, and they fixed it right away. Something triggered their alerts, and
they temporarily locked down your account. The alternative is thousands of
spam bots running amok.

It's not ideal, and maybe a warning would have been nice... however, I feel
like GitHub handled it way better than most other sites would. If this
happened to me, I would have had no complaints.

~~~
chao-
Same. The author's "completely lost my trust" feels unwarranted. With all data
still accessible and a resolution protocol clearly presented, I'd tout this as
an example of a good policy. I wish other sites/organizations would learn from
Github and handle cases like this so clearly and promptly.

Now I'm sure I would still feel inconvenienced for this having happened at
all, but again, with things presented so clearly, it falls into my "well, shit
happens" bucket rather than the "I am morally outraged and cannot conceive how
this would ever occur!" bucket.

Maybe it's because I _can_ conceive how this would occur? Having only
tangentially worked with a few such systems (e.g. identifying bot behavior),
false positives come with the territory and so I've got some empathy for the
Customer Support reps who have to deal with the fallout from edge cases in
such a system.

Feels a lot like having a credit card get locked while traveling (which I have
had happen _despite notifying the issuer_ of impending travel). That is, a
system which helps in most cases wound up classifying me as a different-from-
most case. That's going to keep happening throughout my life. It's part of the
automation-assisted world we live in.

The question is how Github handles these cases, and I think they did fine.

------
PuffinBlue
I'm always surprised when other people are surprised that these centrally
controlled third parties have the ability to control access to data that you
give them.

Why is it _so_ surprising and upsetting that something like this occurred?
I've always taken the approach that data shared to these types of services
are, fundamentally, out of my ultimate control - is this not a common
viewpoint?

By using GitHub, or any other service for that matter, you're literally giving
someone else your data to manage and are at their mercy. Yes, you'd hope that
that service would continue to operate and provide value to you but you do
have to take steps to make sure that you have alternatives in place should the
worst happen.

The real core of it is what is it that make events like these so intensely
surprising and _personal_ to those who experience it?

Finally, I'd have to say that GitHub did everything correctly here, they
responded quickly, rectified the fault and gave a reasonable answer.

~~~
dzek69
github is somehow advertised as ideal place for open source community.

instant-banning long-term active-user and not telling what was THAT suspicious
in his user activity that they have to ban him doesn't seems to match being
"open"

in my opinion - active account of long term user should get so many "positive
score" in the banning rules that it shouldn't instant-ban him. it couldn't be
scenario like "your account had been taken over" \- they told him he get into
spam filters. he should have been given at least 24 hours to react before
hiding all his stuff

but yeah.

nobody should treat github seriously. not anymore.

------
CaliforniaKarl
Ouch. Yeah, it's not good when suddenly everything is disappeared from public
view.

So, two comments.

First, GitHub, if you believe an account is not associated with a human, put a
banner up on all of their pages, one that _everyone_ can see. For the banner
visible to everyone, say something like "We [GitHub] think this account's
owner is not human. Please have the account owner log in ASAP, or this page
will disappear soon!"

Next, don't forget this this is Git. If you don't have 100% trust in GitHub,
then just mirror your repo somewhere else. Spend the 20$ a month to get an
account at a web host that supports cron jobs, and run a mirror script every
five minutes.

~~~
ultramancool
$20 a month for a web host? What year did I get teleported back to? How about
a $5/mo digitalocean box or $15/mo kimsufi dedicated server.

Better yet, just host your code on your own stuff and sync it over to GitHub
using a hook. Deploying gogs or gitlab using docker is dead easy.

~~~
chao-
Seconding setting up your own. I have a low-power just-in-case "server" (a
term I use loosely, it's a little $200 Zotac zbox) paired with a NAS just for
this purpose.

------
sillysaurus3
I once triggered this by pasting a very, very long URL into a gist. It's
unsettling realizing that your entire identity can be turned off at the flick
of a switch.

It's hard to think of a better way to handle this. Github's current behavior
is to remove your account from public view until it can be determined your
account wasn't compromised. During this time, you can still push to your repos
and do any other write operation. Another way to handle this would be to make
your account read-only without hiding anything. The latter is objectively
worse: imagine suddenly not being able to do anything.

Gitlab is quite pleasant nowadays, so that's an option.

~~~
jbg_
The whole point of distributed version control is that if Github shuts down
(or locks you out), you can still do your work.

Them making your account read-only should not lead to you not being able to do
anything.

~~~
sillysaurus3
Certainly, but many companies have centralized around Github to the point that
it's integrated into their workflows (e.g. deployment).

~~~
chao-
Github has a clear resolution policy, and by the author's account they were
also quite responsive, even on a weekend. If a temporary user account issue
completely destroys the ability of a professional team at a company to
interact with their production system, is it Github's fault, or is it the
company's for not having a backup method of interacting with something so
critical?

I like Continuous Delivery systems as much as anyone, but if something in that
pipeline broke _and it perfectly lined up with_ an everything-is-on-fire
moment where I needed to deploy a hotfix? I'd just run my tests manually and
deploy directly.

------
nercury
This worries me a bit: I have another github account that is definitely a
robot: it publishes documentation on github pages automatically on successful
travis build.

There are robots everywhere: for example, the Rust's github repos won't work
the same without bors or highfive.

If I remember correctly, the github itself definitely used to allow robots for
such use cases.

~~~
gkoberger
They aren't worried about robots, they're worried about spam. The "we don't
think you're human" error message was just to sound cute. Automation is fine;
it's likely the poster pushed a spammy link or something that set off their
filters.

------
x1798DE
>Sorry, but we have to keep our spam-detecting tactics hush-hush. If I were to
share that information and word got out, it would be like releasing access to
some of our security protocols into the big, wide world. I hope you
understand.

I don't know much about spam-fighting, so I don't know to what extent this
"obscurity" strategy is viable there, but this is at best a bad analogy since
the consensus seems to be that security protocols that you have to hide are
not good security protocols.

~~~
PuffinBlue
Keeping a capability secret is a perfectly valid approach to providing
security. If you publicise a capability then an attacker can work to defeat
that capability very easily and directly. By keeping it secret you force the
attacker to first find out you have it and unpick that capability in order to
defeat it.

Security by obscurity cannot be the _only_ approach to security, but it's
certainly a valid tool. Militaries and security services the world over rely
on it throughout history for instance.

------
voaie
The sudden disappearance is not only annoying to the victim. The last time I
saw an account disappear in front of me, I thought that person might be a
joker.

