Hacker News new | past | comments | ask | show | jobs | submit login
GitHub Enterprise SQL Injection (blog.orange.tw)
346 points by urig on Jan 7, 2017 | hide | past | web | favorite | 51 comments

Although I am impressed by the work, I dislike these writeups.

In similar bug bounty posts, a user will post all this brilliant setup code, these tools I never heard of to decompile binaries, view source, get through obfuscation, etc.

Then, when you get to the meat and bones, they just go "Ok and vulnurability is here, and you exploit it like this. Ta data."

It's very anti-climatic, and I don't learn much in the process. I would like to learn more about what your thought process is like after you get the code. How did you narrow your focus on ActiveRecord? Did you just know from seeing it? I would love to learn more. Do expert pen-testers have this checklist of exploits they try?

Sorry if it sounds like whiny, I would just love it if these had more meat in the actual finding of the exploit as they did the setup.

It's a bit of both. It's not so much a checklist as a classes of bugs, but close enough. That includes vulnerabilities like SQL injection, command injection, stored and reflected XSS, et cetera. The WAHH (Web Application Hacker's Handbook) is a good place to start here; it walks through a lot of these standard classes of web app vulns.

Something WAHH touches on, and experienced pentesters get a feel for, is more complex interactions. Often that means chaining several not-so-bad bugs together to build something really exciting. Sometimes they're just obscure consequences of say, a rather complex authorization system. These are typically a lot harder to find, and may involve simultaneous code audits. Ideally, your long-term in-house security people find these, because any attacker that does is in the 95th cleverness percentile.

So, TL;DR: start with WAHH; then go find some vulns, and start thinking about what the app does in terms of alternative things a program could do that a reasonable programmer who was just trying to get it to work might not have thought about.

If you're looking for tools to play with, you can do worse than downloading Kali and running it on a VM. If nothing else, it'll give you a pretty big catalog of tools to start looking at. I think WAHH covers basic Burp Suite usage, but Kali has some other tools like sqlmap and BeEF that aren't too tricky to get started with.

WAHH: http://mdsec.net/wahh/ Kali: https://www.kali.org/

Also want to mention Web Hacking 101 [0], which is written specifically for newbies in vulnerability searching, with real cases and explanation of how certain bugs are or could be found.

[0] https://leanpub.com/web-hacking-101

So there are various ways to learn about the basics (aside from experience); look into SQL injection, XSS and CSRF as a starter then once you understand and can attempt to perform those, you can work your way through this comprehensive if dry guide:


It literally contains almost everything for Webapps.

The next step is finding your own exploits or doing bug bounties.

If you are more interested in the other side of hacking (i.e. everything but webapps, usually OS level stuff) someone more informed than me can probably give you advice, but starting with microcorruption or similar won't hurt.

Regarding the tools, I recognize two-third of the tools from my web security class. So there are places where you can learn this.

It still boggles my mind that sql injection is a thing in 2016, let alone in a modern codebase from a great software company with amazing engineers.

Somehow we (the programming profession) are doing this whole thing wrong, and I'm not sure why.

I find it hilarious how people think there are all these "amazing engineers" around. Sometimes it even seems like hero worship. The reality is that the vast majority of engineers are mediocre even at high-profile tech companies and hot startups. That's not a bad thing and I don't mean it as a criticism; modern civilization has largely been built by mediocre engineers. And even the best engineers make frequent mistakes.

This sentiment is simultaneously terrifying and comforting.

While there are many ways to catch this sort of thing (code review, static analysis, education) I have to place the "blame" here on ActiveRecord. The 'order' API takes an arbitrary string by default. 99.999% of the time, this value can be restricted to a column name or can safely be escaped. For the 0.0001% of the time that something more complex (e.g. Sorting by a calculation like sum(column)) should require a separate, more cautious or complex API call. The common assumption is that all ActiveRecord APIs are "safe" and that simply isn't the case. Brakeman couldn't catch this since it's in Sinatra code and not rails code. Code review didn't catch this because the API was assumed safe by reviewers. Education didn't catch this because not everyone has seen rails-sqli.org. Mind boggling, but still very possible.

We have been enjoying 2017 for quite some time now

Basically any language that has easy string interpolation and takes SQL via a string has this as a potential issue.

The implementations across languages are remarkably similar, I suppose due to the lineage of the original C apis from the various DB vendors.

I don't see any easy way to fix it. If it's possible, and easy to do, it will keep happening.

Perl has a tainted mode where any outside data is marked as tainted. You can only detained it in specific ways. Copying the data from one variable to another keeps the tainted flag.

Pretty much only explicit checks are able to remove the tainted flag from the data. As such you limit the security problems to faulty checks. It is impossible to forget to check something.

Above is used in Bugzilla. Despite all of that, it's still pretty easy to make mistakes. IMO it's rather unfortunate that not more languages have such a feature. This as Perl code if often pretty difficult to read (it doesn't have to be!).

wow, that's a really great idea. I'm surprised that other languages or even frameworks haven't picked that up.

Maybe it can be pitched to Rust.

A lot of frameworks have picked up the notion of "html safe" strings. But this is generally limited html escaping and display rather than SQL.

Props to OP for finding the vulnerability without knowing Ruby on Rails.

Interesting that this is a standard use-case for ActiveRecord [0] and does not include any built-in protection against SQL injection. You can pass a hash of symbols into the method, but there is no safe way to pass string arguments with sanitizing them yourself.

[0] http://guides.rubyonrails.org/active_record_querying.html#or...

God, IMO this should be reported as a bug in rails; in my opinion it ActiveRecord should never accept strings for this without parsing them.

This is probably incredibly common for things like order params in many web apps and also relies on silly conversion rules in MySQL. Any time you see order=name in params you should be suspicious. Better to use numeric enums. There are similar problems with column names, this site has some great examples which apply to many ohms, not just rails:


The lesson is you should always strongly assert the type of user input (where you can, convert to known good values which is even better), and never magically convert types as rails/ MySQL does here. It's less convenient but safer.

Django for example will throw an error if an ordering parameter does not correspond to a database column.

If you have to specify all the columns of a table in Rails, then ActiveRecord ought to be able to restrict the order params to only column names.

I imagine some ORMs don't have this so people can do order by computed columns like "order by assets - liabilities"

To your point it probably makes sense to have an order method that verifies column names and an order_raw method that takes a string so the caller understands the risk, while protecting the usual case.

It's hidden way down at the bottom, but I found the Timeline to be my favourite part :)

Edit: Can't see to format it correctly, just check the original source.

This one?

  2016/12/26 05:48 Report vulnerability to GitHub via HackerOne
  2016/12/26 08:39 GitHub response that have validated issue and are working on a fix.
  2016/12/26 15:48 Provide more vulneraiblity detail.
  2016/12/28 02:44 GitHub response that the fix will included with next release of GitHub Enterprise.
  2017/01/04 06:41 GitHub response that offer $5,000 USD reward.
  2017/01/05 02:37 Asked Is there anything I should concern about if I want to post a blog?
  2017/01/05 03:06 GitHub is very open mind and response that it’s OK!
  2017/01/05 07:06 GitHub Enterprise 2.8.5 released!
Or this one?

  Day 1 - Setting VM
  Day 2 - Setting VM
  Day 3 - Learning Rails by code reviewing
  Day 4 - Learning Rails by code reviewing
  Day 5 - Learning Rails by code reviewing
  Day 6 - Yeah, I found a SQL Injection!

For those that this website is marked as dangerous http://webcache.googleusercontent.com/search?q=cache:z7yyaxI...

Is there a specified reason for it being marked as dangerous? It comes up fine for me. Granted it doesn't support SSL it seems (get that fixed, Orange!) so maybe you're using a plugin that marks non-SSL as dangerous?

The website seems to be a blogspot blog on a custom subdomain...

> For those that this website is marked as dangerous

What's causing that, your ISP?

F-Secure VPN

Probably has to do with the .tw TLD.

I'm curious if the author decompiled binaries generated from C code. In that case, I understand how one could "dis assemble" binaries, but generating readable C code, is that possible?

It's possible to an extent. The disassembler being used is a commercial product called IDA from Hex-Rays [1]. They also sell a decompiler plugin for IDA. While it can generate C-like output, it being 'readable' is debatable. Because of the lack debug symbols, compiler optimizations, and other things in release binaries, the decompiler output has some confusing characteristics. Just to name a few things: improperly guessed calling convention, odd code flow (lots of goto statements, labels, and inverted loops), messy dynamic string creation [2], bit shifts instead of multiplication/division for powers of 2, pointer arithmetic to access struct fields, stack variables reused for multiple purposes, etc. With all that being said, the Hex-Rays decompiler is the best I've seen so far. It's great to have when you want a quick overview of a function you are reverse engineering, if you or your company is willing to pay the exorbitant cost that is.

Additionally, there is open source decompiler called Snowman [2] that is pretty neat also.

[1] https://www.hex-rays.com/index.shtml [2] http://reverseengineering.stackexchange.com/questions/3546/e... [3] https://github.com/yegord/snowman

https://hopperapp.com is just 89 €, I've found its decompiler to be sufficient in most cases.

The application is written in Ruby and the code was just slightly[1] obfuscated so there was no need to decompile the actual application he was analyzing.

[1] "This obfuscation is intended to discourage GitHub Enterprise customers from making modifications to the VM. We know this 'encryption' is easily broken."

Decompilers exist (like Boomerang), but it is more likely the author used something like IDA Pro, which has a code flow graph feature.


Hex-Rays is a pretty good decompiler plugin for IDA Pro: https://www.hex-rays.com/products/decompiler/index.shtml

It's IDA Pro. FTA:

> So, let’s open IDA Pro!

Glad to see the communication between Orange and GitHub went great. Both did the right thing after discovering the flaw. Orange reported it and GitHub fixed it and allowed Orange to publish a blog post. I have to admit I'm a bit of a fanboy of HackerOne.

The custom SQL query is something that is hard to prevent. It is still hard to use ActiveRecord/Arel for everything. I'm sure GitLab is not immune to it.

GitHub chooses to encrypt their source code to prevent modifications. Our experience at GitLab is that customer modifications don't cause a lot of extra load on our support team. But of course that might be caused by having different architectures and customers.

> GitHub chooses to encrypt their source code to prevent modifications. Our experience at GitLab is that customer modifications don't cause a lot of extra load on our support team. But of course that might be caused by having different architectures and customers.

Most of the companies I have worked at that require code obfuscation and / or encryption do so because they believe someone will steal their source and re-use it or at least discover how it works and re-implement it in an attempt to steal customers (typically they suggest China in all of those). I'm not sure what GitHub's reasoning is but I hadn't heard of someone doing it to prevent additional support load.

That's an interesting an angle I hadn't thought of.

Github use a key for their obfuscation :

"This obfuscation is intended to discourage GitHub Enterprise customers from making modifications to the VM. We know this 'encryption' is easily broken."

So that's probably why he assumed their reasoning was to do with reducing support by avoiding modifications by customers.

Interesting, from our point of view at RhodeCode, allowing modification of code often benefited our support team. When testing some real edge cases, we were able to inject additional debug code to pinpoint the problems. It actually helps IMHO. Furthermore ability to add a patch to the code to fix a bug within minutes could be a selling point for some types of customers.

After doing support for folks who used code I developed where they modified it without telling me, I could see the merit of both ways. I've spent days remotely debugging an issue before realizing a customers actually did modify our product when they kept insisting it was never modified (he had no idea).

So I could see it working out both ways. I know one of the companies I dealt with who obfuscated their code could send us a debug version for testing / debugging a specific issue then we'd revert back to the release version. It's a little more complex but I kinda like that model.

But as you said it works great for you :)

It's amazingly easy to prevent for high value target type web apps use stored procedures only. The web app db user should only have permissions to execute appropriate stored procedures.

Stored procedures aren't a panacea for SQL injection. The real answer is to use parameters for any user input.

Otherwise if you're concatenating strings to invoke your stored proecedures you're in the same boat.

As a side note, it's not always possible to use parameter binding for all parts of a SQL query. :(

For example, when querying user uploaded databases (not a typical use case), the SQL query needs to include the table name. At least with SQLite (unsure about PostgreSQL), table names can't be bound in a parameter. They need to be part of the non parameterised query string, thus string smashing.

Field/column names are probably similar too, though I haven't checked (yet).

If you're doing anything that esoteric hopefully you understand proper identifier escaping. If not, there's bound to be much worse throughout your app.

That's what libraries like sequel, SQLAlchemy or Jooq are for. I'm looking at implementing some crazy dynamic queries in a CRM system I'm prototyping right now and SQLAlchemy is my saving grace to avoid shooting myself in the foot.

That's true, but an ORM like ActiveRecord ought to protect you from this because you are required to specify valid column names and table names before using it.

It should be able to check if an invalid column or table name is used.

It's extremely easy to validate in automatic way that only proper call's are used to call the sprocs also if you limit web app db user to only have permission to execute appropriate stored procedures you mitigate a good number of attacks even if someone did call the sproc in unsafe manner that allows sql injection now the attacker needs to guess sproc names and signatures and still limited in what he/she can do. This assumes that sprocs are themselves not written to concat sql and then execute it :)

The sad part is many libs force you to accept prepared statements with your parameters or you get no parameters at all.

Care to elaborate?

I see Dynamic SQL being used the most for custom queries that require dynamic columns, or especially, dynamic ordering. Using sprocs won't help you there as you still need Dynamic SQL under the hood. I think the best solution is something like Linq-to-Entities with manually defined order-by logic for each query (e.g. an enum defining the sort-order possibilities) or reflection-based matching, neither of which is pretty.

Dynamic SQL is not the problem, you can safely construct queries on runtime in most RDBMS-s, the key thing is to let the DB treat user input as parameters. You can concat strings to create dynamic queries all you want, just don't blindly shove user input in there, construct the query dynamicaly with parameter placeholders and then bind the user-passed input as parameters.

Of course, there's still the matter of those queries not nessarily being syntactically or semantically correct which will only bite you on runtime, but that's a different story.

If you're doing dynamic SQL you either know what you're doing, or don't know that you don't know what you're doing. Although that probably goes for most of things in life.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact