Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
RCE via GitHub import (gitlab.com/gitlab-org)
184 points by louislang on Oct 10, 2022 | hide | past | favorite | 54 comments


> Sawyer is a crazy class that converts a hash to an object whose methods are based on the hash's key:

For those not familiar with Ruby terminology, "hash" in this context means what other languages would call a hash map, associative array, or dictionary. At first I thought it was referring to the hash of a value, and couldn't grok it.


For those on the Python side of things, https://box.readthedocs.io/en/development/ does something similar and is a great utility to reach for when you're handling trusted JSON-like data and need an API-like interface.

(The Django templating language does this too, albeit in a more magical way: https://docs.djangoproject.com/en/4.1/ref/templates/language...)

Just... make sure you're aware of the issue that the OP's RCE brings up!


> "hash" in this context means what other languages would call a hash map

Interesting. The only other time I've seen this term used for a hash map is in the context of Redis hashes [0], which confused me to no end when I first encountered them for the same reason. Given that Redis itself is written in C, I wonder where the term originated.

[0] https://redis.io/docs/data-types/hashes/


This got me curious, so I went and looked what it was called in a few languages.

  Perl: Hash
  Ruby: Hash
  PHP: HashMap
  Java: HashMap
  Haxe: Map
  Golang: map
  Swift: Dict
  Python: Dict
  C#: Dictionary
  Javascript: Object
  Redis: Hash
  Postgresql: hstore, json


Does PHP have a HashMap class now? I haven’t written PHP for a few years now but associative array used to be how they were referred to.


It doesn't, at least according to the current documentation.

Not sure why they would need a class for that when they already have a primitive type that does the job. I guess maybe a new OO interface will have its parameters ordered more consistently?


Yeah, I pretty sure Perl also called them associative arrays, but like you I haven’t written much Perl since Perl 4/5.


In all my time as a Perl dev (since 1996) I don't think I've ever heard the term "associative array" except at the start of `perldoc perldata` - which says they're known as hashes and then never uses the term again.

    DESCRIPTION
      Variable names
        Perl has three built-in data types: scalars, arrays of scalars, and
        associative arrays of scalars, known as "hashes". A scalar is a single
        string (of any size, limited only by the available memory), number, or a
        reference to something (which will be discussed in perlref). Normal
        arrays are ordered lists of scalars indexed by number, starting with 0.
        Hashes are unordered collections of scalar values indexed by their
        associated string key.
Hash is ubiquitous.


Yes they used to be called Associative Arrays. I think the change occurred between Perl4 & Perl5.

In Programming Perl it says "Associative Arrays" in 1st edition and "Hashes (Associative Arrays)" in 2nd edition.

NB. 2nd edition was the first to cover Perl5


JavaScript also has Map. Which is more general since the keys don't have to be strings.


Keys do not need to be strings in Java either, you need to properly implement equals and hashCode() methods though for the key type.

Also Java has like ten different implementations of Map<K,V> depending on various use cases.


I wasn't comparing to Java. I was commenting on

> Javascript: Object

And in javascript, if you use an Object as a map, the keys do have to be strings (if they aren't already, they are converted into strings).


Associative array in PHP. As opposed to indexed array.


C++: unordered_map


The oldest example I can think of is Perl.


Perl, probably.


> Normally, id should be a number. However when id is {"to_s": {"bytesize": 2, "to_s": "1234REDIS_COMMANDS" }}, we can inject additional redis commands by using bytesize to limit the previous command when it is constructed....

This class of bug cannot happen in a statically-typed language. When I deserialize something in Java, I know what it turned into. If it wasn't what I expected, I get a stacktrace, not an RCE vulnerability.

I could write an interpreter to do weird stuff with my custom class and that could lead to an RCE, or I might have a library that does so. But barring severe malfeasance in a trusted library (like log4j) I'll never think I have a plain number and actually have untrusted code.


Java had a long list of this class of vulnerability, because it lets you build frameworks that lookup class by name.

It's no longer common to do that in Java, because after the CVEs were reported, framework authors changed the API. This process also happened in ruby. Not familiar enough with other ecosystems to be sure but I'd bet money it's hit a fair few of them.


I'd file that under "weird stuff you can do with the deserialized object", and static types certainly don't protect you from that (edit: or from dangerous serializer libraries).

This bug isn't in that category, though: this happened because GitLab assumed they had a specific data type but didn't guarantee that this was always true. The build_command method, in turn, does something different depending on the type of data passed into it.

This line in the Ruby code:

    id = id_for_already_imported_cache(object)
Would translate to this in the Java code:

    Object id = idForAlreadyImportedCache(object)
Which would immediately set off alarm bells in the mind of the author that they should probably double-check they've got the right type.


There's a similar set of problems with .NET and BinaryFormatter and a few other things.

MS now have a security guide on using them: https://learn.microsoft.com/en-us/dotnet/standard/serializat...


I love and prefer Java & Static languages too, however saying it cannot happen in a static language is a bit of stretch. The recent Log4j vulnerability and countless Jackson vulnerabilities are examples of how a simple deserialiation can gone wrong.

Jackson can accept "@class": "com.package.something.Class" which might be class that executes commands on constructor, there is a huge denylist to keep you safe when you use that feature. Almost similar to what Gitlab have here.


Log4J was not a case of "simple" deserialization. The feature that caused the vulnerability was designed for remote execution of code. For this reason I called it out explicitly as an exception.

Vulnerabilities in the serialization library are a separate class of bug. This gitlab bug wasn't a serialization vulnerability, it was a flawed assumption about the structure of the data. The serializer did its job just fine.

In the case of Jackson's @class tagging, the key thing is that Java will still force you to think about the type. If you expect the `id` field to be an integer, you'll have to eventually cast Object to the class that you think you have, and that cast will fail if you got something else instead. There's no way for a redis command to make its way into an int field.


Have you heard about this: https://www.zdnet.com/article/equifax-confirms-apache-struts...

Pretty much everyone's private credit data in the US were leaked because of code written in... Java. Should we still go on into this discussion about how a statically-typed language saves you from whatever, or we can bury this finally?


This is what I said:

> This class of bug cannot happen in a statically-typed language.

I never claimed statically typed languages prevented all RCE bugs, I claimed they prevent RCE bugs that result from making bad assumptions about the shape of your data, which is what happened here with gitlab.

The RCE you linked to was the result of including an interpreter for a scripting language and incorrectly executing it during error handling. This is exactly the kind of malfeasance I identified in my original comment as potentially breaking the safety static typing can bring. Even this could have been prevented by better use of the type system: the OGNL interpreter shouldn't be able to accept just any String, it should require wrapping the String in an OGNLScript class first. As long as library code doesn't construct that class, this would force all layers to keep track of whether a given string is actually safe to interpret. The user must at some point opt in.

Back to the gitlab exploit: It's a fact that a statically-typed language does save you from bugs whose explanation is "id is supposed to be a number, but..." We can quibble about whether it's worth the hassle to have that guarantee, but this particular bug would not have been written in Java.


> Sawyer is a crazy class that converts a hash to an object whose methods are based on the hash's key:

People keep adding hidden interpreters and 'exec' commands in their projects where no one expects them to be. As far as I can tell the library is just supposed to make HTTP requests, it's just a fancy client library?

This needs to be the kind of thing that developers are trained on, like "don't hash passwords" and "don't build up SQL strings from untrusted input". "Don't build your own 'exec' and hide it in your library, don't reinvent object serialization".

There is almost always a better way to solve a problem than by transmitting arbitrary code. Sometimes there isn't but it's rare, and in those cases it's very explicit ie: "I'm sending this database a query, I expect it to execute it".


The RCE here doesn't come from Sawyer `exec`ing anything. Sawyer builds objects from a hashmap, where the hashmap's properties can be accessed as method calls, recursively.

The vulnerability here arises from the fact that you can override built-in methods like `to_s`, which in combination with the way that the Redis gem builds raw commands, can be used to send arbitrary commands to Redis.


Yeah... although monkey-patching, as this bug demonstrates, can get uncomfortably close to `exec`. I have a strong distaste for monkey-patching as a result since I'm only confident in its security when it's used purely statically (i.e. no user input can change how the monkey patching happens), but then that kind of robs the entire point of monkey-patching if you remove its dynamism.


Not sure if everyone will share my definition, but I wouldn’t call this monkey-patching. Yes, it is dynamic addition of methods at runtime, but almost everything in Ruby is runtime, including method definition in the common case. So I reserve the term for when one library intrusively (and often globally) modifies the behaviour of another, i.e. some clear sense of a control boundary being crossed, which is only slightly less technical because we can still denote clear boundaries. And in this case it looks like intentional (albeit naive) behaviour within the library in question.


... And the fact that Ruby doesn’t really have fields, only methods. For example, the Python equivalent to this (morally, (obj := object()).__dict__.update(untrusted data)) would not be vulnerable. Which is not a point against Ruby, only against using this specific technique in Ruby.


I didn't mean to say it was 'exec'ing anything directly, I meant that people are reinventing 'exec' and hidden interpreters. Building up methods from runtime data is the problem I'm referring to.


What’s wrong with hashing passwords? Hash with a slow algorithm is the best method, as far as I know


As one random point: If you just hash the password, you're vulnerable to rainbow table attacks. So you want to salt the password, at the very least.

But really, what you want to do is use a framework developed by domain experts that deals with all that mess for you. Because there's a lot of surprising complexity to storing password hashes securely. So it's better to use a well-vetted library that has eyeballs and mindshare checking that it is correct.


Rainbow table attacks are significantly harder with properly hashed passwords, e.g. with bcrypt.


I think that's what they are saying, bcrypt is secure because it uses a salt and multiple rounds of hashing.


I think all bcrypt implementations implement salting per default. Same for any modern password hashing implementation.


That wasn't the point I was making. I was contrasting it with the (mis)use of e.g. SHA1 or worse, MD5.


Avoid passwords. "A secret is something you tell one other person, so I'm telling you". If possible adjust APIs to not rely on knowledge of secrets for their functioning, and then this entire problem evaporates. e.g. WebAuthn.

If it's crucial that a human memorable secret (a password) is used, choose an asymmetrical Password Authenticated Key Exchange in which it's possible for the relying party to learn a value with which they can confirm that the other party knows the password, but never learn what that password is. This is really difficult to do properly, OPAQUE is the current recommendation of the IETF for this purpose.

Only fall back to hashing passwords because you're obliged to for legacy reasons.


PAKEs bring little to the table when looked at in context of an actual threat model[1], meanwhile they add a fair degree of complexity and their implementations are much less battle-tested. Using them correctly won't really make anything more secure, but using them incorrectly might blow everything up.

Don't avoid passwords with WebAuthn, either. "Fingerprints are usernames, not passwords".

1. https://palant.info/2018/10/25/should-your-next-web-based-lo...


> Don't avoid passwords with WebAuthn, either. "Fingerprints are usernames, not passwords".

It seems like you've no idea how WebAuthn works, or why it's such an important improvement. That's fine, but probably better to just tell people "I know nothing about this" rather than pretend to know enough to give a recommendation.

> PAKEs bring little to the table when looked at in context of an actual threat model

The "threat model" your link proposes is "It's a web site". For which WebAuthn is more appropriate and covers all the issues listed.


WebAuthn can be used with or without a password. You said "avoid passwords", and mentioned WebAuthn. I simply clarified "don't avoid passwords with WebAuthn". By all means use WebAuthn, but ideally keep the password, too.

> The "threat model" your link proposes is "It's a web site". For which WebAuthn is more appropriate and covers all the issues listed.

You have correctly deduced that my comment on PAKEs was a response to your comment on PAKEs and not a criticism of WebAuthn. I believe I can say with confidence that we both think WebAuthn is great.


> By all means use WebAuthn, but ideally keep the password, too.

WebAuthn is perfectly capable of delivering two factor authentication in a single, easier to use, more secure solution. Passwords are awful, it's yet another indictment of our industry that somehow yet again a temporary hack was enshrined as a great idea. See also: people are still making new programming languages with Hoare's Billion Dollar Mistake.

If you've got WebAuthn plus a password it's like you used a padlock to secure your gate, but for some reason you also tied it shut with a shoelace. I am not going to be intimidated by the need to untie the shoelace if I've somehow broken the padlock, yet even if I can't break the padlock I can easily steal the shoelace anyway. Sites which have password + WebAuthn are secured by WebAuthn, but the user's passwords are as vulnerable as ever (to phishing, to database exposure, logging screw-ups, and so on).

The reason you might do that is legacy again, but I'm not talking about "What people might be obliged to do for legacy reasons" rather what they should strive for.


> Using them correctly won't really make anything more secure, but using them incorrectly might blow everything up.

The main benefit of PAKE is avoiding accidental logging of passwords, which absolutely happens at organizations all the time. ZKP is a fine approach, although you can avoid much of the issue with something simpler (addressing a narrower threat, but the most relevant piece).


Nothing, it's an example of good advice we give developers. I made a mistake it in my first post.


It's an easy slippery slope.

We need flexibility because we can't predict all future needs, hence an interpreter.

An interpreter can at worst produce a denial of service. But we also want it to access our data to be useful. Giving access to exact data items it may require is hard or impossible (see flexibility), so we give it a lump of access. Hello exfiltration.

But crafting and supporting a limited interpreter is a chore. Why can't we use the perfectly good interpreter which runs our software? It's almost an RCE!

For the win, we should note that input sanitation is hard and costs us CPU, and skip it. Now finally we made enough to have our system pwned.


Sorry, I meant "don't store unhashed passwords".



It is interesting how far this discloser chose to go when investigating this vulnerability. They reproduced on their local Gitlab install, then proceeded to exploit the gitlab.com install itself and break one of their own gitlab repos.

I am curious what the norms are here.


It would depend on the rules for the program. GitLab's HackerOne Rules of Engagement do not explicitly prohibit testing on gitlab.com except for attacks that are "disruptive" or access the private information of other users.

I'd personally put running an RCE into that category, but it's subjective.

https://hackerone.com/gitlab?type=team


> I'd personally put running an RCE into that category, but it's subjective.

I think that would depend on what code is being run. If it is something innocuous that just reports back that it was run, without any data, that doesn't seem disruptive or like it would access private information.


It does also seem like the reporter was extremely careful as to not accidentally access user data

> After seeing the pings, I immediately turned off the replication by executing the redis command REPLICAOF no one\n\n. No information from your redis server has been replicated to mine as I used nc and I got only the ping messages.


I mean they caused an internal server error on the Gitlab end. I agree they definitely weren't trying to access any user data other than their own.


Kudos on the comment log in demonstrating the transparency of how this was dealt with in the intervening month.


I'm curious on the comment about a customer asking on the fix status. That comment was made before the thread was public. How did the customer know? Was it independently discovered or are they disclosing some known bugs to high value customers?


Look at the timing:

https://about.gitlab.com/releases/2022/08/22/critical-securi...

Blog post published this was patched on 8/22. Support Engineer commented on 8/25.

This is on HN today because this is "30 days after H1 issue is closed make issue public" workflow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: