This is a pretty thorough and high profile hack on a major tech company - this isn't something I'd expect from an Amazon owned property. The hack (allegedly, I haven't downloaded it) includes
* Entire git histories
* Internal/Private AWS SDKs
* Encrypted Password dumps and payout reports
It's so comprehensive I'm very curious into how an attacker got that level of access. I can't think of another, large, corporate web 2.0 startup who's gotten owned in a similar fashion. Could the same attack work on Amazon? YouTube?
It's also strange that someone who has this level of access to what is presumably a multi-billion dollar company decided to just leak the data? Maybe they did try to ransom it, but I'd imagine someone with this kind of access inside Twitch must have had some creative way of making money.
There were no encrypted password dumps. No production secrets were leaked (according to the article). What's here is no more than what your average Twitch engineer has access to.
Yes, that included payout data. Anyone with "staff" access to the site (which any employee can have) has access to any streamer's dashboard, which includes payout data.
I don't think this was an attack. Based on the data so far I think it was a disgruntled engineer. Obviously if more gets leaked later I may revise that opinion.
I also worked for Twitch and can confirm what you're saying is true. These repo's any staff member had access to - including non-engineering staff.
Revenue for the longest time was as simple as navigating to a streamers dashboard as staff, but they did finally gate that away from staff who don't need to see that info, however I am sure there are other ways to obtain revenue reporting info.
I am assuming all data - including personal - has been compromised but so far, the data leaked is data that most staff would have access to in some way or another. Some may find that shocking, but this was not a "high level hack"
I'm actually very happy to hear they finally added a flag for payout access. It's been years since I was there and my eyes bugged out when I saw what I had access to without needing it.
The better question is, why did random engineers have access to the financials of the streamers on the platform, without having to go through a break-glass, audited, emergency access escalation.
No, but doing least privilege, separation of privileges, and RBAC correctly is tedious and difficult and slows development velocity, so companies rarely do it well if they even bother trying unless some outside force compels them.
I highly doubt it would be possible to do something like this at AWS, just because hosting multitenant infrastructure and working with the government forces you to implement security since you're being audited and awarded contracts on that basis. Twitch users don't give a crap about the security of the platform. They just want to monetize as quickly as they can, too.
So I'm not hugely surprised that practices and culture would be different even if they have the same parent company, especially since Twitch was an acquisition. Even if not, though, I'd expect security at Prime to be better than Twitch but worse than Marketplace, Marketplace to be worse than AWS, etc. All speculation since I've never worked at any Amazon product, but that's what I would expect.
The tradeoffs for any individual piece of data are different from the tradeoffs of a company-wide policy. Siloing off one little thing (e.g. credit card info) usually doesn't inconvenience very many people, but at the same time it only provides marginal security. No front page headline has ever read "At Least The Credit Card Info Was Safe". On the other hand, a company-wide policy of siloing everything can have more of a security impact, but it also inconveniences everyone frequently. That's the tradeoff that many tech companies don't want to make.
I don't see how this precludes just-in-time access. Even if people can re-up on their own, you can still observe the data access patterns and manage the risk. Further, when you see someone is getting blocked a lot you can improve the experience for them so they are unblocked, or have more efficient access to the data. This is just mature data and security management.
Quality of life and developer experience are important topics in many ways, but should they really trump security consistently? It's always going to be dependent on people's risk assessment and comfort, but frequently it skews the wrong way because the people making the decisions know that they'll be gone.
Implementing just-in-time access on legacy systems that pre-date just-in-time architectures is extremely expensive. Its cheaper to either give all info or no info. Which is what every legacy company does instead.
My company can shut off my access to the all the databases when they stop asking me to troubleshoot any and all data issues. Which will never happen.
Part of the appeal is working at a place like Amazon is having a voice in decision making in the product you’re building. Hard to make informed decisions or opinions without the data. Engineers in Amazon retail definitely have broad access to sales data.
In general the broad access was to code repos early on. Some were gated. There’s lots of collaboration and the need to study other code bases for learning and collaboration, read only. It’s micro services galore there so one didn’t tend to have access to production databases for services or systems you didn’t work on. You were opted in there. Teams did their own devops for the most part.
The payout data likely wasn’t ripped from a DB but rather dashboards which customer service or partnerships likely had access to. Tier1 or Tier2 support kinda stuff.
This smells like a stolen backup or maybe network access and http scanning, finding the internal GitHub and maybe a support admin cred that allowed dashboard view.
The perk is the wrench UX denoting you are an employee to the community . Reddit/twitch allow employees to communicate with the users . It is a social media platform , being able to indicate that you are special is street cred.
The other access rights that come from staff access is either incedential or miss /debt in architecture.
From prior (user) experience, typically staff have non-wrenched alt accounts for just chilling out in streams (with less concern about conduct, but generally tame by twitch standards). But will wrench up for higher profile streams or folks they're otherwise pretty close with personally.
I suspect that's a lot more controlled these days, but it wasn't very uncommon for signified staff to be trolling along with everyone else.
No one in IT should have access to business data. That's simply best practice. Worst case would be a database engineer who has access to backups or some prod data for troubleshooting, and even that should be under tight control with good access accounting.
Welcome to devops. Ask Mike down the hall to add you to the “admin” group. Tell him you’re a new dev so you need everything.
(This is a joke but also, at many companies, it’s not. Twitch was once small and grew. Who knows what ancient all-access switches are still critical to running the systems, marked “tech debt” in someone’s backlog)
The whole point of devops is to automate everything according to best practices, so fuckups are a thing of the past! The only fuckups, of course, will be Terraform state issues.
Until the business raises a priority one incident that their monthly reports are not looking right and you need to dive into the data to find out why some other API back end decided to present its numbers this month divided by 1000 for ease of display to their own users.
I know, I know, service contacts but my point is sometimes engineers need at least temporary access to provide support at times.
> It's also strange that someone who has this level of access to what is presumably a multi-billion dollar company decided to just leak the data? Maybe they did try to ransom it, but I'd imagine someone with this kind of access inside Twitch must have had some creative way of making money.
Notably, the initial leak didn't actually include the password data which the leaker claims to have, just source code and payment data which has been verified by several affected streamers. It's possible that this first leak was just to establish trust so they can random or auction password hashes later.
Given the torrent is labeled "twitch-leaks-part-one" I'm curious too as to what they have. The torrent breaks out into a lot of compressed volumes, so it's clear this wasn't just a backup file, but a curated collection of files. I'm very curious if we will see any other amazon related leaks come from it.
Either way, I can only imagine the chaos inside as they try to figure out what has transpired here.
>It's possible that this first leak was just to establish trust so they can random or auction password hashes later.
Password hashes are relatively useless though? Once the leak is announced I imagine most of the big targets will rotate their credentials. Then the next thing you need to do is spend possibly thousands in CPU time bruteforcing bcrypt hashes. Then I'm not sure what you can even do with those.
I'm not criminally creative but I imagine you could make more by abusing trust with payment processors or fraudulent invoices.
>Then I'm not sure what you can even do with those
Assume some end users used the same passwords on other, non-twitch accounts. That's what makes hacked passwords valuable, no matter where they came from.
That's something I've wondered - do password hashes tend to be the same across platforms? Is everyone using the same hashing algorithm? Isn't this also what salting is for?
Yes, the hashes are (usually) different due to different algorithms and/or salts. But, if you've brute forced one by using good guesses, and know the email/userid for other sites, and the user used the same or a similar password...that doesn't matter.
If everyone did things the way they're supposed to then no, hashes should never be the same between platforms. Using the same algorithm is likely, but as you said, salting solves that.
But mistakes such as salting with just the username are sometimes made even by very large companies and in that case, hashes could be the same.
If they are the same everywhere, you can precompute a huge database of hashes (called a rainbow table) and simply lookup the hash in the table when breaches occur to find the password. By salting, every provider who stores credentials has different hashes for the same inputs which makes the approach far less attractive at a large scale.
> If they are the same everywhere, you can precompute a huge database of hashes (called a rainbow table) and simply lookup the hash in the table when breaches occur to find the password.
You can do this anyway. But the space requirements of a rainbow table are so large that including an account's username in the password would make a rainbow table completely unfeasible.
It doesn't matter at all if one person's hashed password is identical across two of that person's accounts on two different websites. The identical hash will instantly let an attacker (with access to both hashes) know that this person shares the same password across two accounts. But that is of no value; the attacker is going to start by assuming that it's true anyway.
Salts are there to ensure that two accounts on the same website which have identical passwords nevertheless have different password hashes.
If I understand this right, the problem is "twitchsalt" has to be known so that you can generate the same hash for future logins. So it's just one iteration of hashing more for a brute force attempt (modern hashing algorithms already use multiple iterations of hashing to make brute forcing harder)
Password salting has nothing to do with password reuse.
Imagine two people have accounts on each of two websites:
eBay YouTube
Alice sunlight bobrules
Bob bobrules bobrules
A password reuse attack dumps the YouTube database, cracks Bob's password, and then accesses Bob's eBay account. The fix for this is that Bob should use different passwords on his different accounts. Hashing helps by making step 2 ("crack Bob's password") more difficult. Salting does not affect this attack in any way. Note that the attacker didn't bother to dump the eBay database.
The attack that salting protects against dumps the YouTube database, cracks Bob's password, and then accesses Alice's YouTube account.
And that is not affected by salting. You can use a rainbow table to look passwords up whether or not those passwords are salted. There is zero conceptual connection between the two ideas.
Now, realistically, you can't use a rainbow table on passwords of any noticeable length, and a salt may push the password over the edge of that threshold. If that's really what you want... enforce a minimum password length.
Salts do nothing for people with predictable passwords though. The salt is in the dump, so I can hash known plaintext with the algorithm and the dumped data.
Even if I can only hash a million a day, if your password is one of the top million most popular, and I have a good list, I'll have your password in a day. And if you re-used it...
Salts do make naïve brute-force, all-possible-strings approaches useless, yes.
I would be very deeply concerned if Twitch, a multi-billion dollar company owned by Amazon, does not properly hash and salt the passwords of its users.
A few things here. If you're the sort of person who runs a crypto mine, which I assume many of the people interested in breaking hashes are you have enough firepower at your disposal to at least perform a targeted attack on a few hashes with relative ease.
Ideally that would be useless because things are properly salted and you don't know the salt, however with access to all of the source code as we have here I think it isn't as clear cut, as it may be possible to reverse out the salts as well.
I'm not a cybersec guy so please take my speculation with a grain of salt.
The salt is usually stored next to the password. The point of a salt is just to make the hash unique to prevent the use of rainbow tables, it's not a separate secret.
I think it is pretty common to store the salts alongside the password hashes. They are used by the same pieces of code so it is generally unrealistic to think that your salts will be secure if your hashes are obtained.
Salting isn't really supposed to make a hashing algorithm secure by being secret but by being unique. Unique salts make hashing more secure because an attacker can't re-use a single rainbow table for multiple hashed passwords. That, combined with a sufficiently computationally difficult hashing algorithm, it makes it prohibitively expensive to reverse the hashes of all your users.
This may not be enough to protect high value users or those who use fairly common or easily guessable passwords. This is part of why it is so important that you don't reuse passwords. It's also why your application should reject all known passwords using something like https://haveibeenpwned.com/Passwords or any of the "common password" list you can find online.
Edit: If you do include a secret that is stored seperatly that is added to the password and salt when hashing, this is called "peppering" and these peppers are generally not unique per user.
I've heard this before, and queried how feasible an attack would be, as people always talk about just how bad this is but yet I've _never_ heard of someone having an account compromised through this vector, and I'd like to know how feasible it really is. Here's the sha1 of an unsalted password b85ffa7dae2cbed04e7d3335f6ebc43c8a5764dd
How long does it actually take in practice to break something like this? I would love it if someone could prove it to me.
It is! I guess using a password from Google isn't the best idea, and kind of defeated the point of what I wanted to ask (if your password isn't already hashed online how long does it actually take to break a sha1 hash), but definitely proves the point.
Can I try again? Sha1 e7b7cdf949007abe7e8a190ba8eae56c60018c1f
Thanks for trying! This somewhat supports what I'm suggesting - because that password hasn't been leaked by being posted in plaintext as a verified password, it's not available as a lookup, therefore it doesn't matter whether they used bcrypt, sha1 or md5, or even just pgp encrypted it, the password is likely "secure".
The reason I didn't give any more information on the password above is because you don't have any extra information on a dump of hashes from a twitch database either. If a password is only feasibly brute forceable for a specific algorithm by reducing the search space by many orders of magnitude, it kind of shows that there's not really any risk even if the passwords are unsalted for a person who hasn't reused a password.
> it kind of shows that there's not really any risk even if the passwords are unsalted for a person who hasn't reused a password.
No, it doesn't. You could reuse uwv&6qu_brusb618_$@618jg everywhere and it wouldn't get cracked. If the plaintext password leaked, then you'd be in more trouble.
What matters is whether your password is easy to guess, not whether you've reused it. If you have all unique passwords, they can still all be trivial to crack.
The point of the salt isn't that it makes it take longer to break any one password. What it does is prevent you from re-using the rainbow table you generate breaking one password when you break the next one.
Sha1 is not a very secure/expensive hashing algorithm and thus does make it significantly cheaper to break even with a unique salt.
> What [a password salt] does is prevent you from re-using the rainbow table you generate breaking one password when you break the next one.
Your idea of what a rainbow table is appears to be unrelated to what a rainbow table actually is. A rainbow table is prepared in advance, not generated in the process of cracking an individual password.
It's not so much "how long does it take" as it is "how much does it cost" and the answer to that really depends on what sort of compute infrastructure you have access to. Using a more appropriate hashing algorithm with a sufficient cost factor can massively increase the amount of compute needed. Preventing the re-use of that computational effort on additional users is why unique salts are important.
> It's not so much "how long does it take" as it is "how much does it cost"
So the answer is "It's too expensive to figure out in practice, unless you're being explicitly targetted by someone with nation state level credentials?", i.e. it's pretty much fine?
> Using a more appropriate hashing algorithm with a sufficient cost factor can massively increase the amount of compute needed.
But by the sounds of it, SHA1 is more than enough (given that nobody here is willing to brute force the hash I shared above?)
> Preventing the re-use of that computational effort on additional users is why unique salts are important.
The person who "cracked" my first hash found it in a list of passwords which was actually gotten from a plain text dump 15 years ago. That wasn't found by reversing a hash, so the compute wasn't reused. You are right that once it's cracked, it's cracked and that's that, but if your password _isn't_ cracked it's moot whether it's hashed with SHA1 or something more secure, as per above?
>But by the sounds of it, SHA1 is more than enough (given that nobody here is willing to brute force the hash I shared above?)
SHA1 is "more than enough" for this specific interaction in which you chose a complex password and/or your only opponents are unmotivated/non-incentivized HN commenters that don't have a password cracker at their immediate disposal. That doesn't mean anything outside of this context.
If your opponent was a motivated hacker with dedicated password cracking machines (which do not require anything even close to a nation-state budget, btw), your SHA1 hash would be much more likely to be cracked. If you were a specific target of a hacker group, such as an employee of a company that is being targeted by an attack or someone known to have a BTC wallet with $10 million in it, your SHA1 hash would be much more likely to be cracked. If your password was a relatively simple phrase like "dog$aregreat2019", like the vast majority of user passwords are, it would almost certainly be cracked.
SHA1 is not even anywhere close to "enough" for general password hashing use. Don't think otherwise just because a couple of random HNers failed your little game.
edit: The premise of your "challenge" is also not equivalent to the goals of most hackers. Unless you are a specifically known and prioritized target (because you're a celeb, VIP, wealthy person or something like that), the goal of a hacker is not to take one specific hash and crack it, because the success of that will depend a lot on the complexity of your password. The goal of most hackers in a breach like this Twitch one is more like "just throw it all at the wall and see what sticks". They take a massive database of thousands of hashes and spend a few hours to see what can be cracked, taking advantage of the fact that while some people may have complex passwords, most do not. After a few hours, maybe they crack 90% of the SHA1 hashes in a leak. Maybe your password was complex enough that it was in the 10% that wasn't cracked; good for you, but just because your password remained uncracked doesn't mean SHA1 is "enough". The hackers still got the other 90%.
But you shared a hash of an uncommon password. We probably have the salt (probably somewhere in the code) and people dont use password managers. So rainbow tables are enough.
Oh, I thought the first sentence was you and not quoted. Agreed with the above
Many hashes are trivial to target, until you start getting to password hashers that force you to use lots of RAM or CPU (or ideally both) to check a single password. As long as you know what hashing algorithm was used (often inferred by the hash length or other details), you can shove it into hashcat or some alternatives and wait, either using a good dictionary or bruteforce. If you've configured hashcat to work well with a decent GPU, you're good to go.
Even bcrypt is not that hard to find a solution to a hash if it didn't use enough rounds.
I learned a bunch of this when a company I worked for was breached and wanted to see just how easy it was to solve out weaker passwords in our db.
As I said, I've heard the claim, but still question it. Here's a sha1 e7b7cdf949007abe7e8a190ba8eae56c60018c1f, how long does it take hashcat to break it?
I don't really follow your argument. You've never heard of a hash being brute forced? I've done it myself multiple times, both for pen testing purposes and for password recovery on systems I control myself.
The LinkedIn password leak contained hashed (but not salted) passwords, and some of those where cracked and exploited in the wild.
My old gaming PC with a 1060 can apparently do ≈ 6300 * 10^6 hashes per second. Assuming your password above is az-AZ, 0-9 = 62 possibilities (with no salt) it would take me 10 seconds to test all combinations for 6 characters and 30 days for 9 characters. And it's a trivially parallel problem, making it easy to throw money on to make it wall-clock quicker.
It's just a simple brute force problem, I don't see what there is to question (beside the choice of SHA1 for password hashing...).
> The LinkedIn password leak contained hashed (but not salted) passwords, and some of those where cracked and exploited in the wild.
The hashes of previously unused passwords were brute forced, or passwords were reused across sites from a previous plain text dump and exploited? Because there's a big difference between those two things. If your password is reused and originally compromised , you're screwed regardless, and having the leaked hashed passwords doesn't leave you in any worse a situation than before.
> My old gaming PC with a 1060 can apparently do ≈ 6300 * 10^6 hashes per second. Assuming your password above is az-AZ, 0-9 = 62 possibilities (with no salt) it would take me 10 seconds to test all combinations for 6 characters and 30 days for 9 characters. And it's a trivially parallel problem, making it easy to throw money on to make it wall-clock quicker.
So practically infeasible to exploit? The claims that are being made (even in this thread) are that having a mining rig would let you brute force a SHA1 hash, but based on the numbers
> It's just a simple brute force problem, I don't see what there is to question
If it's "just a simple brute force problem", and SHA1 is the only issue, then my question is what's the password in the hash above? You (and others here, on reddit, online) are telling us that this is a trivial problem.
> The hashes of previously unused passwords were brute forced, or passwords were reused across sites from a previous plain text dump and exploited?
I believe there are documented instances where previously not leaked passwords were cracked. Of course not 128 bit random strings, but still passwords more "complex" than what you previously posted. If you have 100 million hashes to try, you will crack some. People are generally have bad passwords, especially in 2012, even if the plaintext weren't available anywhere...
> So practically infeasible to exploit?
It depends on how strong the password is and how much money you have to spend. For 32 USD I get an hour with p4d.24xlarge that has 8 graphics card, that in total can do about 175 * 10^9 hashes per second. 20 hours (and 640 USD) machine time (not wall clock time) on that machine can do what 30 days on my old PC does.
> If it's "just a simple brute force problem" […]
If you can give me a bound on the number of combinations, and an AWS account to bill, I and many others would gladly attempt to crack your hash :-). But if your second hash is >9 alphanumerical characters we will probably just burn electricity to no avail.
I don't even know what you are arguing?
EDIT: Now that you have some numbers of hashing rates and cost, you can figure out how expensive different passwords are to crack with different approaches. Two common dictionary words with two numbers appended? 6 random alphanumeric characters? Then think about how expensive the cheapest non-leaked password is in a database of 100 million users are...
Is it bad to store plaintext passwords? Yes, obviously. Is some hashing better than none. Yes, obviously. Is salting your hashes much better than not. Yes, because with a salt, your first password wouldn't have turned up on Google / in rainbow tables. Is it even better to use a proper PBKDF. Yes, with a pretty aggressive PBKDF, brute forcing even low-complexity passwords become expensive very quickly, and we get the benefits of salting "built in".
Can SHA1 / MD5 hashes be cracked even if not the _exact_ password-hash pair have been leaked previously? Yes, very much so.
Right? "Its just a simple brute force problem", but sometimes that still takes a lot of force. Sometimes far more force than breaking a single account password.
I managed to lock myself out of a dogecoin wallet. I have the hash of the passphrase, so I figured I'd give it a go cracking it. After a few weeks (and a larger than usual power bill) I sent it to some friends with good mining rigs to try and take a stab at it, willing to split the amount 50/50. Its only the passphrase, not the full wallet, so I'm not worried about someone stealing the doge.
The passphrase is probably 15-25 characters, mostly not dictionary words or simple letter/number/symbol substitution, only symbols easy to type on a US keyboard. I'm now about 6 months trying to crack that password with probably a few hundred dollars of electricity used overall between myself and friends (I don't know their power bill), excluding hardware cost as it was already owned, and I'm not even halfway through the search space.
Can it be done? Sure. Will I be able to crack that password with a cost that's less than the value of the DOGE in the wallet? Probably not. Right now its really more of a gamble that I'll get lucky with the rigs running. I had to tone down some of my rigs as it was getting quite hot over the summer, but over the winter I'll be chugging away as the waste heat is just additional home heat. I'll probably need to rent a considerable amount of GPU power on a cloud provider to crack it, at which point maybe it'll take me days to crack it but ultimately cost me many, many thousands of dollars in GPU-time.
Salts being exposed is not a massive risk in of itself, as the purpose of the salt is to prevent the use of pre-computed tables to reverse a hash into plaintext, forcing an attacker to bruteforce each individual hash+salt instead of being able to reuse work.
With regards to crypto mines being used for breaking hashes, if you have one based on GPUs, yes, you could reuse GPU mining hardware for cracking hashes, albeit with relatively low hashrates for current best practice hashing algorithms.
If you're looking at something like Bitcoin's hashrate and thinking that it could be used to break SHA2 hashes, as far as I understand ASIC miners, this is not possible, as ASIC miners are designed only for mining, and they don't really accept non-mining related inputs (ie, no arbitrary inputs to be hashed, unless it matches Bitcoin's specific steps for iterating over nonces).
> Ideally that would be useless because things are properly salted and you don't know the salt
I'm really curious where people get their ideas about salting. It's not just a word. It doesn't make one password any more difficult to crack. It makes cracking every password in a given database more difficult to do. A password's salt is public information.
Relatively useless...but if even a few percent of people recycle passwords used for banking or crypto platforms it could be a profitable cache of data.
Maybe that Twitch is competent in the password department so they decided against it? But thinking about it, although it's unclear if two-factor secrets are included in the leak, but maybe the two-factor secrets may be usable to someone who has already the password of a victim. Unless it's the dongle-type one (WebAuthn/FIDO), the secret is common to both the server and the user, so two-factor bypass is almost certain in this case.
Doesn't seem likely to me. If the attacker has password hashes then they would want to keep this attack quiet so that the buyer of the hashes would have time to compute the passwords. If Twitch gets wind of this happening then a simple password reset would foil any efforts.
I'm hoping we will get to see a transparent report (from hacker or Twitch) on how this happened.
I think anyone would be excited to hack Twitch as the site alone - or any big platform for that matter - but this is quite literally someone just downloading the entire Twitch ecosystem and publishing it online.
It something I would expect security hardware to have automatically stopped. Even an employee shouldn't be able to download 125GB of stuff without flipping a safety switch somewhere.
Gosh - I've worked at shops where we handled multi-terabyte images and we'd regularly stream large chunks of that while debugging tools. I've also worked at places where data was king and 125GB of stuff might be a reasonable dispatch of data to help someone debug.
The volume of data is irrelevant - source code is usually teensy tiny and of far more value to companies than, say, three months of livestream chat logs.
I'm not certain what security hardware you're thinking of - but I'm pretty sure I hate it already since it doesn't effectively guard anything while making everyone's lives difficult. For effective corporate security you need 1) data use policies and 2) access control lists - both of those are generally more effectively implemented at an entirely software level.
Yeah volume is a terrible metric to go by. I work as a data engineer and a lot of the time if I am working between environments or when migrating between data centers will have a copy of the data locally that I can write tests against or move to somewhere I can compare it to a running output. This would be possible to do entirely remotely I guess but not nearly as easy. (note I never do this with anything that contains PII)
It is still fraught with problems, while you (knowingly) wouldn't do it with PII, is not all that reassuring, others could, or compromised system could be used to exfiltrate this data, if the only control is just trust on the users behaving well with their access
That fact in general industry the controls on how PII data is accessed internally is so lightly managed should worry everyone
Trying to protect against leaking developers/employees is like trying to protect against lone gunman terrorists: useless.
And, if you try anyway, it is likely to cause more annoyance to everyone involved than actual protection (think TSA).
I disagree. Locking down and logging access to raw data like password hashes or payout information to only those who absolutely need it doesn't cause much annoyance and is very useful.
It protects the company against rogue employees (not even strictly malicious, but also curious employees who want to see more than they should). It limits exposure if an employee's account gets hacked (my pet theory for this Twitch hack). And if something does go wrong, logs help track down the issue/leak.
And at the end of the day, there should be a lightweight way to request access. Many times I've seen people request access that they didn't actually need. And most other times they have access pretty quickly.
Note that it was code that was leaked. Preventing developers from leaking the codebase they are working with is outright impossible. Now combine that with a "monorepo" and even the most junior developer has access to practically the entire company codebase and version control history.
And you can try to prevent them from accessing live/real customer data, but the cost is that they will never be able to debug issues in production. Most companies, even very large ones, are just not able to pay that cost. Not to mention that once you have access to the codebase there are a million ways to leak customer data anyway -- it is a lost battle.
Of course, some stuff you can't avoid, especially code leaking. Luckily code isn't usually that interesting or useful to external parties which is the only reason it isn't leaked more.
For the rest of the stuff, there's a sliding scale. In no universe does your average twitch developer need raw access to password hashes, for example.
What with security as it is on these companies, the code is literally the most sensitive information they can hold, specially in terms of value to the company. With the code out, expect lots more high-profile cracks in the coming months...
"your average twitch developer" needs access to the password hashes or at least the code that checks these hashes the moment they need to debug an issue which involves logging in, and from then its all downwards.
Adding to your pet theory I think that WFH has led to a lot of people being casual about their workplace security. For example, leaving a laptop unattended at a Starbucks.
This is just a guess but I wouldn't be surprised if companies have to start taking stricter precautions with their security in a WFH world.
This isn't accurate. There are certainly companies that have extremely in-depth Data Loss Prevention toolsets and teams - everything anyone downloads or moves is logged and alerts fire if things look out of the ordinary. Google clearly had tons of data about how Anthony Levandowski was able to exfiltrate lots of info when he left.
The issue that building these systems accurately so they are NOT a constant annoyance is difficult, expensive, and takes a large team to support well.
There are ways to look for anomalous behavior without creeping too hard (even though it's a business's right to view and monitor all network traffic on their system).
If someone who doesn't have a business need to upload lots of traffic begins uploading large amounts of data, you may ask questions. Maybe you kick off a scripted playbook that then checks for increased logins to other privileged systems, or for large transfers of data from internal sources to the user's desktop.
I dont know dude, I work in an enormous company that you 've heard of, and it's impossible for me to imagine how to extract code out. I can't do it, except if I get remote access and film my screen while scrolling.
Anything else is found quickly. I certainly wouldn't even dream of someone extracting the repo.
Really, you can't simply copy files from a code repo you're working on? You work on a isolated workstation, not connected to any external network, where you are not allowed to bring anything other than plain clothes (TSA-style)? With a sizable army of developers all working this way?
And if it's a remote FB/VNC connection, what is preventing you from just recording the screen? Not really hard...
Most companies I've seen could see all their code extracted with one malformed NFS packet. These are "air gapped" systems holding the type of industrial secrets that we don't want to leak to china. Practically the only real line of defense they have is employee screening, which does not really stop the lone man guy.
There are much better cases than this; in this case a monorepo makes it slightly more likely to be caught rather than less. (A monorepo can get to Google size and then you can't check it all out at once and it needs bespoke tooling, which can make it harder to pull this off.)
On the flip side while many smaller repos _can_ have independent ACLs, you are very unlikely to set those up until you reach a certain scale -- and then when you reach that scale it gets hard to implement ACLs across everything at once. So your engineers probably all have access to all your repos until you reach a very large size anyway. So the question becomes just "can someone write a for-loop over all of the repo names and check them all out," and it's like, yeah, that's not terribly hard, I as a programmer can do that pretty easily in bash.
Ideal repo size should not in my view be directed at "how do I prevent compromise to the external world," because VCS is not designed to give you the superpower of being resilient around being compromised. Rather VCS is trying to give you the superpower of time travel. So you should probably scope your repo to "what is the unit that makes sense to time travel with?" -- in other words if you are adamant that you have these independent services which operate decoupled and running this one backwards by a year should not affect that one, then those services should be in separate repos. If on the other hand they have some moderate coupling and rewinding this service by 1 year would break the APIs that that service uses to communicate... then those should ideally be in the same repo so that you can coordinate changes between them to their shared protocol.
> So your engineers probably all have access to all your repos until you reach a very large size anyway.
Happens at my company. We have rudimentary ACL but not sure how its implemented because you can find things via explicit searching, or via "organic finding" via links from repo->repo but it won't be surfaced if you just search for code.
You can still have a monorepo and restrict who has access to certain parts of it. You just have to build the tools to do it.
Google, for example, has a small number of subdirectories in the tree that only certain engineers can view (the really sensitive stuff, like the actual ranking algorithms for search and ads) but the build system is setup to allow you to still link against it.
Not particularly - unless different teams are highly focused on certain subsections of the repository. If everyone might have to look anywhere than you'll need to download all the repos - whether that's one or five hundred.
Clean OS install or new hardware should both be daily events at even mid sized companies. Because even if it’s once every 2-4 years per developer that still becomes extremely common in aggregate.
I think the tech giants have warped some people's expectations of what a "mid-sized" company is. I work for a mid-sized company where we roll our own ERP system and we probably average about two clean OS installs per year across the entire development team.
I'm working on a project and just had to repull my workspace after some local corruption. I pulled 1.2TB out of the office and never got an email. I think it's pretty common for places not to monitor egress that closely.
There was a fad for tools that accomplished this in enterprise networks, with much clearer rules for who needs to access what (it was called "data loss prevention", or DLP) and those tools for the most part don't work. This is a harder problem than it looks like.
DLP products tend to be more about scanning the contents of data for sensitive patterns, at least in my observation of the market. There are other products (typically built into SIEM) that do correlation on login events, network traffic and whatnot to detect anomalous behavior.
I’ve worked on a lot of DLP projects in big enterprise, and I have a very dim view of the entire category of product. A lot of their functionality is just magic black boxes, that unsurprisingly achieve very little. The primary motive for deploying them is not that they’re particularly effective, it’s so that you can tell auditors and other scrutineers that you’ve got a “DLP solution”. The idea that you can grant people access to huge quantities information, but then very strictly control what they do with it is fundamentally flawed. Especially on networks that require large amounts of in and outflow for BAU. Even the most tightly controlled data in the world cannot be protected from an inside leaker (or adversary who has taken control of an insiders access), because it runs into the same “analog hole” issue that DRM products have.
My company has this. It encrypts any file touched on USB. And other software logs every app run. Prevents casual copying but easily circumvented. But somewhere logs may have enough info to trace the source of leak I guess.
> It something I would expect security hardware to have automatically stopped. Even an employee shouldn't be able to download 125GB of stuff without flipping a safety switch somewhere.
Remember that Twitch handles streams. Good luck implementing this without having all sorts of false alarms everywhere.
Plus, you don't have to exfiltrate 125GB in one go.
I feel like once you have it pulled downm, it would be as simple as an upload to s3 (which wouldn't trigger any flags), then making the bucket public whenever you want. Hell, S3 used to (still does?) support being part of a torrent swarm...
> And it might be disguised as a video stream coming out of the video streaming servers.
Just log in to FB messenger or Discord and egress it as small data chunks that way. Lots of people have private chats on work computer for practical purposes.
Discord allows for bots, so you could easily write a script to chunk data and egress, and another to re-assemble.
Cue "Don't check payment receipts into git" discussion - although I strongly suspect this hack wasn't just about acquiring appropriate credentials and then running `git clone`. It sounds to me like a backup service was compromised.
There are so many indiscreet USB pentesting devices easily purchasable by anyone today, I'm actually surprised this sort of thing doesn't happen more often.
ITT: people shocked that something like this could happen at a company the size and profile of Twitch.
Running security at scale in a hypergrowth B2C company is very difficult. It's also completely different from running security at a startup, in a B2B company, or a slower-growth situation. _Every_ security executive and manager I've met has given up in frustration after 12-24 months and gone to take a cushy FAANG job instead.
I'm not surprised at all. My experience in security at a larger SV unicorn was that changes only happened in the immediate aftermath of a security crisis. Otherwise, there was incredible inertia and you just wouldn't be able to get the institutional support you needed to make progress.
How much of this is a holdover of lax security practices from before they were acquired? I can’t imagine AWS being managed in a way where local network access gives you keys to the kingdom. Then again, EC2 instance profiles do let you do quite a bit.
But does twitch not share the same Amazon wide git service? Could most of Amzn code be leaked or compromised? Seems like all of amazon internals that shares security measures is at risk...
I've heard (but don't have any actual evidence more than hearsay) that Twitch generally operates independently of Amazon/AWS. I'm sure that they share some things, but I wouldn't be surprised if their source was separate from the "main repo"
Remember that Amazon runs one of the biggest multi-tenant service platforms in the industry! A separate business unit like Twitch is likely to be set up a lot like any other random AWS customer, and you wouldn't expect that compromising servers used by one AWS customer to automatically compromise the underlying infrastructure.
(I would also expect that the Amazon retail systems are in most senses "just another tenant" on AWS, albeit with much more liberal quotas!)
I always had the impression that Twitch were operating in a largely independent fashion. For instance, it had been an open secret for years that one of their executives had been sexually harassing female streamers. Only a year ago he was finally fired. If Amazon had a firmer grip on Twitch, I'm sure they would have stepped in much earlier.
If you go back to the Adobe software breach circa 2013, a large part of their issues were the bolt on connections between acquisitions. It's honestly the most common thing I see in the startup world.
> It's also strange that someone who has this level of access to what is presumably a multi-billion dollar company decided to just leak the data?
From what I heard about Twitch-interns over the years, it seems the company is more a third-rate-s**hole that grew too big too fast and accumulated a huge amount of technical debt and fatal security flaws. Making billions doesn't mean anything if you don't invest them back into the important corners of the company. It's considered a miracle that the platform is still working that well in that state. And what comes from the leaks so far supports this view.
Though, said that, it seems they did start to improve one or two years ago, just too late to prevent this critical hit. But considering this was also a strike that avoided the deadly parts (yet), maybe there is a different aim here and the company can grow from this? It will be interesting to see how Amazon will react to this.
> From what I heard about Twitch-interns over the years, it seems the company is more a third-rate-s*hole that grew too big too fast and accumulated a huge amount of technical debt and fatal security flaws.
I mean this as a genuine question, but is there any company that didn't end up like this after an exponential growth phase? I'm not saying it's okay, but this feels par for the course. I've now been at two start ups during that hockey stick growth time and both went through this as well.
I'd be curious if anyone here has worked at a large, fast growing tech company where they didn't accumulate a ton of technical debt during growth. If so, what did the company do to prevent that?
Generally yes, but Twitch is not your average startup. It's now 10 years old, and 7 of those years it was owned by Amazon, which should have enough competence and manpower for bringing it onto a good course. But from what I heard, Amazon did neglect Twitch for a long time and focused too much on making it a profitable business by all costs. Because of which they had all those scandals and problems in the last years. It's a business-platform, where technology is just an afterthought.
Does anyone know if Twitch employees have two factor auth? Having access to an employee's account would be the easiest way to pull this off.
It'd be strange if they don't have two factor auth, of course, but it's just as strange to have this large of a hack.
I think if it is a simple case of an employee account takeover, then the attack would "work" to some extent at any company. Larger companies typically have strict data access requirements, though. Good luck finding the few employees who have raw access to Google password hashes, for example. And even more luck knowing how to get that data if you do.
Its still more secure. Rubber hose cryptanalysis applies to both equally, but that doesn't mean there aren't other attacks that apply to totp which don't to yubikeys.
With a phone you need my passcode to accept to 2FA request (assuming lock screen notifications are disabled). I think yubikeys can work without a passcode as long you plug it in right?
Right, but presumably the site is already asking for a password, and if the attacker can bypass one password, im not sure its a safe assumption that they cant bypass two. However fair enough. Some yubikeys do involve fingerprint scans too though.
The main security benefit is unphishability. With yubikey/webauth crypto is used so you can't give the code to the wrong website. Phishing is a pretty major cause of account hacks generally, so pragmatically that is a very big win.
With a Yubikey, you need to use your password to log in to your computer, and then need to auth using Yubikey.
With OTP app, you need to use your password to log into your computer, passcode for phone, and then auth.
In both cases, it's something you know, and something you have. You could argue that the app based is a bit more secure in that you need two passwords. On the flipside, if your phone gets pwned, someone can access completely remote.
I don't know which protocols they use (obviously), but if they use WebAuthn, everything is public-key signatures. Even if you leak everything from the server, public keys buy you nothing.
Every Twitch Developer has 2FA even 3rd party developers are required to have 2FA I also think, but don't know, that this applies to Twitch Broadcaster Partners as well in order to have their tax information in the system.
Luckily iirc from a conversation with a senior Twitch engineer the Tax information backend has been migrated to Amazon. So hopefully that did not leak... Because that would be full legal name and addresses of a ton of streamers that likely have stalkers.
Twitch partners also have forced 2FA for quite some time now, should be a couple of years now - at least more than a year though. Covid killed my sense of time.
From an ethical standpoint, any code that amplifies and profits from radical speech should be fair game for release. If employees or hackers feel the need to release info in that regard, so be it. This is the risk defined in such models and should be mitigated accordingly.
>Because you expect Amazon to put security priority over new features and profit?
I don't know what you think Amazon stands for, but Amazon runs the largest cloud hosting service in the world - AWS, which not only runs a large number of other large companies but governments as well. I know, first hand, that their datacenter security protocols are state of the art.
Amazon has a much larger surface attack area so if they were playing fast and loose with security, chances are we would know already.
> Amazon has a much larger surface attack area so if they were playing fast and loose with security, chances are we would know already.
I get your point and I am no taking about AWS but about Twitch. Each part of the company has its own incentives. Amazon is well know for not caring about quality nor its employees. In my experience with corporations there is little to no technical sharing between different parts of the company. AWS could have the best SecOps in the world and Twitch could have no security at all.
Is your experience different?
I'm not sure what point you are trying to make. If you look at most of the high profile hacks and leaks in the past 20 years, very few of them are from web 2.0 tech companies (e.g. Google, Facebook) rather than dinosaurs (ex. Target). Those that have (like Google) have only been successfully breached by nation actors (e.g. China, NSA).
As far as I can tell, there's no data to back up the assertion that these large tech companies are disregarding security if favor of profits, except for Twitch now, which is why this leak is interesting to me.
> In my experience with corporations there is little to no technical sharing between different parts of the company.
Amazon is all about sharing efforts with the company. That's the whole point of AWS - its a monetization of this efforts. Most older AWS services started out as internal services that someone realized was generally useful.
EC2, Amazon's cash cow, competes with nearly identical offerings from Microsoft and Google, and is not a place where additional features are often all that valuable to customers. Any sort of breach like this on EC2 would seriously hurt Amazon's bottom line and they know it.
The streaming part or the downloading/looking at code?
You can look at leaked source code for educational purposes in most places (not legal advice). As far as I understand leaks are commonly used in vulnerability research for example (if the bad guys can use it so can bug hunters).
Streaming copyrighted material is a separate issue - but using it for "criticism, comment, news reporting, teaching" should fall under fair use, no?
What's wrong with looking at public code? The code is public, regardless of how it became public - this isn't someone's personal life being exposed. If twitch is damaged by streaming this, it's only because their poor code quality is being examined publicly.
I can certainly understand why twitch banned this and don't blame them (although I think it's stupid), but I see nothing unethical about openly talking about this code in the public now that it's already there.
> What's wrong with looking at public code? The code is public, regardless of how it became public
Copyright would disagree with you, and I would say that ethically it is basically the same as stealing it yourself. You're profiting off of someone else having done the dirty work for you.
> this isn't someone's personal life being exposed.
Apparently a lot of payment information, telephone numbers, etc. was also in the leak. I don't think we should downloading or encouraging people to download and peruse that stuff.
> You're profiting off of someone else having done the dirty work for you.
I don't think anybody is streaming this stuff on twitch with the intention to make money, anymore than someone sharing it on a blog is trying to make money. Sure, in that edge case I'd agree with you, but it seems like the exception to the rule (after all people can just go look at the code themselves for free). I'm not talking about the guy who stole the code and is likely ransoming Amazon with it - I'm talking about people that just like to talk about code because it's something they like to do (there's an entire category for it on twitch already).
> Apparently a lot of payment information, telephone numbers, etc. was also in the leak. I don't think we should downloading or encouraging people to download and peruse that stuff.
My limited understanding is none of this information actually has been leaked yet, and is likely part of a future ransom (I could be wrong, I haven't looked because I don't care). I don't condone sharing that either, but that's not what the guy streaming was sharing. I'm talking about discussing the source code which is already publicly available.
> Copyright would disagree with you
I know very little about copyright so I'll just assume you're right. I still see no ethical problem with openly discussing this code publicly though. Anyway, agree to disagree.
It's likely lots of bubble gum and chicken wire. I'm sure in the video ingest and transcode side of things there are some really interesting bits though. When you're owned by Amazon you don't need to optimize too much to achieve web scale... just leverage AWS services. It's not like you're going to get a bill.
> When you're owned by Amazon you don't need to optimize too much to achieve web scale... just leverage AWS services. It's not like you're going to get a bill.
Oh you're be surprised. Divisions get billed constantly for the AWS resources they consume, and this bill gets taken out of their annual budget. From what I hear, this is a common practice in most large organizations.
Also, the AWS services you can access from within Amazon are almost identical to the AWS services you can access as an external customer. It's equally easy/hard for a random company to achieve web scale, compared to Twitch.
Oops, didn't mean to be too too negative.
I say embarrassing in the sense of, I've definitely shoved out awful code because something needed to get out(tm). And with large companies, deadlines that cause that situation are inevitable.
But I also say it like that because, well, I've seen code that causes (objectively easy-to-fix) crashes but still ships because of one reason or another: laziness, politics, inexperience. It's a part of software engineering I'm still trying to accept.
Yep, there are lots of small services that don't seem production ready in the source code. Though admittedly we don't know which of those are deprecated.
It is really fun to go through the source code. You'll find interesting architecture diagrams, documentation etc. It's like joining a new job and being amazed how a service you actually use was build.
I would love to see someone look deep into Twitch recommendation system - last time I tested the thing they call "Feedback" is a rolling buffer and wont let you exclude more than ~100 things, adding more simply removed oldest entries and started spamming you with things you already excluded in the past. This looked like performance optimization (less things to track per user).
This won't help with preroll ads because the video segments themselves are replaced in the stream data. They're not ads, but it's not the stream either.
You get a "twitch commercial break in progress" video for the time the ads are playing.
I watch all of my twitch using mplayer. "magic incantations" when generating access token is what produces ad free .m3u8. For example early methods involved setting origin and/or referrer headers to internal Amazon systems.
I'd be interested if someone could get their own instance of Twitch up and running from this leak. Someone mentioned internal API's, which would have to be reworked to avoid detection, but it'd be interesting to host it on AWS just to see how long it takes to get shut down.
How would current AWS policies hold up? Obviously the code would be illegally acquired, but do they have detection mechanisms in place?
Even with source code it is hard to run a service if not impossible. You would need well written documentation that explains various options and error codes you could potentially get.
Many times there is some magic command only one guy knows and he will share with you on slack.
Rubbing a service of any complexity takes years of institutional knowledge.
100s of services and databases to work out and sort through. Good luck building a global real-time video CDN too. You could build your own faster. Microservice architectures mirror the org that built them. You wouldn’t do it the same way for yourself.
Hang on, is this just a repo dump or not? Because it looks like a repo dump, in which case I would be very surprised if any passwords or other personal information is included, at least at a reasonable scale.
Is this the first time actual Amazon infrastructure has been hacked?
Anyone has Amazon been hacked pervious to this?
(Not talking about insecure AWS accounts)
Since the main leaked files are from github, I'm assuming they got it from one of the many reported github auth flaws which don't get fixed and allows access to private repositories. Or more unlikely, via someone getting sloppy with their laptop.
Now I wonder if the commit history has database dumps or sensitive information, which is a common practice, or if any twitch servers have been accessed through a breach or privileged information found in some of their source code.
Change your password obviously, maybe even reset your 2FA if those codes are in the leak.
And if you want to be perfectly safe, don't visit twitch. Because if that source code has any vulnerabilities they might be exploited against twitch visitors as we speak.
Also change any account with a password that's the same as your twitch account. Once they know your twitch password they will try it on your related accounts.
I figure you could "build a Steam" in a couple of years, with the right engineers hitting the main features. There's very little magic at the technology level, and you can make life simpler and forget about minor things like the hardware survey or the pretty graphs. I'm not saying this is trivial, but it's definitely doable.
This is a far different statement than "You can build something and compete with Steam in a couple of years". Most of the really hard problems are not technical. Success ain't gonna happen without a bunch of pain, sweat, and strategic stumbles on the part of the competition.
Steam was built since I was in FUCKING high school. Im old now, well over 30.
Apples, and blueberries.
Bluebarry, Drewbarry, tomato, ToMaHtoH.
Fuck their stupid ass streaming code, it’s a giant crud app, only their devops team can take credit for scaling, everyone else is not worth a shit, sorry, thats life, I gotta Leetcode too, and ur code isn’t worth me reading it, leaked or not).
A lot of the secret sauce of such things are not that secret but just take a lot of work.
Building and maintaining infrastructure simply takes a lot of people, time, relationships and whatnot.
They get good at it over time which I guess could consider some secret sauce but there isn't like some secret code that makes the whole thing way better that now you'll see tons of competitors.
"Building apps is easy as long as you don't have millions of users. For that you have to actually think about bottlenecks, the larger architecture etc."
(I agree with that)
What I wanted to express is that lots of engineers I personally know instead say
"Building apps involves thinking about every bottleneck in advance and optimizing for every possible user scenario and a global user base, regardless if the number of users is only ~100."
"Building apps involves thinking about every bottleneck in advance and optimizing for every possible user scenario and a global user base, regardless if the number of users is only ~100."
I would advocate the exact opposite. If you need to scale to X users focus on making a great platform for X users, even if it’s only 100. If you try to over-engineer instead you’ll prematurely optimize and will make poor decisions that’ll come back to bite you when you actually DO need the scale and the requirements change.
I’m so misread, Twitch is a lot of luck, so is all of these companies. Show me the the source code for luck. I don’t give a fuck if you leaked a video streaming crud app code lol.
You're missing the _hard work_ part. Sure there's always an element of "luck" in any story of success, but that mostly has to do with timing, and is much less weighted than the perseverance and hard work of the people building it.
Twitch is a full-featured, very mature application with many moving parts outside of just the video streaming, and building all those parts took an incredible amount of time and effort.
It’s just luck. I mean, if I was a storyteller, what story would I have to tell if there was no story.
They hit.
It’s sort of like we all hold Golden dice, so we marveled, by our own eyes, at the gold.
Dealer: You rolling those?
Us: no, it’s gold.
They fucking risked it. It’s not a engineering feat, we’re all a bunch of pussies.
Twitch is easiest site to build, you might as well show me a todo app (which will be sieged and dismantled), scale is solved, we will eat your applications, the barbarians.
You don't need a genius. You need a few good people, and a lot of hands. I think the best way to look at things like Twitch is to compare them to cathedrals, bridges, things like that. You might be able to have the idea and sketch the plans by yourself, but it's physically impossible to build it yourself.
Like all things web, the problem is scaling the platform and moderation/security. It wouldn't be hard to build a toy Twitch clone no. But it takes tons of people and money to scale it / secure it. And even with all the security, they still got hacked...
This reminds me of the Albertsons guy on Blind who inadvertently created a meme when he said that Facebook could be rewritten with a small cluster of Oracle dbs. The meme is that Albertsons people are so elite, they work and think in a higher level of existence, way above the scalability bs us commoners are accustomed to.
* Entire git histories
* Internal/Private AWS SDKs
* Encrypted Password dumps and payout reports
It's so comprehensive I'm very curious into how an attacker got that level of access. I can't think of another, large, corporate web 2.0 startup who's gotten owned in a similar fashion. Could the same attack work on Amazon? YouTube?
It's also strange that someone who has this level of access to what is presumably a multi-billion dollar company decided to just leak the data? Maybe they did try to ransom it, but I'd imagine someone with this kind of access inside Twitch must have had some creative way of making money.