Hacker News new | past | comments | ask | show | jobs | submit login
Twitch is hacked, and its source code leaked (kotaku.com)
556 points by goldenzun on Oct 6, 2021 | hide | past | favorite | 298 comments



This is a pretty thorough and high profile hack on a major tech company - this isn't something I'd expect from an Amazon owned property. The hack (allegedly, I haven't downloaded it) includes

* Entire git histories

* Internal/Private AWS SDKs

* Encrypted Password dumps and payout reports

It's so comprehensive I'm very curious into how an attacker got that level of access. I can't think of another, large, corporate web 2.0 startup who's gotten owned in a similar fashion. Could the same attack work on Amazon? YouTube?

It's also strange that someone who has this level of access to what is presumably a multi-billion dollar company decided to just leak the data? Maybe they did try to ransom it, but I'd imagine someone with this kind of access inside Twitch must have had some creative way of making money.


There were no encrypted password dumps. No production secrets were leaked (according to the article). What's here is no more than what your average Twitch engineer has access to.

Yes, that included payout data. Anyone with "staff" access to the site (which any employee can have) has access to any streamer's dashboard, which includes payout data.

I don't think this was an attack. Based on the data so far I think it was a disgruntled engineer. Obviously if more gets leaked later I may revise that opinion.


I also worked for Twitch and can confirm what you're saying is true. These repo's any staff member had access to - including non-engineering staff.

Revenue for the longest time was as simple as navigating to a streamers dashboard as staff, but they did finally gate that away from staff who don't need to see that info, however I am sure there are other ways to obtain revenue reporting info.

I am assuming all data - including personal - has been compromised but so far, the data leaked is data that most staff would have access to in some way or another. Some may find that shocking, but this was not a "high level hack"


I'm actually very happy to hear they finally added a flag for payout access. It's been years since I was there and my eyes bugged out when I saw what I had access to without needing it.


Parent company was no different.


Why did non engineers have access to repos?


The better question is, why did random engineers have access to the financials of the streamers on the platform, without having to go through a break-glass, audited, emergency access escalation.


Allegedly it also contains AWS access keys. I feel bad for the engineers who will have to answer for this.


So much for information compartmentalization. Does the typical engineer need access to payment details for their daily work?


No, but doing least privilege, separation of privileges, and RBAC correctly is tedious and difficult and slows development velocity, so companies rarely do it well if they even bother trying unless some outside force compels them.

I highly doubt it would be possible to do something like this at AWS, just because hosting multitenant infrastructure and working with the government forces you to implement security since you're being audited and awarded contracts on that basis. Twitch users don't give a crap about the security of the platform. They just want to monetize as quickly as they can, too.

So I'm not hugely surprised that practices and culture would be different even if they have the same parent company, especially since Twitch was an acquisition. Even if not, though, I'd expect security at Prime to be better than Twitch but worse than Marketplace, Marketplace to be worse than AWS, etc. All speculation since I've never worked at any Amazon product, but that's what I would expect.


The tradeoffs for any individual piece of data are different from the tradeoffs of a company-wide policy. Siloing off one little thing (e.g. credit card info) usually doesn't inconvenience very many people, but at the same time it only provides marginal security. No front page headline has ever read "At Least The Credit Card Info Was Safe". On the other hand, a company-wide policy of siloing everything can have more of a security impact, but it also inconveniences everyone frequently. That's the tradeoff that many tech companies don't want to make.


I don't see how this precludes just-in-time access. Even if people can re-up on their own, you can still observe the data access patterns and manage the risk. Further, when you see someone is getting blocked a lot you can improve the experience for them so they are unblocked, or have more efficient access to the data. This is just mature data and security management.

Quality of life and developer experience are important topics in many ways, but should they really trump security consistently? It's always going to be dependent on people's risk assessment and comfort, but frequently it skews the wrong way because the people making the decisions know that they'll be gone.


Implementing just-in-time access on legacy systems that pre-date just-in-time architectures is extremely expensive. Its cheaper to either give all info or no info. Which is what every legacy company does instead.

My company can shut off my access to the all the databases when they stop asking me to troubleshoot any and all data issues. Which will never happen.


Part of the appeal is working at a place like Amazon is having a voice in decision making in the product you’re building. Hard to make informed decisions or opinions without the data. Engineers in Amazon retail definitely have broad access to sales data.


Why would an intern at Twitch have access to data in production?

Saying that no 'secrets' were leaked is effectively burying the lede.


In general the broad access was to code repos early on. Some were gated. There’s lots of collaboration and the need to study other code bases for learning and collaboration, read only. It’s micro services galore there so one didn’t tend to have access to production databases for services or systems you didn’t work on. You were opted in there. Teams did their own devops for the most part.

The payout data likely wasn’t ripped from a DB but rather dashboards which customer service or partnerships likely had access to. Tier1 or Tier2 support kinda stuff.

This smells like a stolen backup or maybe network access and http scanning, finding the internal GitHub and maybe a support admin cred that allowed dashboard view.


By secrets, I mean salts, password hashes, etc.


> You also get access to every streamer's dashboard and their analytics

I would classify that as access to production systems.


Why is getting access to prod, or prod data, considered a perk, exactly?


The perk is the wrench UX denoting you are an employee to the community . Reddit/twitch allow employees to communicate with the users . It is a social media platform , being able to indicate that you are special is street cred.

The other access rights that come from staff access is either incedential or miss /debt in architecture.


It's understandable why this is a neat perk, but it also seems absurd when you look at Twitch as an entity owned by a global corporation.


Man oh man, big "No thanks" to a perk like that from me

"Hey, pick through everything I say with a fine-toothed comb and treat it as the official company stance!"


From prior (user) experience, typically staff have non-wrenched alt accounts for just chilling out in streams (with less concern about conduct, but generally tame by twitch standards). But will wrench up for higher profile streams or folks they're otherwise pretty close with personally.

I suspect that's a lot more controlled these days, but it wasn't very uncommon for signified staff to be trolling along with everyone else.


This statement makes no sense.

The leak includes source code of multiple active websites and applications that are operated under the umbrella of Twitch/Amazon.

Why would an intern have access to this data?


In many companies source code for all products is available to each and single developer.


In what world does all this data in the leak would be stored together in a unified ecosystem? It makes absolutely no sense.

If you're saying that Twitch runs their developer environment in a lousy manner (and you have proof of this), then please go ahead.

But to imply that an intern/average developer would be given access to all this branching information is ignorant.


I think many people are trying to say that in this world at many companies all that data is indeed stored and accessed together.

Maybe the super secure siloed world doesn't really exist outside of military/government organizations.


I have first hand info, and this is how it’s done. Don’t call somebody ignorant if you don’t have first hand info. Leave that for somebody who does.


>Why would an intern have access to this data?

monorepos are a thing at several companies (e.g. Google).


Monorepos can and often do have ACLs per directory.


I worked for a multi billion company and even 6 month contractors had access to basically everything with little effort.


No one in IT should have access to business data. That's simply best practice. Worst case would be a database engineer who has access to backups or some prod data for troubleshooting, and even that should be under tight control with good access accounting.


Welcome to devops. Ask Mike down the hall to add you to the “admin” group. Tell him you’re a new dev so you need everything.

(This is a joke but also, at many companies, it’s not. Twitch was once small and grew. Who knows what ancient all-access switches are still critical to running the systems, marked “tech debt” in someone’s backlog)


The whole point of devops is to automate everything according to best practices, so fuckups are a thing of the past! The only fuckups, of course, will be Terraform state issues.


No the whole point of devops is to get rid of those terrible sysadmins always keeping the devs from doing anything...

Once you have "DevOps" the devs are ops, your head count drops, and all that pesky security and other things those dirty sysadmins wanted are gone

kinda /s sometimes I think that is really want managers think about devops


As it turns out, the entire industry doesn't quite agree on "the whole point of devops."


Until the business raises a priority one incident that their monthly reports are not looking right and you need to dive into the data to find out why some other API back end decided to present its numbers this month divided by 1000 for ease of display to their own users.

I know, I know, service contacts but my point is sometimes engineers need at least temporary access to provide support at times.


Could have been a hack of a twitch engineer's laptop or something like that.


This is what I thought of as well. Maybe just an engineer was hacked.


Sounds like someone in Twitch Security needs to take a course on Least Privileged Access then


> It's also strange that someone who has this level of access to what is presumably a multi-billion dollar company decided to just leak the data? Maybe they did try to ransom it, but I'd imagine someone with this kind of access inside Twitch must have had some creative way of making money.

Notably, the initial leak didn't actually include the password data which the leaker claims to have, just source code and payment data which has been verified by several affected streamers. It's possible that this first leak was just to establish trust so they can random or auction password hashes later.


Given the torrent is labeled "twitch-leaks-part-one" I'm curious too as to what they have. The torrent breaks out into a lot of compressed volumes, so it's clear this wasn't just a backup file, but a curated collection of files. I'm very curious if we will see any other amazon related leaks come from it.

Either way, I can only imagine the chaos inside as they try to figure out what has transpired here.


>It's possible that this first leak was just to establish trust so they can random or auction password hashes later.

Password hashes are relatively useless though? Once the leak is announced I imagine most of the big targets will rotate their credentials. Then the next thing you need to do is spend possibly thousands in CPU time bruteforcing bcrypt hashes. Then I'm not sure what you can even do with those.

I'm not criminally creative but I imagine you could make more by abusing trust with payment processors or fraudulent invoices.


>Then I'm not sure what you can even do with those

Assume some end users used the same passwords on other, non-twitch accounts. That's what makes hacked passwords valuable, no matter where they came from.


That's something I've wondered - do password hashes tend to be the same across platforms? Is everyone using the same hashing algorithm? Isn't this also what salting is for?

Never implemented auth myself.


Yes, the hashes are (usually) different due to different algorithms and/or salts. But, if you've brute forced one by using good guesses, and know the email/userid for other sites, and the user used the same or a similar password...that doesn't matter.


If everyone did things the way they're supposed to then no, hashes should never be the same between platforms. Using the same algorithm is likely, but as you said, salting solves that.

But mistakes such as salting with just the username are sometimes made even by very large companies and in that case, hashes could be the same.


Why does it matter if hashes are the same?

That only tells you the passwords are the same.


If they are the same everywhere, you can precompute a huge database of hashes (called a rainbow table) and simply lookup the hash in the table when breaches occur to find the password. By salting, every provider who stores credentials has different hashes for the same inputs which makes the approach far less attractive at a large scale.


> If they are the same everywhere, you can precompute a huge database of hashes (called a rainbow table) and simply lookup the hash in the table when breaches occur to find the password.

You can do this anyway. But the space requirements of a rainbow table are so large that including an account's username in the password would make a rainbow table completely unfeasible.


It doesn't matter at all if one person's hashed password is identical across two of that person's accounts on two different websites. The identical hash will instantly let an attacker (with access to both hashes) know that this person shares the same password across two accounts. But that is of no value; the attacker is going to start by assuming that it's true anyway.

Salts are there to ensure that two accounts on the same website which have identical passwords nevertheless have different password hashes.


In a perfect world, no, but lazily someone could skip salting and/or use common hashing functions. IIRC this was a problem at Sony not too long ago.


Pretty much this. If they gain one email/username password combination - they can use it elsewhere.


If they are properly hashed and salted, they can not.


The point here is that once you brute force the plaintext password, the same password might be used elsewhere.


What if you did something like hash(plaintext_pw+"twitchsalt") <browser> ---> <server> hash(browser_hash + db_salt)


If I understand this right, the problem is "twitchsalt" has to be known so that you can generate the same hash for future logins. So it's just one iteration of hashing more for a brute force attempt (modern hashing algorithms already use multiple iterations of hashing to make brute forcing harder)


Well, bear in mind, the hacker also has the exact code Twitch uses to salt it's hashes.


The browser_hash is now the password.


Password salting has nothing to do with password reuse.

Imagine two people have accounts on each of two websites:

             eBay           YouTube
   
   Alice     sunlight       bobrules
   
   Bob       bobrules       bobrules
A password reuse attack dumps the YouTube database, cracks Bob's password, and then accesses Bob's eBay account. The fix for this is that Bob should use different passwords on his different accounts. Hashing helps by making step 2 ("crack Bob's password") more difficult. Salting does not affect this attack in any way. Note that the attacker didn't bother to dump the eBay database.

The attack that salting protects against dumps the YouTube database, cracks Bob's password, and then accesses Alice's YouTube account.


"Salting does not affect this attack in any way." Yes it does. If you habe unsalted passwords you can just use a rainbow table to look passwords up.


And that is not affected by salting. You can use a rainbow table to look passwords up whether or not those passwords are salted. There is zero conceptual connection between the two ideas.

Now, realistically, you can't use a rainbow table on passwords of any noticeable length, and a salt may push the password over the edge of that threshold. If that's really what you want... enforce a minimum password length.


"Use of a key derivation that employs a salt makes this attack infeasible." https://en.wikipedia.org/wiki/Rainbow_table

"Salts defend against attacks that use precomputed tables (e.g. rainbow tables)" https://en.wikipedia.org/wiki/Salt_(cryptography)


Salts do nothing for people with predictable passwords though. The salt is in the dump, so I can hash known plaintext with the algorithm and the dumped data.

Even if I can only hash a million a day, if your password is one of the top million most popular, and I have a good list, I'll have your password in a day. And if you re-used it...

Salts do make naïve brute-force, all-possible-strings approaches useless, yes.


Yes, but nothing will make predictable passwords safe (at least when you have the hash). Enforcing password guidelines helps a bit.


I would be very deeply concerned if Twitch, a multi-billion dollar company owned by Amazon, does not properly hash and salt the passwords of its users.


You don't brute force it, you find the password for accounts with the same e-mail in leaks from other sites and try only those.


You can still run those "top billion popular password" lists against properly salted/hashed passwords.


A few things here. If you're the sort of person who runs a crypto mine, which I assume many of the people interested in breaking hashes are you have enough firepower at your disposal to at least perform a targeted attack on a few hashes with relative ease.

Ideally that would be useless because things are properly salted and you don't know the salt, however with access to all of the source code as we have here I think it isn't as clear cut, as it may be possible to reverse out the salts as well.

I'm not a cybersec guy so please take my speculation with a grain of salt.


The salt is usually stored next to the password. The point of a salt is just to make the hash unique to prevent the use of rainbow tables, it's not a separate secret.


I think it is pretty common to store the salts alongside the password hashes. They are used by the same pieces of code so it is generally unrealistic to think that your salts will be secure if your hashes are obtained.

Salting isn't really supposed to make a hashing algorithm secure by being secret but by being unique. Unique salts make hashing more secure because an attacker can't re-use a single rainbow table for multiple hashed passwords. That, combined with a sufficiently computationally difficult hashing algorithm, it makes it prohibitively expensive to reverse the hashes of all your users.

This may not be enough to protect high value users or those who use fairly common or easily guessable passwords. This is part of why it is so important that you don't reuse passwords. It's also why your application should reject all known passwords using something like https://haveibeenpwned.com/Passwords or any of the "common password" list you can find online.

Edit: If you do include a secret that is stored seperatly that is added to the password and salt when hashing, this is called "peppering" and these peppers are generally not unique per user.


I've heard this before, and queried how feasible an attack would be, as people always talk about just how bad this is but yet I've _never_ heard of someone having an account compromised through this vector, and I'd like to know how feasible it really is. Here's the sha1 of an unsalted password b85ffa7dae2cbed04e7d3335f6ebc43c8a5764dd

How long does it actually take in practice to break something like this? I would love it if someone could prove it to me.


Is the password ncc1701e?

I just googled it and found https://hashtoolkit.com/decrypt-sha1-hash/b85ffa7dae2cbed04e... along with other results.


It is! I guess using a password from Google isn't the best idea, and kind of defeated the point of what I wanted to ask (if your password isn't already hashed online how long does it actually take to break a sha1 hash), but definitely proves the point.

Can I try again? Sha1 e7b7cdf949007abe7e8a190ba8eae56c60018c1f


Couldn’t find it in 1.4 Trillion combinations. Used rockyou.txt with dive.rule.

Took me 6 minutes to try all 1.4 trillion passwords. So either you have a strong password or I messed something up. What is it?

In theory if your password was weak enough to be on this list it would take on average 3 minutes to break it on a GTX 1080.


Thanks for trying! This somewhat supports what I'm suggesting - because that password hasn't been leaked by being posted in plaintext as a verified password, it's not available as a lookup, therefore it doesn't matter whether they used bcrypt, sha1 or md5, or even just pgp encrypted it, the password is likely "secure".


It depends. It doesn’t have to strictly be a leaked password. If it’s similar to a leaked password then the permutation rule-set will catch it.

Anything under 9 characters I can brute force in minutes. 9 character passwords would take me 9 hours.

Obviously if someone has a nest of the latest GPUs then they could go a lot faster.

But yes if your password is uwv&6qu_brusb618_$@618jg then it doesn’t really matter how you hash it.


The reason I didn't give any more information on the password above is because you don't have any extra information on a dump of hashes from a twitch database either. If a password is only feasibly brute forceable for a specific algorithm by reducing the search space by many orders of magnitude, it kind of shows that there's not really any risk even if the passwords are unsalted for a person who hasn't reused a password.


> it kind of shows that there's not really any risk even if the passwords are unsalted for a person who hasn't reused a password.

No, it doesn't. You could reuse uwv&6qu_brusb618_$@618jg everywhere and it wouldn't get cracked. If the plaintext password leaked, then you'd be in more trouble.

What matters is whether your password is easy to guess, not whether you've reused it. If you have all unique passwords, they can still all be trivial to crack.


Well. Sha1 is not _that_ hard to break. It's a solved algorithm


That's for generating collisions, not preimage resistance. It's not particularly easy to reverse.


The point of the salt isn't that it makes it take longer to break any one password. What it does is prevent you from re-using the rainbow table you generate breaking one password when you break the next one.

Sha1 is not a very secure/expensive hashing algorithm and thus does make it significantly cheaper to break even with a unique salt.


> What [a password salt] does is prevent you from re-using the rainbow table you generate breaking one password when you break the next one.

Your idea of what a rainbow table is appears to be unrelated to what a rainbow table actually is. A rainbow table is prepared in advance, not generated in the process of cracking an individual password.


> Sha1 is not a very secure/expensive hashing algorithm and thus does make it significantly cheaper to break even with a unique salt.

Ok, so how long does it take to break the hash I've provided if it's not very secure?


It's not so much "how long does it take" as it is "how much does it cost" and the answer to that really depends on what sort of compute infrastructure you have access to. Using a more appropriate hashing algorithm with a sufficient cost factor can massively increase the amount of compute needed. Preventing the re-use of that computational effort on additional users is why unique salts are important.


> It's not so much "how long does it take" as it is "how much does it cost"

So the answer is "It's too expensive to figure out in practice, unless you're being explicitly targetted by someone with nation state level credentials?", i.e. it's pretty much fine?

> Using a more appropriate hashing algorithm with a sufficient cost factor can massively increase the amount of compute needed.

But by the sounds of it, SHA1 is more than enough (given that nobody here is willing to brute force the hash I shared above?)

> Preventing the re-use of that computational effort on additional users is why unique salts are important.

The person who "cracked" my first hash found it in a list of passwords which was actually gotten from a plain text dump 15 years ago. That wasn't found by reversing a hash, so the compute wasn't reused. You are right that once it's cracked, it's cracked and that's that, but if your password _isn't_ cracked it's moot whether it's hashed with SHA1 or something more secure, as per above?


>But by the sounds of it, SHA1 is more than enough (given that nobody here is willing to brute force the hash I shared above?)

SHA1 is "more than enough" for this specific interaction in which you chose a complex password and/or your only opponents are unmotivated/non-incentivized HN commenters that don't have a password cracker at their immediate disposal. That doesn't mean anything outside of this context.

If your opponent was a motivated hacker with dedicated password cracking machines (which do not require anything even close to a nation-state budget, btw), your SHA1 hash would be much more likely to be cracked. If you were a specific target of a hacker group, such as an employee of a company that is being targeted by an attack or someone known to have a BTC wallet with $10 million in it, your SHA1 hash would be much more likely to be cracked. If your password was a relatively simple phrase like "dog$aregreat2019", like the vast majority of user passwords are, it would almost certainly be cracked.

SHA1 is not even anywhere close to "enough" for general password hashing use. Don't think otherwise just because a couple of random HNers failed your little game.

edit: The premise of your "challenge" is also not equivalent to the goals of most hackers. Unless you are a specifically known and prioritized target (because you're a celeb, VIP, wealthy person or something like that), the goal of a hacker is not to take one specific hash and crack it, because the success of that will depend a lot on the complexity of your password. The goal of most hackers in a breach like this Twitch one is more like "just throw it all at the wall and see what sticks". They take a massive database of thousands of hashes and spend a few hours to see what can be cracked, taking advantage of the fact that while some people may have complex passwords, most do not. After a few hours, maybe they crack 90% of the SHA1 hashes in a leak. Maybe your password was complex enough that it was in the 10% that wasn't cracked; good for you, but just because your password remained uncracked doesn't mean SHA1 is "enough". The hackers still got the other 90%.


But you shared a hash of an uncommon password. We probably have the salt (probably somewhere in the code) and people dont use password managers. So rainbow tables are enough. Oh, I thought the first sentence was you and not quoted. Agreed with the above


> But by the sounds of it, SHA1 is more than enough (given that nobody here is willing to brute force the hash I shared above?)

Absolutely not and that is a ridicoulous conclusion to draw. State-level resources are absolutely not required to break sha1.

> but if your password _isn't_ cracked it's moot whether it's hashed with SHA1 or something more secure, as per above?

Again, absolutely not. The algorithm and cost setting have a huge impact on the practical likihood that an attacker will crack your password.


Many hashes are trivial to target, until you start getting to password hashers that force you to use lots of RAM or CPU (or ideally both) to check a single password. As long as you know what hashing algorithm was used (often inferred by the hash length or other details), you can shove it into hashcat or some alternatives and wait, either using a good dictionary or bruteforce. If you've configured hashcat to work well with a decent GPU, you're good to go.

Even bcrypt is not that hard to find a solution to a hash if it didn't use enough rounds.

I learned a bunch of this when a company I worked for was breached and wanted to see just how easy it was to solve out weaker passwords in our db.


As I said, I've heard the claim, but still question it. Here's a sha1 e7b7cdf949007abe7e8a190ba8eae56c60018c1f, how long does it take hashcat to break it?


I don't really follow your argument. You've never heard of a hash being brute forced? I've done it myself multiple times, both for pen testing purposes and for password recovery on systems I control myself.

The LinkedIn password leak contained hashed (but not salted) passwords, and some of those where cracked and exploited in the wild.

My old gaming PC with a 1060 can apparently do ≈ 6300 * 10^6 hashes per second. Assuming your password above is az-AZ, 0-9 = 62 possibilities (with no salt) it would take me 10 seconds to test all combinations for 6 characters and 30 days for 9 characters. And it's a trivially parallel problem, making it easy to throw money on to make it wall-clock quicker.

It's just a simple brute force problem, I don't see what there is to question (beside the choice of SHA1 for password hashing...).


> The LinkedIn password leak contained hashed (but not salted) passwords, and some of those where cracked and exploited in the wild.

The hashes of previously unused passwords were brute forced, or passwords were reused across sites from a previous plain text dump and exploited? Because there's a big difference between those two things. If your password is reused and originally compromised , you're screwed regardless, and having the leaked hashed passwords doesn't leave you in any worse a situation than before.

> My old gaming PC with a 1060 can apparently do ≈ 6300 * 10^6 hashes per second. Assuming your password above is az-AZ, 0-9 = 62 possibilities (with no salt) it would take me 10 seconds to test all combinations for 6 characters and 30 days for 9 characters. And it's a trivially parallel problem, making it easy to throw money on to make it wall-clock quicker.

So practically infeasible to exploit? The claims that are being made (even in this thread) are that having a mining rig would let you brute force a SHA1 hash, but based on the numbers

> It's just a simple brute force problem, I don't see what there is to question

If it's "just a simple brute force problem", and SHA1 is the only issue, then my question is what's the password in the hash above? You (and others here, on reddit, online) are telling us that this is a trivial problem.


> The hashes of previously unused passwords were brute forced, or passwords were reused across sites from a previous plain text dump and exploited?

I believe there are documented instances where previously not leaked passwords were cracked. Of course not 128 bit random strings, but still passwords more "complex" than what you previously posted. If you have 100 million hashes to try, you will crack some. People are generally have bad passwords, especially in 2012, even if the plaintext weren't available anywhere...

> So practically infeasible to exploit? It depends on how strong the password is and how much money you have to spend. For 32 USD I get an hour with p4d.24xlarge that has 8 graphics card, that in total can do about 175 * 10^9 hashes per second. 20 hours (and 640 USD) machine time (not wall clock time) on that machine can do what 30 days on my old PC does.

> If it's "just a simple brute force problem" […] If you can give me a bound on the number of combinations, and an AWS account to bill, I and many others would gladly attempt to crack your hash :-). But if your second hash is >9 alphanumerical characters we will probably just burn electricity to no avail.

I don't even know what you are arguing?

EDIT: Now that you have some numbers of hashing rates and cost, you can figure out how expensive different passwords are to crack with different approaches. Two common dictionary words with two numbers appended? 6 random alphanumeric characters? Then think about how expensive the cheapest non-leaked password is in a database of 100 million users are...

Is it bad to store plaintext passwords? Yes, obviously. Is some hashing better than none. Yes, obviously. Is salting your hashes much better than not. Yes, because with a salt, your first password wouldn't have turned up on Google / in rainbow tables. Is it even better to use a proper PBKDF. Yes, with a pretty aggressive PBKDF, brute forcing even low-complexity passwords become expensive very quickly, and we get the benefits of salting "built in".

Can SHA1 / MD5 hashes be cracked even if not the _exact_ password-hash pair have been leaked previously? Yes, very much so.


Right? "Its just a simple brute force problem", but sometimes that still takes a lot of force. Sometimes far more force than breaking a single account password.

I managed to lock myself out of a dogecoin wallet. I have the hash of the passphrase, so I figured I'd give it a go cracking it. After a few weeks (and a larger than usual power bill) I sent it to some friends with good mining rigs to try and take a stab at it, willing to split the amount 50/50. Its only the passphrase, not the full wallet, so I'm not worried about someone stealing the doge.

The passphrase is probably 15-25 characters, mostly not dictionary words or simple letter/number/symbol substitution, only symbols easy to type on a US keyboard. I'm now about 6 months trying to crack that password with probably a few hundred dollars of electricity used overall between myself and friends (I don't know their power bill), excluding hardware cost as it was already owned, and I'm not even halfway through the search space.

Can it be done? Sure. Will I be able to crack that password with a cost that's less than the value of the DOGE in the wallet? Probably not. Right now its really more of a gamble that I'll get lucky with the rigs running. I had to tone down some of my rigs as it was getting quite hot over the summer, but over the winter I'll be chugging away as the waste heat is just additional home heat. I'll probably need to rent a considerable amount of GPU power on a cloud provider to crack it, at which point maybe it'll take me days to crack it but ultimately cost me many, many thousands of dollars in GPU-time.


Salts being exposed is not a massive risk in of itself, as the purpose of the salt is to prevent the use of pre-computed tables to reverse a hash into plaintext, forcing an attacker to bruteforce each individual hash+salt instead of being able to reuse work.

With regards to crypto mines being used for breaking hashes, if you have one based on GPUs, yes, you could reuse GPU mining hardware for cracking hashes, albeit with relatively low hashrates for current best practice hashing algorithms.

If you're looking at something like Bitcoin's hashrate and thinking that it could be used to break SHA2 hashes, as far as I understand ASIC miners, this is not possible, as ASIC miners are designed only for mining, and they don't really accept non-mining related inputs (ie, no arbitrary inputs to be hashed, unless it matches Bitcoin's specific steps for iterating over nonces).


> Ideally that would be useless because things are properly salted and you don't know the salt

I'm really curious where people get their ideas about salting. It's not just a word. It doesn't make one password any more difficult to crack. It makes cracking every password in a given database more difficult to do. A password's salt is public information.


Relatively useless...but if even a few percent of people recycle passwords used for banking or crypto platforms it could be a profitable cache of data.


Maybe that Twitch is competent in the password department so they decided against it? But thinking about it, although it's unclear if two-factor secrets are included in the leak, but maybe the two-factor secrets may be usable to someone who has already the password of a victim. Unless it's the dongle-type one (WebAuthn/FIDO), the secret is common to both the server and the user, so two-factor bypass is almost certain in this case.


Doesn't seem likely to me. If the attacker has password hashes then they would want to keep this attack quiet so that the buyer of the hashes would have time to compute the passwords. If Twitch gets wind of this happening then a simple password reset would foil any efforts.


I'm hoping we will get to see a transparent report (from hacker or Twitch) on how this happened.

I think anyone would be excited to hack Twitch as the site alone - or any big platform for that matter - but this is quite literally someone just downloading the entire Twitch ecosystem and publishing it online.


Twitch has not been known to be transparent about anything.


It something I would expect security hardware to have automatically stopped. Even an employee shouldn't be able to download 125GB of stuff without flipping a safety switch somewhere.


Gosh - I've worked at shops where we handled multi-terabyte images and we'd regularly stream large chunks of that while debugging tools. I've also worked at places where data was king and 125GB of stuff might be a reasonable dispatch of data to help someone debug.

The volume of data is irrelevant - source code is usually teensy tiny and of far more value to companies than, say, three months of livestream chat logs.

I'm not certain what security hardware you're thinking of - but I'm pretty sure I hate it already since it doesn't effectively guard anything while making everyone's lives difficult. For effective corporate security you need 1) data use policies and 2) access control lists - both of those are generally more effectively implemented at an entirely software level.


Yeah volume is a terrible metric to go by. I work as a data engineer and a lot of the time if I am working between environments or when migrating between data centers will have a copy of the data locally that I can write tests against or move to somewhere I can compare it to a running output. This would be possible to do entirely remotely I guess but not nearly as easy. (note I never do this with anything that contains PII)


It is still fraught with problems, while you (knowingly) wouldn't do it with PII, is not all that reassuring, others could, or compromised system could be used to exfiltrate this data, if the only control is just trust on the users behaving well with their access

That fact in general industry the controls on how PII data is accessed internally is so lightly managed should worry everyone


Trying to protect against leaking developers/employees is like trying to protect against lone gunman terrorists: useless. And, if you try anyway, it is likely to cause more annoyance to everyone involved than actual protection (think TSA).


I disagree. Locking down and logging access to raw data like password hashes or payout information to only those who absolutely need it doesn't cause much annoyance and is very useful.

It protects the company against rogue employees (not even strictly malicious, but also curious employees who want to see more than they should). It limits exposure if an employee's account gets hacked (my pet theory for this Twitch hack). And if something does go wrong, logs help track down the issue/leak.

And at the end of the day, there should be a lightweight way to request access. Many times I've seen people request access that they didn't actually need. And most other times they have access pretty quickly.


Note that it was code that was leaked. Preventing developers from leaking the codebase they are working with is outright impossible. Now combine that with a "monorepo" and even the most junior developer has access to practically the entire company codebase and version control history.

And you can try to prevent them from accessing live/real customer data, but the cost is that they will never be able to debug issues in production. Most companies, even very large ones, are just not able to pay that cost. Not to mention that once you have access to the codebase there are a million ways to leak customer data anyway -- it is a lost battle.


Of course, some stuff you can't avoid, especially code leaking. Luckily code isn't usually that interesting or useful to external parties which is the only reason it isn't leaked more.

For the rest of the stuff, there's a sliding scale. In no universe does your average twitch developer need raw access to password hashes, for example.


What with security as it is on these companies, the code is literally the most sensitive information they can hold, specially in terms of value to the company. With the code out, expect lots more high-profile cracks in the coming months...

"your average twitch developer" needs access to the password hashes or at least the code that checks these hashes the moment they need to debug an issue which involves logging in, and from then its all downwards.


Nope, it was code AND data, including the sensitive type (e.g. user payouts).


Adding to your pet theory I think that WFH has led to a lot of people being casual about their workplace security. For example, leaving a laptop unattended at a Starbucks.

This is just a guess but I wouldn't be surprised if companies have to start taking stricter precautions with their security in a WFH world.


This isn't accurate. There are certainly companies that have extremely in-depth Data Loss Prevention toolsets and teams - everything anyone downloads or moves is logged and alerts fire if things look out of the ordinary. Google clearly had tons of data about how Anthony Levandowski was able to exfiltrate lots of info when he left.

The issue that building these systems accurately so they are NOT a constant annoyance is difficult, expensive, and takes a large team to support well.


There are ways to look for anomalous behavior without creeping too hard (even though it's a business's right to view and monitor all network traffic on their system).

If someone who doesn't have a business need to upload lots of traffic begins uploading large amounts of data, you may ask questions. Maybe you kick off a scripted playbook that then checks for increased logins to other privileged systems, or for large transfers of data from internal sources to the user's desktop.


I dont know dude, I work in an enormous company that you 've heard of, and it's impossible for me to imagine how to extract code out. I can't do it, except if I get remote access and film my screen while scrolling.

Anything else is found quickly. I certainly wouldn't even dream of someone extracting the repo.


Really, you can't simply copy files from a code repo you're working on? You work on a isolated workstation, not connected to any external network, where you are not allowed to bring anything other than plain clothes (TSA-style)? With a sizable army of developers all working this way?

And if it's a remote FB/VNC connection, what is preventing you from just recording the screen? Not really hard...

Most companies I've seen could see all their code extracted with one malformed NFS packet. These are "air gapped" systems holding the type of industrial secrets that we don't want to leak to china. Practically the only real line of defense they have is employee screening, which does not really stop the lone man guy.


If the bulk of it is a git repo, it's probably expected that every engineer will download it regularly.


Case against monorepos?


There are much better cases than this; in this case a monorepo makes it slightly more likely to be caught rather than less. (A monorepo can get to Google size and then you can't check it all out at once and it needs bespoke tooling, which can make it harder to pull this off.)

On the flip side while many smaller repos _can_ have independent ACLs, you are very unlikely to set those up until you reach a certain scale -- and then when you reach that scale it gets hard to implement ACLs across everything at once. So your engineers probably all have access to all your repos until you reach a very large size anyway. So the question becomes just "can someone write a for-loop over all of the repo names and check them all out," and it's like, yeah, that's not terribly hard, I as a programmer can do that pretty easily in bash.

Ideal repo size should not in my view be directed at "how do I prevent compromise to the external world," because VCS is not designed to give you the superpower of being resilient around being compromised. Rather VCS is trying to give you the superpower of time travel. So you should probably scope your repo to "what is the unit that makes sense to time travel with?" -- in other words if you are adamant that you have these independent services which operate decoupled and running this one backwards by a year should not affect that one, then those services should be in separate repos. If on the other hand they have some moderate coupling and rewinding this service by 1 year would break the APIs that that service uses to communicate... then those should ideally be in the same repo so that you can coordinate changes between them to their shared protocol.


> So your engineers probably all have access to all your repos until you reach a very large size anyway.

Happens at my company. We have rudimentary ACL but not sure how its implemented because you can find things via explicit searching, or via "organic finding" via links from repo->repo but it won't be surfaced if you just search for code.


You can still have a monorepo and restrict who has access to certain parts of it. You just have to build the tools to do it.

Google, for example, has a small number of subdirectories in the tree that only certain engineers can view (the really sensitive stuff, like the actual ranking algorithms for search and ads) but the build system is setup to allow you to still link against it.


Not particularly - unless different teams are highly focused on certain subsections of the repository. If everyone might have to look anywhere than you'll need to download all the repos - whether that's one or five hundred.


How often do devs delete and re-clone?


Clean OS install or new hardware should both be daily events at even mid sized companies. Because even if it’s once every 2-4 years per developer that still becomes extremely common in aggregate.


I think the tech giants have warped some people's expectations of what a "mid-sized" company is. I work for a mid-sized company where we roll our own ERP system and we probably average about two clean OS installs per year across the entire development team.


Yes, but GP said "every engineer, regularly" which seems odd.


I suspect lots of junior devs will clone fresh, push changes, nuke repo and repeat. I did when i was young instead of syncing state and rebasing.


> Even an employee shouldn't be able to download 125GB of stuff without flipping a safety switch somewhere.

I am trying to recall, but I am pretty sure when I worked in Microsoft Office that a build would pull down many tens of gigabytes of data.

125GB in one day from the build system wouldn't be uncommon!


That's ingress though. Companies should be monitoring and worrying about egress.

Edit: This won't help against a thumbdrive, but that type of thing should be also tracked.


I'm working on a project and just had to repull my workspace after some local corruption. I pulled 1.2TB out of the office and never got an email. I think it's pretty common for places not to monitor egress that closely.


There was a fad for tools that accomplished this in enterprise networks, with much clearer rules for who needs to access what (it was called "data loss prevention", or DLP) and those tools for the most part don't work. This is a harder problem than it looks like.


DLP products tend to be more about scanning the contents of data for sensitive patterns, at least in my observation of the market. There are other products (typically built into SIEM) that do correlation on login events, network traffic and whatnot to detect anomalous behavior.


I’ve worked on a lot of DLP projects in big enterprise, and I have a very dim view of the entire category of product. A lot of their functionality is just magic black boxes, that unsurprisingly achieve very little. The primary motive for deploying them is not that they’re particularly effective, it’s so that you can tell auditors and other scrutineers that you’ve got a “DLP solution”. The idea that you can grant people access to huge quantities information, but then very strictly control what they do with it is fundamentally flawed. Especially on networks that require large amounts of in and outflow for BAU. Even the most tightly controlled data in the world cannot be protected from an inside leaker (or adversary who has taken control of an insiders access), because it runs into the same “analog hole” issue that DRM products have.


My company has this. It encrypts any file touched on USB. And other software logs every app run. Prevents casual copying but easily circumvented. But somewhere logs may have enough info to trace the source of leak I guess.


These tools (DLP) have gotten better with app migration to K8s, since traffic can be watched prior to encryption in a standardized way. Just an FYI….


The enterprise DLP tools were deployed fleetwide as agents and at network choke points; getting access to the raw data wasn't the problem.


Thank you for mentioning this. I always had a gut feeling that it seems like an extremely hard problem to solve in a sensible way.


> It something I would expect security hardware to have automatically stopped. Even an employee shouldn't be able to download 125GB of stuff without flipping a safety switch somewhere.

Remember that Twitch handles streams. Good luck implementing this without having all sorts of false alarms everywhere.

Plus, you don't have to exfiltrate 125GB in one go.


I feel like once you have it pulled downm, it would be as simple as an upload to s3 (which wouldn't trigger any flags), then making the bucket public whenever you want. Hell, S3 used to (still does?) support being part of a torrent swarm...


Why would that help? They just have to accumulate work over a period of time and then 'lose' their laptop.


That's 6.25GB/day over a 20 day working month. More time, less data per work day, harder to detect.


And it might be disguised as a video stream coming out of the video streaming servers.

But it could also be a 128GB thumb drive plugged into the system somewhere.


> And it might be disguised as a video stream coming out of the video streaming servers.

Just log in to FB messenger or Discord and egress it as small data chunks that way. Lots of people have private chats on work computer for practical purposes.

Discord allows for bots, so you could easily write a script to chunk data and egress, and another to re-assemble.


ML engineers / data scientists are regularly moving terabytes of data around at Amazon.


Indeed , how could this happen, really curious.

So let's say someone with access to all GitHub repos gave the password to someone else, maybe then it was downloaded from another machine?

Or someone stole the credentials and downloaded from another machine?

Or someone got access to such a machine?

It's it not possible to prevent these cases?

How long does such a download take?


Cue monorepo discussion


Cue "Don't check payment receipts into git" discussion - although I strongly suspect this hack wasn't just about acquiring appropriate credentials and then running `git clone`. It sounds to me like a backup service was compromised.


There are so many indiscreet USB pentesting devices easily purchasable by anyone today, I'm actually surprised this sort of thing doesn't happen more often.


Shouldn't that be discreet devices? Or do they make a really high pitched whine with a big flashing light when they start transferring data?


"Hey, Jeff, what's that weird thumb-drive over there that keeps texting me `I'm in your datacenter downloading your datas'?"


ITT: people shocked that something like this could happen at a company the size and profile of Twitch.

Running security at scale in a hypergrowth B2C company is very difficult. It's also completely different from running security at a startup, in a B2B company, or a slower-growth situation. _Every_ security executive and manager I've met has given up in frustration after 12-24 months and gone to take a cushy FAANG job instead.

I'm not surprised at all. My experience in security at a larger SV unicorn was that changes only happened in the immediate aftermath of a security crisis. Otherwise, there was incredible inertia and you just wouldn't be able to get the institutional support you needed to make progress.


It's funny because for me each letter of FAANG is an hypergrowth B2C company...


All of them have very significant B2B products.


How much of this is a holdover of lax security practices from before they were acquired? I can’t imagine AWS being managed in a way where local network access gives you keys to the kingdom. Then again, EC2 instance profiles do let you do quite a bit.


Conflating AWS security with twitch security is probably the wrong way to think about it.

Within Amazon those are almost going to be two entirely separate companies, with very different security focuses.

The idea that Amazon is monolithic and uniform wasn't true when I left there in 2006, and I'm certain it is less so now.

And that isn't just that its related to the merger, but that fundamentally its different business orgs with different focus.


But does twitch not share the same Amazon wide git service? Could most of Amzn code be leaked or compromised? Seems like all of amazon internals that shares security measures is at risk...


I've heard (but don't have any actual evidence more than hearsay) that Twitch generally operates independently of Amazon/AWS. I'm sure that they share some things, but I wouldn't be surprised if their source was separate from the "main repo"


Remember that Amazon runs one of the biggest multi-tenant service platforms in the industry! A separate business unit like Twitch is likely to be set up a lot like any other random AWS customer, and you wouldn't expect that compromising servers used by one AWS customer to automatically compromise the underlying infrastructure.

(I would also expect that the Amazon retail systems are in most senses "just another tenant" on AWS, albeit with much more liberal quotas!)


I always had the impression that Twitch were operating in a largely independent fashion. For instance, it had been an open secret for years that one of their executives had been sexually harassing female streamers. Only a year ago he was finally fired. If Amazon had a firmer grip on Twitch, I'm sure they would have stepped in much earlier.


If you go back to the Adobe software breach circa 2013, a large part of their issues were the bolt on connections between acquisitions. It's honestly the most common thing I see in the startup world.


> It's also strange that someone who has this level of access to what is presumably a multi-billion dollar company decided to just leak the data?

From what I heard about Twitch-interns over the years, it seems the company is more a third-rate-s**hole that grew too big too fast and accumulated a huge amount of technical debt and fatal security flaws. Making billions doesn't mean anything if you don't invest them back into the important corners of the company. It's considered a miracle that the platform is still working that well in that state. And what comes from the leaks so far supports this view.

Though, said that, it seems they did start to improve one or two years ago, just too late to prevent this critical hit. But considering this was also a strike that avoided the deadly parts (yet), maybe there is a different aim here and the company can grow from this? It will be interesting to see how Amazon will react to this.


> From what I heard about Twitch-interns over the years, it seems the company is more a third-rate-s*hole that grew too big too fast and accumulated a huge amount of technical debt and fatal security flaws.

I mean this as a genuine question, but is there any company that didn't end up like this after an exponential growth phase? I'm not saying it's okay, but this feels par for the course. I've now been at two start ups during that hockey stick growth time and both went through this as well.

I'd be curious if anyone here has worked at a large, fast growing tech company where they didn't accumulate a ton of technical debt during growth. If so, what did the company do to prevent that?


Generally yes, but Twitch is not your average startup. It's now 10 years old, and 7 of those years it was owned by Amazon, which should have enough competence and manpower for bringing it onto a good course. But from what I heard, Amazon did neglect Twitch for a long time and focused too much on making it a profitable business by all costs. Because of which they had all those scandals and problems in the last years. It's a business-platform, where technology is just an afterthought.


Does anyone know if Twitch employees have two factor auth? Having access to an employee's account would be the easiest way to pull this off.

It'd be strange if they don't have two factor auth, of course, but it's just as strange to have this large of a hack.

I think if it is a simple case of an employee account takeover, then the attack would "work" to some extent at any company. Larger companies typically have strict data access requirements, though. Good luck finding the few employees who have raw access to Google password hashes, for example. And even more luck knowing how to get that data if you do.


> Does anyone know if Twitch employees have two factor auth?

Yes, IIRC everyone at Amazon has a hardware security key (which is more secure than the standard mobile app TOTP most of us use everywhere online).


>(which is more secure than the standard mobile app TOTP most of us use everywhere online).

Is it though? The "wrench theory" applies here. It's not unthinkable that an employee was stalked on social media and had their key stolen.


Its still more secure. Rubber hose cryptanalysis applies to both equally, but that doesn't mean there aren't other attacks that apply to totp which don't to yubikeys.

More secure != perfectly secure.


With a phone you need my passcode to accept to 2FA request (assuming lock screen notifications are disabled). I think yubikeys can work without a passcode as long you plug it in right?


Right, but presumably the site is already asking for a password, and if the attacker can bypass one password, im not sure its a safe assumption that they cant bypass two. However fair enough. Some yubikeys do involve fingerprint scans too though.

The main security benefit is unphishability. With yubikey/webauth crypto is used so you can't give the code to the wrong website. Phishing is a pretty major cause of account hacks generally, so pragmatically that is a very big win.


It's still the same, 2fa.

With a Yubikey, you need to use your password to log in to your computer, and then need to auth using Yubikey.

With OTP app, you need to use your password to log into your computer, passcode for phone, and then auth.

In both cases, it's something you know, and something you have. You could argue that the app based is a bit more secure in that you need two passwords. On the flipside, if your phone gets pwned, someone can access completely remote.

Everything is a tradeoff.


Why would you need to log into your computer with a yubikey? Wouldn't any computer (including the attacker's computer) work?


Amazon still has a passkey requirement, it's not just a touch of the key, and these passwords are different to your user passwords at login.


They require a physical touch.


Yes.

I don't know which protocols they use (obviously), but if they use WebAuthn, everything is public-key signatures. Even if you leak everything from the server, public keys buy you nothing.

https://webauthn.guide/


Every Twitch Developer has 2FA even 3rd party developers are required to have 2FA I also think, but don't know, that this applies to Twitch Broadcaster Partners as well in order to have their tax information in the system.

Luckily iirc from a conversation with a senior Twitch engineer the Tax information backend has been migrated to Amazon. So hopefully that did not leak... Because that would be full legal name and addresses of a ton of streamers that likely have stalkers.


Twitch partners also have forced 2FA for quite some time now, should be a couple of years now - at least more than a year though. Covid killed my sense of time.


Facebook [2011] was pretty bad…

https://www.theguardian.com/technology/2012/feb/17/facebook-...

…except Mangham didn’t ever get to release his spoils to The Internet?


> I can't think of another, large, corporate web 2.0 startup who's gotten owned in a similar fashion

Linkedin, Microsoft, Yahoo, Google


I mean, it did work on Amazon (a division with poorer security probably, but still). 4chan is a truly special place


From an ethical standpoint, any code that amplifies and profits from radical speech should be fair game for release. If employees or hackers feel the need to release info in that regard, so be it. This is the risk defined in such models and should be mitigated accordingly.


Who decides what speech is radical enough to compromise the privacy of users?

And if speech is "radical" meaning to the point of illegality, shouldn't the legal system decide, rather than the court of public opinion?


Radical as in pushed to the extremes, not radical in thought to the general population. See https://www.youtube.com/watch?v=rE3j_RHkqJc&t=1s

That I've been DONT MENTION ARROWS ON HN on this post is a good indication we're not close to solving this by a long shot.


> this isn't something I'd expect from an Amazon owned property

Because you expect Amazon to put security priority over new features and profit? We have very different understandings of what Amazon stands for.


>Because you expect Amazon to put security priority over new features and profit?

I don't know what you think Amazon stands for, but Amazon runs the largest cloud hosting service in the world - AWS, which not only runs a large number of other large companies but governments as well. I know, first hand, that their datacenter security protocols are state of the art.

Amazon has a much larger surface attack area so if they were playing fast and loose with security, chances are we would know already.


> Amazon has a much larger surface attack area so if they were playing fast and loose with security, chances are we would know already.

I get your point and I am no taking about AWS but about Twitch. Each part of the company has its own incentives. Amazon is well know for not caring about quality nor its employees. In my experience with corporations there is little to no technical sharing between different parts of the company. AWS could have the best SecOps in the world and Twitch could have no security at all. Is your experience different?


I'm not sure what point you are trying to make. If you look at most of the high profile hacks and leaks in the past 20 years, very few of them are from web 2.0 tech companies (e.g. Google, Facebook) rather than dinosaurs (ex. Target). Those that have (like Google) have only been successfully breached by nation actors (e.g. China, NSA).

As far as I can tell, there's no data to back up the assertion that these large tech companies are disregarding security if favor of profits, except for Twitch now, which is why this leak is interesting to me.


> In my experience with corporations there is little to no technical sharing between different parts of the company.

Amazon is all about sharing efforts with the company. That's the whole point of AWS - its a monetization of this efforts. Most older AWS services started out as internal services that someone realized was generally useful.


EC2, Amazon's cash cow, competes with nearly identical offerings from Microsoft and Google, and is not a place where additional features are often all that valuable to customers. Any sort of breach like this on EC2 would seriously hurt Amazon's bottom line and they know it.


Someone actually started streaming going through the code ... on twitch.

https://www.twitch.tv/deepfrieddev



On one hand I understand why you'd ban that kind of content, on the other it's essentially public information now... what's the point.


Because everyone else doing it still doesn't make it right.


The streaming part or the downloading/looking at code?

You can look at leaked source code for educational purposes in most places (not legal advice). As far as I understand leaks are commonly used in vulnerability research for example (if the bad guys can use it so can bug hunters).

Streaming copyrighted material is a separate issue - but using it for "criticism, comment, news reporting, teaching" should fall under fair use, no?


What's wrong with looking at public code? The code is public, regardless of how it became public - this isn't someone's personal life being exposed. If twitch is damaged by streaming this, it's only because their poor code quality is being examined publicly.

I can certainly understand why twitch banned this and don't blame them (although I think it's stupid), but I see nothing unethical about openly talking about this code in the public now that it's already there.


> What's wrong with looking at public code? The code is public, regardless of how it became public

Copyright would disagree with you, and I would say that ethically it is basically the same as stealing it yourself. You're profiting off of someone else having done the dirty work for you.

> this isn't someone's personal life being exposed.

Apparently a lot of payment information, telephone numbers, etc. was also in the leak. I don't think we should downloading or encouraging people to download and peruse that stuff.


> You're profiting off of someone else having done the dirty work for you.

I don't think anybody is streaming this stuff on twitch with the intention to make money, anymore than someone sharing it on a blog is trying to make money. Sure, in that edge case I'd agree with you, but it seems like the exception to the rule (after all people can just go look at the code themselves for free). I'm not talking about the guy who stole the code and is likely ransoming Amazon with it - I'm talking about people that just like to talk about code because it's something they like to do (there's an entire category for it on twitch already).

> Apparently a lot of payment information, telephone numbers, etc. was also in the leak. I don't think we should downloading or encouraging people to download and peruse that stuff.

My limited understanding is none of this information actually has been leaked yet, and is likely part of a future ransom (I could be wrong, I haven't looked because I don't care). I don't condone sharing that either, but that's not what the guy streaming was sharing. I'm talking about discussing the source code which is already publicly available.

> Copyright would disagree with you

I know very little about copyright so I'll just assume you're right. I still see no ethical problem with openly discussing this code publicly though. Anyway, agree to disagree.


They = you. It's fine to be honest, you're not exactly making it unobvious.


"Sorry. Unless you’ve got a time machine, that content is unavailable."

Too bad, it would be nice to see someone go through and document how Twitch works. I've never worked at "web scale" so I'd probably learn a lot.


> I've never worked at "web scale" so I'd probably learn a lot.

As someone who has worked at both large and small companies, you'd probably be disappointed.


It's likely lots of bubble gum and chicken wire. I'm sure in the video ingest and transcode side of things there are some really interesting bits though. When you're owned by Amazon you don't need to optimize too much to achieve web scale... just leverage AWS services. It's not like you're going to get a bill.


> When you're owned by Amazon you don't need to optimize too much to achieve web scale... just leverage AWS services. It's not like you're going to get a bill.

Oh you're be surprised. Divisions get billed constantly for the AWS resources they consume, and this bill gets taken out of their annual budget. From what I hear, this is a common practice in most large organizations.

Also, the AWS services you can access from within Amazon are almost identical to the AWS services you can access as an external customer. It's equally easy/hard for a random company to achieve web scale, compared to Twitch.


A lot of it is probably hacked together -- like, embarrassingly hacked together lol


This is true about almost any company. Closed source generally means you can have lower standards.


You’re being downvoted for being overly negative, but the ops code is of (literally) shockingly poor quality.

This leak has made me understand clearly that code quality is not what makes a product great.

I guess that’s something.

The jenkinsfiles are mostly nice and clean though. I’ve definitely seen worse of those.


Oops, didn't mean to be too too negative. I say embarrassing in the sense of, I've definitely shoved out awful code because something needed to get out(tm). And with large companies, deadlines that cause that situation are inevitable.

But I also say it like that because, well, I've seen code that causes (objectively easy-to-fix) crashes but still ships because of one reason or another: laziness, politics, inexperience. It's a part of software engineering I'm still trying to accept.


Yep, there are lots of small services that don't seem production ready in the source code. Though admittedly we don't know which of those are deprecated.


Well, you know what they say, "Self help is the best help."


I hear Netflix has a good tech blog ;)


Hah. This is like when reddit does something people don't like and there is a huge thread about it ... on reddit.


It is really fun to go through the source code. You'll find interesting architecture diagrams, documentation etc. It's like joining a new job and being amazed how a service you actually use was build.

Everyone interested, just download the code :)


Channel is gone, banned?


Yep, we saw it happen live.


It just got disconnected.

The chat had a few Amazon insiders, which was interesting to read their perspectives.


Any bits you recall from the chat?


This no longer works. Guy got banned I think.


And banned


got banned


aaaand it's gone


There's something about this sentence that I find hilarious:

The download was posted to 4chan today, described by its unidentified source as “part one” of “an extremely poggers leak,”


I find it extremely ironic that they whine about Twitch being a "disgusting cesspool"... on 4chan.

> Calling Twitch a “disgusting toxic cesspool,”


Ironic? Why?


Because as far as cesspools go, 4chan is the most toxic and disgusting


That's a bold claim.


This hack was not very xqcL of them.



> including its source code

This will help with ad preroll blockers.

I would love to see someone look deep into Twitch recommendation system - last time I tested the thing they call "Feedback" is a rolling buffer and wont let you exclude more than ~100 things, adding more simply removed oldest entries and started spamming you with things you already excluded in the past. This looked like performance optimization (less things to track per user).


This won't help with preroll ads because the video segments themselves are replaced in the stream data. They're not ads, but it's not the stream either.

You get a "twitch commercial break in progress" video for the time the ads are playing.

You can check this by loading a stream with MPV.


aaand new ad bypass dropped 4 hours ago :)

>You can check this by loading a stream with MPV

I watch all of my twitch using mplayer. "magic incantations" when generating access token is what produces ad free .m3u8. For example early methods involved setting origin and/or referrer headers to internal Amazon systems.


I'd be interested if someone could get their own instance of Twitch up and running from this leak. Someone mentioned internal API's, which would have to be reworked to avoid detection, but it'd be interesting to host it on AWS just to see how long it takes to get shut down.

How would current AWS policies hold up? Obviously the code would be illegally acquired, but do they have detection mechanisms in place?


Even with source code it is hard to run a service if not impossible. You would need well written documentation that explains various options and error codes you could potentially get.

Many times there is some magic command only one guy knows and he will share with you on slack.

Rubbing a service of any complexity takes years of institutional knowledge.


Please don't rub the services, it causes unnecessary friction, and wear & tear.


100s of services and databases to work out and sort through. Good luck building a global real-time video CDN too. You could build your own faster. Microservice architectures mirror the org that built them. You wouldn’t do it the same way for yourself.


The top streamers' earnings were also leaked: https://www.twitchearnings.com/


lots of discussion and speculation from a few hours ago here:

https://news.ycombinator.com/item?id=28770590


We're just walking into a future where these kind of leaks happen every other day, aren't we ?


does it matter? social networks arent some obscure technology, but making them successful is


We are already there it seems


I wonder how often these "hacks" are just an engineer leaking the info.


Hang on, is this just a repo dump or not? Because it looks like a repo dump, in which case I would be very surprised if any passwords or other personal information is included, at least at a reasonable scale.


Anybody took a peek?

What language, and framework if they use one, do they use?


Here are a few screenshots of go and php: https://sizeof.cat/post/twitch-leaks/.

WARNING: do not click the link, copy it and paste it in new tab.


?


They check the referrer, see it's from HN and redirect to an image instead.

So by copy + paste into a new tab, it will lose the HN referrer.


on firefox, disabling referrer means you won't see the image

    network.http.sendRefererHeader = 0


I know the original frontend used ember.js but then they switched to react... that's about all I know :D

(twitch used to sponsor and attend local ember.js meetups)


A mix of Go, Ruby, Python, Elixer from what I saw.


Archive of the original 4chan post from this morning: https://archive.is/8rQNK


Is this the first time actual Amazon infrastructure has been hacked? Anyone has Amazon been hacked pervious to this? (Not talking about insecure AWS accounts)


Since the main leaked files are from github, I'm assuming they got it from one of the many reported github auth flaws which don't get fixed and allows access to private repositories. Or more unlikely, via someone getting sloppy with their laptop.

Now I wonder if the commit history has database dumps or sensitive information, which is a common practice, or if any twitch servers have been accessed through a breach or privileged information found in some of their source code.


I'm pretty sure a company of Twitch's size uses on-premise GitHub.


Yup, and AWS Code*


Which Github auth issues are you referring to?


As an avid Twitch streamer, what do I need to do to protect myself?


Change your password obviously, maybe even reset your 2FA if those codes are in the leak.

And if you want to be perfectly safe, don't visit twitch. Because if that source code has any vulnerabilities they might be exploited against twitch visitors as we speak.


Also change any account with a password that's the same as your twitch account. Once they know your twitch password they will try it on your related accounts.


Report your earnings on your tax.


What language is the main website written in?


Typescript/React and Go. Ruby and Ember once upon a time.


A lot of Go and Ruby


at we least know their backups were 'complete' ! This hack seems to includes everything and the kitchen sink !


[deleted]


I thought it was pretty obvious that that was a joke.


From banned usernames, "Jesus".

Yep. From Mexico to the Pagonia and Iberia, let's screw a few millions of users.


Does it take a genius to figure out how to build twitch? It’s a modern crud app with video streaming.


I figure you could "build a Steam" in a couple of years, with the right engineers hitting the main features. There's very little magic at the technology level, and you can make life simpler and forget about minor things like the hardware survey or the pretty graphs. I'm not saying this is trivial, but it's definitely doable.

This is a far different statement than "You can build something and compete with Steam in a couple of years". Most of the really hard problems are not technical. Success ain't gonna happen without a bunch of pain, sweat, and strategic stumbles on the part of the competition.


Sir (Madame?), I ask you one simple question:

Was Twitch built in 10 years, or over just a few?

Steam was built since I was in FUCKING high school. Im old now, well over 30.

Apples, and blueberries.

Bluebarry, Drewbarry, tomato, ToMaHtoH.

Fuck their stupid ass streaming code, it’s a giant crud app, only their devops team can take credit for scaling, everyone else is not worth a shit, sorry, thats life, I gotta Leetcode too, and ur code isn’t worth me reading it, leaked or not).


Based on what I read of the ops code… don’t give them too much credit.

The thing I learned most from this leak is that the technology side plays very little part in the business being successful or not.


This is such a Hacker News comment.

It's just a crud app - why do they need more than 10 employees?


inefficiency.


A lot of the secret sauce of such things are not that secret but just take a lot of work.

Building and maintaining infrastructure simply takes a lot of people, time, relationships and whatnot.

They get good at it over time which I guess could consider some secret sauce but there isn't like some secret code that makes the whole thing way better that now you'll see tons of competitors.


Everything is easy to build until a small nation state’s worth of people want to use it at once.


I work in a small nation state.

That doesn't stop CV-hungry engineers from finding ways to overcomplicate it.

(I do agree with you on this topic in general)


You completely misread me.


Oh I'm sorry, here's what I read:

"Building apps is easy as long as you don't have millions of users. For that you have to actually think about bottlenecks, the larger architecture etc."

(I agree with that)

What I wanted to express is that lots of engineers I personally know instead say

"Building apps involves thinking about every bottleneck in advance and optimizing for every possible user scenario and a global user base, regardless if the number of users is only ~100."


"Building apps involves thinking about every bottleneck in advance and optimizing for every possible user scenario and a global user base, regardless if the number of users is only ~100."

I would advocate the exact opposite. If you need to scale to X users focus on making a great platform for X users, even if it’s only 100. If you try to over-engineer instead you’ll prematurely optimize and will make poor decisions that’ll come back to bite you when you actually DO need the scale and the requirements change.


Just stream the video, it’s easy!


Netflix & Youtube in shambles.


I found out how to destroy reddit. Just fork their repos! https://github.com/reddit


Sadly, Reddit stopped updating the public repo for their main application in 2017:

https://github.com/reddit-archive/reddit


It's for the best. New reddit is awful.


so it won't even have new reddit, even better!


I mean doing Youtube is even easier; it's just a wrapper around HTML5 video.


Everything is just a crud app with a few extra steps.... yet you're not Zuckerberg or Dorsey


One shouldn't aspire to be a Zuckerberg/Dorsey.


I personally don't, many people do though. I used them because Facebook and Twitter could be easily summed up as "crud app"


I’m so misread, Twitch is a lot of luck, so is all of these companies. Show me the the source code for luck. I don’t give a fuck if you leaked a video streaming crud app code lol.


This perspective is immature at best, but genuinely ignorant of how tech works at twitch scale.


I’ll grant you immature, nothing else.


You're missing the _hard work_ part. Sure there's always an element of "luck" in any story of success, but that mostly has to do with timing, and is much less weighted than the perseverance and hard work of the people building it.

Twitch is a full-featured, very mature application with many moving parts outside of just the video streaming, and building all those parts took an incredible amount of time and effort.


It’s just luck. I mean, if I was a storyteller, what story would I have to tell if there was no story.

They hit.

It’s sort of like we all hold Golden dice, so we marveled, by our own eyes, at the gold.

Dealer: You rolling those?

Us: no, it’s gold.

They fucking risked it. It’s not a engineering feat, we’re all a bunch of pussies.

Twitch is easiest site to build, you might as well show me a todo app (which will be sieged and dismantled), scale is solved, we will eat your applications, the barbarians.

Rome falls.


You'll be rolling dices for a long time of that's all you need to build a twitch clone ;)


It's a very polished, state of the art crud app serving millions of people, it's always interesting to see how it's made.

I personally don't give a single fuck but I can see the appeal for some people.

It's a bit like the great pyramids, it's just a big pile of rocks but we'd be really interested in knowing exactly how the made these big piles


You don't need a genius. You need a few good people, and a lot of hands. I think the best way to look at things like Twitch is to compare them to cathedrals, bridges, things like that. You might be able to have the idea and sketch the plans by yourself, but it's physically impossible to build it yourself.


Like all things web, the problem is scaling the platform and moderation/security. It wouldn't be hard to build a toy Twitch clone no. But it takes tons of people and money to scale it / secure it. And even with all the security, they still got hacked...



This reminds me of the Albertsons guy on Blind who inadvertently created a meme when he said that Facebook could be rewritten with a small cluster of Oracle dbs. The meme is that Albertsons people are so elite, they work and think in a higher level of existence, way above the scalability bs us commoners are accustomed to.


Right, just like a plane is a car with wings.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: