Hacker News new | past | comments | ask | show | jobs | submit login
The 111M Record Pemiblanc Credential Stuffing List (troyhunt.com)
87 points by johns 8 months ago | hide | past | web | favorite | 71 comments

So in the past I've advocated password algorithms (sometimes called password formulas):


I felt like they could bridge the gap between a regular person who is weary of having to look up every password using a password manager (although a lot of them make it easier with browser plugins and phone apps, but it's still an extra step).

However, in light of the recent Gentoo vandalism, it seems like a user had their password formula figured out. Algorithms do guard against credential stuffing; that particular person was most likely specifically attacked. If you have a strong formula, it should take at least 7 or 8 passwords to begin to figure it out.

At a minimum, if you have non-tech friends who use a single password for everything, start them off easy: You should use a manager. It's the only way to guard everything. But if they don't want to go that route, at a bare minimum, recommend that they need three passwords. One that's highly secure for banks, employment and government. One insecure for everything else. And finally one for your e-mail which should be shared with nothing!

Password algorithms are a step up. It's a trade off of course: you are protected against credential stuffing and you don't need a manager; you can have a different password for every site without having to memorize a hundred password; only the exceptions to stupid password rules. The trade off: your algorithm probably sucks and if you're targeted specifically, someone can get to everything.

Every aspect of security involves trade offs. The various password management choices, along with their advantages and disadvantages, should be taught in high school.

A variation on the password algorithm: Generate half of your password by using the algorithm. Create the other half, one per site, using a random algorithm, and write it on a piece of paper (if the site has stupid "security" requirements for the password, you can usually fit these into your random string).

To regenerate your passwords, an adversary would need both to figure out your algorithm and obtain your piece of paper.

If you're already writing half the password on a piece of paper wouldn't it be safer to generate the whole password randomly and write that down?

If only half the password is written down, anyone who obtains that paper without knowing the algorithm only knows half the password!

Oh, that's right, thanks!

Or, you know, use a password manager.

Which password manager is the most secure / robust / potentially long lasting company out of: LastPass, 1Password, Dashlane, Keeper (or others?)?

I use LP, I also used 1Password professionally and I found it cumbersome but your mileage might vary. I disliked Dashlane and never used Keeper. They all do roughly the same thing, the difference is in UI mostly, just test it out and see what you like best.

Password algorithms, aka deterministic password managers, are usually pretty strong, specifically as strong as the master password(s) you use. Meaning that it’s usually easier to guess the mp than to reverse the algorithm.

Personally, I have different mp for different “security domains” (google/fb, banks, other socials, ...), and I’m using just a sha256 plus encoding — a trade off between requiring a stronger mp, and being able to easily remember everything, including the algorithm.

I wrote more about it here: https://hackernoon.com/mempa-a-modern-deterministic-password...

Ew. Friends don't let friends use low-entropy passwords.

"Password Strength" https://www.xkcd.com/936/

Diceware: http://world.std.com/%7Ereinhold/diceware.html "This page offers a better way to create a strong, yet easy to remember passphrase for use with encryption and security programs. Weak passwords and passphrases are one of the most common flaws in computer security. Take a few minutes and learn how to do it right."

As someone technically literate but doesn't use a password manager: I sign up for a lot of services on one device (home laptop) and then need to use them on another device (work laptop, phone). How does a password manager work for this?

I currently have about ~15 different passwords I use. I know which to use based on how long I've been using the service. Why is this strategy ineffective?? At most a hacker could get 3-4 of the services I use, and even then they'd need to find each of those services out of the hundreds I use. I also have 4 different emails I use for logins.

Most password managers provide apps and syncing, and are often integrated into your browser, so everything is a click away. Having to juggle 4 emails, remember which of 5 passwords a site uses, and figure out your exposure in case of a breach seems a lot harder than the above.

I happen to use the open source password manager from Keepass.info. It works on a local password file protected by strong crypto. Then, I use Dropbox to sync that file from device to device.

The problem with using the same password on multiple sites is this: if any one site gets pwnd, it gets a lot easier for the cybercreeps to pwn your account on other sites (says Obvious Man).

It doesn't take much technical skill to credential-stuff--to hammer a lot of sites with a list of credentials. So, keeping the list of sites you actually use a secret is not effective.

This whole deal sucks. But it's real.

Personally, I use LastPass across browsers in multiple desktop accounts and my mobile browser, with no issue. You create a LastPass account, presumably with a very secure password, that you can log into elsewhere.

As for the password strategy, I imagine you could be vulnerable if any two important accounts - say, email and bank - both used the same password. Are you confident this is not the case?

I assume certain emails associate with certain types of accounts, which could flaw your strategy. If you're able to remember ~15 different passwords with random emails, congrats on your stellar memory!

I use Google Smartlock, and it functions across all my (android) devices quite well. It does sort of rely on your being all-in on the Google ecosystem. At work we use LastPass, but since I only use it on the desktop in a browser I can't speak to how it works across devices.

LastPass works for multiple devices, including mobile - you can sync to 1 LastPass account, too.

Does smartlock work with native mobile apps (which may or may not use magic webviee for auth), or is there a way to manually transfer password to log into an app?

Password managers across devices:

Either use a service that syncs up to a server, or a standalone apps and save its encrypted database to a shared filesystem such as Dropbox or pcloud.

As long as you don't care if at most 4 of your accounts get hacked, then you have an effective strategy, I guess.

>The trade off: your algorithm probably sucks

I suspect most people will end up having weak algorithms the same way they have weak passwords.

These data breaches where the source isn't known can be frustrating. As someone who already uses unique passwords for everything, there's not much I can do (change 500+ passwords?). And I can understand Troy's argument[1] for not sharing the leaked password, so that doesn't leave many other options.

I guess I'll just start going through my saved passwords and use them to delete all of the old accounts I rarely use, maybe with a little help from the GDPR.

[1] https://www.troyhunt.com/here-are-all-the-reasons-i-dont-mak...

My solution for this is to use a unique email address for each site/ service. That way if I see that hn@mydomain.com has appeared in a breach, I know both where the leak came from and which password to change. Also helps identify the source of any spam emails...

You can also do this with Gmail by adding a . Or two randomly in your email.

Gmail and other MTAs support +something in the e-mail address user part too. If you forget your password, you do have to dig through your e-mail and figured out which one you used, but this method does let you track down when someone sells/shares your e-mail address or 3rd parties.

You just have to remember the exact username/email you used in case you forget it. That can include the sitename itself, or some simple transform, but sometimes services change names... so make sure to keep records of exactly the email used for each service (or don't delete your email from them), forgetting that is worse than losing the password, since there's often no helpful recovery service they offer.

The bigger problem is MANY MANY sites don't accept the (+) in an email address.

Yes. More and more sites are using common frameworks and/or validation libraries where a + is not considered to be an acceptable part of the recipient name.

this method does let you track down when someone sells/shares your e-mail address or 3rd parties.

Unless they strip out the +something part.

Based on my experience this unfortunately does occur, as does removal of dots in the local part.

spamgourmet.com is this idea as a free (and awesome) service.

Good idea

I really don't think HIBP should even be publishing or notifying people about these. It's almost always existing breaches just merged together in a different way. If I went and grabbed the raw torrents and combined them in various ways I could make hundreds of different "credential stuffing" lists. Would HIBP list and notify people about all of them?

This post doesn’t mention it, but on past credential stuffing lists that got loaded, Troy mentioned how many were new to the HIBP dataset. I’d assume that there were enough new emails on here to make it worth loading.

94% of the email addresses were already in the database according to the Twitter account. 6% still represents many millions in this case, but perhaps it's unnecessary to notify the ones already known.

Post does mention it.

Once the huge password Torrent is updated with Pemiblanc (9 GB, last updated March 1, 2018), you can download it and scan it for all your passwords locally. Then you can determine which are pwned. You'll have to SHA-256 them all, but that shouldn't be too hard.

Thanks for pointing that link out, I hadn't come across that API before.

I've never tried to follow up with which accounts/passwords have been used from haveibeenpwned, can you describe this further as I'll try this.

The "huge pw torrent" is something I can just search on torrent trackers? Once I have the list, its just a list of passwords, or includes the emails? Then they're sha-256 hashed and I need to ..unhash them?

I think he is referring to Troy Hunts pwned password list.(which is about 9gb, afaik) Presumably it will be updated with these new plaintext passwords.

Once it’s updated, you can check all your passwords against the list. It’s a list of sha256-hashed passwords.(so he isn’t sharing tons of plaintext passwords, as sha256 can’t be reversed) You would sha256 your own passwords and check them all against the list.

Edit: to clarify, I think there are tools to help check against the offline list pretty easily. Or you could also query Troy hunts pwned password page (or its api)once it’s updated; instead of downloading 9gb. The k-anonymity model is pretty clever, and querying the site should be secure.

I think Firefox is going to use Trent's list to check passwords use when browsing. That would actually make Firefox more functionally useful that other browsers.

I also want to see this.

Well, you can see who leaked the password by checking for your password if it’s unique, right?

If the term Credential Stuffing is new to anyone, we’ve done a deep dive into what it is and the tools that are used here: https://breachinsider.com/blog/2017/credential-stuffing-how-...

We saw this pretty regularly at my old job, with attacks almost daily. They range from ‘script kiddie’ who just use the default tool settings and do it all from one IP making it easy to spot, to persistent attackers who would play cat and mouse with our live defences. They’d switch IPs using huge proxy lists found online every few minutes, as well as learn our alerting thresholds and attempt to fly just under the radar. For some reason though, they always seems to user UserAgents that were ancient, or weren’t real, allowing us to identify attack traffic compared to our normal user activity.

UAs are often hardcoded into compiled malware binaries that get shared/leaked amongst actors and groups. Latter users dont have access to the source so at best all they can do is dick around with hex editors and maybe change a character or two instead of the whole string.

Did you try to find attackers in the set of unconspicious UAs? If you did not try hard to look for more skilled adversaries, expect some to be hiding from your analysis. Once you don't see anything in a large range of skill/sophistication, you can assume there to be no adversaries that don't have the ability to pull a Stuxnet off. And if you need to guard against those, and have the ressources to do so, you already know this.

Agreed, based on other thresholds and alerts, we certainly saw some more advanced actors - using in-country home broadband lines to conduct the attacks. This made tracking and blocking them much harder, as there was a risk of blocking genuine customers who simply didn’t conform to our idea of ‘normal’. We ended up finding another way to fingerprint them, but thank you for calling that out, as you are entirely right that there is almost always someone trying to be truly covert.

If anyone is suffering with these types of attacks (or isn’t and you think you’re missing something) feel free to reach out, more than happy to help - email is in my profile

I hope you have more than one distinct way to identify these more sophisticated attacks, as you would want to be able to ensure there are no others that are only a few steps better than them.

As said, you need to vet a range of sophistication above the most sophisticated example you actually encountered, to assume there are no others that you could reasonably detect with the techniques you could deploy. Always make sure to know you'd see anyone who is only one level better than the best you encountered, where the size of such a level should be estimated from the density you see in the distribution of attacks.

You are also good if you don't automate defense with the best detection you have, so that you prevent an attacker from automatically judging the quality of your detection capabilities with you then believing the attacker got stopped when he just deployed a technique you can no longer see.

I.e., make sure you don't alert an attacker that you can still see him when you are just barely still able to do so, as you would not want him to up his camouflage to the point where you won't see him anymore.

Is anyone else annoyed by the native advertising for 1Password there, without any disclosures that they are affiliate links?

I've lost pretty much all of my respect for Troy Hunt as he went from maintaining a useful service to just being another ad for 1Password.

His website has always read as thinly-veiled content marketing for Cloudflare and Azure. If you can look past that at the actual informational content, and disregard the specific products mentioned, it's worth reading.

Not at all annoyed about that. It is the principle he pushes and I have no concern at all that this one password manager he explicitly names is one that he gets some benefit from. That site doesn't run for free, you know. If you read info on his site, you should have enough sense to think to look for free alternatives if you wish.

>the rapid rise of the rapid rise of credential stuffing attacks

Maybe I was wrong but I always thought that was the "point"...

At least in the sense that as far as profitability goes the point of hacking or gaining access to a list of hacked passwords from say a boring site like some image sharing site was that you then take that and use it to do more nefarious things like access banking stuff, more sensitive identity related things, spying, etc.

Obviously there are folks out there hacking away for their own enlightenment or fun, but ultimately anyone looking to do more than that, I always thought the point was credential stuffing all along, otherwise who cares what someone's Flickr username and password is?

Where can I download the list? I want to see what password was shared.

He loaded them into this site to check: https://haveibeenpwned.com/

I'm not sure troy shares the lists - for obvious reasons.

It wouldn't be a hack if it wasn't available. I can run through the hashes. I just want the list without having to jump through a bunch of hoops and pay someone to get it.

His site is basically one big advertisement for 1password now. I would not trust it.

What do you distrust?

Do you believe he's lying about the existence of certain breaches? Returning false results for whether a password is compromised? Be specific: what untrustworthy things do you suspect him of?

Since his website is now an advertisement, it's wise to take everything he says with skepticism.. is he pushing the product, or is he providing good advice? What's his motivation for helping users when his main focus is on pushing ads?

As others have pointed out in response to me, he's incredibly dishonest in claiming that his website is funded by him on one page, but clearly he's being paid by a company selling something. He injects ads into email notifications without identifying them as such.

If someone is acting untrustworthy 50% of the time, would you still trust them 100% of the time?

So, what untrustworthy things do you suspect him of?

FUD is not useful.

I'm not sure why you're being downvoted when you're exactly right. I have lost a lot of respect for Troy Hunt when he pretty much turned his blog and HIBP into a native advertisement for 1Password; without any disclosure that he is being paid by 1Password.

I'm looking at: https://haveibeenpwned.com/

I see a link below the search box, which when I click explains he has "partnered" with 1Password, why, and why he liked it prior to the partnership. It also links to this:


which has a lot more detail.

That's not what I call "without any disclosure". And makes me wonder what your idea of "disclosure" would be.

"Without any disclosure" is clearly false, but it would be nice if the About page didn't still say "[s]hort of the odd donation, all costs for building, running and keeping the service currently come directly out of my own pocket."

Personally, I don't like that it starts with the reasons to not take money ("I've had many offers to sponsor HIBP, to monetarily reward me for product placement and indeed to buy the service outright. I've rejected every single one of them because I didn't want my motives to be questioned"), then it is a _long_ time before he explicitly says he is taking money ("Clearly, this is a commercial relationship - 1Password pays to get their product in front of people via HIBP").

If you received one of the breach emails, there is a 1Password ad in the middle with no obvious disclosure that it's an ad / affiliate link.


Worth noting that the same emails also solicit donations, and there is no disclosure on the donations page:

> If you loved this free service and want to know what goes into making it possible, have a read of the donations page. Buy me a coffee or a beer or just some time with the kids at a movie.


Does IIIm mean 111m or 3m (roman number)?

Where do you see "IIIm"?

In the title right here on HN, it says 111M. Guess the HN font doesn't clearly distinguish between 1 and I clearly enough.

It's worse on the site, they use https://fonts.google.com/specimen/Vollkorn

what about an email-as-login authentication? click url link on your phone to log in.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact