
Scrypt is Maximally Memory-Hard - cperciva
http://eprint.iacr.org/2016/989
======
loup-vaillant
Wait a minute, there's a time memory tradeoff attack for scrypt. Argon2 was
developed precisely to remedy that problem. And now there's a proof Scrypt is
the best anyone can do? This would mean a similar attack is possible on Argon2
as well, even if we haven't found it yet.

This sounds like a _significant_ result, so I'm a bit skeptical right now.
Perhaps the paper doesn't mean what I think it means?

~~~
cperciva
Time-memory tradeoffs don't reduce the cost, which is determined by the time-
area product (or technically the integral of area over time).

~~~
loup-vaillant
After you perform the tradeoff, you can still use ASIC to reduce your costs.
We regular users only have an x86_64 box. Ideally we want ASIC to be just as
expensive.

And by the way I'm not sure time memory tradeoffs doesn't reduce the time area
product some don't, but I bet some do (which is why they're called "attacks").

That said, my knowledge is quite sketchy here. Any recommended reading?

~~~
loup-vaillant
Perhaps I was not clear. While memory cost about the same in all settings,
processing power does not. Whatever part of an attack can be ASICified will be
cheaper than it would have been if executed on stock hardware.

Time memory tradeoffs, even if they don't reduce the time-area product, _do_
increase the amount of silicon that can be ASICified. The additional
computations might very well cost less than the equivalent memory lookups.

I'd like to close the gap between stock hardware and ASIC. This paper's
abstract suggest that we cannot. That's a bummer.

~~~
loup-vaillant
Ah, found it. I just had to read the damn paper. Turns out I was confusing
memory _hard_ functions, which allow for time/memory tradeoffs, and memory
_bound_ functions, which do not.

Scrypt is hard, but not bound. I believe Argon2d attempts to be bound as well.
Argon2i however might just be hard, at least in theory, because its memory
accesses do not depend on the secrets, making them predictable.

------
zaroth
The point of scrypt is to utilize the commodity hardware to the max. Commodity
hardware gives you a lot of memory IO for the buck. You can't spend the same
dollars and get 100x more IO on an ASIC. But you can spend the same dollars
and get 100x (actually even more) faster at typical hashes like SHA256.

The point is to cut down on an attacker's possible advantage by using
specialized hardware. Scrypt uses memory IO so that you can "get your money's
worth" by running it for 100ms on an Intel CPU.

This is old-school thinking and a defensive play at trying to prevent offline
attacks. The idea is impose a run-time cost on each attempt to verify a
password. Defender spends it every time they try to log someone in. Attacker
spends it for every guess at trying to crack a password.

The new approach (well I'm quite biased since I founded a company that does
this) is to add a one-time cost which is unbounded and does not impose run-
time latency.

We do this with a large amount of random data. Many TBs or perhaps as much as
a PB. The salted hashes are entangled with the data pool with a simple API, in
such a way that an attacker would need to steal the entire data pool to run an
offline attack. Stealing only part of the data pool does not let you crack any
passwords, even if they are trivially simple.

The bigger the data pool, the easier it is to defend, and the faster the
system runs and more hashes/second it supports. So, is essence it's high
performance scalable security that actually stops offline attacks even after
the site is breached.

You use this new approach as an additive layer on top of the existing scrypt
hashing that you are already doing.

~~~
rraval
I'm intrigued.

Do you have a link or something that I can read more about the "new approach"
as you say?

~~~
danbruc
[https://taplink.co/technology/](https://taplink.co/technology/)

~~~
valarauca1
This is snake oil. You are just doing 4 additional HMAC salts and pretending
the data is more secure. This doesn't give you any additional security.

    
    
         H1 = HMAC(Salt1, Password)
         H2 = HMAC(Salt2, H1)
         ---send to Company---
         H3 = HMAC(Salt2, H2)
         H4 = HMAC(Salt3, H3)
         ---send back to customer---
         Compare H4 with known value
    

Ref to technical white paper: [https://taplink.co/wp-
content/uploads/2016/10/TapLink_Blind_...](https://taplink.co/wp-
content/uploads/2016/10/TapLink_Blind_Hashing_Whitepaper.pdf) (I'm treating
the AppID as a Hash, as I'm ignore the look up stage for the _massive_ salt).

Effectively your process can be described as:

    
    
         Hash = HMAC( Salt3, HMAC( Salt2, HMAC( Salt2, HMAC( Salt1, Password)))) 
    

But with network transmission?!?!

Multiple salts != more security

[https://stackoverflow.com/questions/12753062/multiple-
salts-...](https://stackoverflow.com/questions/12753062/multiple-salts-to-
protect-passwords)

[https://programmers.stackexchange.com/questions/115406/is-
it...](https://programmers.stackexchange.com/questions/115406/is-it-more-
secure-to-hash-a-password-multiple-times)

Also the step of shipping the final hash back to the customer is SO dangerous.
TLS/SSH is _secure_ but one miss-configuration/bug/0-day and you leak dozens
of credentials.

This is a really stupid model. Furthermore salts larger then the final output
side offer no additional security. If you have 256bits of output, you only
need 256nits of input. Larger inputs technically risk reducing the entropy of
the final output.

This whole security model assumes taplink.co will never get MITM. This can't
be guaranteed. This also means your whole system goes down when taplink.co is
out.

~~~
zaroth
That's a lot to respond to at once. :-)

I will say it's important to understand _how_ the Salt we return is generated.
We use a massive pool of data as effectively the internal state of a hashing
function. This means you need all the data in order to perform the same
calculation.

The value sent to us is just the hash of a salted hash. By itself it is
useless. The value we return is Salt2 -- also useless on its own. This is very
purposeful. If the TLS channel is owned, nothing of value is lost. The Salt1
is still private on the site's server, and no passwords can be cracked without
it.

Availability is a legitimate concern. Similar to SMS multifactor, this is an
added 3rd party dependency. We direct peer to many of our customers and route
over private IP space to avoid a potential DDoS.

~~~
valarauca1

          I will say it's important to understand how the Salt we return is generated. 
    

Not in the slightest. It is the HASH of a HASH of a HASH. There is no academic
rigor to support this gives you any additional security. And a lot of academic
rigor that states it is a moot operation.

    
    
          The Salt1 is still private on the site's server, and no passwords can be cracked without it.
    

1) If you argue any SALT is private... Then WHY EVEN USE SALTS?!?! If
something on the database is private/secure why not use clear text passwords?
The whole idea of using SALT+HMAC is that NOTHING stored is private/secure.
You cannot EVER assume SALT(s) are private.

2) Cracking the password is irrelevant. All that is needed is:

    
    
          SALT2 + Web transferred hash
    

Then the credentials are gained. Yes learning somebody's password, or several
million's peoples password is a fun experiment but gaining access to the
website is also an attack vector, one your system is making easier.

For a traditional HMAC/Scrypt/Bcrypt recovering the password is a critical
part of this as the authentication mechanism is a single black box program.
Input is SALT+Password+Hash. Output is an auth token.

The system you describes breaks this dichotomy. No longer must the password be
broken to gain access.

:.:.:

    
    
           Availability is a legitimate concern
    

An extreme one

    
    
           Similar to SMS multifactor
    

SMS Multi-factor is depreciated. Faking SMS from an arbitrary phone number is
extremely trivial in practice. Look into Dave Kennedy's work on social
engineering.

Multi-Factor auth is best done via Tokens or TOTP.

    
    
           We direct peer to many of our customers and route over private IP space to avoid a potential DDoS.
    

So instead you want to just install your own box on the customers network?
This opens up even more headaches about managing and authenticating your
access to their private LAN. Then you get into patching agreements, possible
OS limitations, security audits. This is a horrible solution for both parties.

~~~
zaroth
I'm sorry if I'm not explaining it right, there's definitely something lost in
transmission here.

We are keying the hash (HMAC), keyed by a value derived from the data pool.
Please don't take the 4 line pseudo-code too literally.

The value stored in the database (Hash2) can only be used to verify a password
if you can also complete the blind hash which is blinded by the data pool.

I can say, this approach has been vetted by both well known cryptographers
(Solar Designer, Scoob) as Passwords^15 as well as industry crypto wonks at
MITRE and elsewhere. It's certainly not snake oil.

The salt kept on the site's server means that the site must be breached in
order for a successful attack to be mounted. It's why we call this a fully
additive layer of security. You need the Salt from the site, and the entire
data pool from us, to mount an offline attack.

2) Why would you say that? It's absolutely not the case that you can login
with Hash1 and Salt2.

If you intercept Hash1 and Salt2 then you may know Hash2 but you still cannot
login, and you cannot try to crack it without Salt1. Again, this is all
assuming TLS is broken in which case you can just inject your own JavaScript
onto the page and steal passwords in clear text.

By direct peering I mean programs such as AWS DirectConnect which gives us
10Gbps on their network and private IP access to our peers. Nothing to get too
excited about.

I'll be back online in about an hour if this still doesn't answer the basic
questions around the security of the construct.

EDIT: I do not mean to say solardiz or sc00bz have personally endorsed our
product. Only that we have all worked on and published write ups on the same
general approach (using large data pools with bounded network links) to solve
the password security problem.

~~~
aruss
While security does not rely on simplicity, a simple system is much easier to
reason about security. What you add is a whole stack of complexity through
multiple hashes, entropy generation (on your end), and network transport
(TLS).

There's no way you could convince me to use your system over a basic KDF
implementation. The only people you're going to convince to use this protocol
is someone who doesn't have experience in the field, which is why I'd consider
your solution snake oil.

~~~
zaroth
I've talked to many cryptographers in the field. Universally they _appreciate_
the solution for its simplicity. We don't use any new crypto - the whole
construct is based on a CS-PRNG, hashing and HMAC.

You talk to this service just like you talk to any service over the LAN or
WAN. Through an encrypted channel. Goes are the days when you can just
dispatch a request over the LAN and assume you're good. We are happy to setup
dedicated machines with spiped as we are using TLS.

But certainly you look at solutions like CloudFlare or even dare I say
Firebase, and the industry has moved far beyond your level of paranoia.

I don't want you to use it instead of your basic KDF, but in addition to /
after your KDF.

13 million Americans had over $15 Billion stolen last year in cyber-heists and
almost 70% of those attacks were using a stolen password. The basic KDF is not
working, and it's time to stop blaming the user for not having 69 bits of
entropy on their password and start giving companies the tech they need to
actually secure their passwords.

------
murbard2
It's worth noting that an issue with scrypt for proof-of-work cryptocurrencies
is that is is expensive to check. The ideal problem is memory hard to solve
but can be checked cheaply. This is what equihash (used by zcash,
([https://www.internetsociety.org/sites/default/files/blogs-
me...](https://www.internetsociety.org/sites/default/files/blogs-
media/equihash-asymmetric-proof-of-work-based-generalized-birthday-
problem.pdf)) attempts to do, though proving the memory hardness seems more
harduous than in the case of scrypt.

~~~
zaroth
Is this really a major pain point? You have to run billions of units of the
work to find the proof. You run exactly one unit of the work to validate the
proof.

The units can be set somewhat arbitrarily. Although it is important for the
unit to be memory-hard-enough to not accelerate well on a GPU.

I would think that's a rather wide "target" which can be hit by scrypt at a
certain difficulty.

Needless to say, it's still possible to get it wrong, e.g. Litecoin.

~~~
drostie
There's something to be said for both. It's true that there is a scaling
intrinsic to the Bitcoin protocol (which it had to have, as it was based on
fast functions) to allow arbitrary slowdown at the brute-force level. However
ideally it would be nice to have an asymmetry there.

Just think of it purely in the manner of "why run exactly one unit of the
work? why not run a hundredth of that unit to validate the proof?" An
asymmetry allows you to more quickly verify the ledger, which is both a
startup cost and a lesser ongoing cost for everyone.

I'm wondering however if there is a connection with asymmetric encryption,
though. Imagine a one-way function h and two functions f(x), g(x) such that
f(g(x)) = x. Suppose the task is "find M such that g(h(M)) lies within some
narrow subset," then if g is much slower than f, you can verify a pair (M,
g_h_M) not by verifying that g(h(M)) = M (slow) but rather by verifying that
f(g_h_M) = h(M) (much faster). The importance of h is just that for the most
obvious application (f(x) = x^E modulo N, g(x) = x^D modulo N), you also have
that g(f(x)) = x, and this is a relatively common property of inverse
functions.

~~~
murbard2
You get the same properties with a basic proof of work scheme. The difficulty
would be to make g memory hard and f not memory hard.

------
todd-davies
When we talk about the term 'computationally hard', we usually mean an NP-hard
problem. I assume that here, 'memory-hard' means that no other hashing
algorithm can have a greater lower bound on its memory complexity than Scrypt.
Is that correct?

Edit: After a re-read, I realised that the answer is in the text:

"Memory-hard functions (MHFs) are hash algorithms whose evaluation cost is
dominated by memory cost."

------
infruset
I was under the impression that there were scrypt ASICS. Can someone explain
how, if true, this is compatible with this claim?

~~~
panic
For some reason Dogecoin, Litecoin, and all the other scrypt-based
cryptocurrencies use extremely low N and r parameters, such that computing a
hash only needs 128K of memory.

~~~
raverbashing
Maybe using a harder hash does not incentivize people mining it, reducing its
growth and liquidity?

~~~
erik
I'm under the impression that the Litecoin creators thought that scrypt would
prevent ASIC miners, but they still wanted to be able to GPU mine, so they
used low difficulty parameters. Which of course led to ASIC miners being
implemented. Then Dogecoin just copied Litecoin directly.

~~~
cloudjacker
> thought that scrypt would prevent ASIC miners,

I'm still amazed that people believed that.

We've been pushing general purpose CPU's for decades because there is nothing
novel or hard about making single purpose processors.

I didn't realize you could just say random computing terms and people would
believe you. "This is memory hard so therefore no ASIC (single purpose
processor) could be made because memory is expensive"

I like cryptocurrency but 2012-2014 was seriously like being on crazy pills
listening to the fans.

~~~
wmf
To be fair, memory hardness is a legitimate concept with proofs and
everything; Litecoin just made a mistake in their design. ZCash is giving it a
second try, taking into account improvements that have been made over the last
few years.

------
hippich
at [https://hashcash.io/](https://hashcash.io/) I am using litecoin hashing
algo, which in turn uses scrypt.

