Hacker News new | past | comments | ask | show | jobs | submit | byronvickers's comments login

Thanks for this - I need to solve exactly this problem on a toy project I've been working on; you've saved me the time to hunt down the appropriate algorithm!


I hope it helps, but if you are just looking for text strings, you might want to benchmark your local regex libraries first. Among other optimizations, they might use that algorithm under the hood for long (foo|bar|baz|...) expressions.


Weird to see this show up on HN - I was just looking at this page a couple of days ago.

I have a little puzzle I've been trying to solve in my spare time; maybe someone here can point me in the right direction.

The puzzle is: given a float from Math.random(), suppose you know only whether it is greater than it is then 0.5 (i.e you only see the result of a coin flip which depends on Math.random()). What is a practical method to reverse out the state of the xorshift+ generator, given multiple such successive observations of the output?

Any input appreciated!


In V8 engine, Math.random is implemented using xorshift128, which is a completely linear PRNG with 128 bits of state. Since you can recover 1 bit from each output, 128 outputs should be enoguh for you to setup a GF(2) linear system and solve for the state. Once you have the state, all you need is to simulate how V8 output future random values from state.

There are a series of challenges called "fastrology" from PlaidCTF 2023, which is about predicting Math.random() given some partial output (e.g. Math.floor(Math.random() * k)) with several variants. You can try to find writeups for that challenges, those solvers should be easy to adapt to predicting Math.random()>0.5 .


I've been studying this myself just a few days ago!

tl;dr For a "Math.random() > 0.5" function, you need ~128 successive outputs, each of which leaks exactly 1 bit of the state at that particular iteration, and then you "just" solve a system of linear equations over gf(2) to recover the full state for any given iteration.

You can use z3 to solve it automagically, or you can write your own gaussian elimination solver (which works out much faster than z3 in practice).

Math isn't exactly my strong suit, so it took me quite a bit of bashing my head against the wall to make sense of things and turn it into code, so I've been planning on writing a blog post that explains this all in-depth (or at least, as much depth as I understand it).

For Chrome, there's an added catch in that it generates random numbers in batches, and returns those batches in reverse-order, so you'd need to do a small brute-force to identify where the batch boundaries are in your output sequence.

For Firefox and Safari, there's an added catch in that they add together the two halves of the state before returning it. This can also be worked around with a small brute force.


It's fun it's exactly 1:1 and you don't need any more output than the state!

It's sort of sad it isn't all AES-CTR or ChaCha12. Fast enough not to be the bottleneck in your webpage, and if you find any hint of a pattern in the output, good news, you get to publish a paper on it!


If you need secure random values you should be using https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getR.... Absolutely do not use `Math.random` for anything requiring security.


Sorry, to clarify, I know CSPRNG APIs are exposed and what you use for safe randomness.

What I'm saying is that outside rare, extreme situations--maybe some HPC Monte Carlo simulation where you need gobs of randomness--you might as well use strong primitives for all your randomness, because they work well and are now plenty cheap for the use case with hardware support, etc.

The games to find a fast non-cryptographic function that passes enough statistical tests, etc., don't make as much sense to me when cryptographic randomness costs a few cycles a byte. There is not much upside to every standard library exposing a bad random generator that we have to warn people not to use. The handful of people that really need xorshift128+ can figure something out.

Maybe still a position you disagree with, but at least clearer without the sarcastic gloss.


The intention is that web.crypto is used when reversing should be difficult.

The goal for Math.random is to have much lower memory and speed impact. You can certainly disagree with their priorities but given those priorities, it makes sense to go with a much simpler algorithm that can be reversed easily.


Sorry, I know you should be using a CSPRNG API for cryptography, and probably my glib phrasing there obscured that.

I'm saying the cost of running actual cryptographic primitives has dropped a ton over time: on computers from decades ago a cheaper flavor of pseudorandomness was clearly necessary, now hardware AES is very cheap. And webpages aren't typically massive doing HPC simulations or other things that will be bound by the PRNG taking a few cycles per byte.

So the memory/CPU benefit of keeping the bad PRNG around is not obviously still worth it to me. In your words, I think I disagree with their priorities, particularly because the cost savings are not what they used to be.


This is absolutely my experience as well. I used to encounter this regularly in a professional context and ended up just replying with a confirmation that "option A" was what he'd agreed to. I was never mistaken.


And it's OUR fault these people are allowed to continue existing unimpeded in society.

Just so we're all on the same page.


I'm very much a lurker in online spaces (this is the first comment I've ever made in HN!) but I've found the atmosphere on tiles very appealing.

I'd love an invite if you'd be willing to share. You can contact me at <my username> at gmail.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: