Where does your ~160,000 come from? There's a billion rows, couldn't there be a ...

londons_explore · 2024-01-04T14:38:16 1704379096

There are only ~400 place names in the generated data. 160,000 = 400*400

The state machine generation would need to know all 400 - but it's easy enough to scan the first few hundred thousand rows to find them all, and have a fallback incase a name is seen that has never been seen before. (the fallback would be done by having the state machine jump to an 'invalid' state, and then at the end you check that that invalid state's counter is zero, and if it isn't, you redo everything with slow code).

tzs · 2024-01-04T16:06:00 1704384360

> There are only ~400 place names in the generated data

The README says the maximum number of unique place names is 10 000, so you should probably design for the fast case to handle that many instead of just going by what is in the database that it happens to be measuring with at the moment.

londons_explore · 2024-01-04T14:42:38 1704379358

This code is only fast in the 'happy' case (ie. ~400 unique places, with no odd distributions, temperatures with a std dev of 10 from a mean between 0 and 40). It is still correct in the 'unusual' cases, but would be slow because it would revert to fallback code.