Hacker News new | past | comments | ask | show | jobs | submit login
When to Write a Simulator (2021) (sirupsen.com)
52 points by todsacerdoti 10 months ago | hide | past | favorite | 18 comments



Monte Carlo Simulations have been a fantastic way for m3 to evaluate students' assignments in my AI course. Since some of the traditional AI algorithms rely on some degree of random, I run the simulation, check if it was able to achieve some threshold, and increment whether their agents were able to successfully complete the assignment. If they were able to complete the task 70% of the time, they receive credit for that test case.

At the start of the semester, its just this weird method for their test cases, but once we hit our Uncertainty section, I bring up Monte Carlo, show them the Monty Hall scene from Brooklyn 99 [1], and explain how it can provide students with approximations.

If possible, I recommend including a GUI to visualize a single trial of the simulation. My students tell me it helps them "see" how their agents are behaving. Plus, Computer Science does not have many tangible examples you can show the general public (aka, something they can show parents or potential employers. The GUI helps bridge that gap of "here's something anyone can see and understand".

[1] https://www.youtube.com/watch?v=AD6eJlbFa2I


> “How many times do you have to roll an n-sided die to ensure you’ve seen each side at least m times?”

This is an extension of the coupon collector's problem (which addresses the m=1 case). Per the Wikipedia page:

> Donald J. Newman and Lawrence Shepp gave a generalization of the coupon collector's problem when m copies of each coupon need to be collected. Let T_m be the first time m copies of each coupon are collected. They showed that the expectation in this case satisfies:

> E[T_m] = n log n + (m-1) * n * log log n + O(n), as n -> infinity

https://en.wikipedia.org/wiki/Coupon_collector%27s_problem#E...

edit: If you're familiar with that problem, then you can use the tail estimate for 1 copy per shard with 99.99% probability for 128 shards (per the linked gist) to arrive at ~1800 -- it's a matter of computing beta such that n^{-beta + 1} = 0.01%, so beta ~ 2.9; I then calculated beta * n log n to arrive at 1801.

Given the scaling of the expectation above, you can observe the difference between "get at least 1" and "get at least 2 of each" is fairly small in the grand scheme of things. And, incidentally, the second term 128 * log(log(128)) is just about 200 (for a total of about 2000), so not too far off from the simulation result in the blog post.


When you look at it the right way, the solution to the Monty Hall problem is actually straight forward and doesn't require a simulation.

If you choose the "no switching" strategy, you're probability of winning is 1/3. On the other hand, if you use the "switching" strategy, you win if you pick a goat at first and you loose if you pick the car at first. Thus, with the "switching" strategy, your probability of winning is 2/3.

So once you recognize that with the "switching" strategy you win if and only if you start on a goat, the only thing you're really doing with a simulation is to contribute to the carbon footprint of the world.

Without that realization, a simulation can be helpful, of course, since finding the right way to look at a problem is often so hard.


Everything is straight forward when you look at it through the right abstractions. That's the point of abstraction.

Simulation can be useful when you don't know what the right abstraction is, which is the problem most people have with Monty Hall -- this is the point I got from TFA.

I disagree with the TFA that you need "a PhD in queueing theory" to know what the right abstractions are, but it's still beyond most people because we are still woefully statistically illiterate as a species.

I could probably also solve that shard replica problem analytically -- or at least recursively (it sounds like a variant of coupon collection) -- but I'd still want to confirm my calculations by writing a simulation.


The tricky part to explain about the Monty Hall problem is that there is extra information after the host opens the door.

Because you picked a door, the choice the host had of which door to open is sometimes constrained and that gives you extra information as well.

Everybody tries to do it solely with probabilities, which is fine, but it becomes far more intuitive when you point out the extra information.


The Monty Hall problem is confusing because the three-door case is the only time that the host opens just one door. It looks like a coin flip. If you ask the question with 100 doors, where the host goes and opens 98 other doors, the answer is obvious!


There are other secondary advantage of simulation too. The big advantage computers have is ... doing lots of computations (who knew?). Once a simulation is involved, suddenly Monte Carlo methods can be employed to take advantage of that. A bunch of mathematically intractable optimisation problems are, in practice, easily solved by guess-check-refine strategies like evolutionary algorithms. Having been involved in optimisation problems that involved the SIMPLEX algorithm; as a mere mortal I'd much, much rather be debugging simulations than pages of awkward linear equations. In most cases I argue accuracy is worth more than precision.

Plus, simulations demo well.


It would be interesting to study whether programmers reading a simulation of monty-hall would predict the same outcomes from the english-language question.

My view of many of these "non-intuitive" probability questions is the terrible phrasing favoured by the sorts who pose these problems. I'm fundamentally sceptical that animals have, in general, a "poor" understanding of probability.

I think it vastly more likely that natural language is a highly lossy communication system ill-fit for posing abstract problems that are outside of our ecological intuitions. It isn't probability at issue, but correctly recovering exactly what problem we're supposed to be considering.

Writing a program forces all assumptions into the open, and I'd bet you'd see far fewer mistakes -- yet our so-called intuitions about probability would be the same.


> I'm fundamentally sceptical that animals have, in general, a "poor" understanding of probability.

I take the opposite view. Casinos with all of their negative-expected-value (for players) games would not exist, nor would many lotteries, if not for the allure of "oh, but if I win big..." overpowering any rational decision-making.

Monty Hall is not challenging as a result of information loss due to translating to natural language. Reading the simulation code is like telling ChatGPT to think about the problem step-by-step: if you picked the door with the car, the host opens one of the goat doors arbitrarily; if you pick a door with a goat, the host opens the other goat door.

Further, if you flip the sequence of events, such that the host opens a door containing a goat, and then you pick, you end up with a different probability of success entirely -- 50%, instead of 66.7% (always switch) or 33.3% (never switch)!

This played out in the actual game show in real time, where people could observe the actual events directly. Per Monty Hall himself:

> He said he was not surprised at the experts' insistence that the probability was 1 out of 2. "That's the same assumption contestants would make on the show after I showed them there was nothing behind one door," he said. "They'd think the odds on their door had now gone up to 1 in 2, so they hated to give up the door no matter how much money I offered."

https://en.wikipedia.org/wiki/Monty_Hall_problem#History


I'm not sure you've said anything that disagrees with me. My claim is that the information in the environment that people are sensitive to causes them to fail to accurately model the scenario. The problem they are solving, in their heads, fails to correspond to reality. My claim is: if it did, then they would solve it correctly.

This is often phrased, typically by self-important mathematicians, that people's understanding of probability is wrong. This is a very bizarre view, and rather self-aggrandizing as far as probability goes.

Probability isn't understanding or formulating the problem. Probability is merely the machinery, very basic, which runs upon a formulation.

Do this instead: run one round of a game with 10 doors, and have people observe Monty opening all the relevant doors.

Why should it be that if people observe this it becomes obvious, and yet if put in natural language it isn't?

Natural language applied to non-intutive ecologies (ie., environments outside those we usually operate in) cause people to fail to model the sitaution correctly.

Take casinos: when animals go out foraging they are playing a slot machine. Their addiction to foraging is perfectly ecologically rational. Casions exploit an information asymmetry: people cannot see the long-run house advantage.

This has nothing to do with people's "probability" skills, and just their ability to accurately understand the sitatuon.

The underlying foolishness behind those who equivocate between "probability" and "modelling scenarios" is the assumption that reality itself has probabilities, and that scenarios are easy to model.

It is incredibly difficult to come up with an accurate cognitive model of wtf is going on in a monty hall game. So much so that famous mathematicians, with arbitrarily good probability skills, get the answer wrong. Teaching people "probability" here is beside the point. Animals are acutely good at it when they are presented with an ecologically intutive problem, of arbitary probablistic complexity.


Ah. I had focused on the lossiness of natural language much more than the rest of your statement.

> Why should it be that if people observe this it becomes obvious, and yet if put in natural language it isn't?

It's not clear to me that the general public would come to that conclusion, as opposed to the obvious extension of Hall's observation where contestants thought their odds having gone up upon being shown additional information.

To you and me, yes, it might be even more obvious because we're familiar with the core principles.

> Take casinos: when animals go out foraging they are playing a slot machine. Their addiction to foraging is perfectly ecologically rational.

I'm inclined to disagree here. Surely animals don't just engage on a random walk; they know where they've found berries before, or can smell prey from a distance. And describing it as an addiction feels wrong. Once the animals are full, they don't keep hunting for food at the expense of other life-sustaining activities.

> Casinos exploit an information asymmetry: people cannot see the long-run house advantage.

Sure. I have also encountered hubris in the form of people who are aware of the house's edge but are confident they can do better than everyone else. There are also betting strategies that have positive expected value (e.g. lose a coin flip? double your bet; win a coin flip? reset to $1) but fail to take into consideration that you have probability = 1 of going bankrupt in the long run, since you don't have infinite money.

> The underlying foolishness behind those who equivocate between "probability" and "modelling scenarios" is the assumption that reality itself has probabilities, and that scenarios are easy to model.

In many cases, we use probabilities to approximate unknowns. Efficacy of a vaccine? Prevents infection X% of the time, or severe illness Y% of the time. Why not 100%? We don't necessarily know (or maybe it's due to a bunch of complex factors). But we've observed these outcomes for these samples of our population, and reducing it to a single number is "good enough".


Here's another way of phrasing my point. I expect to be able to find an ecologically intuitive example, for any given "non-intuitive" probability problem, where the underlying reasoning/model/etc. is the same, and people will solve it exactly correctly.

Try this:

You're at a dinner with two others and you know the host has poisoned the food for you and your friend. He let's you choose your plate first. He then hands another plate to your friend. Do you swap your plate with the one on the table?

I think in this case it "feels" more obvious that the plate on the table, being his, is much less likely to be poisoned since he will necessarily have given a poisoned one to your friend. This makes the necessity in the hosts action more intuitive. I think a lot of people here would swap, even if they couldnt fully explain why.. they know the hosts knows something and this affects what you took and what they were given.

I'm sure a much better example could be created which is even more obvious.


This sounds like a more complicated (and simultaneously under-analyzed) version of the Princess Bride iocaine powder scene.

Assigning probabilities doesn't really make any sense in this context, since you're relying on the trustworthiness of your information as well as the assumption that the game is fair and/or winnable. Besides that, I would expect most people to make their decisions based on their interactions with the host (body language, tone, etc), although there's potential for trying to analyze psychology of the host.


Yes, indeed. And I think exactly the same problems are at work in understanding any scenario. We don't turn off our scepticism, or all the various heuristics that keep us alive, just because a mathematician comes along with a powerpoint slide.


> Casinos with all of their negative-expected-value (for players) games would not exist, nor would many lotteries, if not for the allure of "oh, but if I win big..." overpowering any rational decision-making.

This seems like a separate problem. According to my best knowledge, most people going to casinos know full well that the odds are not in their favour – i.e. they do understand the probabilities/expectations involved – they just don't care because it still entertains them.[1]

Not caring and not understanding are two very different things!

[1]: See e.g. A World of Chance, by Brenner, Brenner, and Brown.


> I'm fundamentally sceptical that animals have, in general, a "poor" understanding of probability.

I agree with this skepticism, and I think it boils more down to insufficient statistical literacy. Statistical statements (what things are actually like) often stand in contrast to logical reasoning (what things ought to be like) – popularly observed in quips such as "in theory there is no difference between theory and practice."

This difference is uncomfortable and so many people (especially those that are better educated and well versed in logical reasoning – e.g. teachers!) take refuge in believing the logical (the X ought to) over the statistical (the huh so apparently not X), and thus pass on a bad understanding of statistics to the next generation.

To be clear, nothing about this is surprising because statistics is an incredibly young field and in a few hundred years I'm sure we'll all be more statistically literate and comfortable with such reasoning. But we're not there yet. And I don't think this has anything to do with human limitations but just a shift in mentality that needs to take place across a broad fraction of the population to really take hold.


I'd phrase it as a difference between decision theory and statistics. In almost all cases we're formulating advice, not predictions.

In the monty-hall case, we're rarely presented with a person who is going to help us in those circumstances.. so we are motivated to reason to an understanding of the situation in which that help is absent. ie., a better model of the scenario is harder to arrive at, because it involves making ecologically irrational assumptions on behalf of the motivations of others in a context (ie., game show) where suspicion is generally required.

It's a perennial annoyance of mine that the weirdest people with the least fit social instincts (relevant academics) go around inveighing on supposed 'ordinary people' prescribing, most often, patterns of thinking that would get everyone killed.

What the practice of formulating problems shows is that mathematicians (et al.) are extremely bad at phrasing problems to be intuitive. Indeed, I think they deliberately do the opposite, wishing to 'expose' the idiocy of others and their own superiority.

Physicists do not practice physics this way. It would be insane to deliver a lecture where clear physical intuitions can be built, by instead choosing cases where they break down and saying, "ah! so you need the mathematics!". Physicists would, certainly, hold such an activity in contempt.

Here we should be formulating these problems to be maximally intuitive, holding their structure constant. Then showing how the formalism helps.


"Monty Hall Problem ... After writing my simulation, however, I finally feel like I get it. "

It was the same to me. I actually set out to write the simulation to proof the conclusion is wrong (I was a bit younger and more arrogant). But already while writing it, I suddenly slowed down and thought about it in a different way. And guess what, the simulation confirmed it. But only while writing it, I understood.

And yes, simulations are fun in general.

I finally got back to it and working now a bit on a WebGPU physic simulation with simple chemistry. It is fun and GPUs are so much more powerful, if you manage to tell them to do what you want..

(shader debugging is no fun)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: