
Introducing Guesstimate, a Spreadsheet for Things That Aren’t Certain - freefrancisco
https://medium.com/guesstimate-blog/introducing-guesstimate-a-spreadsheet-for-things-that-aren-t-certain-2fa54aa9340#.um4q5txph
======
ozgooen
I wasn't expecting this to go on hnews yet, but happy to take any questions!

~~~
cpitman
Any plans to add different probability distributions? I do this kind of
analysis by hand for things like project timelines and cost estimates, but I
model each variable with a double triangular distribution
([http://www.mhnederlof.nl/doubletriangular.html](http://www.mhnederlof.nl/doubletriangular.html)).
You provide a best case, worst case, and then the most likely case. The
distribution approximates the long tails on many kinds of distributions.

~~~
ozgooen
Definitely. I haven't yet had a request to do a double triangular
distribution, that looks really interesting.

It will be a bit of a challenge to make it intuitive, but I'm sure I'll figure
out something.

Just so you know, right now there is a second distribution, uniform. You can
access it by clicking the 'normal' icon in a normal distribution, where a
short list is shown below.

~~~
masonium
I just made an issue on github, but the uniform distribution looks wonky at
the right endpoint.

~~~
ozgooen
Much of that is because the react histogram library I used. There are a bunch
of issues with it. I definitely intend to replace it with something much
better. If any of you recommend tools or libraries for that, I'd be very
appreciative.

------
aj7
I would be much more interested if this product were "choose probability
density function centric." Then, the Monte Carlo engine would gain much more
interest. Being able to choose or specify arbitrary distributions, and then
run simulations, would be valuable.

Of special interest are non-continuous distributions. How often have normal
distribution reasoning failed in finance? Put another way, a user should be
able to model a distribution himself.

~~~
ozgooen
Very good to know. Right now you can choose between normal, uniform, and a few
very simple discreet distributions, but not others.

When I built this, my first goal was to make any distribution run quickly. At
this point I believe adding other distribution types will be quite doable,
expect them shortly.

~~~
thanatropism
Two words: beta distribution. It has finite range and an arbitrary mode, and
has the uniform as a special case.

------
hardmath123
Relevant—Uncertain<T>: A First-Order Type for Uncertain Data (Microsoft
Research)

[http://research.microsoft.com/pubs/208236/asplos077-bornholt...](http://research.microsoft.com/pubs/208236/asplos077-bornholtA.pdf)

------
imh
I'm convinced that an excel sort of lay-person's computing platform is where
probabilistic programming will really take off. This seems really cool!

------
p4bl0
It's not really related but it made me think of a friend's PhD thesis on
uncertain data. If the subject interests you, be sure to checkout the summary
of his (impressive) work:
[http://a3nm.net/blog/phd_summary.html](http://a3nm.net/blog/phd_summary.html).

------
krmmalik
I like it. I had to do a strategy session with a client a couple of weeks ago
and we needed to estimate how much the strategy was likely to cost over the
next few months the. We had quite a few variables to work with though. This
would have been handy in such a scenario I presume? We knew what are
components and the ranges were.

~~~
ozgooen
I believe it would be handy, but it depends on the size. Right now I think
it's reasonably fast and intuitive for models of around 3 to 40
metrics(variables). If you have more it could get slower, especially if many
of them have to be recalculated at once.

I suggest trying it out. If nothing else, you may be able to begin with very
simple models of the most important variables.

~~~
krmmalik
It was around 6-8 metrics so i guess it would be fine?

~~~
ozgooen
Definitely. Here's one in that range as an example.
[http://getguesstimate.com/models/163](http://getguesstimate.com/models/163)

Feel free to play with it. You can edit it, just not save it. (I recently
realized this was not obvious to most people)

------
kadder
This is very similar to the paper
[http://www.isi.edu/~szekely/contents/papers/2012/szekely2012...](http://www.isi.edu/~szekely/contents/papers/2012/szekely2012-iui.pdf)

As per the paper , you can choose arbitrary distributions , construct a fluent
graph , run Monte Carlo simulation and get the result - |via
[http://bit.ly/hnbuzz01](http://bit.ly/hnbuzz01) |

------
sundarurfriend
'Fuzzy logic' seems to be an ex-buzzphrase nowadays, but this seems pretty
close to that territory. A variable/cell/logical-unit containing not a single
value, but a distribution (often between bounds), and getting combined with
other similar variables/cells/logical-units in ways that understand and
respect the probability distributions.

Perhaps that field can provide a potential source of new names, when you
decide to market this as a company.

------
jkaptur
Very much like Crystal Ball - an Excel add-on that's popular in the finance
and energy fields.

~~~
darcyparker
And @Risk [http://www.palisade.com/risk/](http://www.palisade.com/risk/)

------
brudgers
Direct link to Github: [https://github.com/getguesstimate/guesstimate-
app](https://github.com/getguesstimate/guesstimate-app)

~~~
vive-la-liberte
Does the app include everything so it can run offline?

~~~
ozgooen
No. It needs to be online at this point.

~~~
vive-la-liberte
Thanks for the prompt reply. Are you planning to open source all of it in the
future or will it remain SaaS?

~~~
ozgooen
Right now the vast majority of it is open source. There is a component that is
not: the rails server, but that's pretty tiny. The client can be developed on
without that. If there's some interest I'm happy to make what's existing
available.

I can't make guarantees about the distant future. There's a ton of work I
would love to see happen with Guesstimate, and my guess is that much of it
would only be possible if it becomes a company. This can still mean that it
can be mostly open source, but I really have little idea what the situation
would be at that time.

~~~
vive-la-liberte
I see. Thank you for having open sourced so much of it already and good luck
with the project :)

------
evanb
I was watching "Total time spent watching this video" video, and had a basic
question.

How does one tell guesstimate that there's a hard lower bound on a quantity.
ie. Video Length is at least 0, because negative watch times are unphysical? I
know the specified distribution in this case is very narrow (the video lasting
between -1 and 0 minutes has probability ~0.000032). But the answer does come
out to be 26±32, which includes a substantial unphysical region.

And, if I give a hard lower bound on Video Length, can it propagate that
knowledge into an asymmetric error on Total time?

~~~
ozgooen
Good catch!

Right now the main distribution types are normal and uniform. In the video, I
showed normal distributions, which have long tails in both directions.

In this case, a normal distribution isn't really correct, because, as you
noted, being less than 0 is exceedingly unlikely.

I believe the correct way to deal with this is to use a lognormal distribution
or something that has 0 chance of being less than 0. I don't yet have a simple
way of doing this, but it's definitely on the agenda.

~~~
evanb
You should be careful about committing too much to particular distributions
being the "right" ones. For example, what if I know some variable is an odd
integer between 1 and 11 inclusive with no support on any other real number?

Just something to keep in mind when abstracting what a distribution is.

edit: though as a short-hand for entry, Gaussian is usually a pretty good
guess. Is there support for µ±σ instead of [low,high] in the works? Or support
for numerical distributions?

~~~
ozgooen
"Is there support for µ±σ instead of [low,high] in the works? " \- There used
to be. I'll be considering ways of adding it back.

"Or support for numerical distributions?" \- By numerical distributions do you
mean discreet distributions: like, a 40% of being '8' and a 60% chance of
being '6'? If so, the answer is no. However, if you use the ternary operator
it is possible to do very simple versions of this now. We do support totally
random picks of different numbers though, using the pickRandom([3,5,3])
function.
[http://mathjs.org/docs/reference/functions/pickRandom.html](http://mathjs.org/docs/reference/functions/pickRandom.html)

~~~
evanb
I meant more along the lines of a user-entered histogram. But that's roughly
the same as what you're talking about. It does seem that such a thing must
roughly correspond to some internal portion of Guesstimate, anyway. So for an
advanced user to punch a distribution in would be handy. May be out-of-scope
for this project? I guess really I'm looking for a way to error propagate my
home-grown datasets :)

------
jakespencer
Awesome! I use Crystal Ball
([http://www.oracle.com/us/products/applications/crystalball/o...](http://www.oracle.com/us/products/applications/crystalball/overview/index.html))
with triangular distributions and Monte Carlo for software project cost
estimation. Crystal Ball costs thousands of dollars so I will be following
this with interest.

~~~
netghost
You might also find LiquidPlanner
([http://www.liquidplanner.com/features/#why-
liquidplanner](http://www.liquidplanner.com/features/#why-liquidplanner))
interesting for software estimation. The scheduling engine built entirely
around the notion of ranged estimates driving probabilistic schedules.

------
jasonshen
I think this is super cool! We're so bad at estimating probabilities (think
Han Solo's "never tell me the odds") that this helps visualize the
distribution of outcomes

------
jeffehobbs
This is really cool. Can anyone recommend any particularly good/cogent Simple
Caveman explanations of how Bayesoan theory/Monte Carlo simulation work?

~~~
ozgooen
Honestly, my favorite resource for much of this is the book How to Measure
Anything by Douglas Hubbard. He goes into detail in understanding the value of
information and how and why to use Monte Carlo Simulations.

Video:
[https://www.youtube.com/watch?v=w4fHGTsZZD8](https://www.youtube.com/watch?v=w4fHGTsZZD8)
Book: [http://www.amazon.com/How-Measure-Anything-Intangibles-
Busin...](http://www.amazon.com/How-Measure-Anything-Intangibles-
Business/dp/1452654204)

------
Mauricio_
It seems that using montecarlo in excel is apparently pretty commonm, but I'm
not sure if it can be done without addons.

~~~
ozim
I think DeMarco and Lister did some monte carlo in spreadsheet without addons:
[http://www.systemsguild.com/riskology](http://www.systemsguild.com/riskology)

------
marcusgarvey
A "Rumsfeldian" visualizer. Neat!

[http://www.theatlantic.com/politics/archive/2014/03/rumsfeld...](http://www.theatlantic.com/politics/archive/2014/03/rumsfelds-
knowns-and-unknowns-the-intellectual-history-of-a-quip/359719/)

~~~
coldtea
I always find it strange that people found that thing Rumsfeld said dumb,
bizarre or (worse) his own invention [1].

Those are standard epistemological distinctions, known (and written about)
since at least the times of Aristotle.

[1] (I mean the philosophical essence of what he said -- not that he didn't
tried to use it an an excuse for BS).

~~~
gojomo
I also find it interesting that the aerospace/defense world (and co-
influencing cultures in engineering and government) have sometimes used 'unk-
unk' as shorthand for 'unknown unknowns'.

[http://www.waywordradio.org/unk_unk/](http://www.waywordradio.org/unk_unk/)

------
tunesmith
What fun - I did a monte carlo estimate a few years back when trying to
determine what purchases price of house my girlfriend and I could afford. It
depended on probable interest rate, how much my old house would sell for, etc.
It'd be interesting to see how simply it could be modeled in this.

~~~
ozgooen
That seems like a great fit.

A few weeks back I used the tool to help a friend decide which mortgage option
to take for his house. One house has a slightly lower APR than the other, but
was a had a higher assistance fee.

After fiddling with it, it looked like the one with the lower assistance fee
was the better option. But perhaps more important, it didn't seem like it made
a big difference; perhaps around $200 after 10 years. This was a good
indication that the choice didn't really matter; that it wasn't something to
spend over a few hours worrying about.

[http://getguesstimate.com/models/100](http://getguesstimate.com/models/100)

~~~
aj7
When refinancing houses, I always performed detailed analyses. With a good
model, you can compare every deal offered. The key input is your expected life
of the loan. You'd pay very high fees for an interest rate reduction if you
really thought you'd keep a loan 30 years. But that is not very realistic.

I found, remarkably to me at the time, that at a single particular loan life,
the APRs of a single lender all converged very accurately to a single interest
rate. This told me two things: 1. Choosing a realistic loan life was key. 2.
Many of the choices (points + fees vs. interest rate), were for most people
illusory offerings to give the illusion of choice.

------
miguelrochefort
Isn't how all software should be written? Expressions that represent a set of
all possible values, effectively replacing the need for types.

Surely, such a platform would make building an app 100 times easier. Not that
building apps is a good use of our resources.

~~~
borplk
I can kinda imagine it but can you elaborate on the part about getting rid of
types?

~~~
miguelrochefort
By replacing types with arbitrary expressions, you could go beyond just having
a list of integers. You could have a list of odd numbers, a list of words
starting with "A", a list of ages over 18, a list of prime numbers, a list of
at least 5/items, a list of all integers (infinite), etc.

You can only access index N of a list whose type expression proves that it has
at least 5/items. You can only access the fields of objects that are proven
not to be null. You can only give as input to a function that expects an odd
number a value whose type expression proves it to be an odd number. Basically,
you can never have runtime exceptions.

~~~
infinite8s
You are talking about dependent types
([https://en.wikipedia.org/wiki/Dependent_type](https://en.wikipedia.org/wiki/Dependent_type))
and only a few theorem-proving languages support them.

------
conservajerk
Interesting idea - but you could certainly do this in any spreadsheet
application with multiple cells to represent ranges etc.. I think the of
estimating probabilities issue can be considered to be more of a practices
issue than a tools issue.

------
netghost
This is a really great interface, and cool idea.

You might consider upping the run count, or maybe narrowing your bins for the
visualization. Either way, it's great to see more tools embracing probability
and uncertainty like this.

~~~
ozgooen
In practice, 5000 was basically the number that wouldn't slow it down; this
represented around 20-30% of the rendering time (react components were the
main bottleneck, surprisingly, though I could still optimize them more).

I think that this works fine for small models, which is much of what exists
now. As there are larger models, I'd eventually like to offload calculations
to AWS Lambda or something similar, so we can do far more.

------
ashish161
hey nice tool on a side note what tool did u use to create the animated
tutorials on your git page:[https://github.com/getguesstimate/guesstimate-
app](https://github.com/getguesstimate/guesstimate-app)

image link
[https://camo.githubusercontent.com/8fd97a97fa656a1eb92294f0f...](https://camo.githubusercontent.com/8fd97a97fa656a1eb92294f0fc436885b5d8dbb3/687474703a2f2f672e7265636f726469742e636f2f6c636b496670416b69412e676966)

~~~
IshKebab
I don't know what he used, but I would recommend ShareX.

------
evanb
It'd be nice to be able to show / enter n-dimensional histograms too, so that
one can get an idea of / control the correlation between two outputs / inputs.

------
desireco42
This is awesome. I am aware of the math behind, but this really made it much
simpler to use. And also to make complex models.

------
Beltiras
Cool idea but doesn't render into anything useful on my S5....

~~~
ozgooen
Thanks for letting me know.

It's definitely not mobile compatible yet. I think making it viewable on
mobile devices soon is doable. Making them easily editable will be much harder
/ require a different interface.

~~~
djhn
I wouldn't worry about that just yet, even the Google spreadsheets dedicated
app is a slow nightmare to use on the newest iOS and Android devices.

And to comment on the project, incredibly interesting. In my experience non-
quantitative people (non STE-grads) tend not to be able to, at all, model
decisions, decision trees and event trees as distributions or probabilities.
This kind of decision analysis can be sold at very high margins, because it's
value can be very high for the right audience. Intuitively, if you can form a
solid business team and do enterprise/policy/strategy level sales, a company
around this has a market (none of the established competing software pointed
out by others is very cheap, and none of it is very modern, accessible or
intuitive). Another option, more suitable for a freelancer mindset (and with a
wider distribution of non-bust outcomes as well as arguably higher EV...), is
starting and maintaining this as an OSS project, hopefully with a wide and
growing contributor base; and establishing a career as a consultant in the
modelling/predictions space with Guesstimate as your resume and business card.
This of course depends as well on your personal interest - software developer
vs. data scientist vs. fledgling capitalist.

------
retube
How are correlations between variables accounted for?

~~~
ozgooen
Guesstimate doesn't yet support correlations. Right now we assume everything
is independent, though of course variables are often correlated with their
outputs.

------
rbanffy
Finally!

------
Chris2048
Nice, but I can't help thinking of spreadsheets as something of a crutch.

Also check out:

[http://probcomp.csail.mit.edu/bayesdb/](http://probcomp.csail.mit.edu/bayesdb/)

[https://github.com/taschini/pyinterval](https://github.com/taschini/pyinterval)
[http://mavrinac.com/index.cgi?page=fuzzpy](http://mavrinac.com/index.cgi?page=fuzzpy)

~~~
netghost
In this case it looks like the spreadsheet is mostly for laying out the
variables in a nice orderly table/grid.

~~~
bbcbasic
I'd say in all cases that is what the spreadsheet is for. Otherwise use R.

~~~
Chris2048
Seems people will always find ways to turn their data into grids :-/

Also, use pandas ;-)

