Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Guesstimate, a Spreadsheet for Things That Aren’t Certain (medium.com/guesstimate-blog)
559 points by freefrancisco on Dec 31, 2015 | hide | past | favorite | 94 comments



I wasn't expecting this to go on hnews yet, but happy to take any questions!


Hey, I tried to send you something through Intercom on the site, but something was broken and it wouldn't send. Here's what I tried to write:

Cool idea. Found this from a @worrydream tweet. Some comments after playing with it for a few minutes:

* A bit disappointing that I can only have uniform or Gaussian distributions. At minimum I'd like a binary distribution (coin flip, probably biased coin flip). A lot of things I would want to model need this (e.g., will we close this sale? will we close this investor? that kind of thing.)

* I'm really confused by the arbitrary two-letter codes assigned to things for formulas. Makes the formulas impossible to read. Why not just use the names I give to the cells, or something derived from the names?

Really nice start though! I'm co-founder & CEO of fieldbook.com, another spreadsheet-like tool, so I love information tools and anything that expands the mind's capacity. Best of luck and let me know how I can help!


I just looked at Fieldbook. The user-interface work you've done is fantastic. I'll save you from shamelessness. Here's a link to the promotional video for anyone who's interested: https://www.youtube.com/watch?v=Qw5f6Qptufc

Give me beta access! I want to play with this thing.

Wow! The developer tools are amazing too: https://www.youtube.com/watch?v=stwlaJGeLoM


Thanks! We were on Show HN not long ago, if you use the link there you can get beta access now: https://news.ycombinator.com/item?id=10752570


Sorry to hear about Intercom, will investigate shortly. I heard someone else had a similar issue.

In response to your points:

- Other distribution types are the #1 most requested feature at this time. Binary distributions are possible using functions instead of the built in distributions, though obviously not intuitive. It's built using math.js, which has several random functions of different kinds. (In this case, you could use '=randomInt(0,1)' to produce a coin flip.

http://getguesstimate.com/models/365

- The arbitrary two-letter codes were simply the easiest thing to begin with. I started with something derived from the names, but this presented problems with cells with empty names, especially ones that started empty and later become non-empty. Excel has a pretty sophisticated model for referencing cells. For the sake of getting something shipped, I started with a very simple one. Definitely an area to improve.

- I'd love to talk sometime. I'm also in SF, will send you an email. Thanks for the advice!


Cool, thanks! Yeah, it was not obvious you could create new distribution types with random functions. I didn't look at functions too closely because I didn't expect distributions to be there. Makes sense, though, thanks!

Re the two-letter codes, I totally get doing the simple thing just to launch and get it out there. We face a similar problem in Fieldbook, by the way. Would be happy to explain how our solution works sometime.


Any plans to add different probability distributions? I do this kind of analysis by hand for things like project timelines and cost estimates, but I model each variable with a double triangular distribution (http://www.mhnederlof.nl/doubletriangular.html). You provide a best case, worst case, and then the most likely case. The distribution approximates the long tails on many kinds of distributions.


Definitely. I haven't yet had a request to do a double triangular distribution, that looks really interesting.

It will be a bit of a challenge to make it intuitive, but I'm sure I'll figure out something.

Just so you know, right now there is a second distribution, uniform. You can access it by clicking the 'normal' icon in a normal distribution, where a short list is shown below.


I just made an issue on github, but the uniform distribution looks wonky at the right endpoint.


Much of that is because the react histogram library I used. There are a bunch of issues with it. I definitely intend to replace it with something much better. If any of you recommend tools or libraries for that, I'd be very appreciative.


I do the same thing as cpitman, try to figure project times and costs using the same best, worst, most likely. It would be nice to have it automated so when the PHB looks at the numbers they just don't pick the "best case" because we have the "best team". 50 nodes would be plenty, I can roll up tasks into a bigger task set.


I worked on a while for a product to do this. It foundered on the stony shores of my unfamiliarity with angular, which was the sexy new thing at the time.


+1 for more distribution shapes, specifically power law, combinations of power and normal (Which you might want to consider for your soft-attack-long-tail variables over triangular, if the edge between triangles causes you artifacts), and distributions based on empirical collections.


Have you seen or were you at all inspired by Analytica from Lumina Decision Systems (http://www.lumina.com/why-analytica)? It's been around for about twenty years and is often used in the field of policy analysis when you need you to quantify messy things like human lives in dollars in the face of massive uncertainties.


Also, Oracle Crystal Ball.


This is awesome! Have you heard of Augur? It's a decentalized prediction market: http://augur.net I wonder if the guys there would be interested in this.


Definitely, keeping a close eye. Have long been interested in prediction markets. It would be nice to tie data from them and the Good Judgement project into tools like Guesstimate so people could make forecasts using other strong forecasts.


Pulling data from prediction markets into Guesstimate is an exciting idea. A few thoughts:

* Prediction markets are usually for binary outcomes. I imagine the most useful role of binary variables in Guesstimate would be to mix two different distributions. "If Clinton wins, student debt in 2018 will look like distribution A; if Sanders wins, student debt in 2018 will look like distribution B".

* I'm not sure how Augur (or any other market) reports likelihoods, but it's good to keep in mind that market prices do NOT generally reflect any sort of average belief. See https://www.aeaweb.org/assa/2006/0106_1015_0703.pdf.


That makes sense. It would be very useful to see estimates of how well Presidential candidates would do if they got elected.

In the future, one idea would be to keep track of people's metric estimates in Guesstimate, and later score and rank them on how well they do. So if Charles always reports a 90% confidence interval that's far too optimistic, we could help adjust it automatically next time. This would also allow us to aggregate different opinions directly, essentially being like a mini prediction challenge. This would be a ways off though, and it really depends on what direction the product goes.


The guy who's work was originally the basis for Augur has some interesting things to say about that project (and is working on his own implementation of a Bitcoin-based prediction market): http://www.truthcoin.info/faq/


I love the tool and tweeted about it yesterday. It's brilliant, I love it, excited to try some more complex probabilistic distributions.

Only request would be to allow for private spreadsheets. I can download and run the code locally but this would help many people who are less tech savvy.

Great product - looking forward to seeing how it evolves!


It's nice to see a web app with keyboard support, just please don't highjack the keys if any modifier keys are held down, these are usually for the browser, alt+left to go back as example.


<3 this! Are there any plans for some type of export utility, e.g. some type of JSON serialization of a finished model?


That's definitely on the agenda. Curious, what are you interested in a JSON serialization for?


Oh, it wouldn't have to be that format in particular, I was just guessing how you might represent the worksheet in some useful, repurpose-able manner.


I love the simplicity. Overall, a fantastic product.


I would be much more interested if this product were "choose probability density function centric." Then, the Monte Carlo engine would gain much more interest. Being able to choose or specify arbitrary distributions, and then run simulations, would be valuable.

Of special interest are non-continuous distributions. How often have normal distribution reasoning failed in finance? Put another way, a user should be able to model a distribution himself.


Very good to know. Right now you can choose between normal, uniform, and a few very simple discreet distributions, but not others.

When I built this, my first goal was to make any distribution run quickly. At this point I believe adding other distribution types will be quite doable, expect them shortly.


Two words: beta distribution. It has finite range and an arbitrary mode, and has the uniform as a special case.


Statistical reasoning in general has failed in finance, especially 2008+... I don't think statistics or normal distributions are to blame. Rather the mindset that the risk scenarios are something that is avoidable with certainty. That's almost religious...


Guesstimate is napkin math. It makes no sense to spend an inordinate amount of time fine-tuning the distribution when both the distribution and its parameters are just best guesses. The important part is the propagation of uncertainty across many dependent variables, and the normal distribution is often good enough for that purpose. Whenever it isn't, for me that'd be a sign to use a proper statistical model instead. MCMC was invented to do inference on models of arbitrary complexity, with however much or little data you might have.


Why would you need Monte Carlo? Can't you combine probability density functions through convolutions (or other tricks with integrals, like Fourier or Laplace transforming and then using straight arithmetic)?


Relevant—Uncertain<T>: A First-Order Type for Uncertain Data (Microsoft Research)

http://research.microsoft.com/pubs/208236/asplos077-bornholt...


I'm convinced that an excel sort of lay-person's computing platform is where probabilistic programming will really take off. This seems really cool!


It's not really related but it made me think of a friend's PhD thesis on uncertain data. If the subject interests you, be sure to checkout the summary of his (impressive) work: http://a3nm.net/blog/phd_summary.html.


I like it. I had to do a strategy session with a client a couple of weeks ago and we needed to estimate how much the strategy was likely to cost over the next few months the. We had quite a few variables to work with though. This would have been handy in such a scenario I presume? We knew what are components and the ranges were.


I believe it would be handy, but it depends on the size. Right now I think it's reasonably fast and intuitive for models of around 3 to 40 metrics(variables). If you have more it could get slower, especially if many of them have to be recalculated at once.

I suggest trying it out. If nothing else, you may be able to begin with very simple models of the most important variables.


It was around 6-8 metrics so i guess it would be fine?


Definitely. Here's one in that range as an example. http://getguesstimate.com/models/163

Feel free to play with it. You can edit it, just not save it. (I recently realized this was not obvious to most people)


This is very similar to the paper http://www.isi.edu/~szekely/contents/papers/2012/szekely2012...

As per the paper , you can choose arbitrary distributions , construct a fluent graph , run Monte Carlo simulation and get the result - |via http://bit.ly/hnbuzz01 |


'Fuzzy logic' seems to be an ex-buzzphrase nowadays, but this seems pretty close to that territory. A variable/cell/logical-unit containing not a single value, but a distribution (often between bounds), and getting combined with other similar variables/cells/logical-units in ways that understand and respect the probability distributions.

Perhaps that field can provide a potential source of new names, when you decide to market this as a company.


Very much like Crystal Ball - an Excel add-on that's popular in the finance and energy fields.




Does the app include everything so it can run offline?


No. It needs to be online at this point.


Thanks for the prompt reply. Are you planning to open source all of it in the future or will it remain SaaS?


Right now the vast majority of it is open source. There is a component that is not: the rails server, but that's pretty tiny. The client can be developed on without that. If there's some interest I'm happy to make what's existing available.

I can't make guarantees about the distant future. There's a ton of work I would love to see happen with Guesstimate, and my guess is that much of it would only be possible if it becomes a company. This can still mean that it can be mostly open source, but I really have little idea what the situation would be at that time.


I see. Thank you for having open sourced so much of it already and good luck with the project :)


I was watching "Total time spent watching this video" video, and had a basic question.

How does one tell guesstimate that there's a hard lower bound on a quantity. ie. Video Length is at least 0, because negative watch times are unphysical? I know the specified distribution in this case is very narrow (the video lasting between -1 and 0 minutes has probability ~0.000032). But the answer does come out to be 26±32, which includes a substantial unphysical region.

And, if I give a hard lower bound on Video Length, can it propagate that knowledge into an asymmetric error on Total time?


Good catch!

Right now the main distribution types are normal and uniform. In the video, I showed normal distributions, which have long tails in both directions.

In this case, a normal distribution isn't really correct, because, as you noted, being less than 0 is exceedingly unlikely.

I believe the correct way to deal with this is to use a lognormal distribution or something that has 0 chance of being less than 0. I don't yet have a simple way of doing this, but it's definitely on the agenda.


You should be careful about committing too much to particular distributions being the "right" ones. For example, what if I know some variable is an odd integer between 1 and 11 inclusive with no support on any other real number?

Just something to keep in mind when abstracting what a distribution is.

edit: though as a short-hand for entry, Gaussian is usually a pretty good guess. Is there support for µ±σ instead of [low,high] in the works? Or support for numerical distributions?


"Is there support for µ±σ instead of [low,high] in the works? " - There used to be. I'll be considering ways of adding it back.

"Or support for numerical distributions?" - By numerical distributions do you mean discreet distributions: like, a 40% of being '8' and a 60% chance of being '6'? If so, the answer is no. However, if you use the ternary operator it is possible to do very simple versions of this now. We do support totally random picks of different numbers though, using the pickRandom([3,5,3]) function. http://mathjs.org/docs/reference/functions/pickRandom.html


I meant more along the lines of a user-entered histogram. But that's roughly the same as what you're talking about. It does seem that such a thing must roughly correspond to some internal portion of Guesstimate, anyway. So for an advanced user to punch a distribution in would be handy. May be out-of-scope for this project? I guess really I'm looking for a way to error propagate my home-grown datasets :)


I should add that the negative portion of the answer may not be from sampling the small tail of the Video Length distribution, but is much more likely to be an artifact of how you calculate uncertainty. It might be better to find the median and go out some % in each direction asymmetrically. You can see that the Total Time is roughly flat and then peters out.


Actually, the negative portion could be from sampling the much more substantial negative tail of the Viewers distribution. Either way, constraints seem to be important!


Awesome! I use Crystal Ball (http://www.oracle.com/us/products/applications/crystalball/o...) with triangular distributions and Monte Carlo for software project cost estimation. Crystal Ball costs thousands of dollars so I will be following this with interest.


You might also find LiquidPlanner (http://www.liquidplanner.com/features/#why-liquidplanner) interesting for software estimation. The scheduling engine built entirely around the notion of ranged estimates driving probabilistic schedules.


I think this is super cool! We're so bad at estimating probabilities (think Han Solo's "never tell me the odds") that this helps visualize the distribution of outcomes


This is really cool. Can anyone recommend any particularly good/cogent Simple Caveman explanations of how Bayesoan theory/Monte Carlo simulation work?


Honestly, my favorite resource for much of this is the book How to Measure Anything by Douglas Hubbard. He goes into detail in understanding the value of information and how and why to use Monte Carlo Simulations.

Video: https://www.youtube.com/watch?v=w4fHGTsZZD8 Book: http://www.amazon.com/How-Measure-Anything-Intangibles-Busin...


I wrote a tutorial about statistical bootstrap resampling w/ animations, which is similar/related to the Monte Carlo method: http://minimaxir.com/2015/09/bootstrap-resample/


It seems that using montecarlo in excel is apparently pretty commonm, but I'm not sure if it can be done without addons.


I think DeMarco and Lister did some monte carlo in spreadsheet without addons: http://www.systemsguild.com/riskology



I always find it strange that people found that thing Rumsfeld said dumb, bizarre or (worse) his own invention [1].

Those are standard epistemological distinctions, known (and written about) since at least the times of Aristotle.

[1] (I mean the philosophical essence of what he said -- not that he didn't tried to use it an an excuse for BS).


I also find it interesting that the aerospace/defense world (and co-influencing cultures in engineering and government) have sometimes used 'unk-unk' as shorthand for 'unknown unknowns'.

http://www.waywordradio.org/unk_unk/


What fun - I did a monte carlo estimate a few years back when trying to determine what purchases price of house my girlfriend and I could afford. It depended on probable interest rate, how much my old house would sell for, etc. It'd be interesting to see how simply it could be modeled in this.


That seems like a great fit.

A few weeks back I used the tool to help a friend decide which mortgage option to take for his house. One house has a slightly lower APR than the other, but was a had a higher assistance fee.

After fiddling with it, it looked like the one with the lower assistance fee was the better option. But perhaps more important, it didn't seem like it made a big difference; perhaps around $200 after 10 years. This was a good indication that the choice didn't really matter; that it wasn't something to spend over a few hours worrying about.

http://getguesstimate.com/models/100


When refinancing houses, I always performed detailed analyses. With a good model, you can compare every deal offered. The key input is your expected life of the loan. You'd pay very high fees for an interest rate reduction if you really thought you'd keep a loan 30 years. But that is not very realistic.

I found, remarkably to me at the time, that at a single particular loan life, the APRs of a single lender all converged very accurately to a single interest rate. This told me two things: 1. Choosing a realistic loan life was key. 2. Many of the choices (points + fees vs. interest rate), were for most people illusory offerings to give the illusion of choice.


Isn't how all software should be written? Expressions that represent a set of all possible values, effectively replacing the need for types.

Surely, such a platform would make building an app 100 times easier. Not that building apps is a good use of our resources.


A type is an expression that represents the set of all possible values..

If you don't like making types explicit, you could use implicit typing like in Haskell, or having values carry their types like in Ruby.


I can kinda imagine it but can you elaborate on the part about getting rid of types?


By replacing types with arbitrary expressions, you could go beyond just having a list of integers. You could have a list of odd numbers, a list of words starting with "A", a list of ages over 18, a list of prime numbers, a list of at least 5/items, a list of all integers (infinite), etc.

You can only access index N of a list whose type expression proves that it has at least 5/items. You can only access the fields of objects that are proven not to be null. You can only give as input to a function that expects an odd number a value whose type expression proves it to be an odd number. Basically, you can never have runtime exceptions.


You are talking about dependent types (https://en.wikipedia.org/wiki/Dependent_type) and only a few theorem-proving languages support them.


Interesting idea - but you could certainly do this in any spreadsheet application with multiple cells to represent ranges etc.. I think the of estimating probabilities issue can be considered to be more of a practices issue than a tools issue.


This is a really great interface, and cool idea.

You might consider upping the run count, or maybe narrowing your bins for the visualization. Either way, it's great to see more tools embracing probability and uncertainty like this.


In practice, 5000 was basically the number that wouldn't slow it down; this represented around 20-30% of the rendering time (react components were the main bottleneck, surprisingly, though I could still optimize them more).

I think that this works fine for small models, which is much of what exists now. As there are larger models, I'd eventually like to offload calculations to AWS Lambda or something similar, so we can do far more.


5,000 tests is more than enough for most general use cases. (i.e. data that would be able to fit into browser memory, anyways)


hey nice tool on a side note what tool did u use to create the animated tutorials on your git page:https://github.com/getguesstimate/guesstimate-app

image link https://camo.githubusercontent.com/8fd97a97fa656a1eb92294f0f...


I don't know what he used, but I would recommend ShareX.


It'd be nice to be able to show / enter n-dimensional histograms too, so that one can get an idea of / control the correlation between two outputs / inputs.


This is awesome. I am aware of the math behind, but this really made it much simpler to use. And also to make complex models.


Cool idea but doesn't render into anything useful on my S5....


Thanks for letting me know.

It's definitely not mobile compatible yet. I think making it viewable on mobile devices soon is doable. Making them easily editable will be much harder / require a different interface.


I wouldn't worry about that just yet, even the Google spreadsheets dedicated app is a slow nightmare to use on the newest iOS and Android devices.

And to comment on the project, incredibly interesting. In my experience non-quantitative people (non STE-grads) tend not to be able to, at all, model decisions, decision trees and event trees as distributions or probabilities. This kind of decision analysis can be sold at very high margins, because it's value can be very high for the right audience. Intuitively, if you can form a solid business team and do enterprise/policy/strategy level sales, a company around this has a market (none of the established competing software pointed out by others is very cheap, and none of it is very modern, accessible or intuitive). Another option, more suitable for a freelancer mindset (and with a wider distribution of non-bust outcomes as well as arguably higher EV...), is starting and maintaining this as an OSS project, hopefully with a wide and growing contributor base; and establishing a career as a consultant in the modelling/predictions space with Guesstimate as your resume and business card. This of course depends as well on your personal interest - software developer vs. data scientist vs. fledgling capitalist.


How are correlations between variables accounted for?


Guesstimate doesn't yet support correlations. Right now we assume everything is independent, though of course variables are often correlated with their outputs.


Finally!


Nice, but I can't help thinking of spreadsheets as something of a crutch.

Also check out:

http://probcomp.csail.mit.edu/bayesdb/

https://github.com/taschini/pyinterval http://mavrinac.com/index.cgi?page=fuzzpy


Spreadsheets definitely have a lot of downsides compared to full programming languages, but there are some strong upsides. I think Guesstimate is good for relatively simple models (compared to large programs) of specific things; these are intended to be very visual (so you can see how uncertainty propagates through a system) and collaborative (so people can disagree on specific estimates).


In this case it looks like the spreadsheet is mostly for laying out the variables in a nice orderly table/grid.


I'd say in all cases that is what the spreadsheet is for. Otherwise use R.


Seems people will always find ways to turn their data into grids :-/

Also, use pandas ;-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: