

An Ode to Little Data - hackerews
http://katsenblog.com/post/67763100374/an-ode-to-little-data

======
11001
Little innovation in "little data"? What about the whole academic field of
Statistics? Most of the innovation is about how to make maximal use of
limited/censored/missing data? I don't hear "big data" folks talking much
about multiple imputations, structural equation modeling, mixed-effects
regressions, etc.

~~~
glutamate
Michael Jordan (the other Michael Jordan) wrote a nice article about how the
long tail of big data is little data and hierarchical models etc. should work
really well there.

[http://bayesian.org/sites/default/files/fm/bulletins/1106.pd...](http://bayesian.org/sites/default/files/fm/bulletins/1106.pdf)

------
pskomoroch
"90% of the world is using Excel and similar tools to analyze and visualize
their data."

These types of statements always prompt me to ask "what is the specific
problem you are trying to solve?". Replacing Excel? What is the unique
advantage beyond a different UI? I think the real product challenge here is
identifying why using a different UI or tool would have ROI for the average
business user the author is describing. If it doesn't, why would they switch?
What is your startup's tool going to do to increase my profits and justify the
investment and switching cost?

I've often found it is the analyst and approach, not the tools that make a
huge difference on these problems, and for those who care deeply about tools
there are many open source options (R, SciPy, etc). Creating more generic
"small data" tools runs the risk of solving a problem customers don't have.

------
christopheraden
False dichotomy about Excel vs. DSLs like R. "90% of the world is using Excel
and similar tools to analyze and visualize their data. [...] What’s the next
step-up from the spreadsheet? Learning how to code. Learning stats. Leaving
the comfort of a spreadsheet’s visual display for R, maybe?" To make this
statement would be ignoring that numerous non-free statistical packages have
offered GUI front-ends for years. SAS/EG, JMP, Minitab, and SPSS instantly
come to mind. Two out of those four are even marketed directly to people as
statistical extensions of spreadsheets. Granted, the "long tail" will still
need to learn a little bit of stats, but I fail to see how this is a problem
that Excel solves (unless we're talking only about plots).

I don't think it's a hard leap to consider that if the big-box statistical
package companies realized how much of their money came from industry, they'd
do what they could to make their software seem like an alluring proposition.
Statistical software costs an order of magnitude more than Excel, so they'd
need pretty good arguments on how to sell upper management that the business
team actually needed an 8000 dollar piece of software.

I'm not sold that there's nothing in between Excel and R. From my experience,
they require a slight learning curve (nowhere near the learning curve of going
from Excel to R), but not an insurmountable one. What these solutions lack is
the name recognition that Excel has, or decent integration into a MS Office
stack (exceptions, of course--I remember seeing a statistics toolbox for Excel
once), or they cost too much.

I think part of the real problem is that for a lot of companies, Excel is
"good enough". There's plenty of stuff it can't do well. It chokes on larger
data sets, has limited statistical functionality, poor scripting capabilities,
and shaky random number generators. But it's good enough for people who don't
want to do much with their data.

If they wanted to do harder-core analysis, they'd outsource it to an analytics
team. This perspective comes from a viewpoint inside BigCo. The prohibitive
cost of some of these solutions might be a harder pill for a small firm to
swallow.

------
zrail
I posted some thoughts about "little data" from a more personal angle earlier
this month[1] (hn discussion [2]). The solution that the OP is hinting at
might very well be something that helps normal people organize and ask
questions about the data they have on hand and scattered around the Internet.
It'd be cool if I could mix and mash up little applications and visualizations
with my data without actually pushing that data anywhere I don't have control
over, though.

[1]: [https://www.petekeen.net/little-data](https://www.petekeen.net/little-
data)

[2]:
[https://news.ycombinator.com/item?id=6718422](https://news.ycombinator.com/item?id=6718422)

------
glamp
I've never understood why people like "big data". Small/medium data is way
more fun!

------
glaugh
Definitely a lot of different angles on little data. Our aim at Statwing is to
take the power of statistical analysis and embed it in a UI that's easier to
use than Excel's PivotTables (via automated selection and interpretation of
statistical tests). And hopefully in doing so we're filling in the gap between
Excel and tools like R.

[https://www.statwing.com/demo](https://www.statwing.com/demo)

That said, I certainly think Paul's approach makes a lot of sense, and I hope
someone builds that. Closest thing I can think of is
[http://anapsis.com/](http://anapsis.com/)

</pitch>

~~~
glutamate
On bayeshive.com, we have this cool little feature where if people want to
build a more complicated statistical model that involves entering an equation
- think non-linear regression or a dynamical system - then you can save that
equation and share it with the other users, or run it against other datasets.

We haven't figured about how to extend that to visualisations, which I think
is what the OP is really looking for as well.

------
hax0rsehat
Interesting

