

Startup Pivot: Small (data) Is The New Big (data) - Mercutionario
http://insights.qunb.com/a-techstars-startup-pivot-why-we-pivoted-from-visualising-big-data-to-smarter-small-data

======
jandrewrogers
A point that a lot of nominal Big Data startups miss is that genuinely large-
scale data management and analytics are not driven by visualizations at all
nor fit in a web-driven SaaS-like environment. The purpose is to answer a
complex question from unimaginably large volumes of data, not to draw charts
and graphs. It is often too I/O intensive for virtualized clouds and the
visualization component is almost superfluous to the purpose. Most of the
problems that need to be solved in Big Data are low level, down at the
computer science and infrastructure level. Many of the use cases are
intrinsically poorly suited for web-based SaaS type offering.

To make matters worse, many high-value Big Data analytical problems are
(literally) not meaningfully visualizable except for marketing purposes. It is
rather tricky to visualize an analytic product when there are a hundred
critical values that need to be rendered in some fashion for every pixel your
monitor can display. A lot of high-value analytics have this characteristic
but most of the nominal Big Data visualization tools ignore this case even
though it is arguably the most important one.

Consequently, while labeling your startup "Big Data" is trendy and
fashionable, there are very few genuine Big Data startups. Adding value in
this market requires a combination of serious theoretical computer science
chops plus very creative interface design. Few startups are actually
addressing the needs of this market and are instead assuming the market wants
the web app they have the skills to produce.

~~~
mwetzler
Agree! My personal experience: our customers at Keen IO constantly demand
better visualizations and many expect a traditional analytics frontend like
GA. However... our highest value customers pay for our API capabilities, not
our line charts. Storage and querying at massive scale is the hard part and
that is worth thousands of dollars a month, not tens or hundreds.

Stunning visualizations and a better web experience are definitely something
we want to do when timing allows, but so far our true customer value is in the
backend and our APIs.

------
agibsonccc
There's a lot of money to be made in the smaller verticals. I think for a
startup, there's a lot of opportunity to allow people to just manipulate and
sanitize data in simpler spreadsheets.

For example, one thing I'm being forced to implement myself has been a lot of
string manipulation operations to sanitize different kinds of data I'm playing
with in spreadsheets.

Even just having something misimported wastes a lot of time.

OpenRefine isn't bad, but can only get you so far. That being said, if I can
come up with a complete solution myself, I wouldn't mind just adding it to the
suite of tools I'm already offering :)

I'm also wondering about different kinds of tools already out there though.

~~~
bliti
I once made a simple program that would create excel spread sheets from the
input of a barcode scanner. Took me about a day. It made a big impact on the
company. Before, they would spend 2 days creating the spread sheet. My program
had it done in minutes. They were drop shippers, and went from about 25
pallets a week, to 50 (and kept growing). All in a matter of less than one
month. The program was less than 200 LOC.

~~~
agibsonccc
Exactly! It's a simple concept for a programmer to think about. (A 2d array?
How hard could this be?) but really it's crazy how annoying it is to do
certain things. Excel has math down pretty well, but encoding other kinds of
data or even compiling the data to a spreadsheet is a significant problem for
most people.

~~~
bliti
The biggest problem that company had was fetching the image for each product
(it fed from one API). They had to manually insert the image for each one.
Imagine a pallet with 200 different products in it! All I did was create a web
GUI where they would upload their orders, then the program would fetch all the
data, insert the images in the correct row, and generate the file for them.
But the file would be sent off to a workstation that was operated as a printer
server/queue station. After that, the spreadsheet was automatically printed
and the shipping department would get the document in a nicely formatted
manner. It weird, but I get excited talking about it. Its the little wins that
really count.

------
mathattack
Excel is good enough for most problems that small and medium sized firms have.
Big Data tools and techniques are only worth the effort when the problems to
solve are big. That's not small business.

~~~
Mercutionario
Exactly my point, I'd like to post something on Excel as the real and ultimate
data tool soon.

~~~
mathattack
It's amazing how much of the world financial system is built on the back of
Excel too. Probably the other extreme there - a little too much VBA where
bigger and better tools would work.

------
mwetzler
I think there is definitely an opportunity to help companies understand their
GA data, but our company Keen IO is proof that Big Data is not too fat for
startups. We're finding a big opportunity in helping developers build custom
analytics backends and white-labeled analytics. We've found the big bucks are
made supporting customers with truly big data challenges. We can provide the
scalability and reliability that would be arduous and expensive for them to
build in-house.

~~~
bliti
Let's say I have a data analysis startup. My market is dentists with three or
_less_ offices. How would your Big Data solutions work for me?

(Real question. I am developing this business as we speak).

~~~
mwetzler
That's awesome! This isn't really a "Big Data" problem - dental transactions
don't happen in the scale of billions per month. However, the transactions
themselves are very valuable for the dentists --- that's great for you! You
can provide insight without having to crunch a zillion data points.

This is an example of an industry vertical analytics solution. Our hypothesis
is that analytics products will be created to support all kinds of use cases
like this (insurance analytics, manufacturing analytics, distribution
analytics, etc). We want Keen IO to be the platform on which people build
those solutions.

The way we can help you is to first make it easy for you to reliably collect
data from all your dentists (with client libraries). Then we expose all that
data, and of our analytics capabilities (e.g. queries), by REST API. You can
log into Keen IO and create a line chart, then copy and paste the javascript
right into your site. Now you can build a completely white-labeled website for
your dentists, while we take care of storing and querying your event data. Our
scoped API keys will allow you to secure the data so a given dentist can only
see data for her offices. You can also create an internal dashboard for your
team so you'll able to do analysis across all the dentists. Hopefully you'll
discover industry trends and benchmarks that you can use in marketing reports
or to resell to the dental industry (assuming your dentists allow this!). I
bet they would like to know how they compare to other dentists...

Anyway, this is too much fun. Let me know if you want to brainstorm more!

~~~
bliti
You mention collecting data, which has been a big issue in this market. The
local dentists (this is an offline startup) use a closed sourced program that
does not export data. The only way to retrieve any data is to use automation
scripts that "read" from the screen and dump it into CSV files (they use excel
a lot). Once that is done, the script calls another one that sanitizes the
data and inserts it into the database (PostgreSQL, in this case).

I currently have a flexible API that allows me to make complicated queries by
sending commands through an HTTP call. Say "GET /clients/where/__last-
visit/20132009" and so on.

On the front end, I have a fairly custom dashboard for each dentist (they all
have specific needs due to different business sub-specialties). The dashboard
calls the API and creates the reports automatically. They are update _n_
amount of times a day, depending on the metric being observed.

Given this information, and the one you provided me, can you give me your
insight into how different your offering would be?

My aim is to reduce complexity, and reduce business costs. Without eroding the
service or client experience. I'm interested how my "small data system" would
compare to yours in those terms. It may sound like I'm being pedantic, but I
am genuinely interested in learning more. You could be the solution that saves
me hundreds of hours of development time.

~~~
mwetzler
Hmm... that does make your data collection tricky. Ideally you have something
that captures events as they happen, in a nice format, rather than waiting on
summary report views.

Looks like you already built an API similar to ours, but if you ever reach
scale/maintenance/availability bottlenecks then check us out.

Most of our customers use charts that query in realtime when the page loads or
the user adds a filter or whatever, but batching reporting like you describe
is a way to optimize the page load time.

Generally we want to help developers before they already built the solution
themselves. I think your homegrown solution is probably exactly what you need
right now to validate your product in the market. When you want to invest more
in your architecture for improved performance, scalability, or reliability,
then you can consider building v2.0 on Keen IO :)

~~~
bliti
I appreciate the response. The system is still fairly new, and in development.
My options are still pretty much open. Just waiting on more paying customers
(but aren't we all?).

------
pnachbaur
I can certainly relate to the challenges of working on a big data platform
intended to immediately satisfy varied customers...

But mainly I want to say I'm super impressed with the Google Analytics
'storification'. I can imagine the difficulties in bringing that level of
quality to myriad data sources, but I'm excited to see you succeed!

~~~
Mercutionario
Thanks that's awesome feedback!

------
j_s
Heads up: when JavaScript is enabled but analytics is not (eg. Ghostery
blocking HubSpot), the site falls apart badly.

~~~
Mercutionario
Crap! Thanks for the heads up, do you have any idea how we could fix that?

~~~
j_s
Did not see your interest in fixing this until today.

Just verify the expected variables/methods exist prior to using them (lines
99,197,365 right now):

    
    
       if(hbspt) hbspt.cta.load(
       ^ added ^

------
tsax
Maybe a few more buzz words and I may have clicked the link.

~~~
Mercutionario
Touché. Instantly new big data FREE for YOUR pivot might have done the trick,
though.

