
MapR may shut down as investor pulls out after ‘extremely poor results’ - heyyyouu
https://siliconangle.com/2019/05/30/mapr-may-shut-investor-pulls-following-extremely-poor-results/
======
xkgt
As many others pointed out, these product vendors are getting killed by the
cloud (I am not saying whether it is good or bad). On-demand compute and
storage scalability of cloud makes perfect sense for Big data infrastructure.

These companies are failing because they can't establish a use case of running
their product on cloud. It is ironic because when AWS started EMR, MapR was
one of the distributions they offered via EMR[1]. Over period of time, they
weened off MapR and stopped offering them as deployment option. So it is
another case of AWS cannibalizing its own third party ecosystem.

Once this was gone, MapR got reduced just another marketplace partner [2] and
their additional licensing cost didn't make sense when native EMR was
sufficient for most use case.

[1]
[https://aws.amazon.com/emr/mapr/pricing/](https://aws.amazon.com/emr/mapr/pricing/)
[2] [https://mapr.com/partners/partner/amazon-elastic-
mapreduce-a...](https://mapr.com/partners/partner/amazon-elastic-mapreduce-
and-mapr/)

~~~
jcims
Which has to be a gut punch because EMR feels like a bag of bolts rather than
any kind of polished system.

~~~
sitkack
And MapR is so much more efficient than vanilla Hadoop, AWS definitely makes
more money off the less polished yet mainstream version.

~~~
xkgt
IMO it is not a fair comparison between EMR and MapR flavor of hadoop as they
are designed for different usecases. MapR and other flavors of hadoop are
engineered to make the task of administering a long running cluster easy. In
comparison, EMR has a rudimentary set of tools for peaking into a running
cluster.

Whereas AWS offering is ideally suited for transient clusters. It relies on
other solutions like Athena, Redshift Spectrum to cater for ad-hoc use cases
such as querying and reporting. In this regard, EMR has much better support
for programmability and elastic resource provisioning which is really
important for transient clusters.

~~~
sitkack
I think we are talking past each other. MapR is a highly tuned system compiled
to native binaries. It has nothing to do with the emphermality of the
clusters, it has to do with job runtimes.

------
notyourday
To me, and a few people that I know, the most interesting part of MapR was
their NFS server over multiple nodes. It solved a real problem that real
customers (i.e. enterprises with money for whom tech made stuff work and not
was work in itself) were willing to pay a lot of money for ( see Isilon, EMC,
NetApp, etc ) but that was a non-sexy part of the MapR business -- Big Data
part was much sexier.

Over the last 5 years the real money making enterprises solved their "it
should look like a big file system" problem so "the rest of our stuff works"
issue stopped being an issue ( in a process Isilon got bought by the EMC, EMC
got bought by Dell ) either by buying those specialized solutions, moving to
object store or building home grown systems that worked with their specific
applications and big data people went with newer, more shiny big data
solutions, leaving MapR with no market.

~~~
notacoward
The increasing adoption of object stores and other not-quite-filesystems, even
when a real filesystem would be the more appropriate choice, is definitely
part of this. For good or ill, people would rather work on or with one of the
simplified alternatives. I'm in that camp myself now, working on a system with
lame HDFS-like semantics after two decades of proving the "POSIX can't scale"
folks wrong. It's just not a battle worth fighting any more, at either the
technical or business level.

~~~
notyourday
That's why I think MapR missed the mark. They had a software only solution one
could deploy on a top of generic hardware in a data center to solve hundreds
of TB to small number of PB storage problem that was _real_ for enterprises
and since those nodes would also be able to do compute jobs that was the way
to get enterprises to adapt that technology. Going to object store was
possible but it required dev time and in the game of build or buy the buy of
the non-core tech is nearly always preferable. And, in case of MapR, it could
be done immediately not after a massive refactoring to move to object store.

But the MapR sales people were impossible to deal with. They wanted to talk
about Big Data. And all the wonderful things that we could do with it.

Me: I have lots of files. Think billions. Lets talk about what and how your
stuff works to solve this problem because I'm hearing some people successfully
used it on a million files scale.

Them: That's great. Let me tell you about big data that you can do using it.

Me: That's ok. I really want to solve my files problem.

Them: Big data is the future! You can change your application to do big data.

Needless to say it went nowhere fast.

~~~
noir-york
Interesting post. Out of curiosity: aren't there others who have developed
cluster storage exposed as NFS / FUSE?

~~~
notyourday
They all sucked. MapR sucked the least. Considering a comparable EMC solution
was ~20x more expensive and less flexible, MapR happened to be in the right
place at the right time with the right software but blew it because they
decided they knew better what the customers needed to buy vs what the
customers wanted to buy.

Fundamentally, file system clusters are a very difficult business to get into
and make enough money to justify being in that business. On a low end there's
open source stuff that kind of sort of works. Selling consulting on a top of
it is at best a ramen-profitable business. No one is buying a Ceph-consulting
for a million dollars if they run a business that needs that size of storage
-- the data is too valuable to do a semi-custom solution and be a test case
for Ceph, Gluster, etc.

On the top end there are EMCs of the world with $350K/node pricing + yearly
10% support contract. Finally, there's a "rewrite the app" alternative ( call
it 1 year, million $ price tag to move to object store ) that a customer
company is always considering.

So the sweet spot is enterprise solution with a license cost of about
$12k-$24k per year per node which has EMC/Isilon/NetApp-like functionality
that unbundles software from hardware. The tricky part is that it needs to be
a proven solution that works on the comparable EMC/Isilon/NetApp sized
deployments. To do that one needs to have a lot of excellent engineers that
cost a lot of money, a lot of sales engineers that intrinsically understand
the commonality between the customers' requirements and can explain it to
those engineers and a lot of very expensive test beds.

~~~
mbreese
I remember being in a pitch meeting with MapR a few years ago. It was for a
large academic institution’s HPC group. We were all very interested in their
NFS/HDFS tools, which would have met a very critical need. They just kept
talking about the Hadoop workflows, which wouldn’t have flown at all for this
group. Most HPC workflows/tools don’t map very well to Hadoop or at the
minimum would have required tools to have been rewritten.

We really wanted to spend the money, but as far as I remember, we didn’t end
up doing anything more than a test install.

I’d still love a middle option that could handle petabyte scale storage
without the Isilon price tag.

~~~
hc91
CERN?

~~~
mbreese
No, this was at Stanford a few years ago, so not close to CERN level HPC.

Our lab has modest compute requirements, but need a ton of storage, so we were
very interested in a mid-range storage option.

------
joehandzik
We're seeing a lot of regret around sprawling Hadoop deployments, so this
doesn't surprise me. Other Hadoop vendors (vendor?) pivoting to machine
learning is a bandaid as the compute capabilities scale beyond HDFS's
performance limitations. Look towards new-gen startups around NVME/NVMEOF
(WekaIO, Excelero, E8, etc etc) to fill the void.

The question is going to be: will anyone provide an intelligent way to
maintain compatibility with applications that expect to interface with HDFS
rather than POSIX? It's a bit of a gap right now from what we see.

~~~
capkutay
Hadoop's whole appeal was it was cheap and scalable. Did it actually work
without serious engineering teams maintaining each distribution? absolutely
not. the hype was purely VC funded.

fast forward 10 years and hadoop has basically been killed off by hosted
storage services that are more expensive but 10000x easier to manage.

~~~
dmix
I'm not really familiar with this industry. Are these Hadoop clusters hosted
locally by these businesses? And MapR et all were providing the
software/consulting to help manage them?

Then I'm guessing better cloud hosted options came out offering similar
capabilities.

If that's the case was it really a big surprise that "cloud" hosting would eat
any self-hosted platform's lunch?

~~~
threeseed
So in the very recent past (i.e. a few years) businesses wanting to do Data
Science ran a distro of Hadoop e.g. MapR that they bought from the vendor.
They charged an exorbitant charge per node (e.g. $10k) because they figured
they would get people to switch from Teradata or Oracle.

Now what the cloud offered was so much more compelling. You had per hour
pricing on the order of $20 for a minimal cluster. You had unlimited
autoscaling so you didn't have to do capacity planning and go through
procurement processes to pre-order hardware/licenses. And of course you had
unlimited, ultra-cheap storage courtesy of S3. And it also allowed each team
to have their own mini-cluster instead of everyone relying on some giant one.

I don't think the recent explosion in Data Science would've happened without
the cloud.

~~~
disgruntledphd2
Which is hilarious, given how expensive it is to run analytics on AWS. My
current company looked into recreating our internal analytics environment on
the cloud, and realised that it wouldn't be cost effective.

------
atombender
Lots of interesting discussion here about billions of files and what not. I
don't really work with big data -- the largest amounts of data I work with are
in the range of tens of millions at most, and my modest "ETL" is about reading
from APIs and cleaning up incoming data into consistent schemas, refining data
(e.g. geocoding addresses, or correlating against public demographic
records/statistics), a process that is super fast and doesn't even require
multiple nodes for the most part -- so I never encounter anything I can't
solve with some Postgres or Redis. What I'd love to know is what everyone is
actually _doing_ with their huge clusters. I can understand the need for big
data in hard sciences (CERN, or genetics, or similar), and of course machine
learning (e.g. image feature extraction), and then there are real time auction
use cases like ad servers that probably have some big data component. What
else are people doing out there?

~~~
heyyyouu
This is a great question -- I'd love to know this too. I watch this space from
the enterprise perspective and I see a lot of big data talk/prep but not so
much execution, if that makes sense.

------
kwillets
MapR has been reduced.

Hadoop was the floppy disk of big data. Ubiquitous, but always beat by other
solutions.

~~~
threeseed
This is simply not true. Hadoop is growing massively.

All that is happening is that the vendors are being killed by the cloud.

------
moonjoAWS
Any impacted engineers looking for a new home in the Bay Area or Seattle, feel
free to reach out moonjo@amazon.com to learn more about teams with Redshift.

------
Joeri
So, that would mean the hadoop distro market would be down to just cloudera? A
single vendor market is not a market, it's a legacy product.

~~~
russtrotter
AFAIK, AWS's EMR and Glue leverage Hadoop under the covers. Does that count
towards that market? and/or at least keeping Hadoop development active?

~~~
derefr
Hadoop ≠ Hadoop distro. Hadoop is fine and healthy; what is in question is
whether larger repackagings of it with other vendor-specific components is an
interesting product any more.

------
bitL
Weren't they killed the moment Spark appeared, having heavily invested in
accelerating Hadoop? I am surprised that Cloudera is still around...

~~~
SEJeff
And spark was sort of killed by Storm / Heron as even Spark Streaming does
microbatches and isn't truly realtime.

~~~
Dunedan
Wasn't it the other way around? Wasn't Storm killed by Spark? At least once
Spark was out and gained traction, nobody in my bubble talked about Storm
anymore. But I guess it depends on the bubble you're in.

~~~
brokensegue
yeah their timeline seems backwards

------
Communitivity
This is sad. I remember the MapR demo at the NVidia GTC a couple years ago,
and their demo was amazing. I hope they pull a rabbit out of the hat and are
able to continue, but if they don't then I hope they Open Source as much as
they can.

------
purplezooey
I think they'll be fine. At my last company we had a large MapR cluster and it
was hands down more reliable and user-friendly than anything else. Maybe
Cloudera will pick them up.

~~~
jrg
We've used them as a more reliable/resilient/replicated (with a lower
operational effort) provider of HBase tables and Kafka streams, and a bit of
file storage.

------
kod
Hope they're able to open source some of their tech if this does come to pass.

