
1.1 Billion Taxi Rides with MapD & 8 Nvidia Pascal Titan Xs - walterbell
http://tech.marksblogg.com/billion-nyc-taxi-rides-nvidia-pascal-titan-x-mapd.html
======
Hydraulix989
MapD is doing some fascinating stuff, but I'm really skeptical about the
business need for speed.

I was working on a similar concept (GPU-accelerated Hadoop MapReduce) at the
time, and everyone I talked to in the Valley (and believe me, I knocked on a
lot of doors) was perfectly happy with their Hadoop jobs running overnight or
in an hour instead of instantaneously. It also amazed me how many people were
using Hadoop only for the distributed storage.

Maybe data scientists love taking coffee breaks...

In the end, I scrapped the project because nobody wanted it -- everybody said
they were only doing I/O bound work, which my solution didn't really
accelerate.

Interestingly enough, I talked to people from Fusion.io (data center PCI-E
SSDs) who mentioned that everybody told them that they didn't need Fusion
because they were only doing compute bound work.

A YC company, BigCalc, was trying to do a similar thing with accelerating R
(by rewriting library routines in highly-optimized pure C), targeting hedge
funds, but suffered a similar fate as my map reduce project.

Shockingly, people just don't need speed.

~~~
semi-extrinsic
I think your conclusion can be amended to "the people who are willing to pay
for more speed get the most value for money by creating or buying an
application-specific solution rather than a generic one".

Like your hedge fund example: why pay for accelerated R, when you can spend
just a little more money on hiring a low-level C/assembly programmer instead
and get _50x more_ speedup?

It's not just startups, big companies suffer the exact same problem. Look at
Portland Group's OpenACC initiative, for instance, which intends to make GPU
acceleration as easy to use as OpenMP, with pragmas etc. But everyone
interested in GPUs just go and run straight CUDA or OpenCL code instead, there
are essentially no "casual GPU acceleration users". Also see Cray Chapel.

~~~
Hydraulix989
This is especially true for Wall Street. I know some hedge funds are even
working with FPGAs and ASICs.

~~~
seanp2k2
[http://algo-logic.com](http://algo-logic.com) FPGA network cards that run
business logic on order books before they even get into the CPU. Pretty hard
to beat an approach like that. By the time you've executed a single
instruction, you've already lost the race :)

Edit: spelling

------
tmostak
Here's a picture of the server the benchmarks were run on!

[http://imgur.com/a/dMzU9](http://imgur.com/a/dMzU9)

------
antisthenes
I was expecting a visualization of the taxi routes overlaid over a map,
perhaps helping the city optimize transportation or show the most popular
routes during times of day.

Instead it reads like a marketing blog from Nvidia with some fairly
meaningless benchmarks in fractions of a second.

~~~
marklit
Todd Schneider already did a bunch of analysis on the data. I simply wanted to
use the dataset to compare various OLAP engines.

[http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-
ta...](http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-
uber-trips-with-a-vengeance/)

------
paulasmuth
Also check out this blog about internals of massively parallel databases such
as MapD (it talks about _why_ they are so fast) [0]

[0]
[https://news.ycombinator.com/item?id=12291767](https://news.ycombinator.com/item?id=12291767)

