
GPU Open Analytics Initiative - jonbaer
http://gpuopenanalytics.com/
======
blueyes
Let me get this straight: H2O's contribution to this project is a bunch of
deep-learning libraries that they didn't create? How will they maintain and
extend that? Do they know how credibility in open-source works?

H2O wrote its random forests and GBMs, neither of which show large performance
gains on GPUs compared to neural nets. And the neural nets in this initiative
were created by Google, Berkeley, Microsoft... This looks like the OpenStack
of machine learning, an unmaintainable project that will crumble under its own
complexity.

~~~
aub3bhat
I really don't understand your arguments, let alone the overt negative tone
here?

>> 1\. How will they maintain and extend that?

Umm just like how everyone else does it.

>> 2\. H2O wrote its random forests and GBMs, neither of which show large
performance gains on GPUs compared to neural nets.

What does your statement even means? [1]

>> And the neural nets in this initiative were created by Google, Berkeley,
Microsoft...

You mean libraries?

Note: NOT an employee of H2O or anyway affiliated.

[1] [https://devblogs.nvidia.com/parallelforall/bidmach-
machine-l...](https://devblogs.nvidia.com/parallelforall/bidmach-machine-
learning-limit-gpus/)

~~~
blueyes
I'm negative on marketing fluff. You're free to enjoy it of course.

>> Umm just like how everyone else does it.

Wrong. Their ability to maintain and extend any of those libraries is limited.
TensorFlow is hundreds of thousands of lines of code. Google can maintain and
extend just fine. I doubt that H2O has the chops. That matters to their
clients, because TensorFlow breaks things with new releases, and those clients
will want someone to fix it. Do you call H2O? Do they patch Google into the
conference line? I'm saying they're not in a position to offer commercial
support. Anyone who has worked in enterprise will see the risks here.

>> What does your statement even means?

The algorithms that H2O specializes in do not benefit that much from the
massively parallel computation that GPUs offer. If you don't understand the
differences in the computation required by neural nets as opposed to random
forests, you will not understand my critique. Neural nets offer superior
accuracy on many problem sets and often need GPUs. Random forests do neither
of those things.

H2O, for most of its long existence, has operated purely on CPUs just fine.
But GPUs and deep learning are hot, so they are associating themselves with
the buzzwords, even though they are not really in a position to a) build deep
learning solutions or b) make their real product perform well on GPUs.
Marketing fluff.

>> You mean libraries?

Why yes, I mean libraries. Thanks for clearing that up.

There's a lot of noise in the deep learning and AI space. Watson is one
egregious example. H2O is another. While the technology itself is promising,
some of the vendors are dubious.

~~~
arnon
In the end, most AI/DL implementations in products are fluff. It feels like a
lot of products are about "HEY LOOK AT ME, I CAN DO IT TOO!", which screams
immaturity of the concepts.

From personal experience, many customers ask about our capabilities - but when
it actually comes to using any of them there are only a handful of actually
relevant ML algorithms that are useful, and those used to be called
'statistics' up until fairly recently.

------
jordigh
I was hoping this was going to be about replacing CUDA or fostering the
development of clover, pocl or other OpenCL implementations. Not sure how open
your analytics can be when it's all overwhelmingly dependent on binary blobs.

------
aub3bhat
Is there any documentation regarding how this allocates / deallocates GPU
Memory. While developing Deep Video Analytics [1] one of the biggest issue I
have faced is TensorFlow refusing to deallocate GPU memory [2] (freeing up for
other processes to use) unless the process is killed/exits. This in
combination with allow-growth allocation strategy used by TF makes it
unpredictable to run mixed loads. E.g. I would prefer to run a Detector,
Indexer along with some computation in a GPU data frame etc. at same time on a
single GPU.

[1]
[https://github.com/AKSHAYUBHAT/DeepVideoAnalytics/](https://github.com/AKSHAYUBHAT/DeepVideoAnalytics/)

[2]
[https://github.com/tensorflow/tensorflow/issues/1578](https://github.com/tensorflow/tensorflow/issues/1578)

~~~
jacquesm
Funny I was reading about that yesterday, here is a snippet that might help
you if you do not need all memory in one process:

    
    
      options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
    
      session = tf.Session(config=tf.ConfigProto(gpu_options=options))
    

That way you limit how much the GPU can use up front and then other
applications can use it instead.

Of course if you need all of the memory during one phase of the computation,
want to free it and then re-allocate it at a later stage you are probably out
of luck.

But if your pipeline would normally allow for all tasks to run with limited
memory you might get away with this strategy at the expense of some speed.

Good luck!

~~~
aub3bhat
I use per_process_gpu_memory_fraction extensively [1], but when you are
writing code that runs on different GPUs (Titan vs 1080 vs 1070 vs K80) due to
change in the total memory available you run risk of under allocating or over
allocating GPU memory.

[1]
[https://github.com/AKSHAYUBHAT/DeepVideoAnalytics/blob/maste...](https://github.com/AKSHAYUBHAT/DeepVideoAnalytics/blob/master/dvalib/indexer.py#L123)

~~~
jacquesm
Ah yes, I get your problem now, you have to deal with changing environments,
not just your own setup to solve a particular problem. I see. That makes it a
lot more complicated. Is there a way to fold the pipeline into a single
process?

Not pretty from a modularity point of view but it just might get the job done.

~~~
aub3bhat
I am using Celery workers which run a single tensorflow model in solo
concurrency mode. The current workaround is to keep track of per process gpu
fraction and then incorporate information about Total GPU memory available
when assigning workers/celery process to individual GPUs.

~~~
jacquesm
Right, so essentially you are doing per process memory management because the
layers lower down aren't smart enough to handle this transparently. I'm fairly
sure that the nvidia libraries do support releasing memory and from what I see
on my own machine memory usage of the GPU does go up and down when running
longer jobs. So maybe it isn't so much a structural problem as a problem in
one particular module that you exercise a lot?

------
tmostak
For more context Timothy Prickett Morgan of Next Platform wrote about the
Initiative a few days ago: [https://www.nextplatform.com/2017/05/09/goai-
keeping-databas...](https://www.nextplatform.com/2017/05/09/goai-keeping-
databases-analytics-machine-learning-gpu/) .

------
En_gr_Student
Internal work filter kills the link. Is the website on a blackhole list
somewhere?

~~~
glenneroo
You're not missing much, just some introduction text and a link to their
GitHub here:
[https://github.com/gpuopenanalytics](https://github.com/gpuopenanalytics)

Here's the full website text:

GPU Open Analytics Initiative

Continuum Analytics, H2O.ai, and MapD Technologies have announced the
formation of the GPU Open Analytics Initiative (GOAI) to create common data
frameworks enabling developers and statistical researchers to accelerate data
science on GPUs. GOAI will foster the development of a data science ecosystem
on GPUs by allowing resident applications to interchange data seamlessly and
efficiently.

GPU Data Frame

Architecture

Our first project: an open source GPU Data Frame with a corresponding Python
API. The GPU Data Frame is a common API that enables efficient interchange of
data between processes running on the GPU. End-to-end computation on the GPU
avoids transfers back to the CPU or copying of in-memory data reducing compute
time and cost for high-performance analytics common in artificial intelligence
workloads. Users of the MapD Core database can output the results of a SQL
query into the GPU Data Frame, which then can be manipulated by the Continuum
Analytics’ Anaconda NumPy-like Python API or used as input into the H2O suite
of machine learning algorithms without additional data manipulation.

Plus a list of founders: Anaconda, H2O.ai and MAPD.

------
lmeyerov
Our team at Graphistry is involved on bringing this to the world of event &
graph analytics, and to full-stack interactive visual analytics, so happy to
answer questions about that!

------
crudbug
Does GPU has full access to DRAM ? Or is it a 2-level access - DRAM then GPU
RAM ?

~~~
sgt101
Hello - Interesting to think this through! The GPU has its own bus and I think
sends data to its cores according to the CUDA instructions that it is given.

I'm hazy about how instructions get to the GPU... I guess the cpu us is
executing the OS and the then gets instructions to load the program from disk
and then into the execution space in the GPU's memory. I think that the CPU
then pushes data into the GPU's memory space for execution and reads results
out from the GPU memory according to synchronisation instructions in its
program?

------
pplonski86
What is data size (how many GB) that needs such frameworks?

~~~
anilshanbhag
Anything from 10GB to 1TB works well with these systems. They can scale beyond
1TB if you use multiple machines.

The catch is GPU memory is more expensive than CPU memory. You only want to
use it if: 1) Your workload is compute heavy - think machine learning OR 2)
You care about speed i.e., you want queries to be interactive.

~~~
arnon
With these in-memory DBs, scaling beyond 1TB becomes prohibitively expensive
for most companies.

~~~
tmostak
Note that MapD is not fully in-memory. It can pull from CPU Memory and SSD as
necessary. So its essentially as fast as the fastest disk-based database at
its worst and likely two orders-of-magnitude or more faster in many common
scenarios where certain subsets of the data are repeatedly queries (for
example, BI workloads or when repeatedly pulling out subsets of a dataset to
train a model)

