1. How does BlazingDB compare to MapD ?
2. How do you skip ingest - are you using Apache Arrow like Dremio as an efficient data representation format?
3. Do you have any benchmarks, or maybe where would you see BlazingDB on this list? http://tech.marksblogg.com/benchmarks.html
using same hardware as MapD ?
4. Can you run your solution in a cluster? :)
2) We read parquet files into our own caching system. We often use arrow apis though do not rely on arrow for our data representation.
3) We have made client side benchmarks but have not performed a standardized replicable benchmark for people to validate yet. We have been a VERY small team to date and are going to make that available as soon as we can. You CAN launch AWS marketplace blazing instances to see how it performs.
4) You sure can. A large part of BlazingDb's focus is on distribution. You can add nodes during runtime.
Like us you're kind of strattling the line between the hadoop ecosystem and GPUs. Mesos does this as well.
It seems like most folks in the GPU space still don't get Hadoop or S3 as a data source yet though (despite it being the dominant source of data warehousing S3 or otherwise).
How are you guys coping with this? Have you found a different experience than me?
It seems like a lot of the big data companies are at least adding some sort of gpu management as a checkbox now so the spaces will likely converge with or without the MPI crowd.
The other trend I'm seeing is you have folks like MapD and Kinetica trying to run the whole stack themselves. I'm not sure how well that will go overall (especially given how you can't really have everything in 1 warehouse typically). Could you comment on this? Are you going to try this as well after integrating?
I hope to see more companies in this space actually exploring this intersection.
Many make the mistake of boiling the ocean and doing nothing well though. Due to that it ends up being consulting. How will you guys over come that? Reworded, what is the initial focus vs the long term strategy?
Other playrers taking the whole stack)
When you say other players are trying to run the whole stack themselves it depends on what you mean. From my understanding Kinetica is an in memory solution for example. How do you query a 100TB data set? Or a 1PB dataset? They are instead going after a specific kind of problem so I wouldn't consider any of us to be trying to run the whole stack. This is usually left up to behemoths and none of us are that in the GPGPU database space yet.
As far as where our focus is. We want you to be able to use blazing to accelerate query workloads more and more on data where it lies how it lies, You can ingest into blazing for maximum performance but the big difference between us and some of the other gpu players is that our focus is on working on datasets that leverage multiple understores (e.g. hdfs, s3, azure file share) and use multiple file formats. Right now the only two file formats we operate on natively are Simpatico (our own file format) and Parquet. Others have to be ingester but we are working on adding more file formats tha we can interact with efficiently in place without the need for ingestion.
As far as trying to take over the whole stack. Blazing wants to be a lean but powerful development shop. We want to focus on our core competencies and leverage other brillliant technologies when possible to avoid having to manage everyrhing. We don't want to build new file systems. And we think that there is a really big value proposition in helping people analyze data where it already lies.
As per your last question.
This is not our first pajama party when it comes to startups and something we have seen is that early on it feels very similar to consulting when you are establishing product market fit. You adjust your product to the immediate needs of the people you are interacting with. That road can be a troublesome one if you end up spending all of your time configuring other toosl and products and not developing your own. Keeping a tight handle on the funnel early on will help ensure that you have enough resources to work on your main value proposition and also increase the likelihood that the few engagements you do take lead to long term revenue and licenses (if that is how you make your money).
In the long run my main strategy is try to do as little as possible. Try to do it really well.
Re: Kinetica. They are doing visualization as well as things like machine learning.
I like your guys' approach a lot better. I see a clear bridge from the mass market commodity storage to something useful actually leveraging GPUs. That's why I commented. Best of luck to you!