

Ask HN: How to achieve reporting from the RDF data stored in HBASE? - jrphn

We have an application where we are storing all information as RDF triples in HBase. Now we need to do data analysis and make reports out of it.
The current approach is, we are extracting a part of data according to the need of reporting using map reduce and dump that data to a .tsv file and use that file for reporting. But the problem with this approach is :<p>1. The extracts run for a long time as it needs to traverse through all customers in HBase and give out put for reporting. As a result, for single requirement , we need to wait for a long time.
Though the performance can be made better using a good hardware configuration. Still it takes ~10 hrs to pass through all data. We do have Time constraint.<p>2. Sometimes, the data is so huge that, we can&#x27;t use MS excel for creating charts. So we use shell script to divide the data, but we need some tool to handle this huge data to create charts<p>3. Now the final objective is to show the reports or to create an interactive dashboard (containing filters with charts)<p>Guys, Can you please suggest any solution for above problems?
======
jerven
How many triples?

~~~
jrphn
I think, it depends on customers in database, currently, it is 2.7M customers.

~~~
jerven
If you are under 10,000 triples per customer, you are using the wrong
technology stack.

Either use RDF and SPARQL and leverage your graph potential or use relational
on HBASE, the current solution does not seem to work for you.

On the graphing side, learn to use R. It will serve you a lot better than
trying to make graphs with excel.

27 billion triples can easily be worked with using e.g. Virtuoso 7.2 on
commodity hardware. e.g. 256gb RAM+ 2Tb of consumer SSDs.

Your current setup does not leverage either the strengths of RDF or of HBASE.
Moving of HBASE and onto a SPARQL/RDF store will be less work and I expect to
have a higher ROI.

Remember SPARQL stores have improved by leaps and bounds in the last few years
so the home grown HBASE solution is no longer state of the art.

~~~
jrphn
Thanks for your Insight. I will surely keep it in mind.

