
Ask HN: What to use for real time reporting? - ninjamayo
I have this problem and was wondering if anyone can help. I have a number of data feeds (CSV files, web APIs) that I would like to report on but can&#x27;t find anything open source or paid that can do the job. Basically I want to be able to attach any data feed to this system and build a simple report (select fields, couple of aggregations) which are then distributed to my team. The system could either be a desktop app or web based, don&#x27;t mind. Each member of my team would have different levels of access to this and the report would present the data in tabular formats, charts and pivots. I looked at Tableau and Qlikview but they are not ideal for real time reporting.<p>Any ideas if there is anything like that out there?
======
lsiunsuex
I wrote something along these lines at work to monitor servers and stuff.

A few cron jobs query Google Analytics, server cpu, ram and hd free space, a
specific mysql table total row count, etc... every minute and curl the results
to Firebase.com

A separate web app in AngularJS 1.x stays open on one of my monitors so I can
monitor the results all day.

Firebase can take in data a number of ways (curl, php, js, etc...) and with
it's realtime push - keeps the web app side of it fresh and current without
having to refresh the page every minute via some other means.

You could also access the firebase data via iOS / Android code if you wanted
to get fancy.

Works for us.

~~~
ninjamayo
Sounds like a neat solution but I guess you haven't heard something that is
out there that can do all that and we could use straight out of the box.
Problem is we have a lot of data feeds that will keep changing over time so
the tool needs a generic infrastructure to accommodate new ones.

I can't believe there is nothing out there. Oh well live and learn.

------
brudgers
I think there is some ambiguity of terminology around 'real time'. By which I
mean that aggregation implies a temporal window and batching data and
processing the data as a batch whereas really 'real time' implies right now
and a single datum and stream processing. The historical concepts are OLAP
[online analytic processing] and OLTP [online transaction processing],
respectively. More recent thinking is that batch is a special case of stream
(or vice versa depending on the task).

Architecturally, it is common to use pub/sub. Basically to ingest all the
input streams into a single log type data structure and allow services to make
calls against an API.

Boiled down to the simplest thing that might work:

1\. Write each event in the input streams as text to a log file [OLTP]

2\. Grep [maybe using tail] to find the interesting bits. [OLAP]

Where I am going is that the way a specific business is likely to solve this
class of problems tends toward integrating several tools rather than something
off the shelf because of issues like data integrity [i.e. it's bad for a bank
to drop data on the floor, it's ok for the comments section of a blog].

The second reason that integrating tools makes sense is that issues like data
access restrictions are more robustly handled at the operating system and
network levels [avoids duplicating a core OS functionality and specific
security/access configurations] I'd mention that maintaining data silos is
often more habit than necessity and providing full access to the data or only
ingesting data appropriate for full access probably simplifies the design.

Anyway, to me this looks more like software development than swiping a credit
card if the set of features is non-negotiable...although, I suppose the
consultant route might open up that possibility.

Good luck.

