
Realtime funnel analysis using Solr and Cassandra - BlueHotDog2
http://blog.getjaco.com/analyzing-funnels/
======
ktamura
>For funnel analysis, it’s not feasible to use this data model for getting
back a summary of the funnel steps and the sessions matching it, since there’s
no option in Solr to run a recursive query, which would allow to go over each
session and check if it’s a match for the funnel.

I don't think this approach scales, even in an environment that supports
recursive queries like PostgreSQL.

The more scalable approach would be to use either a commercial database
systems with explicit support for pattern matching or encode conversion path
as a string (ex: "top page -> product page with SKU=1337 -> Purchase" becomes
"T_SKU1337_P") and use REGEX/GROUP BY.

In all cases, this sounds like a suboptimal use case for either Solr or
Elasticsearch.

~~~
itayadler
Why do you think this approach isn't scalable? would love to hear your input
on that. Also what commercial database systems do you think will be good for
this?

~~~
ktamura
The suggested approach most likely requires a lot of recursive backtracking.
Of course, there's an efficient way to implement this, and that's what most
commercial databases' path analytics features do. Here's one example by
Oracle:
[https://docs.oracle.com/database/121/DWHSG/pattern.htm](https://docs.oracle.com/database/121/DWHSG/pattern.htm)

I've always found it befuddling why so many developers want to use
Solr/Elasticsearch for analytics heavylifting. It's probably because

1\. SQL is not the most intuitive (although most pervasive) API for data
analysis

2\. Much of the data is already in Solr/Elasticsearch to make your data
searchable/perform simple roll-ups and filtering, etc., so it'd be great if
you can do more complex analytics against them as well

AS to why Solr/Elasticsearch is not ideal: the existence of superior
alternatives that is OLAP databases.

------
graffitici
Why do people use C* in addition to ES? It seems like in this case most of the
data could directly be piped into ES?

I understand that ES can lose data, or have some data storage problems, but
one could just as well store all the incoming data on Hadoop or so, without
having to bother with C*, no?

~~~
itayadler
C* makes it much easier to manage a cluster of Solr as the data grows
(specifically with DSE), as with the tight integration you get all the
benefits of C*. (HA, eventual consistency, multi-dc replication..)

