
Communication Logs: Handling 250M SMS messages a day - Parseco
http://www.infobip.com/engineering/communication_logs/
======
Cieplak
They mention using Neo4j instead of a relational database. Recently I needed
to implement a friendship data model, and started off using Neo4j because it
seemed like the best tool for the task, but ended up using essentially the
following model in Postgres:

    
    
        CREATE TABLE friendship (
         person1 INT NOT NULL,
         person2 INT NOT NULL,
         PRIMARY KEY (person1, person2),
         CHECK (person1 < person2)
        );
        CREATE INDEX friendship_person2 ON friendship(person2);
    
    
        CREATE VIEW friendship_view AS
        SELECT person1 AS person, person2 AS friend 
        FROM friendship
        UNION
        SELECT person2 AS person, person1 AS friend FORM friendship;
    

taken from here: [http://www.postgresql.org/message-
id/20141111201127.d80b6bc4...](http://www.postgresql.org/message-
id/20141111201127.d80b6bc4c2a6970a6b941dec@potentialtech.com)

What do you folks think? It seems like this would be the densest way to store
the relationship, using the checked constraint trick with the IDs so you don't
need duplicate records to store Alice and Bob's friendship as {A -> B, B -> A}
if the friendship is inherently bidirectional (vs a followed/follower model),
and doesn't preclude one from storing the data in another isomorphic structure
that might be optimized for specific queries. I'm sure it's possible to
replicate this with Neo4j, but it felt cumbersome programming that in Java
versus expressing that in SQL. But I'd really like to use Neo4j in an OLTP
environment, but still haven't had enough of an impetus yet because WITH
RECURSIVE in postgres works well if the recursion depth is capped.

~~~
yid
Your `friendship` table is a simple adjacency list representation for an
undirected graph. As you point out, the main trouble in using a relational
database for a true graph model is to process graph traversals, which you can
do with nested inner queries, joins, or Postgres' WITH RECURSIVE. A graph
database, for that reason, is usually optimized as a very efficient sequential
JOIN machine, where you can imagine successive JOINS as connecting the
endpoints of edges along a graph path.

By the way, if you're interested in graph representations for efficient
querying, check out TripleStores [1], and the Wiki list on subject-predicate-
object databases [2].

[1]
[https://en.wikipedia.org/wiki/Triplestore](https://en.wikipedia.org/wiki/Triplestore)
[2] [https://en.wikipedia.org/wiki/List_of_subject-predicate-
obje...](https://en.wikipedia.org/wiki/List_of_subject-predicate-
object_databases)

~~~
Cieplak
Thanks! I was just reading the neo4j source and noticed the bitmap indexes
which apparently have several advantages to B-tree indexes for indexing
graphs:

[https://github.com/neo4j/neo4j/tree/3.0/community/lucene-
ind...](https://github.com/neo4j/neo4j/tree/3.0/community/lucene-
index/src/main/java/org/neo4j/kernel/api/impl/index/bitmaps)

ps:

[http://stackoverflow.com/questions/9541541/b-tree-vs-
bitmap-...](http://stackoverflow.com/questions/9541541/b-tree-vs-bitmap-
database-indexes)

[https://en.wikipedia.org/wiki/Bitmap_index](https://en.wikipedia.org/wiki/Bitmap_index)

Thanks for showing me triplestores! Good to know that it's better to use a
database engine optimized for triples for graphs vs rolling your own in SQL.

------
gue5t
Worst-case, you'd have to process input at a sustained rate of
(140×250×1000000)/(24×3600) = 405093 B/s.

I certainly hope your system can manage 400kBps throughput.

~~~
jzwinck
You're assuming a constant arrival rate and zero overhead. Both of which are
definitely not accurate.

As a wild guess, you're off by 1000x. You may say 400MBps is still not too
crazy, but it's beyond the real capacity of many systems out there.

~~~
gue5t
Of course those aren't sane assumptions, but if the peak rate were the
significant information, why would they title the article with the per-day
rate? It smells like someone decided "250 million is a large enough number to
scare people". Fuck that.

Even if peak "instantaneous" flow is 400MBps, considering that SMS already has
seconds to minutes of latency, a bit of buffering shouldn't be a big deal.

They start the article by saying they make six (not necessarily serial)
network roundtrips for each message which indicates that something is horribly
wrong with the design of their system. Their code needs to bill each SMS to
the sender, send the SMS to the telco, and log successes and failures in a way
that's searchable later. The rest of the article describes their terribly
overwrought approach to these tasks. With so much enterprise software and so
little actual discussion of the problem domain, there's no way they're butting
up against anything interesting enough to merit a HN submission.

------
elcct
Nice work, but to be fair 250M msgs per day is not at all impressive.

------
pbreit
What's Infobip pricing for numbers and SMS in US?

