
Using Graph Databases to Stop E-commerce Fraud in Real Time - bwmerkl
http://neo4j.com/blog/graph-database-ecommerce-fraud/
======
maxdemarzi
If anybody wants a little more code with their blog post, here is another
example [http://maxdemarzi.com/2014/02/12/online-payment-risk-
managem...](http://maxdemarzi.com/2014/02/12/online-payment-risk-management-
with-neo4j/)

A proper fraud detection system has lots of pieces. The one the customer was
having trouble with was Cross Reference in real time. Imagine 10k requests per
second coming in, you can use cross reference to flag transactions for further
analysis or as a partial weight to more fraud detection algorithms.

------
ryanlol
Despite what the article seems to claim, it appears to be describing the most
basic anti-fraud measures which are easily defeated by basic fraudster tools
and techniques.

>User ID

1 account per card per IP, basic stuff.

>IP address

Vip72 is a pretty basic example of a commercial solution, more advanced people
either source fresh IPs from bots or scan for SSH tunnels.

>Geo location

While not as trivial to defeat, most "serious" fraudsters either operate their
own mule networks or pay for access to one. There's no lack of stupid people.

>A tracking cookie

lol?

>Credit card number

There's no lack of credit cards, even if you're paying $30 for a no-AVS card
with full VBV info that still costs less than a TV.

~~~
chatmasta
This is a blogpost intended to demonstrate the usefulness of applying neo4j to
a specific problem. All the challenges you cite are valid, but they do not
negate the value of using neo4j in this scenario.

You can solve these problems by adding to the list of identifiers. Off the top
of my head: browser fingerprinting, request latency, IP subnets, IP ASNs, hsts
super cookies...

Stopping fraud is always going to be a cat and mouse game. The goal is not to
eliminate fraud completely, but to make it so difficult for fraudsters to use
your service successfully, that they would rather move to the next target. If
a fraudster is specifically targeting your service, they _will_ get around any
blockades you put in front of them. But you can still make it as hard as
possible to circumvent those blockades.

If you deter a high enough percentage of fraudsters, you mitigate your risk of
chargebacks. That risk will never be zero. But you should certainly do all you
can to minimize it.

~~~
happywolf
But the tone of the blog post implies this graph thingie is _the_ solution:

"How graph databases stop fraud e-commerce frauds in real-time" \- note the
use of 'stop' and 'real-time'.

"Fortunately, graph database technology is able to detect the patterns that
arise around these e-commerce fraud scenarios and put an end to them in real
time"

Again, look at the terms used: 'put an end' and 'real time'.

I am not trying to nit-pick, but what has been discussed is far from being
able to put an end to fraudsters. Maybe this could weed out some of the newbie
frausters, but that is it.

Disclaimer: I work in the payments industry.

------
assface
Why can't you do this in a relational DBMS? Recent research says it that would
probably be faster.

The Case Against Specialized Graph Analytics Engines

[http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper20.pdf](http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper20.pdf)

~~~
ryguyrg
(disclosure: i work for neo4j)

This paper is a bit unrelated to Neo4j. Neo4j is an ACID-compliant native
graph database. It is not a "specialized graph analytic engine."

Neo4j stores the graph data on disk (and caches in memory) as nodes and
relationships. After index lookups to find the start points in the graph, all
traversals of relationships are done in constant time -- allowing it to scale
with linear performance characteristics, regardless of the size of the graph.

The referenced paper cites two main reasons that RDBMS would be better for the
analytics use cases: (1) ability to express graph queries in SQL and (2)
performance of executing those queries.

(1) Cypher is, like SQL, a declarative language. However, it represents graph
constructs in a much more natural way -- "ASCII art for graphs." There's
significant praise from developers on the web of the benefits of Cypher for
traversing graphs, which is why we decided to open up the language:
[http://www.opencypher.org/](http://www.opencypher.org/)

(2) As Neo4j isn't really intended as an analytics engine, its performance
characteristics are not included in this paper. However, (expensive) indexes
do not need to be created and maintained for every relationship in Neo4j.
Similarly, these indexes do not need to be accessed for traversal (also
expensive).

------
lisper
I don't get how fraudsters can create all of these fake identities. Every cc
and bank acct application I've ever filled out has asked me for my SSN. Surely
the banks check on that before issuing a card? How can n fraudsters get n^2
valid ssns?

~~~
ryanlol
While synthetic identity fraud happens (you can use fake SSNs!), this isn't
intended to stop that kind of fraud.

This is intended to stop people from using other peoples credit cards, not
their own cards they just applied for as someone else.

~~~
lisper
That still doesn't make any sense. The modus operandi here (I thought) was
that N fraudsters mix-and-match their actual addresses and phone numbers with
fake names in order to open N^2 fraudulent accounts, which they then operate
legitimately for a while before pulling the scam. So they can't be using other
people's cards because the fake identities don't really exist.

