
E-commerce Fraud Facts - jasontan
http://blog.siftscience.com/five-ecommerce-fraud-facts/
======
ekanes
This is awesome stuff. Theoretically. But Sift doesn't actually make these
functional/actionable right away through their service (even though they
could). We signed up, love (LOVE!) the idea, but they keep asking for more
data before returning meaningful results.

Their home page says, "Get going in minutes: Integrate in just three steps:
paste a Javascript snippet onto your site, log transactions from your servers
to our REST API, and send examples of banned users." But it isn't so. They
require a long-term in-depth model of your site/usage including multiple
fraudulent examples (what if you've mostly solved fraud?) before returning
meaningful results.

According to this post and previous posts, they should be able to return
meaningful results with very basic things: time of transaction, email address,
etc. They should start off with: "Here's our recommendation, but it's based on
limited information so we feel X strongly about it." Instead they say: "Give
us more information, we can't help you yet."

~~~
jasontan
Hey Aaron, I'm the OP, and CEO of Sift Science. Sorry that your experience has
been subpar. I'll follow up offline.

I think there's some confusion -- we actually do have a global machine
learning model, which is what customers will start with if we haven't received
any labels (training examples to learn from). But, your mileage may vary with
the global model. We think it's a starting point, but that fraud differs from
site to site in subtle ways, and to be truly effective you need to have your
own model. This is why training examples are so critical. Once we receive
enough training examples, we're able to build a model specific to your site.

That said, I'll be the first to agree with you -- we haven't done a great job
setting the right expectations and messaging, especially on the point above
(that your mileage may vary with the global model). We've been working hard to
improve this, and there are now several reminders in the console, and an
introductory tour, that emphasize the importance of a thorough integration
with labels to achieve best results. With machine learning especially, bad
data in = bad results out. Also, our first UI/UX designer started a month ago,
and improving this experience is a top priority.

With regards to making fraud facts more functional and actionable -- we
actually do provide reasons we think a user is suspicious in our web console
and API (see [https://siftscience.com/docs/getting-
scores](https://siftscience.com/docs/getting-scores)). But, there isn't an
"aggregate learnings" page like what's presented in the blog post -- that's on
the 3-month roadmap.

Again, sorry for the subpar experience, and I'll follow up with you
separately. We have a very technical product -- we're working hard to abstract
the complexity, while making sure we set customers up for success. We've also
been overwhelmed with customer demand, and have been scaling the team to keep
up (7 FT employees two months ago, 15 now). That said, no excuses. Just know
that we're not happy to hear about experiences like yours, and are working
hard to make it right.

Thanks, Jason

(EDIT) Aaron and I just had a good phone call -- we're going to make things
right.

~~~
ekanes
Thanks for the call, for listening, and keep up the great work!

------
larrys
With domain name registration the factors that we have noticed that are almost
certainly fraud orders are (in various combinations):

1) credit card payment is all lower case and/or obvious non understanding of
how US addresses are formatted

2) domain name has "hack" or some foreign sounding word. Or is anything
related to vietnam (get plenty from vietnam)

3) IP location doesn't match customers location

4) Multiple attempts in a row with different credit cards

5) Registrant name doesn't match the name on the credit card and/or address

6) Customer name doesn't relate to email address used in any way.

Once again no one factor is definitive usually but a combination of several
together almost always indicate a fraud order.

Those are off the top there are more. Bottom line is when you simply look
visually at the orders you can tell with near 100% certainty that an order is
fraudulent.

Otoh, here is a fictional example of an order that wouldn't appear fraudulent
at all:

domain: bobspartycity.com

Registrant: Bob Wagner Address: 76 Walnut St., Williamette IL
bobspartycity@gmail.com And IP is in that vicinity etc.

...etc. It could be of course but we've never had a case where a fraudster
puts much effort into faking an order using knowledge of what we look for.

~~~
jusben1369
Agree with Vietnam. As a % of fraud I'd say 70% is Vietnam. And few if any
legitimate orders from Vietnam. In our experience.

~~~
_mulder_
Interesting, any reason for 70% exactly? Os is that just on your personal
experience? I'd have thought Nigeria and East Europe would have been higher.

------
svmegatron
The patterns that emerge for fraudulent orders are amazing. And, as the
article notes, often specific to a particular merchant. Fraudulent orders
often come in waves lasting up to several months, and pattern recognition can
be particularly helpful in identifying parts of those longer waves.

I'm also working on a project in this space -
[http://www.merchantprotector.net](http://www.merchantprotector.net)

~~~
larrys
(I upvoted you but I might have clicked the downarrow instead).

On your site this phrase:

"We used to have a problem with fraudulent orders."

My suggestion is that the following is much clearer:

"We no longer have a problem with fraudulent orders."

Also the other info I would present in a less negative way. People tend to
respond better (imho and experience) to a message presented in a positive way
rather than negatively.

So for example rather than:

"Fraud can kill your business" I would say:

"Reduce Fraud and Make more money".

Instead of "Stop the cycle of worry" I would say:

"Sleep at night and make more money"

etc.

Obv. there are many twists to this that's just two examples.

~~~
svmegatron
Thanks for the feedback! Those are great ideas, I will test them out!

------
joshuahedlund
We've found that good indicators include: a large distance between billing and
shipping addresses, a large distance between estimated IP location and billing
address, large order size, using a free email like gmail/yahoo/hotmail (that's
the smallest of the factors, but virtually all of our fraud orders use them).
Even combining these and others with a threshold, it's still hard to reliably
detect without too many false positives.

~~~
ecopoesis
At a previous gig we found the same basic factors. I wrote a quick script to
iterate through all the available Weka[1] classifiers using our manually
flagged data as a training set. Then I took the top 20 performing ones and
used them on incoming orders in production. If more then half the classifiers
agreed a transaction was fraud, we denied it. Though this seems a very blunt
hammer (I'm not a machine learning expert by any stretch) it worked remarkably
well.

[1]
[http://www.cs.waikato.ac.nz/ml/weka/](http://www.cs.waikato.ac.nz/ml/weka/)

~~~
rogerbinns
What happens to the false positives - ie the ones you denied that were real?
Do people get a way to prove they really are legit?

(Matt Cutt's blog claimed I was a spammer when I made a comment about two
factor authentication - I have no idea why. It told me to email the
administrator to get it accepted but of course provided no clues on what
address to use etc so I didn't bother. I'm betting the anti-spam software is
claiming a victory when it was actually a failure.)

~~~
ecopoesis
For any denial we popped a "technical issues, please call customer support"
message. We found that fraudsters were far less likely to call then real
customers.

------
twilightfog
Fraud "facts" like that applied in a blanket fashion would frequently flag
international customers, 3 of the 5 rules listed apply to me.

~~~
svmegatron
A US company selling internationally will have to be very careful, especially
at first, applying any "fraud best practices."

One of the (apparent) advantage to the OP's service is that there is a built-
in learning component, presumably tailored to your particular store. That
should help quite a lot with recognizing patterns unique to an individual
situation.

------
pdog
Are you familiar with ensemble methods and boosting algorithms?

How does Sift Science combine multiple signals like these (which individually
are pretty weak) into one fraud detection system with a high level of
predictive accuracy?

------
pytrin
We really wanted to like Sift, as we suffer from a substantial amount of fraud
attempts (as most business who sell digital products). However, their model is
not a good fit for eCommerce sites, and that's a shame - it seems to be built
specifically for services marketplaces like AirBnb, where there is typically a
time delay between payment and service provision.

I exchanged a few Email with their support previously, and there is no way to
get real time fraud scoring. I would expect to receive in the response to a
transaction event the risk score associated with it - something similar to
what the Minfraud service does (which we use), but taking into account more
factors since they collect more data via their Javascript API.

One can only hope they'll offer this capability in the future, and we'd be
glad to try it out again.

~~~
wallywax
E-commerce that deals with physical goods does have a time delay, so I think
you're really trying to say that it's not a good fit for digital goods sales
rather than that it's not a good fit for e-commerce...

~~~
aylons
Thanks for pointing this. I was really confused with the parent post.

------
C1D
This doesn't seem really smart since some of those apply to me. I am not from
America and I sometimes order from there meaning it would seem like I'm
ordering at 4am when its 2pm my time. Another thing is the email, I know a lot
of people with their birth year inside their email. What about them?

~~~
stephenlambe
Sift intern here.

We use the time zone of the customer rather than of the website for scoring
riskiness. Sorry if that wasn't clear.

As for customers including birth years in their emails, you're correct that
this would be counter to the general trend. However, a fraud detection system
like Sift has many data points on which to score a customer. Hopefully, the
person would otherwise look benign and thus not have a very high overall
score.

~~~
C1D
Thanks for clarifying that. I though it used the server time.

------
smoyer
"It might turn out that size 10 shoes are more fraudulent than size 15 shoes."

There should be a pretty limited list of mailing addresses that would order
size 15 shoes (Shaq's house?) and the black-market for reselling them must be
a lot tighter.

------
dminor
So for something like "fraudsters don't use capital letters," does your system
discover a fact like this automatically, or do I have to think up these
indicators myself and hope they are relevant?

~~~
stephenlambe
This was discovered automatically. That's one benefit of a machine learning-
based system: you feed it a lot of data and tell it when fraud actually
occurred and it adapts its rules predict fraud accordingly.

------
jusben1369
Did anyone else find it odd that the 2am and 4am times weren't qualified? (US?
East Coast/West Coast?)

~~~
ekanes
I think they're actually quite sophisticated about it - they look at the time
zone according to the shopper's browser. So they're looking at local-to-the-
shopper.

~~~
jasontan
Yep! We try to localize the time.

~~~
jusben1369
So the fraudsters in Vietnam are trying to time it for 2am to 4am in the local
market of the site they're defrauding?

~~~
stephenlambe
Not quite...an order is more likely fraudulent when it was placed at 2-4am
local time. Local time for the fraudster (in Vietnam or wherever else), not
local time for the site they're defrauding.

------
AsymetricCom
What exactly is a "fraudulent order" anyway? Someone has a credit card, pays
you and you send the product. Where is the fraud? Isn't it external to the
company or service? If someone steals a credit card and makes an online
purchase, isn't that the responsibility of the card company in securing its
account proxy more fully?

~~~
stephenlambe
E-commerce fraud takes many forms. Three main types of fraud impact merchants:
payment fraud, new account fraud and account takeover. We described all three
recently at Sift in a blog post: [http://ow.ly/oPrS2](http://ow.ly/oPrS2).

In the stolen card scenario you describe (a type of payment fraud), the
e-commerce merchant is actually liable. In other words, if that card is
reported stolen by the cardholder after the goods are shipped out, the
merchant loses the revenue. This is b/c for online credit card transactions,
they are categorized as "card not present" transactions, since the merchant
can't be as certain the actual cardholder made the purchase. If the
transaction had occurred in a physical store, the card company would be
liable.

