
Seven habits of highly fraudulent users - necubi
http://blog.siftscience.com/seven-habits-of-highly-fraudulent-users/?
======
IgorPartola
The night owl thing is misinterpreting the data. I am going to guess that the
more likely scenario is that at 3am there are simply less total transactions,
while fraudulent transactions stay at more or less the same level. Looking at
a total volume of fraudulent transactions vs the hour or the day would be more
helpful.

Another prediction: "fraudsters work on weekends" would yield the same graph
if displayed as a percentage of total transactions.

Edit:
[http://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statisti...](http://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics)

~~~
pyduan
It's also partly due to them tending to be more international, meaning they
are more likely to live in a different timezone.

The "outside of business hours" explanation is still a valid explanation but
definitely not the whole story. Overall, I'd caution that when building fraud
detection models understanding the stories behind the data is extremely
important, or you risk having an algorithm that works for the wrong reasons.

For example, if suddenly your user base because more international (for
example if you start allowing non-US users to use your website), you'll
suddenly have a lot of false positives if you're not cautious because to your
system it'll look like they're "operating outside of business hours".

~~~
grier
It's also only a good signal as long as you aren't actioning on it.

As soon as it becomes advantageous enough to fit in with the normal user
patterns, the attackers will modify their behavior accordingly.

The time-of-day, internationality, and many other signals mentioned in this
post are easily evaded when it becomes profitable to do so.

------
wdewind
I love these. Building fraud detection systems is so funny because the signals
themselves end up usually being really simple, it's just a matter of:

a) Having the tools to look at all your data to see where the patterns are

b) Having the tools to track instances of patterns once you've identified
them.

But as I said, the patterns themselves are usually pretty simple, as you see
in this article.

This one is a bit specific to Justworks since we use bank account numbers
instead of credit card numbers, but one we've picked up is the source of the
bank account is a great signal. If a company has bank accounts exclusively
from certain banks then they are almost certainly fraudulent. We've even seen
people try to sign up with consecutive bank account numbers from the same
banks!

~~~
aestra
This goes for other industries as well.

For example every insurance claim from any insurance company is run through
fraud detection software. Turns out there are some characteristics of a
fraudulent insurance claims that have been identified over the years. The
software can flag a claim as potentially fraudulent for further human review.

Discover Card's fraud detection worked extremely well in my case. My CC number
was stolen and after only two fraudulent transactions (total about $500) it
tripped the software to freeze the card. I have no idea how they identified
those as fraud so quickly, they didn't seem out of place to my eyes
considering my transaction history.

~~~
superuser2
Sometimes the pattern only exists in data writ large.

For example, a particular merchant is breached, and then the fraudsters try to
run most of the cards stolen in that breach at a few locations.

It could be obvious on your bank's end that a few dozen customers made a
charge at Store A and then reported a charge at Store B to be fraudulent, so
then they decided to freeze the cards of everyone with the same Store A ->
Store B pattern.

~~~
aestra
Great point.

------
mootothemax
_Fraudsters tend to make multiple accounts on their laptop or phone to commit
fraud._

Any idea how this can be tracked? Normal cookies, or something more in-depth?

A site I'm working on tracks page visits independently of logged-in user
sessions.

I'm wondering if it's worth considering explicitly looking for multiple
logged-in users sharing the same page visit session.

There are various tricks - e.g. Flash cookies, Etags etc - that make me feel a
bit uneasy, despite how much I like the idea of tracking multiple accounts per
device.

~~~
x0x0
a relatively privacy friendly way is to compare accounts to ip addresses. You
obviously expect there to be some nat-ing going on but it should be relatively
constant over time. That is, if in your entire network you've seen, say, 10
accounts from a given ip and all of a sudden 5 accounts interact with a given
shop, it's probably fraudulent.

You can monitor characteristics the browser reports, including user agent
strings.

You can use techniques like differences in rendering of canvas drawings to
images to fingerprint browsers. In fact, I'd bet good money this is a _great_
signal: what you're trying to do is not fingerprint, but detect when the
reported user-agent has been overridden. Few people override user-agents.

Then you can go on to ways to bury identifiers in browsers. For example, etags
on cached objects may be ok if you aren't using it for advertising and clear
it in your privacy guidelines.

You can also fingerprint with time deltas, though this may be patented.
Briefly: computers synchronize to milliseconds, I think. If you are careful,
you can probably detect sub-millisecond clock skew between a client and your
server. This should not be constant across devices.

etc etc etc

~~~
TheLoneWolfling
...And then you start flagging businesses, ISPs, even entire countries as
fraudulent.

The number of NATs there are make that sort of correlation... difficult.

For that matter, there are people that use UA "spoofing" for non-nefarious
purposes. Me, for one.

~~~
thaumasiotes
Speaking as someone who's spent a lot of time in China... flagging entire
countries as fraudulent isn't the end of some ridiculous slippery slope --
it's a thriving practice today. I have to hide my IP address to interact with
the internet in almost any interesting way.

~~~
x0x0
yeah, for a while I blackholed china and india with ipfw. Hacking attempts
against my server fell like a rock. I felt a tiny bit bad, but if these
countries / isps can't police their subscribers, what do they expect.

Eventually I got rid of wordpress/php and just use nginx to serve static files
so I felt secure enough to drop the firewall rules.

------
ChuckMcM
This was a great read. You know what fraudsters search for ? Vulnerable PHP
web sites and sites with anonymous logins. Its kind of amazing. I would
imagine that if you could take all of these signals and geo-track them back to
the originating IP you would be able to illuminate fraud tolerant ISPs[1], and
fraud schemes and targets. Sure its a big data problem but it seems imminently
tractable if local vendors provide fraud detection data to central source.

[1] You could notify ISPs of their hosting fraudlent traffic and if they
continue to host it ...

~~~
devicenull
Getting consumer ISPs to respond to obvious abuse cases (DDOS attacks) is very
difficult. Getting them to respond to fraud (which they'd have to investigate
more then looking at a bandwidth graph) seems impossible.

Now throw a language barrier on top, and it's even more difficult.

Hell, getting accurate abuse contact information is a project all by itself.

------
koyote
So Microsoft domains account for the highest amount of fraud?

Maybe a third-party should seize hotmail and outlook.com in order to clean it
up for them...

~~~
aytekin
Those domains are more popular in countries where fraud is higher.

~~~
wldcordeiro
The OP was referring to how Microsoft seized domains from another company
under the guise of security.

------
contingencies
The Four Habits of Highly Silicon-Valley Startups.

1\. Put something that doesn't belong there _in the cloud_.

2\. Make _undue generalizations_ about its applicability to third party
businesses of which you have limited understanding.

3\. _Fake growth_ by dubious means, such as ramping up 'customers' (even if
none of them actually use your service on an ongoing basis), hiring
extensively, and waylaying all business processes to cater toward visible
progress at investment rounds.

4\. Spend almost as much on _marketing_ as development.

------
josu
>Habit #2: Fraudsters Are Night Owls

Are they using local time? Or is there a chance that they are not accounting
for the fact that most of the fraudsters are foreign and in a different time
zone?

~~~
wdewind
I believe they mean the SiftSciences customer's time, not the hacker's time.
They are proposing that fraudsters tend to hit in the middle of the night
based on where the target is hosted, not where they are.

~~~
AnimalMuppet
If the targets were US-based, this could fit with the "fraudsters are
international" finding.

------
johnchristopher
> outlook

> Some of the most fraudulent email domains are operated by Microsoft. Why
> could this be? Two possible reasons are that 1) Microsoft has been around
> for a lot longer and 2) email addresses were easier to create back in the
> day. Today, websites use challenge responses such as image verification or
> two-factor authentication to verify your legitimate identity.

But outlook.com is the most recent Microsoft web mail domain. Why is it
already much more used than other Microsoft web mail domains (hotmail, live,
etc.) ?

~~~
josh2600
Go to hotmail.com and look at the login portal.

~~~
phaemon
So what? The mail server name has practically nothing to do with the email
address. The address is still @hotmail.com.

He makes a valid point: why did @outlook.com addresses suddenly become used
for scamming?

~~~
chatmasta
Here's a possibility:

\- - - - @outlook.com is a relatively new email domain (< 2 years)

\- - - - Most people buying online are over age 18

\- - - - Most people do not change their email address

\- - - - Most people over the age of 18 have had their email address longer
than two years

So, by that logic, if someone has a @outlook.com email address, there are a
few possibilities:

\- - - - They had an old email address, but switched/forwarded it to
@outlook.com sometime in the last 2 years (unlikely - generally people don't
suddenly change their email)

\- - - - They made an @outlook.com address for ecommerce signups (unlikely -
why not use your current provider e.g. Gmail?)

\- - - - This is their first email account (unlikely)

\- - - - They registered it to commit fraud (hmmmm)

Obviously this is all speculation and there are exceptions to all those
assumptions, but it seems logical that the last option is more likely than the
others, especially shen weighted by the fact that fraudsters almost always
create more than one email account.

------
gordon_freeman
Pretty interesting insights. Though in Habit #6: "Fraudsters Are Really
Boring" : the digits in email addresses appear pretty obvious (non-fraudster)
to me. We need to remember that there are 600 million + email accounts
registered with Gmail for example and it is really difficult to increase your
chance of creating new email ID without using any digit(s) while registering
your email address.

I myself use email address with 2 digits and I have so many of my friends
using 4 digits or so. I personally don't think having more digits in your
email ID is directly proportional to being more fraudulent.

~~~
skeoh
I would say that while there can be legitimate email addresses with multiple
digits, and fraudulent email addresses with no digits, neither of these facts
precludes the possibility of a correlation.

~~~
gordon_freeman
that's exactly my point. Looking at this habit of "email address with multiple
digits" alone seems correlation rather than causation. Making an anti-fraud
algorithm by including all or most of these habits might point to causation
and help solve the problem though.

------
jfasi
Is it just me, or did they really just fit a quintic curve to theirs
"Fraudsters are Sneaky" plot? There had better be a good reason for their
using such a high-dimensional polynomial.

~~~
sks
It may be a non parametric estimate. My guess is that they used loess
([https://en.wikipedia.org/wiki/Local_regression](https://en.wikipedia.org/wiki/Local_regression)).

------
jacquesm
I've built a system like this recently and most of these are supported by the
analysis I made. Even so, there are _many_ more signals possible than the ones
listed here and only in the aggregate can you use any of this, a single signal
is never strong enough to distinguish between fraud and friend.

False positives will always happen, no matter how many signals you throw into
the mix, there will always be exceptions. Even so the difference between
running with a system like this and being wide open is like day and night.

------
woodchuck64
Fascinating. I can't get over why Fraudsters Go Hungry. Rampant speculation:
they're used to eating in front of a computer screen, they're doing what they
love so no reason to do anything differently during lunch hour, they're shut-
ins and socially inept. International nerds, male, unmarried, above average IQ
but didn't get into the top Indian/Russian/Chinese universities so making a
living and finding meaning hacking into America.

~~~
Zikes
The "mechanical turk" style fraudster setups probably rely more on getting
desperate, unemployed people to work in regions so scarcely regulated that
lunch breaks aren't even expected.

------
romaniv
These articles always seem to imply that you can use those traits for
detection. Can you? I mean, the numbers and wording imply that you can, but
some of the stats are not very clear or intuitive.

For example, just because group X usually doesn't eat lunch doesn't mean that
not eating lunch is a good trait to detect them in the general population.

Also, 6% of outlook.com is used for fraud? This is a huge percentage.

How does this company detect multiple accounts on the device?

~~~
roel_v
Only only 'you can', it is being done by thousands of companies across the
world every second. (not these properties directly, just building a
statistical model of fraudulent users and comparing transactions to that model
to flag potentially fraudulent transactions - in wire transfers, online
purchases, credit card use, ...)

------
joering2
Outlook is seriously broken! 10 minutes after creating an account, my spam
mailbox had already 20 messages. I don't believe that spammers are mailing
every possible combination of a username; there must be some leak or other
way.

Also it seems that Microsoft gave up on verifying whether your message is spam
or not. I had government emails (USPS, for example) as well as emails from my
gmail and yahoo friends landing straight in junk.

------
minusSeven
I doubt how much the data can used as an overall generalization. They should
analyse the pattern and go little more deeper.

And what is exact meaning of fraudulent user here ?

------
alexsmolen
This is interesting and well-informed, but it's important to remember that
fraud is an adversarial problem. The bad guys will change their behavior to
evade detection. The habits described here may exist when there is no defense
in place, but if you use them to detect fraud, you'll likely see shifts in
behavior to appear more "normal" and evade detection.

------
gojomo
Seems they should be keeping these signals secret.

~~~
dublinben
These traits aren't unknown to anyone who is processing credit card
transactions and keeping an eye out for fraud. It's pretty clear that they
don't consider them trade secrets either, or they wouldn't be sharing.

------
bitL
LOL, I am hitting most of these for completely different reasons and never
attempted nor plan to attempt a fraud. I guess another overreaching
application of statistics (it must be because confidence intervals say so and
our prediction model agrees!). It resembles to me the saying that all
murderers eat bread, so bread is dangerous!!!

I really hate oversimplifications in these serious matters.

It happened to me that my bank was using a similar silly algorithm to
consistently block my credit card during my world travel every time I arrived
to a new country/airport, even if I told them about it in advance. A way to
lose customer for life for sure, especially when their emergency line operates
only during working days between 9am-6pm in Germany...

~~~
Zikes
The article clearly says that these factors are not to be taken individually,
but in addition to hundreds or thousands of others.

------
RevRal
I'm pretty impressed with how the header and side bar operate on this site.
Enlarge/zoom the page and the sidebar becomes the header. I'd like to know how
this is achieved.

~~~
skeoh
The website has a responsive design [0], which adapts to the available
viewport. It looks like this one was implemented with Bootstrap [1].

[0]:
[http://en.wikipedia.org/wiki/Responsive_web_design](http://en.wikipedia.org/wiki/Responsive_web_design)

[1]: [http://getbootstrap.com/](http://getbootstrap.com/))

~~~
joshmlewis
I want to point out that this commenting style is very good. You provided
helpful answers, looked into the original question, and then provided sources.
More people should comment like this.

------
dm2
It's interesting that gmail is the least likely used for fraud, why is that?
Can't anybody create multiple gmail accounts?

VPN traffic would also be an interesting metric.

~~~
codygman
IIRC you have to do text message validation. If not I believe the amount of
messages you can send are under 50. However these things change over time, and
I believe at one point (maybe now) you couldn't make a gmail account without
text verification.

Feel free to correct me if my memory is wrong because it very well could be.

~~~
NoMoreNicksLeft
At one point I was thinking about setting up a tv channel for VLC... you can
write a lua script to let VLC extract video urls from a webpage. So I'd use
Tor/bitcoin to get hosting somewhere, put up a simple page for that purpose,
and use Youtube to host the videos. You need Google accounts though, lots of
them (Google would suspend them quickly, after all).

The solution I considered was paying people in Africa to sign up for gmail for
me, and I'd pay them per account. I figured I'd only need 50-100 per month, so
the low volume might make it possible. They often have smartphones, and
amounts that are too low for you to bother with might be a decent payday for
them for 5 minutes work.

Now, I know what you're going to say... Youtube detects copyrighted works,
won't let you upload them. That part was easy.

Just invert the video color, and flip it upside down. Then the lua script for
VLC would de-invert and unflip it. And I could even bring in the audio from
another site (VLC allows muxing), since Youtube uses audio signatures more
than they do video signatures for that stuff.

I had a prototype going for awhile. Called it "Space Potato Channel". It just
played videos others had uploaded (wrote a little backend to schedule movies).
If you tuned in 5 minutes late, it'd show the video 5 minutes in, etc. Then I
learned about how the NSA was giving tips to law enforcement and doing the
parallel reconstruction thing, and I reconsidered my scheme to become a
bitcoin millionaire.

Long story short, gmail accounts were never something I thought would be much
of a problem.

~~~
chatmasta
Or you can go to "account brokers" who sell accounts for something like
$20/1000\. Reliability of those accounts varies per broker but some I
understand to be quite good (never bought any myself).

Hang out on any blackhat SEO forum (or more illegal carding shops, etc. I
would imagine) and you'll see plenty of guys peddling this service.

Incidentally, the youtube method you're describing has been automated many
times. My first real PHP project was a script that found popular videos on
non-youtube sites, downloaded them, watermarked them with my blog URL, and
uploaded them to youtube. That resulted in a fair amount of direct traffic.

If you trawl around youtube these days you'll see plenty of watermarked videos
that are clearly not original content. But as long as nobody is claiming
copyright -- which nobody is doing for cat videos -- Google doesn't give a
shit. Honestly, uploading non-original videos to Youtube only helps their
numbers.

I think a common misconception is that companies care about fake/"spam" user
accounts on their services. But what incentive do they actually have to ban
them? In the world of venture capital, user numbers are an incredibly
important metric, so as long as they aren't actively diluting the service for
other users, companies have an incentive to allow them to propagate and pad
their stats.

Take Snapchat for example. Looking at my friend request page, I have dozens of
obviously spam accounts asking to be my friends. Is Snapchat including these
accounts in their user numbers? Almost definitely. In fact, they probably even
count as "active users" because they are "sharing photos" so often!

One has to wonder how many popular services have been built on VC money given
to them on the presumption of accurate user statistics, when in reality 20-30%
of accounts could be shills. Snapchat, Twitter, Facebook... There are tons of
fake users on all of them, and yet these companies make relatively little
effort to exclude them from stats (except, of course, when reporting
monetization per user).

~~~
NoMoreNicksLeft
I was going to upload A list movies and tv shows. It was going to be a Syfy
channel alternative. Just saying.

------
mynewwork
Ghostery blocks access to the content due to how they're doing a redirect.

I wonder if that's intentional, was one of the seven habits users who block
trackers?

------
excitom
The drawback to services like this is that they are great at hindsight (Aha!
Based on the signals we should have known!) but bad at prediction. Take the
example of Doral, FL that they offer; it has 8X higher fraud. But, should you
avoid Doral customers? No. Should you avoid people who use forwarding
services? No. But if you're scammed, you can look back and say "I should have
known!"

~~~
x0x0
If a scam works once, it will probably be tried again. Even if you are
correct, there is probably value in simply detecting similar scams. Plus, if
sift's network is big enough, a given business will be protected against scams
that hit other businesses.

------
owenversteeg
Argh, that's not how you use an <abbr> element! (in "Fraudsters are really
boring")

------
jdong
This makes the ridiculous assumption that fraudsters don't use proxies, they
do.

------
valarauca1
A while ago I really wanted to build a bot farm, not to do anything particular
malicious (farm reddit upvotes and push content to the front page).

Now this post feels like its encouraging me too.

~~~
ary
> A while ago I really wanted to build a bot farm, not to do anything
> particular malicious (farm reddit upvotes and push content to the front
> page).

You and I have very different definitions of the word "malicious".

~~~
newaccountfool
Hmmm,I wouldn't say it was malicious as long as there was no legal activity
created by it.

