
Expensify sent images with personal data to Mechanical Turkers - yread
https://arstechnica.com/information-technology/2017/11/expensify-acknowledges-potential-privacy-problem-by-calling-it-a-feature/
======
afandian
Having happily used AirBNB in the past, I recently tried to book a stay in
Toronto. They wanted a full photo of my passport or driving license to
complete the boking. Looking a little deeper, not only did they outsource the
processing, but when others had questioned them about it the response seemed a
little noncomittal.

With leaks and lapses of judgment ocurring daily I feel like it would be an
identity fraud ticking timebomb to use AirBNB and other services like it,
especially given the attitude they seem to attach to security. I know others
on HN feel diferently.

I took my business elsewhere.

~~~
CaptainZapp
Playing devil's advocate:

Where is the problem? Or, what does a passport or id card contain, which
should be kept secret?

Your name? They have that anyway? A passport number? What could be done with
it, except verifying that the passport is legit and not stolen? My height? Who
cares?

While I really think there's much too much shenanigans, which can be done with
an (US) social security number and personal information I just don't see the
same issue with identification.

I'm happy, of course, to be educated otherwise.

Disclaimer: i did upload a foto of my passport in ordet to rent an AirBnb in
Sapporo.

~~~
Spooky23
The use of passport is an example of an attempt to do identity proofing. In
the practice of identity management there's a concept of "Identity Assurance
Levels" where you have varying levels of validation of a person's identity. If
you are interested in this, check out NIST Special Publication 800-63A.

AirBNB is trying to use the passport photo as a way to link you to a real-
world identity, cheaply. They are in a sticky situation because their bonkers
business model will suffer if customers need to go through intrusive
processes. Simultaneously, they need to do _something_ to avoid being held
negligent when a fake AirBNB host/guest hurts someone. The problem is that it
doesn't really provide assurance of anything other than possession of an image
of a passport.

The problem is that it's spewing alot of information that if handled
improperly is a high risk for fraudulent use. For example, knowing your
citizenship, date, and place of birth makes it trivial to fraudulently obtain
your birth certificate. That makes it pretty trivial to do something like
obtain a fraudulent driver's license.

There are many ways to do this more effectively and at much lower risk to the
_customer_. For example, you could verify ownership of a bank account with
trivial deposits. Or you could mail the customer a token. Or require a
notarized document. Or some combination. But the risk to AirBnb of a negative
outcome is low, so they push a risk that you may not understand to you.

~~~
CaptainZapp
I just stumbled over your reply right now (a $ short, a day late) and, well:

Thank you very much.

That was instructive, insightful and taught me a lot; namely what the actual
problem is.

Sheesh! Sometimes it's really worthwhile to be a bit prissy, but seriously, I
learned something from your reply and I really appreciate it.

------
dbirulia
There is too much fake marketing around companies adopting ML and AI today. We
call this the "man behind the curtain" (wizard of oz). The rule of thumb is:
if it takes minutes or even hours to OCR & data extract then it's human labor.

Lots of companies like Expensify, Bills.com, ReceiptHog and others use MTurk
or services like MTurk to extract data from the financial documents. The
accuracy is still not 100% guaranteed also the categorization is usually off
since person categorizing your receipts doesn't have a history of your
previous purchases and how they should be categorized in your business. This
also means that if you are doing anything with PII or a healthcare company,
watch out. These companies are NOT HIPAA compliant. They leak PII data. It
takes 1 social hack to steal someone's identity.

How do I know all this? Because we ([https://iqboxy.com](https://iqboxy.com)
YC W17) built a 100% automated solution for bookkeeping and expense
management. You scan a receipt/bill/invoice and you get results in a few
seconds. We have been offered many times to use those human labor services for
"automated data extraction" but we believe it's not how this problem should be
solved in 2018.

~~~
akavel
@dbirulia I see you have a "free plan" (the "paid plan" is currently too much
for me for expected value, in my country it's ~10 coffees, not 2), but it has
"Advanced OCR" grayed out. Does it mean it has no OCR at all (just stores
image scans?), or rather some kind of "primitive OCR" (whatever this means)?

~~~
dbirulia
Hey @akavel the difference is that the system will not build ML models for
your account and will not learn from your edits and categorizations. Give it a
try and maybe it’s enough for your case.

------
Spooky23
“On November 25, Expensify's founder and CEO, David Barrett, announced a new
"feature" the company was working on, called Private SmartScan, in which
customers would be offered the option of recruiting their own backup
transcription workforce through Mechanical Turk.”

That is a pretty sorry attempt at deflection.

~~~
saas_co_de
About what you would expect from a company with an $800 million valuation that
pays its employees $2/hr.

------
ghshephard
Expensify does pretty horrible categorization. It's does a really good job at
determining what the amount is, but approx 70% of the time manages to screw up
the categorization unless its really, really obvious (ground transportation
for Uber) - it will even make rookie mistakes like seeing I classified a WiFi
charge with United, and then classifying my next airplane tickets as "WiFi".

But - I've always assumed that Concur and Expensify had large groups of people
manually correcting/categorizing receipts, so not too surprised here.

------
madaxe_again
Never mind them “still running until sept” - my wife occasionally does mturk
in front of the TV, and maybe five days ago (Nov 20th) lots of the tasks were
expensify.

Some of the receipts had enough info to identify people - store card numbers,
buyer phone number, etc.

------
seattle_spring
Did anyone honestly think it was anything other than humans manually entering
data from the submitted images?

~~~
SilasX
lol so this is like the modern version of the _actual_ , original namesake
mechanical "Turk" from the 18th centruy?[1]

"Um, you pretty clearly have some human wearing a costume and playing chess."

'No, no, I insist, it's 100% machine that happens to be good at this task!'

Now it's:

"Um, you pretty clearly must be using humans to interpret the receipts."

'No, no, I insist, it's all done by our proprietary algorithms!'

[1]
[https://en.wikipedia.org/wiki/The_Turk](https://en.wikipedia.org/wiki/The_Turk)

~~~
mulmen
Can't wait for the Show HN of a mechanical turk backed chess bot.

~~~
SilasX
That would actually be interesting! Humans are good st certain kinds of
computational problems like approximate Traveling Salesman, so if you had
agood algorithm that assumed access to such an oracle, you could indeed power
it with Mechanical Turk workers in a nontrivial way.

~~~
thethirdone
> Humans are good st certain kinds of computational problems like approximate
> Traveling Salesman

Are humans any better at it than computers? I wouldn't think that humans have
a good enough heuristic that the extra 1000x processing speed isn't more of an
advantage.

------
williamscales
I mean, we all knew it was humans doing the "SmartScan" transcription but one
would suppose that there's some level of vetting and control in place...

~~~
goialoq
How would you vet that someone is not going to sell your private information
on the illicit market?

~~~
eps
What "illicit market" could possibly be interested in a pile of random
receipts?

Receipts would have any value if all of them are from the same person and can
be used to reconstruct that person's activity. And even then it'd be pretty
damn tricky to sell that "file" as it still has no exploitable value.

~~~
semerda
Ever looked at your health insurance bills? Medical records are a gold mine
for social engineering. Standard restaurant receipts have everything from
payment type, last 4 digits of your credit card, transaction history... great
for social engineering. Oh and Uber receipts have your home address too ;-)
Whoops.

------
mrgordon
This is why Crowdflower has the option to only show data to people who have
signed a non-disclosure agreement. You don't just show private data to
everyone on the Internet...

~~~
semerda
Your joking right? I'm sure Equifax used that one too. NDAs in 2017 is like
toilet paper. How do you trust one of your employees won't go rogue? If you
deal with a healthcare company in the US you need to provide a BAA and comply
with HIPAA. So how do you handle that?

~~~
mrgordon
How does anyone trust their employees won’t go rogue? If they are under NDA
and are longtime employees with high ratings then there is usually no reason
to think they are trying to “go rogue” by stealing some usually very boring
but private data.

HIPAA has additional requirements.

------
semerda
Just like their AI/Bot/Robot/Skynet Concierge service that's actually human
powered. See tweet =>
[https://twitter.com/kylelloydsf/status/738789931791699968](https://twitter.com/kylelloydsf/status/738789931791699968)

"A heads up, our team usually responds within 24 business hours (7am - 6pm)"
lol.. I guess machines do dream of electric sheep after all ;-)

------
jesperlang
I am imagining "self-driving" cars driven by mechanical turks! Big halls with
what looks like arcade car games, workers specialized in all aspects of
driving, controlling cars on the road at the other side of the planet...

Too crazy?

~~~
throwaway41123
You might be interested in
[https://en.wikipedia.org/wiki/Sleep_Dealer](https://en.wikipedia.org/wiki/Sleep_Dealer)

~~~
jesperlang
thanks, I'll check it out

------
gruez
AI: fake it till you make it!

~~~
sjg007
It would be better to just kick back to the user to resolve it. I used
expensify and it took like 20 mins to process a few receipt images.. I ended
up doing it myself anyway.

------
pakopak
We will start seeing such news more often. Privacy is a real concern in the AI
world and either by machines or humans, plain sensitive data is being
processed in most of the cases. Usually humans require to access the raw data
in order to label it to train or tune the algorithms. The data ends up being
accessible and often, by cheap labour as this is a very expensive process in
the current state of AI.

------
ayw
For anyone interested in solving this problem in their own business—we take
privacy extremely seriously at Scale API
([http://www.scaleapi.com](http://www.scaleapi.com)) and implement numerous
safeguards operationally and technically to ensure this doesn't happen.

~~~
mbesto
> we take privacy extremely seriously...implement numerous safeguards
> operationally and technically to ensure this doesn't happen

I hate that this is going to come off as snark, but a lot of tech startups
make claims about executing "rigorous background checks" (Uber et al), which
gets proven time and time again is not true at scale.

So what specifically do you do to ensure this? How do I trust your
organization?

~~~
goialoq
Just look at the reputation they've built up over the years.

~~~
mbesto
The company is a year old...

~~~
narsil
Probably forgot the /s

------
1PM9xYsw5kzz
TL;DR: Expensify’s deceptive mechanical turk army may have resulted in me
coming within seconds of losing $30k, and almost certainly leaves them exposed
to massive liabilities as they wantonly give away personally identifiable
information to low-paid contract workers that are not bound to
confidentiality.

Throwaway account for obvious reasons. I have had my identity stolen twice, in
both cases with the intent to steal access to accounts I use in connection
with my business. Two relevant pieces of information: my employer pays for my
phone, and I get reimbursements by submitting the PDF bill via Expensify. My
employer is also a bitcoin company, and occasionally I have to buy bitcoin (a
few hundred dollars worth) to top up our service’s wallet for paying
transaction fees.

When the second incident happened a few months ago I had just come back from a
long International trip. I work in the bitcoin industry and know not to keep
coins on exchanges, but I also know not to travel through customs with >$10k
or with bitcoin private keys on my person. So I had 8btc, about $30k at the
time, loaded on my personal account on a certain exchange, in case (1) I
needed emergency funds and was locked out of my bank account for ordinary
fraud prevention snafu reasons; and/or (2) I needed to top-up company accounts
again. It turns out I did need to do the latter.

Immediately on coming home I filed my expense reports for reimbursement.
Expensify, as far as I can tell, does not allow a way to opt out of SmartScan”
from the mobile app. In any case I thought this was some harmless excuse for
the CTO to have a machine learning project. Little did I know. This report
included both my phone bill for the months prior, and the bitcoin transfer, a
print-out from the logged in view of my account on the exchange.

Very shortly thereafter (days?) my phone was remotely SIM-ported to another
device. I got extremely lucky in that I was using the phone at the very moment
it happened, and saw it go from full bars to “No service” while sitting in my
chair. I knew what was going on because this was the 2nd time it had happened
to me. I ran do the coffee shop down the street to get wifi, and found I could
no longer login to my email account. Yup, hack in process. Attacker used SMS
authentication on my cell phone to reset the password on the email.

Lacking a phone, I rushed to the nearest retail store for my carrier, and
after a short conversation I had my phone service back. I had to answer some
questions to regain my account, questions like “from which country did you
recently make international calls, and to whom?” On getting full bars back on
the device, I sat down and did a recovery of my email account, using the same
process as the attacker now that I had regained control of my phone. I opened
the “sessions” tab and logged out the other device, an IP I didn’t recognize
from the other side of the country. 42 minutes elapsed from loss of phone
service to account recovery.

Now here’s where things took a scary twist: I go back to my inbox, and as I
watch I see an “Account reset requested” email from the company that provides
2FA services for my preferred bitcoin exchange. Inside the email is a link to
confirm the request and receive a code to download the credentials (they save
them to the cloud?! wtf!?) to a new device. Right below that in the inbox:
“Password reset” from that bitcoin exchange, containing a clicked link (I
presume, the email was read) that takes you their new-password form that
requires 2FA. The 2FA input box has a link next to it that loads the 2FA app
reset procedure that authenticates by email & SMS and then sends the root seed
information.

As quickly as I could get back home to my secure, non-travel laptop with a
read-only USB boot drive, I logged into my exchange account using the old
credentials, and swept the $30k of coins (thankfully still there) to my cold
storage. The paranoia was because at the time I didn’t know whether it was
malware or what that had leaked enough information for the attacker to
impersonate me. I spent the next few days reinstalling and virus scanning,
resetting all passwords, and redoing my operational security to make sure this
doesn’t happen again. (General advice: draw a flowchart showing what
information you need to have to reset each of your credentials. Make sure it
progresses from most trusted to least trusted, don’t have any loops, and keep
your most trusted root credentials/passwords offline. Dollar-bill sized paper
is excellent. Maybe an envelope with a $100 bill. Unless you are richer than I
am, your instincts will _make_ you secure that.)

But here’s what bugged me. Unlike the time before this happened to me, this
was a hyper focused targeted attack. Even a few seconds later and those coins
would have been _gone_. Unlike the last time the attacker didn’t try to get
into anything else — just went straight to the exchange, and that particular
exchange. Additionally, as this had happened before I had instructed my
telephone company in no uncertain terms to never, ever do a SIM reset without
authorization in person using the best available methods.

Now it comes together, and I am pissed about it. I was able to recover my
phone by answering questions about its usage only I would know… like who I
called, and what country I called from. Only those are listed right on the
phone bill! A form of ID might be required, but seriously what do they check
on that? The name and photo? Faked. The home address? Printed on the bill! And
why did they go straight to the exchange? Because I submitted a screenshot of
a transaction that included exactly how much hard currency I was stupid enough
to leave on it. My email was easily googable (the perils of having a truly
unique name). Those mechanical Turk contractors who processed my receipts had
access to it all.

So Expensify, you’ve got some answering to do. I pretty clear now I came
within seconds of losing a rather large portion of my nest egg and deal with
identify theft resolution because you shared confidential and private
information with hourly (minute-ly?) sub-contractors not subject to background
review, oversight, or background checks, and certainly not subject to your
commercial and third-party confidentiality agreements. You may try to argue
they were, but any competent judge would reject the proposition that a worker
being paid mere cents to fill out a text field from a picture as quickly as
possible before moving onto another task from another company, was able to
read, understand, before seek independent legal advice on your agreement
before clicking through, assuming it got showed to them at all.

“Oh but your company signed our terms of service” you say. Yes, well the
relevant stuff is in the privacy policy, referenced by the terms of service.
Let’s look at that: Section 5, Disclosure of Personal Data. You say you only
allow third parties (and further transfers) to access personal data in
compliance with this same privacy policy, which forbids usage, like identity
theft, not necessary to performance of services.

Maybe this is what slipped past your legal team: RECEIPTS ARE PERSONAL AND
COMPANY CONFIDENTIAL INFORMATION. That phone bill? Personally identifiable
information. That receipt for hosting expenses? That’s our raw cost, and
company confidential information. Privacy of this information matters, and
even if you tried to indemnify yourself in your policy, which you failed to
do, you still run afoul of user protection laws, even in the company-friendly
USA. There are stiff fines and civil liabilities for not taking sufficient
safeguards to secure protected user data.

You damn lucky that I didn’t lose that $30k, or you better believe I’d be
lawyering up. But I sincerely doubt I was the only one targeted either, and
some probably successfully. Can you cover that liability? Are your investors
ready to take that risk? I almost lost $30k. What about the assistant to the
oil company CEO who expenses a coffee down the road from the headquarters of
the competing oil company the rumor mill says they’re going to acquire? What
happens when insider trading props up the price killing the deal, causing the
company to enter bankruptcy and the shareholders file suit for the leak? You
ready to carry that liability?

This is inexcusable. This is reckless. This is mismanagement of confidential
company and personal data exposes you to knowable liabilities and reflects
monumentally poor decision making. You’d better take steps NOW to disable this
“feature”, issue apologies, provide tools for users to know what receipts were
“SmartScanned” by your totally and completely not automated or in-house
service.

And you’d best do that before you start facing lawsuits.

~~~
redhawks2015
Well, there's not really any way someone could execute this attack. Receipts
aren't logically grouped by reports based on someone's identity... They said
it was part of an experiment and only included a small group of free users (if
you're not paying, then YOU'RE the product) plus their employees.

------
chinathrow
That image is racist.

~~~
bipson
If you are referring to the image at the top of the article, I'm afraid it is
not. It is a reference to the origin of the name for "Mechanical Turk" [1].

[1]
[https://en.wikipedia.org/wiki/The_Turk](https://en.wikipedia.org/wiki/The_Turk)

~~~
chinathrow
Sorry, I ment the caption.

> Would you let this guy handle your benefit and business expenses?

