Hacker News new | past | comments | ask | show | jobs | submit login
Expensify sent images with personal data to Mechanical Turkers (arstechnica.com)
253 points by yread on Nov 29, 2017 | hide | past | favorite | 109 comments

Having happily used AirBNB in the past, I recently tried to book a stay in Toronto. They wanted a full photo of my passport or driving license to complete the boking. Looking a little deeper, not only did they outsource the processing, but when others had questioned them about it the response seemed a little noncomittal.

With leaks and lapses of judgment ocurring daily I feel like it would be an identity fraud ticking timebomb to use AirBNB and other services like it, especially given the attitude they seem to attach to security. I know others on HN feel diferently.

I took my business elsewhere.

> I know others on HN feel diferently.

Remember a lot of the people on HN work at these firms, and therefore defend them. For example, I've noticed if I say anything negative about Facebook, the up/down votes on my comments swing wildly; initially starting fairly positive (say +30) then back down to slightly negative (say -2). The voting pattern is certainly intriguing and unusual. Basically, what I'm trying to say is, if you feel something about a particular tech company, HN may not always be the best place to validate that opinion as bias does exist. Stick with your gut (as you have done)!

Prime example why we need laws that would forbid companies to do verification using your passport and other documents containing personal information.

Identity verification using passports is not the problem. Identity verification by exclusively looking at pictures of passports is the problem.

The point of a passport is to bind your legal identity (name, place you live, dob) to characteristics of your body (signature, face), so that people can then use it to verify that the body standing in front of them is for legal purposes the legal entity described in the passport. The point is that in order to check the identity, you compare the picture to the face, you compare the signature on the passport to the signature they write with you watching, so you then can be reasonably sure they are the person described in the passport.

What happens nowadays is that people check whether someone can send them a picture of a passport, and if they can, then that is considered proof that the entity that sent the picture obviously necessarily must be the person described by the passport depicted in that picture. That's just completely broken logic and should simply not have any legal weight at all.

It's as if your web browser checked that the server sends a certificate that matches the requested host name, but didn't check any signatures or anything to ensure they are indeed talking to the owner of that certificate and its corresponding secret key.

> Identity verification using passports is not the problem. Identity verification by exclusively looking at pictures of passports is the problem.

I think another significant part of the problem is the fact that sending someone a picture of your passport means that they, and anyone with whom they choose to share it, thereafter permanently have a picture of your passport. For example, I would have no problem if I were somehow able to audit these companies' use of the passport data and to make sure it was used responsibly and deleted after use. (I would compare to the way that I see everything a customs officer does with my passport, but of course that's not really true, since I have no idea what processing is going on on their computer. However, whatever they've got on the computer comes from access to government databases, to which (hopefully!) private companies don't have access.)

Most modern passports have the ability to sign things with a secret key.

Most phones have NFC to talk to passports.

If only websites said 'sign this text with your passport to prove who you are', that would be excellent security.

Sadly, I think the barrier to this today is iphones lackluster NFC support and androids browsers which don't let you talk NFC directly (hence requiring a special app).

> Most modern passports have the ability to sign things with a secret key.

Really? How? I have a fairly recent US passport, and the ability to do it is not clear. (Genuinely curious—this sounds like a neat thing to be able to do.)

The NFC chip in the passport (the wire loop antenna on the page with your photo on) has a bunch of commands you can send it. Only passports made in the last 5 years or so have this. Old passports are paper only and have no such smartness...

Hook up an NFC debugger app on an android phone and put your passport next to the phone and you can send it commands.

Different countries passports vary, and take different commands etc.

Except it's not, because it makes a black box the arbiter of legal obligations, and judges get the role of signing off on whatever the black box says.

“Prove you are who you say you are by sending us your bank password. Oh, and we’ll keep it on file indefinitely.”

It is rather easy to photoshop and scan a passport. All you would have to do is get a random scan of a passport from your desired country, import the desired photo, print it and scan it a couple times and you have a new id as far as Airbnb is concerned. I highly doubt the outsourced workers checking it would care. There were quite a few driver license templates and passport template on demonoid a few years back...

It's going the other direction.

We're accumulating laws that (usually implicitly) requiring companies to do verification using your passport and other documents containing personal information. Online versions of the kinds of rule whereby hotels need a photcopy of your passport.

We are tangentially getting privacy initiatives from a different set of regulators/legislators, but... well... they're not doing a great job.

"Know your customer" anti money-laundering laws are growing more stringent.

In the U.S. the Patriot Act made it even more mandatory:


but then how would a company that need to verify your identity before providing you a service do so? I m talking about things such as transferwise.com (or, even airbnb, which, if i were a host, i want to know who i m hosting).

And the best way of verification is obtaining a copy of my passport?

Yup. The surprise request for passport/id is exactly what caused me to delete my Airbnb account and cancel the booking that I had already made without the request for passport/Id.

Airbnb you don’t get to pull this kind of unethical shit.

Airbnb employees not happy with my comment I guess, but it's true. I don't want to have to give you my passport before I can go stay somewhere that I already booked like 6 months ago.

Playing devil's advocate:

Where is the problem? Or, what does a passport or id card contain, which should be kept secret?

Your name? They have that anyway? A passport number? What could be done with it, except verifying that the passport is legit and not stolen? My height? Who cares?

While I really think there's much too much shenanigans, which can be done with an (US) social security number and personal information I just don't see the same issue with identification.

I'm happy, of course, to be educated otherwise.

Disclaimer: i did upload a foto of my passport in ordet to rent an AirBnb in Sapporo.

Either sending someone a picture of a passport is accepted as proof that you are the owner of the depicted passport, or having a picture of a passport is worthless. Choose one, and then explain why sending a picture of a passport to airbnb makes sense and security on their part doesn't matter.

The use of passport is an example of an attempt to do identity proofing. In the practice of identity management there's a concept of "Identity Assurance Levels" where you have varying levels of validation of a person's identity. If you are interested in this, check out NIST Special Publication 800-63A.

AirBNB is trying to use the passport photo as a way to link you to a real-world identity, cheaply. They are in a sticky situation because their bonkers business model will suffer if customers need to go through intrusive processes. Simultaneously, they need to do something to avoid being held negligent when a fake AirBNB host/guest hurts someone. The problem is that it doesn't really provide assurance of anything other than possession of an image of a passport.

The problem is that it's spewing alot of information that if handled improperly is a high risk for fraudulent use. For example, knowing your citizenship, date, and place of birth makes it trivial to fraudulently obtain your birth certificate. That makes it pretty trivial to do something like obtain a fraudulent driver's license.

There are many ways to do this more effectively and at much lower risk to the customer. For example, you could verify ownership of a bank account with trivial deposits. Or you could mail the customer a token. Or require a notarized document. Or some combination. But the risk to AirBnb of a negative outcome is low, so they push a risk that you may not understand to you.

I just stumbled over your reply right now (a $ short, a day late) and, well:

Thank you very much.

That was instructive, insightful and taught me a lot; namely what the actual problem is.

Sheesh! Sometimes it's really worthwhile to be a bit prissy, but seriously, I learned something from your reply and I really appreciate it.

Well, given an image of an ID uploaded to service A you could obviously use it as credentials to service B; someone else could reupload your ID in order to rent from AirBNB as you.

Thanks & fair enough.

I really prefer a legitimate answer, like yours, as opposed to be voted down for posting a legitimate question.

>"Or, what does a passport or id card contain, which should be kept secret?"

Well for one the passport as a whole can then be used to commit identity fraud or cloned on the black market.

Combined with some othe rpersonal info it can get you registered for the UK online tax portal etc

Most Bitcoin exchanges also require id/passport, I assume these checks are also outsourced. Not exactly comfortable to share my passport with a random stranger.

Ultimately, there're competing requirements here: person verification & privacy. The requirements are partly AirBnB solving its own problems (like expensify in this case). Partially (substantially), these are "the man's"requirementsˆ.

If a photo of your passport or whatnot is private information that must be kept secret, or else identity theft... well then it can't be sent to AirBnB en masse. Those two do not (easily) go together. Certainly not if it's AirBnB, uber and twelve other things every year. Mechanical turking the review is bad, but doing it internally is probably just as bad. Lots of people will have access to the docs, copies will be emailed around..

Over the last few years (and ongoing) there's been persistent attempts to regulate and change business norms such that transactions are de-anonymized. A lot of it under the heading of "AML." Even most AML has nothing to do with money laundering. The norms of validating customers for fraud prevention or other reasons are becoming a general business/regulatory norm, boilerplate practices. The specific implementations are half-hazard and often negligent.

I just had a case where I was locked out of an account because their backend couldn't send 2FA auth code to my phone. In order to unlock my account, they wanted me to provide "scanned copy of official ID", I refused.

The basis of my refusal was: If your backend systems cannot send 2FA to my phone because of some unknown reason, why should I trust it to securely store my ID info?

I told them I would rather remain locked out of the account and simply post the entire conversation on twitter for all to see. Within 5 mins she capitulated and suddenly "discovered another way she could unlock". There are numerous things that are sort of scary about that encounter. (Was my ID actually required? Threats to post on twitter change security requirements, etc)

I think we’re on the same page.

Well.. This is inevitable if for internal or external reasons, something on the internet needs to verify your identity. Either (A) they live without this verification (B) we find some way of verifying or (C) every tom dick & harry has a dropbox folder full of people’s "scanned copy of official ID.”

To me, it seems like “verifying people” in this manner is stupid. The whole premise is that there are documents that only I have. By sending the document to you, I show that only I could be me. Using this process widely breaks the premise. I know for a fact that several parties’ employees, contractors and such have scanned copies of my official ID …I sent it to them.

Agree. There is actually a pre-existing, low-tech solution to 90% of these issues: Use a Public Notary to verify identity.

Company sends document with unique barcode to person they are trying to verify. Person takes to local UPS store and has public notary verify that person signing document really is "John Q Smith". Person sends back doc, with notary stamp and recordation number showing licensed person has authenticated.

This way your actual ID never leaves your possession.

That's how public student loan applications in Ontario work. They send you a document with a unique barcode, you go to the post office and prove your identity there. They scan the document and that sends a message to the loan centre that this person was positively identified.

> If a photo of your passport or whatnot is private information that must be kept secret, or else identity theft... well then it can't be sent to AirBnB en masse.

I think the main issue is that it’s not really you, but service providers who get to decide what’s an information that need to be kept secret.

For instance if most banks decide to exclusively use bits of your fingernails to verify you, your fingernails become something you need to keep secret, your personal position be damned.

And the more agencies that decide to use your fingernail bits, the more valuable they get.

For Airbnb, if the practice propagates enough, your photo with your passport in it will become the new way to fake you.

There is too much fake marketing around companies adopting ML and AI today. We call this the "man behind the curtain" (wizard of oz). The rule of thumb is: if it takes minutes or even hours to OCR & data extract then it's human labor.

Lots of companies like Expensify, Bills.com, ReceiptHog and others use MTurk or services like MTurk to extract data from the financial documents. The accuracy is still not 100% guaranteed also the categorization is usually off since person categorizing your receipts doesn't have a history of your previous purchases and how they should be categorized in your business. This also means that if you are doing anything with PII or a healthcare company, watch out. These companies are NOT HIPAA compliant. They leak PII data. It takes 1 social hack to steal someone's identity.

How do I know all this? Because we (https://iqboxy.com YC W17) built a 100% automated solution for bookkeeping and expense management. You scan a receipt/bill/invoice and you get results in a few seconds. We have been offered many times to use those human labor services for "automated data extraction" but we believe it's not how this problem should be solved in 2018.

@dbirulia I see you have a "free plan" (the "paid plan" is currently too much for me for expected value, in my country it's ~10 coffees, not 2), but it has "Advanced OCR" grayed out. Does it mean it has no OCR at all (just stores image scans?), or rather some kind of "primitive OCR" (whatever this means)?

Hey @akavel the difference is that the system will not build ML models for your account and will not learn from your edits and categorizations. Give it a try and maybe it’s enough for your case.

> NOT HIPAA compliant

While you're likely correct, there's really no way of telling if they are compliant or not without a published 3rd party audit (ideally several). If a company puts the proper policies and controls in place, and then proves implementation and adherence to a 3rd party, then they are technically compliant (be that HIPAA, PCI, etc.). It is possible to define and implement data access policies for offshore workers. It's extremely hard to prove adherence, but it's possible.

I doubt you could sign a BAA with offshore workers who don't have to comply to such US standards. Furthermore, this space will get shaken up in 2018 in Europe when EU General Data Protection Regulation (GDPR) goes into effect.

Re 3rd party audit -- yes, a Pen Test by a 3rd party & BAA should be the standard for healthcare companies dealing with service provides. If Expensify has any healthcare companies using their service they are either too small to employ such due diligence or Expensify is headed towards a disaster aka Equifax #2.

Either way, tech companies should take privacy more seriously.

Saw that you were super active in this thread, so I googled your username. It looks like you're their competitor and you're acting like you're an unbiased/concerned person. Pretty dishonest - I'm sure it's great, but you should disclose that you're shilling for your company.

Not once did I "shill" for my company here. And yes I am concerned about this and people affected. Should I not be? Calling me dishonest is just poor form mate.

I don't see how a 100% automated solution is going to detect errors. Unless you have humans reviewing and verifying at least a sample of the results you can't train. So do you expect your customers to review everything they scan line-by-line to verify accuracy?

Is this supposed to completely replace your bookkeeper?

Yes, our long term goal is to automate all the bookkeeping duties and when I say automate I mean automate with machines, not outsource :)

But at the current stage - how much stuff is "missed"?

“On November 25, Expensify's founder and CEO, David Barrett, announced a new "feature" the company was working on, called Private SmartScan, in which customers would be offered the option of recruiting their own backup transcription workforce through Mechanical Turk.”

That is a pretty sorry attempt at deflection.

About what you would expect from a company with an $800 million valuation that pays its employees $2/hr.

And I can only imagine shareholders and capital encouraging this deception instead of admonishing it. We must build mechanisms to correct this behavior.

Expensify does pretty horrible categorization. It's does a really good job at determining what the amount is, but approx 70% of the time manages to screw up the categorization unless its really, really obvious (ground transportation for Uber) - it will even make rookie mistakes like seeing I classified a WiFi charge with United, and then classifying my next airplane tickets as "WiFi".

But - I've always assumed that Concur and Expensify had large groups of people manually correcting/categorizing receipts, so not too surprised here.

Never mind them “still running until sept” - my wife occasionally does mturk in front of the TV, and maybe five days ago (Nov 20th) lots of the tasks were expensify.

Some of the receipts had enough info to identify people - store card numbers, buyer phone number, etc.

Did anyone honestly think it was anything other than humans manually entering data from the submitted images?

> Did anyone honestly think it was anything other than humans manually entering data from the submitted images?

They can have humans entering data manually but have safeguards in place to protect data - usually done by either hiring people in-house or having a contract with a firm that employs a regular set of people to do this work.

Sending the data to Mechanical Turk implies basically none of that safeguarding. Yeah, it's possible to do it, but from the sounds of it - especially given Expensify's appalling response - it's pretty apparent that they didn't.

lol so this is like the modern version of the actual, original namesake mechanical "Turk" from the 18th centruy?[1]

"Um, you pretty clearly have some human wearing a costume and playing chess."

'No, no, I insist, it's 100% machine that happens to be good at this task!'

Now it's:

"Um, you pretty clearly must be using humans to interpret the receipts."

'No, no, I insist, it's all done by our proprietary algorithms!'

[1] https://en.wikipedia.org/wiki/The_Turk

Can't wait for the Show HN of a mechanical turk backed chess bot.

That would actually be interesting! Humans are good st certain kinds of computational problems like approximate Traveling Salesman, so if you had agood algorithm that assumed access to such an oracle, you could indeed power it with Mechanical Turk workers in a nontrivial way.

> Humans are good st certain kinds of computational problems like approximate Traveling Salesman

Are humans any better at it than computers? I wouldn't think that humans have a good enough heuristic that the extra 1000x processing speed isn't more of an advantage.

The modern version is more depressing. Companies let humans do the work, while using the proceeds to develop AI solutions which eventually put those people out of work.

Yes, that's why the system is called Mechanical Turk.

Nope. I remember when MTurk came out, used the API and tried some of the work units. For the most part, it really is moot though for most tasks now, except for insanely complex ones or new business processes that haven’t been automated yet. This decline of usefulness is due to the fact recent AI/ML/DL, after little training, can complete mundane, repetitive tasks faster and with far better accuracy and precision than all humans.

I disagree. There are still lots of tasks out there that fit MTurk. Take something like "look at a picture and determine the person's race". Costs five cents for a Turk to do that. You can certainly train an RNN plus a Bayesian network to do this task, but a) there's no public corpus of training data for this so you'll need to generate that anyway and b) this solution still isn't thaaaat accessible to startups that have 1000 other things to worry about. How much time would the engineering team have to spend training, testing, tuning, validating, and deploying the model? If you only ever need to tag 1M images, that $50k on Turk is probably well-spent over the ML solution. If you need to tag 1B images, that's a different story, but if you're looking at that scale you probably already have the resources to do it.

There's already a great variety of cheap demographic recognition API's readily available... https://deepai.org/machine-learning-model/demographic-recogn... is just one.

Race is a cultural construct.

Quick, what's Barack Obama's race?

(Hint: your answer will depend on your culture and what you know of his ancestry and upbringing.)

This literally has nothing to do with what OP was saying.

You took one word out of his entire point and decided to make a statement about that.

The problem is not having humans in the mix, it's that access controls were basically zero. Anyone could go to the site and get personal data in moments, and all the entry was being done in insecure environments by unknown random people.

I assumed it was similar to Evernote's OCR...unless...Evernote was/still is also using Mechanical Turk..

I mean, we all knew it was humans doing the "SmartScan" transcription but one would suppose that there's some level of vetting and control in place...

How would you vet that someone is not going to sell your private information on the illicit market?

You have a controlled environment for the workers, who do their work on locked down software and hardware. Basically, employees don't get to bring any personal possessions into the room where they do the work, the software through which they do the work does not allow them to copy or export any data, the OS that hosts the software doesn't allow them to do anything but run the software.

sounds expensive

Well, I guess I was assuming they were tackling that problem or had some way of addressing it. That assumption was based on trust, evidently misplaced.

It does seem like a hard problem to truly solve. The difficulty of doing the crime is so low compared to the difficulty of excluding anyone who might feel slightly inclined to commit it that it seems inevitable that removing humans is the only viable solution at scale going forward.

What "illicit market" could possibly be interested in a pile of random receipts?

Receipts would have any value if all of them are from the same person and can be used to reconstruct that person's activity. And even then it'd be pretty damn tricky to sell that "file" as it still has no exploitable value.

Ever looked at your health insurance bills? Medical records are a gold mine for social engineering. Standard restaurant receipts have everything from payment type, last 4 digits of your credit card, transaction history... great for social engineering. Oh and Uber receipts have your home address too ;-) Whoops.

Hey bank I forgot my password.. no sorry I don't have that.. no I'm not at home I don't have that either.. I can tell you a few of my recent purchases and the amounts though would that do it?

Human error is always the easiest way in

You figure out a way to remove the name / identifying info and just have someone classify the transaction.

That sounds suspiciously like a strategy that could, eventually, lead to automation of the whole process...but it will be costly in the short term so it's better to just turk it until you make it for now!


This is why Crowdflower has the option to only show data to people who have signed a non-disclosure agreement. You don't just show private data to everyone on the Internet...

Your joking right? I'm sure Equifax used that one too. NDAs in 2017 is like toilet paper. How do you trust one of your employees won't go rogue? If you deal with a healthcare company in the US you need to provide a BAA and comply with HIPAA. So how do you handle that?

How does anyone trust their employees won’t go rogue? If they are under NDA and are longtime employees with high ratings then there is usually no reason to think they are trying to “go rogue” by stealing some usually very boring but private data.

HIPAA has additional requirements.

Just like their AI/Bot/Robot/Skynet Concierge service that's actually human powered. See tweet => https://twitter.com/kylelloydsf/status/738789931791699968

"A heads up, our team usually responds within 24 business hours (7am - 6pm)" lol.. I guess machines do dream of electric sheep after all ;-)

I am imagining "self-driving" cars driven by mechanical turks! Big halls with what looks like arcade car games, workers specialized in all aspects of driving, controlling cars on the road at the other side of the planet...

Too crazy?

You might be interested in https://en.wikipedia.org/wiki/Sleep_Dealer

thanks, I'll check it out

AI: fake it till you make it!

It would be better to just kick back to the user to resolve it. I used expensify and it took like 20 mins to process a few receipt images.. I ended up doing it myself anyway.

Expensify was founded in 2008. A decade of deception.

We will start seeing such news more often. Privacy is a real concern in the AI world and either by machines or humans, plain sensitive data is being processed in most of the cases. Usually humans require to access the raw data in order to label it to train or tune the algorithms. The data ends up being accessible and often, by cheap labour as this is a very expensive process in the current state of AI.

For anyone interested in solving this problem in their own business—we take privacy extremely seriously at Scale API (http://www.scaleapi.com) and implement numerous safeguards operationally and technically to ensure this doesn't happen.

> we take privacy extremely seriously...implement numerous safeguards operationally and technically to ensure this doesn't happen

I hate that this is going to come off as snark, but a lot of tech startups make claims about executing "rigorous background checks" (Uber et al), which gets proven time and time again is not true at scale.

So what specifically do you do to ensure this? How do I trust your organization?

Just look at the reputation they've built up over the years.

The company is a year old...

Probably forgot the /s

I'll be interested in understanding the technical safeguards as well, since I am breaking my head on how to do this. One of my projects was about doing OCR on some confidential satellite images and we didn't want to outsource the OCR process solely because we couldn't figure out a scalable way to obfuscate data selectively in images. One early attempt was to recognize numbers using MNIST, get the boundary and clip that boundary with a similar sized black square. This doesn't work for all forms of text though.

What text were you reading from a satellite?

Licence plates, credit card numbers, text messages, your page number in Game of Thrones, normal spy satellite stuff.


Seriously though maybe the text wasn't in the image as much as metadata attached to it? Something like the lat/long and time of the image? I don't really know, it's a good question.

Ok these were regular land images (I think at 10m resolution) which had some text descriptions on the images. We deemed the text to be confidential enough not to outsource. Def, not spy satellite stuff :)

Are you still interested in this? I might be able to help.

I should clarify, we didn't use satellites directly, we just used the images given by a third party.

You had text that computers couldn't OCR?

On your homepage: "Scale is the easiest way to build human-powered features. It's 100x better than Mechanical Turk."

What do you mean exactly by "100x better than Mechanical Turk"?

LOL. They must also have a patent on quantifying qualitative BS while badmouthing their competition. So professional too.

Interesting. Can you describe these safeguards in any level of detail? Letting a human see clients' personal info seems fundamental to the OCR transcription process whether it's MTurk or Scale.

I'm also interested in this, looking at Scale and their jobs page I don't see that they're hiring for any of their transcribers or people to turk any of their other services, so the humans are outsourced I assume.

Traditionally, this type of work is outsourced to the Philippines sometimes India and China (depending on language skills).

The "Scale" in your name ScaleAPI seems somewhat disingenuous given the fact that you're company seems to be based purely on using humans to do things.

Or self-host it and never send out any data. (I'm offering a self-hosted solution for OCR)

Curious, how do you meet HIPAA compliance when dealing with PII? And don't tell me you don't have enough PII to be an issue. Think Social engineering.

TL;DR: Expensify’s deceptive mechanical turk army may have resulted in me coming within seconds of losing $30k, and almost certainly leaves them exposed to massive liabilities as they wantonly give away personally identifiable information to low-paid contract workers that are not bound to confidentiality.

Throwaway account for obvious reasons. I have had my identity stolen twice, in both cases with the intent to steal access to accounts I use in connection with my business. Two relevant pieces of information: my employer pays for my phone, and I get reimbursements by submitting the PDF bill via Expensify. My employer is also a bitcoin company, and occasionally I have to buy bitcoin (a few hundred dollars worth) to top up our service’s wallet for paying transaction fees.

When the second incident happened a few months ago I had just come back from a long International trip. I work in the bitcoin industry and know not to keep coins on exchanges, but I also know not to travel through customs with >$10k or with bitcoin private keys on my person. So I had 8btc, about $30k at the time, loaded on my personal account on a certain exchange, in case (1) I needed emergency funds and was locked out of my bank account for ordinary fraud prevention snafu reasons; and/or (2) I needed to top-up company accounts again. It turns out I did need to do the latter.

Immediately on coming home I filed my expense reports for reimbursement. Expensify, as far as I can tell, does not allow a way to opt out of SmartScan” from the mobile app. In any case I thought this was some harmless excuse for the CTO to have a machine learning project. Little did I know. This report included both my phone bill for the months prior, and the bitcoin transfer, a print-out from the logged in view of my account on the exchange.

Very shortly thereafter (days?) my phone was remotely SIM-ported to another device. I got extremely lucky in that I was using the phone at the very moment it happened, and saw it go from full bars to “No service” while sitting in my chair. I knew what was going on because this was the 2nd time it had happened to me. I ran do the coffee shop down the street to get wifi, and found I could no longer login to my email account. Yup, hack in process. Attacker used SMS authentication on my cell phone to reset the password on the email.

Lacking a phone, I rushed to the nearest retail store for my carrier, and after a short conversation I had my phone service back. I had to answer some questions to regain my account, questions like “from which country did you recently make international calls, and to whom?” On getting full bars back on the device, I sat down and did a recovery of my email account, using the same process as the attacker now that I had regained control of my phone. I opened the “sessions” tab and logged out the other device, an IP I didn’t recognize from the other side of the country. 42 minutes elapsed from loss of phone service to account recovery.

Now here’s where things took a scary twist: I go back to my inbox, and as I watch I see an “Account reset requested” email from the company that provides 2FA services for my preferred bitcoin exchange. Inside the email is a link to confirm the request and receive a code to download the credentials (they save them to the cloud?! wtf!?) to a new device. Right below that in the inbox: “Password reset” from that bitcoin exchange, containing a clicked link (I presume, the email was read) that takes you their new-password form that requires 2FA. The 2FA input box has a link next to it that loads the 2FA app reset procedure that authenticates by email & SMS and then sends the root seed information.

As quickly as I could get back home to my secure, non-travel laptop with a read-only USB boot drive, I logged into my exchange account using the old credentials, and swept the $30k of coins (thankfully still there) to my cold storage. The paranoia was because at the time I didn’t know whether it was malware or what that had leaked enough information for the attacker to impersonate me. I spent the next few days reinstalling and virus scanning, resetting all passwords, and redoing my operational security to make sure this doesn’t happen again. (General advice: draw a flowchart showing what information you need to have to reset each of your credentials. Make sure it progresses from most trusted to least trusted, don’t have any loops, and keep your most trusted root credentials/passwords offline. Dollar-bill sized paper is excellent. Maybe an envelope with a $100 bill. Unless you are richer than I am, your instincts will make you secure that.)

But here’s what bugged me. Unlike the time before this happened to me, this was a hyper focused targeted attack. Even a few seconds later and those coins would have been gone. Unlike the last time the attacker didn’t try to get into anything else — just went straight to the exchange, and that particular exchange. Additionally, as this had happened before I had instructed my telephone company in no uncertain terms to never, ever do a SIM reset without authorization in person using the best available methods.

Now it comes together, and I am pissed about it. I was able to recover my phone by answering questions about its usage only I would know… like who I called, and what country I called from. Only those are listed right on the phone bill! A form of ID might be required, but seriously what do they check on that? The name and photo? Faked. The home address? Printed on the bill! And why did they go straight to the exchange? Because I submitted a screenshot of a transaction that included exactly how much hard currency I was stupid enough to leave on it. My email was easily googable (the perils of having a truly unique name). Those mechanical Turk contractors who processed my receipts had access to it all.

So Expensify, you’ve got some answering to do. I pretty clear now I came within seconds of losing a rather large portion of my nest egg and deal with identify theft resolution because you shared confidential and private information with hourly (minute-ly?) sub-contractors not subject to background review, oversight, or background checks, and certainly not subject to your commercial and third-party confidentiality agreements. You may try to argue they were, but any competent judge would reject the proposition that a worker being paid mere cents to fill out a text field from a picture as quickly as possible before moving onto another task from another company, was able to read, understand, before seek independent legal advice on your agreement before clicking through, assuming it got showed to them at all.

“Oh but your company signed our terms of service” you say. Yes, well the relevant stuff is in the privacy policy, referenced by the terms of service. Let’s look at that: Section 5, Disclosure of Personal Data. You say you only allow third parties (and further transfers) to access personal data in compliance with this same privacy policy, which forbids usage, like identity theft, not necessary to performance of services.

Maybe this is what slipped past your legal team: RECEIPTS ARE PERSONAL AND COMPANY CONFIDENTIAL INFORMATION. That phone bill? Personally identifiable information. That receipt for hosting expenses? That’s our raw cost, and company confidential information. Privacy of this information matters, and even if you tried to indemnify yourself in your policy, which you failed to do, you still run afoul of user protection laws, even in the company-friendly USA. There are stiff fines and civil liabilities for not taking sufficient safeguards to secure protected user data.

You damn lucky that I didn’t lose that $30k, or you better believe I’d be lawyering up. But I sincerely doubt I was the only one targeted either, and some probably successfully. Can you cover that liability? Are your investors ready to take that risk? I almost lost $30k. What about the assistant to the oil company CEO who expenses a coffee down the road from the headquarters of the competing oil company the rumor mill says they’re going to acquire? What happens when insider trading props up the price killing the deal, causing the company to enter bankruptcy and the shareholders file suit for the leak? You ready to carry that liability?

This is inexcusable. This is reckless. This is mismanagement of confidential company and personal data exposes you to knowable liabilities and reflects monumentally poor decision making. You’d better take steps NOW to disable this “feature”, issue apologies, provide tools for users to know what receipts were “SmartScanned” by your totally and completely not automated or in-house service.

And you’d best do that before you start facing lawsuits.

Well, there's not really any way someone could execute this attack. Receipts aren't logically grouped by reports based on someone's identity... They said it was part of an experiment and only included a small group of free users (if you're not paying, then YOU'RE the product) plus their employees.

That image is racist.

If you are referring to the image at the top of the article, I'm afraid it is not. It is a reference to the origin of the name for "Mechanical Turk" [1].

[1] https://en.wikipedia.org/wiki/The_Turk

Sorry, I ment the caption.

> Would you let this guy handle your benefit and business expenses?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact