Stripe cofounder here. The question raised ("Is Stripe collecting this data for advertising?") can be readily answered in the negative. This data has never been, would never be, and will never be sold/rented/etc. to advertisers.
Stripe.js collects this data only for fraud prevention -- it helps us detect bots who try to defraud businesses that use Stripe. (CAPTCHAs use similar techniques but result in more UI friction.) Stripe.js is part of the ML stack that helps us stop literally millions of fraudulent payments per day and techniques like this help us block fraud more effectively than almost anything else on the market. Businesses that use Stripe would lose a lot more money if it didn't exist. We see this directly: some businesses don't use Stripe.js and they are often suddenly and unpleasantly surprised when attacked by sophisticated fraud rings.
If you don't want to use Stripe.js, you definitely don't have to (or you can include it only on a minimal checkout page) -- it just depends how much PCI burden and fraud risk you'd like to take on.
We will immediately clarify the ToS language that makes this ambiguous. We'll also put up a clearer page about Stripe.js's fraud prevention.
(Updated to add: further down in this thread, fillskills writes[1]: "As someone who saw this first hand, Stripe’s fraud detection really works. Fraudulent transactions went down from ~2% to under 0.5% on hundreds of thousands of transactions per month. And it very likely saved our business at a very critical phase." This is what we're aiming for (and up against) with Stripe Radar and Stripe.js, and why we work on these technologies.)
Stripe customer here. The question raised is, more broadly, "Is Stripe collecting this data in a legal and ethical way?" This too can be readily answered in the negative.
It doesn't matter whether "Stripe.js collects this data only for fraud prevention" or if it works in practice. Under CalOPPA [1], Stripe still has to disclose the collection of the data, and (among other things) allow customers to opt out of collection of this data, and allow customers to inspect the data collected. Stripe's privacy policy refers to opt-out and inspection rights about certain data, but AFAICT not this.
Based on a plain reading of the law, several things about CalOPPA stand out to me. For one, it's not clear to me that the mouse movements in question qualify as "personally identifiable information". Mouse movements are not a first or last name, physical or email address, SSN, telephone number, or any contact method I am familiar with (maybe you know a way?).
Second, it seems to me that opt-out, right to inspect and update, and more are all contingent upon the data being PII within the scope of CalOPPA. Perhaps you can help me with something I've overlooked that would show me where I've erred?
Further, what do you think the correct legal and ethical way for Stripe to use mouse movement data would be? From your comment I can guess that you believe it should be treated as PII. Is that correct?
> Mouse movements are not a first or last name, physical or email address, [or one of a dozen other obvious examples]
You misunderstand what personally identifiable information is. Each individual letter of my name is also not identifiable, the letters of the alphabet are not PII, but when stored in in the same database row, the separate letters do form PII no matter that you stored them separately or even hashed or encrypted them. My phone number is also not something that just anyone could trace to my name, but since my carrier stores my personal data together with the number (not to mention the CIOT database where law enforcement can look it up at will), there exists a way to link the number to my person, making it PII. Everything about me is PII, unless you make it no longer about me.
Mouse movements may not be PII if you don't link it to a session ID, but then it would be useless in fraud detection because you don't know whose transaction you should be blocking or allowing since it's no longer traceable to a person.
Another example[1] mentioned on a website that the Dutch DPA links to (from [2]) is location data. Coordinates that point to somewhere in a forest aren't personal data, but if you store them with a user ID...
> You misunderstand what personally identifiable information
is.
Not to belabor a point discussed elsewhere, but those were not arbitrarily chosen types of PII. They are how PII is defined in the specific law that was cited - CalOPPA. The comment to which I responded contains a link. The text of the law contains its definition of PII.
Please accept my apologies. I can see I failed to communicate clearly and readers interpreted my statements as broad comment about what is or isn't PII across a union of all potentially relevant laws and jurisdictions. This was in no way, shape, form, or manner my intended meaning. Again, please accept my apologies for failing to be clear.
> Mouse movements may not be PII if you don't link it to a session ID, but then it would be useless in fraud detection because you don't know whose transaction you should be blocking or allowing since it's no longer traceable to a person.
Maybe it's just me, but I was under the distinction impression that some patterns of input are characteristic of humans and others of inhuman actors. Is it possible that a user could be identifiable as human or inhuman without having to know which specific human an input pattern corresponds to? Have I misunderstood something?
> [could one distinguish] human or inhuman without having to know which specific human an input pattern corresponds to?
You can't rely on the client asking the server anonymously and adhering to the response. If you want to avoid a connection to a "specific human", it would go like this:
Fraudulent lient: POST /are/these/mouse_movements/human HTTP/1.0 \r\n Content-Type: JSON \r\n [{"x":13,"y":148},...]
Server: that's a robot
Fraudulent client: discards server response and submits transaction anyway
To make sure the server knows to block the transaction, it has to tie the mouse movements to the transaction, and thereby to a credit card number (afaik Stripe does only credit cards as payment option), at least during the processing of the submission before discarding the mouse movement data.
I'm not arguing this is evil or mistrusting Stripe or anything, just that this is considered PII in my part of the world.
Huh? Client sends data to bot-detection server, server sends back a signed response with a nonce and an expiration date saying "Yep, this is a human". Server stores the nonce to prevent replays. Client attaches the signed validation when submitting the transaction. The server that receives that verifies the signature and expiration date, then checks and invalidates the nonce. No association between the transaction and the mouse data necessary.
I don't know if that's how Stripe is doing it, but you could do it that way.
1. Client sends mouse-data + card info to a server, server checks the mouse data, turns it into a fraudPercent, and only stores that percent. That seems to be what they're doing now.
2. Client sends mouse data, gets back a unique nonce, and then sends that nonce to the server with card info. The server could have either stored or discarded the mouse info. It's perfectly possible the nonce was stored with the mouse info.
Those two things seem totally identical. The nonce by necessity must be unique (or else one person could wiggle their mouse, and then use that one nonce to try 1000 cards at once), and you can't know that they don't store the full mouse movement info with the nonce.
You gain nothing by adding that extra step other than some illusion of security.
Note, cloudflare + tor has a similar problem that they tried to solve with blind signatures (see https://blog.cloudflare.com/the-trouble-with-tor/), but that hasn't gone anywhere and requires a browser plugin anyway. It's not a viable solution yet.
If you're going to go as far as "it's perfectly possible that the nonce was stored with the mouse info", then your example following:
> If you want to avoid a connection to a "specific human", it would go like this:
doesn't work either. It's perfectly possible that the server stored that info with the IP address and session information, since it also has access to those, and that could then be connected up with the transaction. I don't understand at this point what standard you're trying to meet, because it sounds like by what you're saying, literally any data sent to a server is "PII" if at some point that server also can, in principle, know your name.
I don't think it's PII. My point is just that your scheme of signed tokens doesn't avoid an association. There isn't a way to.
And that's fine because it's not PII and it's the only way to implement this (in my mind). What you're proposing is just shuffling around deck chairs, not actually sinking the ship.
Oh, I mistook you for the previous commenter. Yeah, I agree that what I proposed doesn't really buy you anything unless you for some reason need the mouse data not to touch the server that's processing the transaction, which seemed to be what they were saying was required. There are multiple layers to why what they're saying doesn't make sense.
> You can't rely on the client asking the server anonymously and adhering to the response. If you want to avoid a connection to a "specific human", it would go like this:
I'm afraid I don't understand. Maybe you can help me? Seems to me you could not store things, you could require a signed and expiring token from the /are/these/mouse_movements/human service, or you could treat the request as super risky without that signed token. I'm sure there are others, I am known to suffer failures of imagination at times.
> To make sure the server knows to block the transaction, it has to tie the mouse movements to the transaction, and thereby to a credit card number (afaik Stripe does only credit cards as payment option), at least during the processing of the submission before discarding the mouse movement data.
I'm clearly wrong, but doesn't the logic here only work if the mouse movements are identifiable in the same sort of way that a phone number is? What happens if that's not accurate and mouse movements from a session are not so personally identifiable? What have I failed to understand? Wouldn't this logic also make transaction timestamps PII?
You keep using that ridiculously apologetic tone that really rubs me the wrong way while making constructive remarks. If you could lose the former without the latter, I might actually appreciate your replies. But then, I'm reasonably sure that it's meant to annoy.
> Seems to me you could not store things, you could require a signed and expiring token
You didn't read the law I was talking about that was specifically and clearly linked in the initial comment to which I responded. The comment in question made a specific claim about a specific law in a specific jurisdiction to which I responded narrowly and specifically. My comment referred clearly to the law in question and summarized points from it.
All points about other laws in other locations are irrelevant to the specific points I was offering discussion of.
> That's actually a good idea.
It is... provided that a handful of mouse movements actually qualify as PII. Which, as claimed here under CalOPPA, seems like it might be doubtful. As others have pointed out, there's room to doubt that a few mouse movements would be considered PII under any current regulatory regime (there are multiple notable ones, they don't agree on all points).
As an approach, it's useful for things like SAML and OAuth protocols when you're dealing with different systems controlled by different parties and need to delegate trust through an untrusted party. It's rarely the best way to move data around inside a system, though, unless you have some compelling reason to introduce this level of blinding.
Your feigned "maybe you can help me?" reads more like sealioning than like a genuine lack of understanding.
However, sure, I'll humour you. A "signed and expiring token" is not sufficient because then a single attacker could use that token to try 1000s of cards before it expires.
Thus, you need a unique token, and wherever you store that unique token (to invalidate it, akin to a database session), you can optionally store the mouse movements or not. The association still exists. A unique token isn't functionally different from just sending the data along in the first place.
Really, you read that as being patient? To me it seems to be an obvious attempt to rub the person they're replying to entirely the wrong way while feigning ignorance.
I would flag it as attempting to trigger others if each reply did not also contain one or two constructive sentences.
> with people who don't seem to have a good understanding of the law
"People" had a fine understanding of applicable PII law, but the person clarified (in between a bunch of bullshit about how godforsaken sorry they are) that they were talking about some USA thing specifically and not the broader definition.
To make sure the server knows to block the transaction, it has to tie the mouse movements to the transaction, and thereby to a credit card number (afaik Stripe does only credit cards as payment option), at least during the processing of the submission before discarding the mouse movement data.
Which is absolutely fine by the law if it isn't stored tied to PII.
GDPR doesn't apply only to storage, though? Maybe I'm confusing it with the previous data protection directive but I'm pretty sure the new GDPR also defines PII processing to include things like transmitting and operating on it.
But if there is some source (e.g. case law, data protection authority) that confirms that you can process two pieces of data and keep one as non-PII if you promise not to connect them in storage or forward them to another place in an identifiable manner, that would be interesting.
> But if there is some source (e.g. case law, data protection authority) that confirms that you can process two pieces of data and keep one as non-PII if you promise not to connect them in storage or forward them to another place in an identifiable manner, that would be interesting.
It would be impossible to follow the GDPR otherwise, all data would implicitly be PII, since all data is associated with an IP address and GDPR defines IP as PII.
> GDPR doesn't apply only to storage, though?
This doesn't matter, because you can always collect data for business critical purposes, which fraud protection reasonably is.
but you are giving stipe pii when you buy something directly. at that point the mouse movement is nothing. and if you dont buy something the mouse movement is not pii.
> but you are giving stipe pii when you buy something directly. at that point the mouse movement is nothing
1) but that's not how the law works
2) law aside, I'm also not sure it holds up ethically to say "you're giving them <some info necessary to fulfill your payment>, what's wrong with giving them <unnecessary other data>". Now, if you say "but it's not unnecessary, it's for anti-fraud!" then sure, that's a different argument: then the argument is not that you might as well give it because of something else you gave but because it's necessary for fraud. They could still do the courtesy of telling users before tracking them (which might bring us back to the legal argument, which tells us that is is indeed necessary to do so).
Mouse movements may not be PII if you don't link it to a session ID, but then it would be useless in fraud detection because you don't know whose transaction you should be blocking or allowing since it's no longer traceable to a person.
Surely the point of mouse movement detection for anti-fraud is more "did the mouse move in an exact straight line to the exact center of an element and therefore isn't a human" or "the last 3 orders on this site used exactly the same pattern of mouse movements therefore is a recording" rather than some sort of "gait detection" to tell who someone is.
The purpose of processing the individual mouse positions over time may be exactly that, but I'm not sure that the intent matters. For example, a website asking for my social security number for the sole purpose of verifying whether it matches the checksum (Dutch SSNs contain a checksum) would still be processing my SSN, no? I'd be interested if I'm wrong, though.
> Each individual letter of my name is also not identifiable, the letters of the alphabet are not PII, but when stored in in the same database row, the separate letters do form PII no matter that you stored them separately or even hashed or encrypted them.
This is a correct statement, but it's implied suggestion that Stripe is doing this is incorrect. There are lots of ways around this: no storing specific keys and hashing input would be my initial impressions.
My guess is Stripe is more concerned about the action patterns than the specific keys that a being pressed.
> Mouse movements may not be PII if you don't link it to a session ID, but then it would be useless in fraud detection because you don't know whose transaction you should be blocking or allowing since it's no longer traceable to a person.
This is an opinion and not a fact.
I don't need to know the identity of the guy wearing a balaclava and carrying a pillow case to know if that guy is in a bank and reaching into his jacket pocket, there's a high likelihood he's robbing the place.
When he shows up at the next place to rob, I don't have to have any PII on him to identify him as a robber. Might not be the same robber at both banks, but they both exhibit similar patterns. If they both limp or talk with a slur, I can reasonably connect the two without knowing the underlying identity.
Yeah? It is clearly personally identifiable. In fact it is pyschologically identifiable when Stripe can associate that with your name, credit card, Ip address, time of the purchase, the vendor, type of the item, how you get to the store, the items you are paying for, how much time you spent on the item or the store, which links you clicked, the browser you are using, the device you are on, your location etc. Do you want me to list all the possibilities they are recording?. You are out of touch with reality here.
Counter-point: if your business is selling digital or physical product into New Zealand the NZ tax department requires you to collect two different types of data for the transaction that prove the customer is located in NZ. This can include IP address, phone number, address.
So, in some instances, Stripe is legally required to collect some of this data.
How does an IP address (in a world with VPN) or a phone number (which is likely to be mobile and could be located anywhere in the world) "prove that the customer is located in NZ"?
You mean on a touchscreen device, or because of a physical disability? Because the latter case seems exceptional enough that I'm not sure how that would legally work (do you have to think of all possible edge cases? What if someone uses dictation because they can't type, does that mean you'd potentially capture social security numbers if you use the microphone for gunshot detection and process the sound server-side?) and in the former case I'm pretty sure taps on a keyboard are not registered as a mouse movement in JavaScript.
> or because of a physical disability? Because the latter case seems exceptional enough that I'm not sure how that would legally work
There have been a number of accessibility-based lawsuits recently. Generally speaking, yes, you absolutely have to allow for them to use an alternative system without locking them out.
Because if your particular methodology breaks things for a people group that way, all kinds of discrimination laws become a hammer that someone can toss your way.
> allow for them to use an alternative system without locking them out
That's not what I'm arguing against, though. I was not saying: forbid screen readers. I said:
> do you have to think of all possible edge cases? What if someone uses dictation because they can't type, does that mean you'd potentially capture social security numbers if you use the microphone for gunshot detection and process the sound server-side?
They are a minority so its likely easy to account for, stuff like tracking them by learning their IP and transaction history to mark them with certain degree of trustability; on the other hand tracking mouse movements and other techniques are essential for users you have no record of (new ip, new user, new cc, etc)
mouse movements will contain personally identifiable information if the user has any kind of writing to text system turned on. You definitely can't rule it out. (not a lawyer) I think what stripe is doing is illegal
> first or last name, physical or email address, SSN, telephone number, or any contact method I am familiar with (maybe you know a way?)
What about a face? Fingerprints? Voice? Aren't those identifiable information even though it didn't make your (common sensical) short list? Mouse movements are on the same order of specificity.
It's less my short list and more the one in the text of the law being cited. Other things, such as finger-, voice-, and face-prints were probably not contemplated by lawmakers in 2003 and thus go unmentioned. They may fall under the "maintains in personally identifiable form in combination with an identifier" clause, though.
Of course, that also provides an easy way to comply. Don't store mouse movements in a way that ties them to PII under CalOPPA, and you don't meet any criteria.
That's definitely a question of implementation, policy, compliance, and liability. You are absolutely correct.
The law in question also requires data to be maintained in personally identifiable form. I am uncertain if a small number of mouse movements is likely to reach this. I do not see how, but that's not a reason why it cannot be so.
Not a lawyer, but not that surprised that the laws you refer to are growing technical loopholes. Here are a couple things that mouse movements can identify in case no one knows what I'm talking about:
Thank you for bringing hard research to this discussion!
I find it interesting that the one that contemplates authentication requires supervised machine learning and goes on to explicitly state that "analyzing mouse movements alone is not sufficient for a stand-alone user re-authentication system". Taken together, this suggests that a sizable corpus of mouse movement data known to be associated with one user may qualify as PII under some definitions.
Again, thank you for sharing this timely information.
This is how we can say mouse movements can lead to privacy violation: mouse movements as such doesn't contain PII like name, zipcode or gender. But when mouse movements are run through the machine learning algorithm, it can NOT only help you to identify the person (mouse dynamics are behavioral factors and you can map across different sites. By mapping across different sites, you will learn basically the same person is surfing these three sites and valuable information for advertising world, as an example) but you can analyze the mouse movements to identify your health issues. Now you take this information and link to other publicly available databases to identify the person!! So, overall, if stripe doesn't sell this data to analyze other patterns like id or health issues, its fine...but guaranteeing it is hard.
So at Unknot.id, we learn similar patterns to detect fraud but using smartphones. But we make sure, only needed results (that is fraud or not) can be achieved and not his health or other privacy related.
https://stripe.com/privacy describes what we do in some detail (including disclosing that we use this kind of browsing data).
More broadly, I assure you that Stripe.js and our fraud prevention technologies are very carefully designed with full compliance with the relevant California (and other) statutes in mind. I’d be happy to connect you with our legal team if you’d like to discuss this in more detail. (I'm patrick@stripe.com.)
For your European customers you should likely make it more clear what stripe.js does before urging them to install it on every page of their website. Using it as soon as a user has a probable interest to purchase a product (e.g. when he/she clicks on “Register” and chooses a plan) would very likely be acceptable as a legitimate interest under GDPR, tracking all users even if they don’t have a clear intent of purchasing something and e.g. only want to get information about the product will definitely not be acceptable as a legitimate interest and would therefore require clear consent first.
Oh, offer was made in case GP wants to have a deeper discussion/back-and-forth than is readily achievable with an online forum. Timing constraints notwithstanding, we work hard to answer questions on HN too.
From a legal perspective, isn't the burden of communicating privacy to the customer on the website/content provider, not Stripe?
Stripe.js is an API -- developers use this API to build something used by their customers. The customer is the one who's data is being collected, and the developers are the one's facilitating that collection via their service. The fact that it got sent to Stripe is not really relevant to who bears responsibility on clarifying data rights to the customer.
It’s specifically different in this case: a big part of Stripe's value to a web vendor is that Stripe can collect credit-card info directly from the buyer (thereby exempting the vendor from PCI compliance and other issues related to storing and processing CCs).
"The simplest way for you to be PCI compliant is to never see (or have access to) card data at all. Stripe makes this easy for you as we can do the heavy lifting to protect your customers’ card information." [1]
Interesting question whether Stripe incurs statutory privacy duties to the web vendor and the buyer separately. I would imagine so, because given the "triangular" nature of this kind of Stripe transaction, Stripe ends up collecting data from two parties.
The data is collected by Stripe, though. The content provider doesn't have access to the mouse movement data, and might not be even aware of that the data is collected.
All this existing legislation is about protecting personal data, not any data.
If what is collected is not linked to an individual and does not allow to identify an individual then these are not personal data and the legal point is moot.
You mentioned legal and ethical but only addressed the legal side. Of course legality and ethicality are not the same, can you also address the ethics side?
If Stripe are indeed tracking mouse movements to detect bot traffic (which is plausible) then that seems broadly ethical and reasonable from an ethical perspective.
In the same way that if the government is tracking our private conversations to detect human traffickers then it's broadly ethical and reasonable from an ethical perspective?
Can you explain how the privacy violation of tracking mouse movements on a subset of online markets is similar in scope or substance to tracking all conversation?
I see a number of obvious differences: I can opt out of purchasing from stripe-managed sites, while I cannot opt out of dragnet government monitoring. I can imagine less invasive ways of stopping human trafficking, while I cannot for fraud prevention, etc.
Still seems unethical if the collected data is stored or stored in a processed form. The same technology can be used to uniquely identify users across devices.
> The question raised is, more broadly, "Is Stripe collecting this data in a legal and ethical way?" This too can be readily answered in the negative.
I think the real issue is that in the U.S. most people's views on things like basic analytics software aren't shaped by reading books on digital privacy or taking classes on privacy law, but rather by some vague cultural memory of the holocaust. It makes it very difficult to have a rational discussion with people.
In my opinion, there's no moral issue with doing this. Fighting fraud and other kinds of cybercrime is an endless cat-and-mouse game. Although there are very bad associations with it, one simply does need to use fingerprinting and supercookies/"zombie cookies"/"evercookies" if they want even a fighting chance.
I think if it's being solely used for such security purposes, isn't shared with or sold to anyone else, and is carefully safeguarded, then it's okay. The main risk I see from it is mission creep leading to it eventually being used for other purposes, like advertising or tracking for "market research" reasons. I don't personally think it's likely Stripe would do this, though.
> I think if it's being solely used for such security purposes, isn't shared with or sold to anyone else, and is carefully safeguarded, then it's okay. The main risk I see from it is mission creep leading to it eventually being used for other purposes, like advertising or tracking for "market research" reasons. I don't personally think it's likely Stripe would do this, though.
Is this view conditional on the type of data Stripe is currently collecting or would it apply to any data Stripe collects? Would this be true if Stripe began recording every keystroke in the app and hooked every XHR request to my backend server and sent copies to Stripe?
I agree that Stripe has a sensible reason for using this data. If I started seeing a high rate of chargebacks, I'd consider enabling Stripe on more parts of my site so that Stripe could consume user behavior earlier on to detect fraud.
My issue is that if there's no agreement about what data Stripe is allowed to collect and what their retention policies are, then the implicit agreement is that Stripe can just collect anything it has access to and hold it forever.
As JavaScript running on my page, Stripe.js has access to basically all user data in my app. There are certain types of user data I would not be comfortable sharing with Stripe, even if it improved fraud detection, so I'd like there to be clear limits on what they're gathering.
Yes, I would say it's conditional. They should be more explicit about what data they're collecting from users. Opaque enough to not reveal all of the exact techniques, but clear enough so site owners can make an informed decision. (Someone dedicated and experienced enough could probably reverse engineer Stripe.js and figure out everything it's doing if they really wanted, but they're also probably updating it regularly.)
There's no need to rely on "security by obscurity". Stripe.js is just a thin client-side library everybody can analyse, so they might as well be fully transparent when it comes to data collection. I don't see the point of trying to obfuscate things, especially since the actual fraud-detection model works on the backend anyway.
With these kinds of adversarial things, I think it's a mix of frontend and backend.
It's a library everyone can technically analyze, yes, but by 1) using ever-changing obfuscation that requires a lot of work to RE, and 2) constantly changing the client-side logic itself, it makes the work of the adversaries a lot harder and more tedious, and means either fewer of them will consistently succeed, or more of them will be forced to become more centralized around solutions/services that've successfully solved it, which means Stripe can focus-fire their efforts a bit more.
Of course there's also a lot going on on the backend that'll never be seen, but the adversary is trying to mimic a legitimate user as much as they can, so if the JavaScript is totally unobfuscated and stays the same for a while, it's a lot easier for them to consistently trace exactly what data is being sent and compare it against what their system or altered browser is sending.
It's cat-and-mouse across many dimensions. In such adversarial games, obscurity actually can and often does add some security. "Security by obscurity is no security at all" isn't exactly a fallacy, but it is a fallacy to apply it universally and with a very liberal definition of "security". It's generally meant for things that are more formal or provable, like an encryption or hashing algorithm or other cryptography. It's still totally reasonable to use obscurity as a minor practical measure. I'd agree with this part of https://en.wikipedia.org/wiki/Security_through_obscurity: "Knowledge of how the system is built differs from concealment and camouflage. The efficacy of obscurity in operations security depends by whether the obscurity lives on top of other good security practices, or if it is being used alone. When used as an independent layer, obscurity is considered a valid security tool."
For example, configuring your web server to not display its version on headers or pages is "security by obscurity", and certainly will not save you if you're running a vulnerable version, but may buy you some time if a 0-day comes out for your version and people search Shodan for the vulnerable version numbers - your site won't appear in the list. These kinds of obscurity measures of course never guarantee security and should be the very last line of defense in front of true security measures, but they can still potentially help you a little.
In the "malware vs. anti-virus" and "game cheat vs. game cheat detection software" fights that play out every day, both sides of each heavily obfuscate their code and the actions they perform. No, this never ensures it won't be fully reverse engineered. And the developers all know that. Given enough time and dedication, it'll eventually happen. But it requires more time and effort, and each time it's altered, it requires a re-investment of that time and effort.
Obfuscation and obscurity is arguably the defining feature and "value proposition" of each of those four types of software. A lot of that remains totally hidden on the backend (e.g. a botnet C2 web server only responding with malware binaries if they analyze the connection and believe it really is a regular infected computer and not a security researcher or sandbox), but a lot is also present in the client.
Thanks for a thoughtful reply (upvoted), but have you looked at the library in question? The code is minified, but there is not much obfuscation going on: https://js.stripe.com/v3/
Most of your examples are quite low-level, but it's much harder to keep things hidden within the constraints of the browser sandbox when you have to interface with standard APIs which can be easily instrumented.
Yeah, theirs is far less obfuscated than most fraud/bot detection libraries I've seen. I believe almost all of the JS code I've seen from companies that primarily do fraud detection and web security is pretty heavily obfuscated. Here, it looks like Stripe.js is doing much more than just the fraud stuff - this is their client library for everything, including payment handling.
I haven't analyzed it and can't say this with any certainty, but my guess is that you're probably right: they're focusing primarily on backend analysis and ML comparing activity across a massive array of customers. This is different from smaller security firms who have a lot less data due to fewer customers, and a kind of sampling bias of customers who are particularly worried about or inundated by fraud.
They may be less interested in suspicious activity or fingerprinting at the device level and more interested in it at the payment and personal information level (which is suggested by articles like https://stripe.com/radar/guide).
Pure, uninformed speculation, but it's possible that if they get deeper into anti-fraud in the future (perhaps if fraudsters get smarter about this higher layer of evasion), they might supplement the data science / finance / payment oriented stuff with more lower-level device and browser analysis, in which case I wouldn't be surprised if they eventually separate out some of the anti-fraud/security parts into an obfuscated portion. (Or, more likely, have Stripe.js load that portion dynamically. Maybe they're already doing this, even? Dunno.)
I’ve implemented my own Stripe checkout for a native application in just a couple of hours, using their REST API. There’s nothing stopping everyone else from doing the same: it’s literally how you used to integrate with payment gateways before Stripe came along. No one gave you a JS library to use on your website.
In my opinion, there _is_ a moral issue. Not in that they collect this information for fraud prevention; that seems like a reasonable use for that data. It's in not having informed consent, in not having a clear document describing what is collected and when it is purged. And that document would need to be consumer-facing (since it's not the vendor's behaviour being tracked).
Responding after being caught is… good, but not as good as not needing to be caught.
This is a fair call-out. We have actually worked pretty hard to ensure that our Privacy[1] and Cookies[2] policies are clear and easy-to-read, rather than filled with endless boilerplate jargon. But we still did make a mistake by not have a uniquely clear document covering Stripe.js fraud prevention in particular.
Could you explain in plain language how this is different or the same as what a credit card company does?
My outsider understanding was that credit card companies happily sell your purchase history or at least aggregate it for marketing, in addition to using your purchase history model to predict if a purchase is fraudulent.
Stripe’s very readable privacy policy makes a clear statement on this:
Stripe does not sell or rent Personal Data to marketers or unaffiliated third parties. We share your Personal Data with trusted entities, as outlined below.
From that and my reading of the rest, I think the answer is clearly no. Also I doubt the data of consumer purchases on Stripe integrated websites is even that valuable to begin with. At least compared to Stripe’s margins.
That's true. They should give more clear and explicit information so site owners can make an informed decision. Including the difference in what's collected if the script is included on just the checkout page(s) vs. on every page.
I am so sick of informed consent and cookie and GDPR etc. popups and banners and forms and checkboxes. I could not care less and neither could most people out there. This crap is ruining the internet for no tangible benefit to the inexplicable thunderous applause of people on tech websites. It didn't hurt anyone when Sears collected rewards data for advertising and it never hurt anyone when web companies used data from user interaction. A simple static webpage is going to end up impossible for anyone but a megacorp to run legally if we keep going down this nonsensical path.
Imagine I mailed you an unsolicited letter and you were legally required to burn it and never say or benefit from what was inside just because I said so. That's the insanity of these "privacy" laws.
I agree with you in general, but this is a big step up. This is essentially the most invasive, intrusive technology that can possibly be deployed on the web - because fraudsters (and other cybercriminals) use the most tricky, dynamic evasion techniques.
And this is regarding website owners adding a script that may run on every page of their site; the consent is for the website owners who are using Stripe and deciding how/if to add their script to their pages.
Or you could just not collect information you don't need? You don't have to ask consent if you just don't do it, you know. The pop-ups are annoying because the website owners want you to just click through. Ever seen one of those where you have to uncheck every single box? Yep, those violate the GDPR. The default setting should be no advertising or other bullshit data, and opt-in if you want it. Which no one ever does. Hence the violations. Get mad at the manipulative ad companies, not the people who for once produced an OK piece of regulation.
He's not the guy who is collecting data. He's the guy whose data is being collected. And I agree with him. True choice is not imposing this cost on everyone. Let me set it in my browser. Then I'll consent to practically everything and you can consent to nothing. And since it's set at your user agent you can synchronize that across devices easily.
If I never see another damned cookie popup I'd be thrilled.
The cookie law is just insane to me. GDPR, or at least the parts that are commonly talked about, seems a lot more reasonable: a user should be able to request what data is being collected about them, and should be able to request a full account deletion, including deletion of all data collected from or about them (perhaps minus technical things that are very difficult to purge, like raw web server access logs).
> a user should be able to request what data is being collected about them, and should be able to request a full account deletion, including deletion of all data collected from or about them (perhaps minus technical things that are very difficult to purge, like raw web server access logs)
I think I'd find it very easy to like this. Honestly, these aspects of GDPR are great. Things I don't like:
* Not allowed to do "no service without data"
* Consent must be opt-in
Bloody exasperating as a user. At least if they'd set it in my user agent. But the browser guys just sit there like fools pontificating on third-party cookies instead of innovating for once and placing the opt-in / opt-out in the browser.
Is this not what the DNT (Do Not Track) header was attempting to achieve before it was essentially abandoned (after being implemented in all major browsers)? Genuinely curious what sort of user agent approach you're looking for.
Actually, I've changed my mind. I think people fall into either the advertiser+publisher camp who don't want this in the browser chrome because it will make it too easy to full opt-out and the browser guys don't want it there because they actually just want the advertisers to die out. What I'm asking for is not a stable equilibrium in any way so it's a pointless thought experiment.
Pretty sure I'd get an option of "free X more articles if you give us your data to sell". Not getting that is annoying because I was fine with giving away my data for articles.
The problem is that imposing the unsafe choice (aka tracking being on by default) puts people who'd rather opt out at risk (because their data is being leaked), while the current situation merely puts an annoyance to people who are happy to opt-in.
As far as the cookie popups go the majority of them are not actually GDPR compliant. Tracking should be off by default and consent should be freely given, which means it should be just as easy to opt-in as it is to opt-out. If it's more difficult to say no than yes then the consent is invalid and they might as well just do away with the prompt completely since they're breaking the regulation either way.
Interesting question, since the web used to exist and work just fine before online advertising. I'm not saying that we should go back in time, but claiming that ads are a requirement for the web to exist is a slight overstatement.
> Interesting question, since the web used to exist and work just fine before online advertising.
For what definition of "work"? There were static informational pages and....not much else. Content that requires upkeep requires revenue requires either ads or access fees, usually.
Reddit for example has nothing to sell me directly unless it was subscription-based which is a nonstarter. There's no other model for sites like that besides maybe browser-based crypto mining.
I wouldn't be surprised if a high percentage of reddit users use something like uBlock. I think universal ad blockers are going to slowly become more ubiquitous over time, too.
People have been trying to find ways to skip TV commercials for decades. It's going to be the same with ads. When it comes to our own personal devices, advertisers can't really win in the end. They're going to have to stick to things like billboards and other things put up in cities, but even those are being protested and banned in many places.
In theory, what about reddit can't be decentralized? All it stores is text and URLs to other content. There isn't all that much actual processing or computation going on, as far as I know, besides some rank calculation stuff. Am I wrong about this?
In that case, it comes down to figuring out how to pay the developers and some kind of election process for admins. But with a site with hundreds of millions of monthly active users, surely they'd be able to figure something out. Like each user who donates $10 or more gets a little perk.
And even without decentralization, micropayments and premium perks are already a much more promising model. Lots of people are buying reddit's silver/gold/platinum/a bunch of others awards. Tinder is free by default and manages to make loads of money without showing any ads. I don't think ads are going to be a sustainable model in 10, 20, 50 years from now. I think service providers are just going to have to figure out ways to provide value to users in exchange for money, like most "meatspace" companies do.
It wouldn't be a non-starter if no other site could do the same thing without also charging for a subscription. Services like Facebook, Reddit, and Instagram all provide a service that many people find valuable. Let people pay for it.
Media businesses have been funded by advertising for hundreds of years (since the start of regular newspapers in the 1600s at least)[1]. Many internet businesses are more like media businesses than shops.
That's not going to work for plenty of services. Most people (if not everyone) are not going to pay for search, social network, instant messaging, maps, mail etc.
That seems like quite a big assumption. Younger generations today think nothing of spending $xx/month on their phone/data plans and another $x/month on each of Netflix/Spotify/etc. It's not hard to imagine the same people paying real money for social networking sites they value. Search could obviously still do advertising even without any personal data mining, since it knows exactly what you're interested in at that particular moment. Useful informational sites could run ads without the privacy invasion and tracking as well, since they also are aimed at specific target audiences. Plenty more sites would continue to run without a (direct) goal of revenue generation anyway; I see no ads on the free-to-use discussion forum that we're all reading right now.
This idea that the only viable business model on the web is spyware-backed advertising is baloney, and it always has been. There is little reason to assume the Web is a better place because the likes of Google and Facebook have led us down this path, nor that anything of value would be lost if they were prohibited from continuing in the same way.
Why would they not? If someone wants to be able to use a social network, do you really think they wouldn't pay $5/month for something they use as much if not more than Netflix? You can't do it now because other services can undercut you and rely on advertising but there is no reason it couldn't be the standard.
My browser is setup to record no history or cookies.
It can be annoying to always have to dismiss the same popups you've dismissed before, but I've never had any issues with online payments or unnecessary captchas, including using stripe.
Because fraudsters' browsers/clients/scripts are also set up to record no history or cookies, and otherwise evade detection/categorization as much as possible. Somewhat ironically, in order for them to accurately distinguish between privacy-conscious users like yourself and actual criminals, and to block criminals from making a purchase while not incorrectly blocking you, they need to collect additional data.
If you don't commit fraud, the only two issues you'll see are that:
1) a small subset of sites will refuse to complete the transaction, as their anti-fraud thresholds are set to deny likely-fraudulent browsers such as yours; and,
2) you will be much more easily fingerprinted and tracked online due to your specific combination of extremely uncommon non-default settings in your browser (which may well mitigate #1 if you're originating from a residential IP address).
If you purchase high-value audio gear or clothing or gift cards — basically, high value things can be resold on eBay immediately — you may find your transaction held for review while someone phone calls you to prove that you're you, but for everyday Amazon or etc. purchasing it's unlikely to matter at all.
> Because fraudsters' browsers/clients/scripts are also set up to record no history or cookies, and otherwise evade detection/categorization as much as possible
Ah, right, bad guys use privacy-enhancing tech, so we'd better undermine it, even if it screws over legitimate users. You know what fraudsters also tend to use? Chrome. Let's block that, shall we?
Fastmail account recovery keeps an "evercookie" which is "first time account X successfully logged in from this device" which allows us to identify that you're using a device with a long history with the account when trying to recover your account after it was stolen. Obviously we don't want to re-authenticate somebody who first logged in yesterday, because that's probably the thief - but if your computer has been used successfully to log in for the past few years, then it's more likely that the recovery attempt is coming from you (obviously, that's still just one of many things we're checking for).
Do they really? evercookie generally has a specific definition, where the application attempts to persist that chunk of data in heavy-handed, abusive, malware-like ways and repopulating it on removal with the same token when possible; usually used in fingerprinting concepts - it isn't just a normal http cookie with the expiration date set years out.
A blanket statement "we need privacy invasion to have any chance against fraud, it cannot be done without, period." without argumentation about why we need this against fraud isn't very constructive.
For example, in my experience: user pays, website gets money, website releases product. It's the user that could be defrauded, not the website. I never heard of fraud issues from the website owners' perspective in the Netherlands where credit cards are just not the main method to pay online. Fraudulent Paypal chargebacks, sure, but iDeal.nl or a regular SEPA transfer just doesn't have chargebacks. It would appear that there is a way to solve this without tracking.
Hey pc, good to see you here on HN. Things like this bother me as a Stripe customer who advocates strongly for privacy. I've asked repeatedly for options which let me have more control over what exactly is happening on my page - or to have a JavaScript-free flow on Stripe.com that I can redirect users to in order to complete their card details. Another easy option would be to use subresource integrity so that I can audit each release of Stripe.js, but your team has turned this down, too. Of course, I could go full PCI, but PCI compliance is a big burden for small businesses. Do you have any plans for making Stripe more accomodating of users with privacy concerns?
Hi ddevault -- we would like to do this, and I'm very supportive in principle (and, per GP, we are perfectly fine with anyone not using Stripe.js), but our current product/engineering focus is on trying to build better tools for the businesses who are losing tens or hundreds of thousands of dollars to fraud. We think we have to first help the businesses who need help immediately. We'll probably then circle back to build products that explore more points on the [efficacy of fraud prevention] - [PCI burden] continuum.
Thanks for the info, pc! I'm worried that this is a dismissive answer, though. Stripe has been in business for 10 years, and fraud has been and will continue to be a constant battle for you. When can I expect to start seeing other problems like this prioritized?
Definitely no dismissiveness intended -- apologies. While Stripe has been in business for 10 years, Radar (our fraud prevention tool) has only existed for 3.5. We've made a good deal of progress in that time and I would guess that it's 1-2 years away from being sufficiently complete that we can start to seriously focus on things other than fraud. (As it happens, I just had a conversation about this with the guy who leads it.)
I'm in the same boat with my communication based startup. I'm being very cautious about any third-party interaction and the heavy activity from the stripe JS wasn't compatible with that.
I had to take some goofy steps to ensure that the Stripe components were limited to just the payments page and didn't bleed over into anything else.
FWIW, reading this I thought "Isn't that for fraud detection?" and then I got to the part where your agent says "it's for fraud detection"...
I'm normally against this sort of thing (even though everybody does it, it seems like) but in this case, it's clear that this really is "works as intended" at least to me, FWIW.
My thoughts exactly as I was reading. Analytics gathering that’s not related to UI improvements wouldn’t bother to collect information on pointer movements.
But it turns out to be a pretty good tool for bot detection. That’s why you can now just check a box to verify you’re human (Something about that sentence feels quite dystopian).
I just don’t see the issue with Stripe’s practice here. They have a clear business model, and selling user data could severely undermine that model.
Likely many developers would opt-in when business sees the 5% surcharge for extra fraud.
And regardless, stripe explicitly says in their documentation that you should include stripe.js on every page of your app, so they can do tracking of pointers movements for fraud detection. This has not been hidden in any way from devs.
Glad to hear you're going to clarify that language in the ToS.
I'm interested to know if you're open to implementing mechanisms that limit what data Stripe collects within my app. I'm happy to help Stripe prevent chargebacks against my app, but I'd like to be in control of what parts of my app's data I hand over to help achieve that rather the current situation which basically grants the library carte blanche to vacuum up whatever data it wants.
This should be a case study in how to properly handle bad press. Of course, it helps when you’re Doing The Right Thing™ to begin with, as it limits your recovery to simple reassurance of the user base. Still, it appears this is being handled exceptionally well.
> This data has never been, would never be, and will never be sold/rented/etc. to advertisers.
This is kind of a straw man. These valuable data sets are typically kept by tech companies to keep a competitive edge. For example, not even Google sells or rents user data.
The more relevant question is "is Stripe's valuation significantly predicated on revenue it can extract from the surveillance data it's collecting?"
My guess is that the answer to this is likely yes. Fraud prevention is the current product built on this data. But it would be shocking if the company never put the data set to additional uses.
> Is Stripe's valuation significantly predicated on revenue it can extract from the surveillance data it's collecting?"
No, it's not. This telemetry is useful for helping businesses avoid crippling fraud losses and we don't use or plan to use it for anything else. I don't think investors even know about it.
We're perfectly happy with the business model we currently have!
It's not super fun per se but Stripe is an important infrastructure service and scrutiny comes with the territory. I'm always happy to answer questions.
Well, you are just 1 person though. Can you speak for the (potentially secret, never spoken until it is that time) mind of other founders and the ideas of investors now and forever in the future? That seems rather unlikely.
The problem is also, that the damage done, when/if the data at some point is sold or hacked or whatever, cannot be properly compensated. What is your personal skin in the game, if that happens, besides going for a new, probably high paid job ('cause of impressive CV of being founder of Stripe)?
To be honest, Stripe does not need this data. To provide functionality, they collect all data on card payments, who's paying, who's the seller, seller volume. They can see which cards are used across which services and the like.
Tracking mouse clicks and URLs is auxiliary imo would not move the needle for them right now, unless they move into advertising (not impossible long term, if they go public).
The cofounder says that the data they're collecting is part of the algorithm that has reduced their fraud numbers, and gives concrete data. Either they do need the data, or he's just outright lying, over and over, all across this thread.
> (CAPTCHAs use similar techniques but result in more UI friction.)
Not only are they inconvenient, but they're often inaccessible for some users. So I just wanted to say thanks for not going that way, even if the cost we must pay is some theoretical compromise in privacy.
Edit: On thinking about this some more, it occurred to me that by using a user's activities on a web page to determine whether they're a bot committing fraud, you might be inadvertently penalizing users that use assistive technologies, such as a screen reader, voice input, eye-tracking, etc. I haven't had a problem with this when doing a Stripe checkout with a screen reader, but I just wonder if this possible pitfall is something that your team has kept in mind.
If they are really in full compliance with the relevant California (and other) statutes as the co-founder claimed they must have also taken the accessibility aspect into account.
We’re big-time Stripe users (ACH mostly to the tune of $300M+ annually) but soon will be branching into debit/credit & in the research so far have found the Radar product impressive.
Having this information up-front and center makes it easier to pass to our Infosec folks + another check in the transparency box.
As far as not using the JS, I was under the impression as long as you’re not storing the account or card numbers & utilizing the tokens properly you’re still at the base level of PCI compliance - meaning you’re securing your website, endpoints, data store etc in the same manner you should be already.
The JS package is really nifty and helpful though, we will be able to standup a one-off late payment page utilizing their checkout flow & one-time payments (server/client - couldn’t expose all the SKUs to folks so had to go that route instead of just client) but the fact we could use Stripe to send the emails with our branding & all we have to really do is pull the payment they owe & create a checkout session to hand-off to Stripe is pretty awesome.
As far as PCI compliance, I was just saying your comment about more work is true depending on your setup w/o using Stripe.js.
In our business we already have our own proprietary fraud models and other PII we secure, so the level of effort to keep PCI compliance for the additional Stripe components is a wash whether or not we use Stripe.js
I totally agree if you're going to use Stripe and you don't have to deal with PCI already in your normal course of business it's a complex area to navigate & using stripe.js is a much smoother path to take.
Basically goes back to basic principles, don't add more stress or work for something outside your core competencies unless necessary & in many cases, I can see where companies should just leverage stripe.js plus the UI utilities as they're well done & save a lot of time.
Big fans of Stripe, even if your ACH rates are significantly more than Wells Fargo or others :P
> You're right that, if you don't use Stripe.js, PCI compliance will be more work.
That is not what @brogrammernot wrote. What @brogrammernot wrote was:
> As far as not using the JS, I was under the impression as long as you’re not storing the account or card numbers & utilizing the tokens properly you’re still at the base level of PCI compliance - meaning you’re securing your website, endpoints, data store etc in the same manner you should be already.
Note the the conclusion of "... in the same manner you should be already."
I think that you will be on the hook for PCI compliance if card data touches your server, while with Stripe.js your server never sees the card data. Of course, it's extremely stupid, because your server is still the one serving the original page and can change it to silently exfiltrate the card details if it gets compromised.
I believe the point was, if your server is compromised but you're using stripe.js, you're not legally on the hook for exposing CC details, even though they definitely could have been exposed.
(I have no idea if this is even true, this was just my reading.)
> Wouldn't it be just a server side to stripe call on a form submit? Even easier than using js (probably not as good user experience wise)
Sure! You just have to also handle PCI-DSS.
One of the nasty things I've had to accept about PCI-DSS is that if you think you have a clever hack for getting around it, you probably don't. It's really a remarkable work of standards authoring.
We more or less do this today, but if you need to setup a new workflow to take payments (one-time or recurring) there's a lot of work already done for you in the Stripe.js ecosystem.
So in our case, to take one-time payments it would've been more work to stand-up the checkout page itself and all of that work behind the scenes. It is much easier to just create a checkout session (basically just hitting the DB to pull the outstanding payment record and creating a stripe customer if one doesn't already exist) and redirect to Stripe's checkout.
The PCI part isn't overstated either, that checkout session lives on Stripe's domain not ours and that's where payment method is collected & stored within Stripe so you're not having to worry about it.
Indeed -- although our business model is, I think, more closely aligned with our users. We serve only one kind of customer: businesses receiving payments with Stripe. Our revenue is a roughly linear function of that of our customers.
I appreciate the security and the clarity on this issue. I only wish you didn't sneak in a pricing increase for long-standing users a few months ago, and I wish Stripe was more honest about its enterprise pricing.
I apologize that anything about the pricing change felt sneaky. (We tried to do the opposite: we emailed every single impacted customer!) I posted a few thoughts about the refund change here: https://news.ycombinator.com/item?id=22893388.
We're not transparent about enterprise pricing since our costs on any given user are so country/business model/implementation-dependent. It's less that our sales team isn't willing to share the details and more that the models themselves are very complicated and change frequently. (Visa and Mastercard are both making big changes to their pricing this year, for example, and that will change almost all of them.)
I appreciate that. My particular beef with the enterprise negotiation experience was that Stripe lists a specific number after which they're open to negotiating and when we'd far exceeded that number (with minimal fraud risk due to the nature of our business), their answer was "You have too many Amex customers, but aren't you happy you're grandfathered into x, y, and z feature we now charge extra for [which we don't even use]".
Then shortly after, Stripe raised pricing on a model I'd just been told was grandfathered in.
Stripe is cheaper than most other processors and charges a flat rate for the transaction, regardless of the upstream cost. Amex is more expensive than Visa for example. A fact of doing business is that things will go up in price, as I'm sure your company also raises prices from time to time.
I think what is being mentioned is that Stripe started keeping the original transaction fees on refunds. In my opinion this is borderline fraudulent since visa/mc/amex do not keep these charges and refund them back to Stripe.
You do realize that this is a free market? If visa/mc/amex are so much better, people can use them. Charging a flat fee for a service doesn't seem that fraudulent to me.
Based on your comment you're clearly not a Stripe user, so I'm not sure why you felt the need to post this.
If visa/mc/amex are so much better, people can use them.
Stripe uses visa/mc/amex, it is not a competitor. You completely missed my point. Stripe uses visa/mc/amex to process credit card transactions, then when a refund is issued the CC companies return the charged amount to Stripe, but Stripe does not return the full amount back to the customer. They keep a percentage. This is what I consider "borderline fraudulent".
Charging a flat fee for a service doesn't seem that fraudulent to me.
But it is not a flat fee. They keep a percentage of the refunded amount. So if a customer bought a $1000 item, then changed their mind and cancelled the order 5 min later, Stripe would still keep $40 just for the fun of it. A small flat fee to cover network expenses would be more appropriate, not a percentage of the amount.
So you have to charge $1000 + ($40 * % of users who return + cushion) for the product. That means non-Stripe businesses can start to out-compete you on cost.
What makes it so that Stripe has such a unique position and can impact your costs and competitiveness to such a large degree?
> A small flat fee to cover network expenses would be more appropriate
That sure seems like the solution a free market in processing would settle on. Something is up.
So you have to charge $1000 + ($40 % of users who return + cushion) for the product. That means non-Stripe businesses can start to out-compete you on cost.*
If you charge your customers more you will still end up paying more. The $40 was based on a 4% fee. (I'd like to make a correction, as in my case it is actually 3.5%)
What makes it so that Stripe has such a unique position and can impact your costs and competitiveness to such a large degree?
Stripe and PayPal are the biggest players in this space. There are others but they are either built on top of these two or do not have the easy API's and/or integration with other 3rd party services. PayPal was the first to start keeping the fees for refunds, and then Stripe followed.
Stripe is a great company otherwise, and I will continue being a customer but that doesn't mean that I can't get upset over such an blatant money grab.
Google never promised that it would not use data for advertising. That was their whole business model from the beginning. It'd be more effective to point out specific instances where Google made an agreement not to use data for a certain purpose and later reneged on that commitment.
I get that one can object to any form of monetization of user data on principle, but pointing to Google as some kind of precedence doesn't seem sound.
"This data has never been, would never be, and will never be sold/rented/etc. to advertisers."
"Stripe.js collects this data only for fraud prevention -- it helps us detect bots who try to defraud businesses that use Stripe."
The language of the revised ToS could go something like "Stripe shall only use the data for fraud prevention. Stripe shall not permit the data to be used for any other purpose, inlcuding, without limitation, any use that aims to increase customer acquisition or sales of products or services."
The problem with statements like "We only use the data for X" is that this is not a limitation. It is perhaps a representation of what Stripe is doing as of the date of the ToS, however it does not mean Stripe does not have permission to use the data for any other purpose. Further, it only applies to Stripe. Another party could be using the data for some other purpose besides fraud prevention and the statement would still be true. Nothing requires that there be a sale or "rental" for another party to make use of the data.
The problem with statements like "We will never sell/rent/etc. the data to Y" is that it does not prevent Stripe from using the data to help Stripe or other parties to sell products and services. Stripe does not need to sell or rent the data to provide that assistance.
To recap, a ToS should limit how the data can be used. Stating how a company currently uses the data is not a limitation. Stating that a company will not sell or rent the data does not necessarily limit how the data can be used by that company or anyone else.
Facebook does not sell or rent data but their collection of data ultimately results in more advertising on the web, and on Facebook-owned websites. How does that happen. The first problem is the collection of data above and beyond what is needed to fulfill a user's request, i.e., the purpose for which it was collected. Ideally we could stop the unnecessary collection of user data, e.g., through law and regulation, and this would reduce the amount of data we need to worry about. The second problem is that after users "agree" to the collection of data, there are no contractual obligations on the collector over how the data can be used, other than not sharing it.
Are there ways for transparently communicating (verifiable) stats for this claim?
To be clear, I am not saying that your claim is not true but if one thing HN has taught me, it is to always ask for data backing up claims that are tall.
As someone who saw this first hand, Stripe’s fraud detection really works. Fraudulent transactions went down from ~2% to under 0.5% on hundreds of thousands of transactions per month. And it very likely saved our business at a very critical phase.
Having worked with payments on a number of products it's really not a tall claim at all. On a small product that's an offshoot of a large media company we had the luxury of firewalling off a lot of countries, prior to that we'd see thousands of fraudulent attempts / payments a week. A lot of them are people iterating through lists of stolen card numbers looking for ones that are still working, so while the actual number of people / bots doing it might be lowish the volume of attempted charges can be huge.
I used to work on fraud detection on a product with transactions totaling billions of dollars a year, and for a period of time we could have stopped something like 90% of our fraud attempts (with like a 99% accuracy rate) by simply blacklisting IPs from Turkey, Vietnam, Ghana, and Nigeria.
For one, some definitions would be nice. How do you define "fraudulent payments"? If I tried to checkout while on VPN and firefox with resistfingerprinting enabled, and your antifraud system stopped me, did that count toward your "millions per day"?
We build models that predict P(payment charged back as 'fraudulent') and then let small random samples through in order to test the accuracy of our predictions. This calibration means that we can compute a pretty accurate "true" total from those we have blocked.
Out of curiosity, when a transactions is part of one of those random samples and is flagged as fraudulent, are the costs/impacts to the merchant the same as any other fraud chargeback/dispute (particularly those that don't use Stripe Chargeback Protection)?
Thanks for responding Patrick, as I said I actually do believe that the claim you're making is not false.
I am always curious about/collecting patterns successful teams leverage for solving problems that I consider important.
Being able to communicate fraudulent payments that Stripe blocks is definitely one of them.
I was being a bit selfish when I asked that, my thought process was like; "Going forward data-collection is going to be scrutinized much more than now and rightfully so. If I ever run a business where we collect data for a very important use case I would want to make sure that we are able to communicate what, why, and how with utmost level of transparency)."
Hope that puts some context to my question, it was a good-faith question. :)
I didn't even read the article. I immediately clicked through to the comments because I knew Patrick would clarify it. Stripe is one of a handful of companies I actually trust.
I recently was denied a credit application because I used Linux when I filed out the application. Not only that, but they still ran my credit. They even told me on the phone the reason they denied my application because I was using a suspicious browser (Chromium).
Let's be honest here. Stripe.js may be about fraud prevention, but what that means is that you'll use every method available to gather data about that individual to build an identity. Then in 3 years when someone changes positions, that system will end up getting compromised, sold to the highest bidder, shared with some government agency, or used for nefarious purposes.
This will be tied to all their IP, OS information, transactions, etc. Just like they are using facial recognition at several retail stores. They make you use a chip reader now for credit/debit transactions, but it has no use online. I'd rather see an open-source universal effort towards decentralized currency, voting, identity, etc.
Having worked at an ML fraud prevention company, can confirm that this is a pretty standard part of a fraud prevention stack. It's essentially used to block credential stuffing. There's a high probability your bank also does this.
There's a difference between doing collecting data on your own site and doing it on third-party sites so banks are a bad example (unless your banks work differently than mine).
Banks only don’t do it on third party sites as they very rarely have third party endpoints. If they did, they would do it there too. If you want an example of another payment processor that does this - Authorize, PayPal, WorldPay will all do this too.
While I believe this, you might want to add a way for the developer to disable certain GET variables or pages from being sent. I doubt sending everything helps Stripe detect fraud.
Some websites send sensitive or identifying data in their GET variables. I personally have analytics data on my pages I don't want to give to Stripe for no reason and definitely does not help them track fraud.
Here's hoping it also reduces the exorbitant amount of false positives we've been seeing with Stripe's fraud prevention services, which cost us a lot in lost legitimate sales.
I'm sorry to hear that! Feel free to email me (patrick@stripe.com) and I'll connect you with the team if you'd like us to do a deeper dive.
But, yes, part of the intent here is to enable us to achieve better ROC[1] in our models and to block more fraud while also encumbering fewer false positives. From our testing, it's very clear that these bot-detection techniques do substantially improve the accuracy when compared to other, coarser heuristics.
A user shouldn't have to email a cofounder to get in touch with a team member. The last time I integrated stripe on a site as a final test before it went live I had my cousin make a purchase and the site got flag as potential money laundering because we had the same last name. At the time literally zero customer service. It took 8 years before stripe started doing any customer support. Cool launch pages but personally I'll never use stripe again
I've always launched with fraud turned down. The bots don't know you're there yet and Stripe can figure out your traffic. Then, you can turn it up once you get a few dozen or so sales under your belt.
We have the opposite problem. People with 50 carding attempts and radar scores of 30 or so. There is no value in Radar if so many of these cases pop up because you can’t really tell the truth from the false.
We use Sift as a backup, and that makes it easier at the same time it as really showing how poorly Radar does in some cases.
Truth be told, it is really good with heavy “dumb” carders, but not when it gets complex. Hope this gets addressed at some point.
There's many instances when someone from $company shows up in one of these threads on "$company is doing something nefarious" to do damage control... and I think this is the first one that doesn't reflect poorly on the $company in question.
The reason for this is that Stripe did do something wrong, and it wasn't in the script - it was in the disclosures and communication surrounding the script. So rather than PR spin, this is actually addressing the real problem.
To all other companies (and Stripe on some other day), this one thread is the exception that proves the rule that damage control in internet comments is a bad look.
Updating the TOS on the Stripe site doesn't really apply when the site executing the script is the site that consumes the API. The user of the consumer site never sees the Stripe TOS.
> Stripe.js is part of the ML stack that helps us stop literally millions of fraudulent payments per day and techniques like this help us block fraud more effectively than almost anything else on the market. Businesses that use Stripe would lose a lot more money if it didn't exist.
Can someone give an example of the kind of fraud schemes involving bots that this would stop? What are the bots programmed to do, how does it benefit the owner of the bots and how do you detect it?
My startup was recently facilitating card testing attacks via the Stripe checkout on our website. It wasn't small purchases, but just started and cancelled transactions.
Unfortunately, I was not alerted by Stripe, but by a "customer" whose credit card number had been stolen somewhere and who saw on his statement our company name (I'm not sure how, since the attackers don't complete the transaction).
The startup is dormant, so checking the Stripe dashboard isn't part of my daily routine. Or even my monthly routine. Even when it was active, we had only a handful of transactions - it's a niche market.
I contact Stripe customer support only because I thought the email from the "customer" could be a phishing attempt. Stripe customer support saw the logs and helped me roll a new public key. When I asked why I wasn't informed of such impossibly high token creation, the answer was that it wasn't a feature. When I checked the dashboard logs, I found that there had been 75k tokens created in the last 28 days (100% card testing). That's 75k stolen credit cards that my website (and Stripe) have helped to validate - and just in the last 4 weeks.
For all the promise of AI, I'd be happy just to get an alert that 75k tokens were created in four weeks, while exactly 0 (zero) completed transactions in the same period.
> but by a "customer" whose credit card number had been stolen somewhere and who saw on his statement our company name (I'm not sure how, since the attackers don't complete the transaction)
I detected a hacked database this way. My credit card (a burner from privacy.com) notifies me of any transaction, including pre-authorizations.
Most likely your customer saw the pre-auth show up.
> (I'm not sure how, since the attackers don't complete the transaction).
Its called "pre-auth" or "pre-authorization". It will show up on your statement for up to 48 hours but then will disappear since transaction is not "settled". During this period you would see a descriptor of transaction like NIKE NEW YORK 310XXXX, or if its dynamic/soft descriptor and merchant is utilizing it, it may say your order number and store like NIKE.COM 11-3939329.
> Credit card testing, a tactic used by fraudsters to test stolen credit card numbers with small incremental purchases before making large-dollar purchases on the card
Testing for what? That the card hasn't been cancelled or has zero money on it? Can they test for how much money is on the card or anything else useful?
And they need bots because they might have say 1000s of cards from a database hack and most of the cards won't be useful?
What kind of large dollar purchases would someone try to make once a card has been confirmed? Why not let bots attempt lots of large dollar purchases?
Lots of ways to monetize a working stolen card, although as GP says, the people stealing the credit card information are generally selling those cards to other people who'll actually try to turn them into cash. Gift cards and any sort of virtual currency are always big and easy. Buying advertising for affiliate marketing works. If you're willing to take on more risk, ordering actual physical goods to be delivered to empty houses and picking them up there - back in the day, we caught a guy who was ordering a bunch of iPods online, having them delivered to a bunch of houses in a development that wasn't finished yet, and then just following the UPS truck around when they got delivered and picking them up off the front doorsteps.
Given that gift cards and virtual currencies are obvious and common approaches, could banks or payment processors not somehow put up big restrictions on cards being used for sudden large purchases on products like this?
At the end of the day, there's a lot more legitimate transactions than fraudulent ones. It's very difficult to restrict a particular type of purchase en masse without having a huge false positive rate. That's exactly why companies like Stripe work so hard on developing signals to feed into their fraud models.
Thanks for posting this explanation and making an effort to update your terms of service to call out this behavior.
We appreciate that you are intent on fixing this, it helps us know that there are honest people working at Stripe and helps clear the fog that what seem to be a lot of us believing you are doing malicious things.
I hope you and your family stay healthy during this pandemic.
> This data has never been, would never be, and will never be sold/rented/etc. to advertisers.
I don't think first-order data access is really the problem here, but rather where that data is going in general and who is authorized to access it, for how long, etc., and the volume of data any one person has access to at one time.
> As someone who saw this first hand, Stripe’s fraud detection really works. Fraudulent transactions went down from ~2% to under 0.5% on hundreds of thousands of transactions per month
So, in order of Stripe to make more money (1.5% less fraud), you chose to track everybody.
>> Stripe records the full URL, including query parameters and URL fragments (e.g., /account?id=12345#name=michael), which some websites use to store sensitive information.
Can you comment on this part as well? If you collect sensitive data unbeknownst to website owners and users you are most likely in for some trouble (i.e. gdpr)
>This data has never been, would never be, and will never be sold/rented/etc. to advertisers.
I chortled a bit. Everything beyond has is questionable, though I believe there is sincerity about past actions and maybe even ethical business considerations.
If location data can be used for supplemental revenue in a fashion that won't hurt revenue more than help revenue, it will, it's only a question of when. It may or may not be advertising, but it absolutely will be used for all sorts of functions beyond fraud detection (if it's not already), especially once Stripe is publicly traded and/or gobbled up by some other massive business.
It's nice when you go into the comments expecting the worst, that Stripe are now one of the bad guys after all, only to find a perfectly reasonable explanation and a clarification from someone talking plain english. Nice.
As an end user, can you provide me with all of the event data you have collected about me? Or does Stripe help sites that integrate stripe.js to respond to data requests?
Ie, CCPA / GDPR says I have the right to see and correct it if I live in one of those jurisdictions.
You could send a subject access request to dpo@stripe.com - it would be interesting to see how they respond if you share the unique identifier mentioned in the article.
CCPA has a Right to Access like GDPR does, but it does not have a Right to Rectification.
Under both laws, if Stripe claims to be a Processor/Service Provider, then they have an obligation to facilitate sites using Stripe to respond to access requests. But they have no obligation to process those requests themselves. I think CCPA requires they direct you to the actual Controller, but that's one aspect of CCPA that has changed since December.
This seems like a pretty good reason to phone home with this data, but ... sending back urls WITH query params? It’s pretty common for sensitive data to be in query params, sometimes even things like bearer tokens. I can’t see how query params would be very useful for fraud detection, and sensitive data like this is something you really want to avoid collecting, IMO that’s low hanging fruit to remove.
This isn't Stripe's fault here, but if to make a payment online you need to track everyone just to make sure the transaction isn't fraud, then the system is broken.
Why can't online payments use a use two-factor authentication by default?
Because, "business". That is, sales will drop for anyone who does this.
That's friction that will reduce sales, and online sellers will move to a provider that does not do this. Stripe would go out of business if it made it more difficult to buy things.
Put your money where your mouth is, and put in the terms that this data will ever be sold/rented to advertisers. If it is, pay a penalty of $25 per datapoint.
Also, if not advertisers, who will get this information?
Make it any “third party”.
It's like installing anti-cheating rootkit to windows kernel. It's like recording your voice, taking retina scan, requiring you to take off your clothes and bugging a GPS device on you before swiping a card in a physical shop. How can it be possibly OK? For fraud prevention? Yes, if every customer were under 24/7 surveilence, there wouldn't be frauds (nor terrorists). Unless they were too smart to hack all the devices and fake the data. It is very wrong and there is no way it can be justified to monitor all URLs and mouse movements before the payment. Having javascript available doesn't mean you are allowed use it for spying on customers! Black Mirror in real life. It will get worse if people accept this and agree with this. Quite sad. :(
How can you say "...will never be sold/rented/etc. to advertisers"? Are you the fortune teller or a prophet? In case of a new CEO or changed ownership everything can change...
Be careful with that. We have seen recently (see social distancing apps) that even public health does not justify invasion of privacy, so why should online fraud?
Can you please go into more detail about the exact method that “sophisticated fraud rings” use to attack? Like what specifically do they do, please ELI5
I don’t think recital 47 allows carte blanche data collection in the name of fraud detection, and at the very least I would think there is an obligation for disclosure of the data collection, a mechanism to access it (DSAR) and the ability to correct inaccuracies.
You're correct about all of these points. GDPR still means that the principles of transparency, purpose limitation, data minimization, etc. are in play, as are data subject rights like access, rectification, and erasure. I was only addressing the specific issue of consent from your previous comment. Consent wouldn't be necessary if there's a different legal basis, and fraud detection qualifies as a Legitimate Interest.
Note that collecting consent still doesn't give you carte blanche to collect all the datas. The principle of data minimization still restricts you to only the data you need for the purpose you state when gathering consent.
For the avoidance of doubt, the main point of my comment was the not insignificant risk (a maximum fine of 20 million euro or 4% of turnover if that is greater) if a data controller does not meet the obligations of the GDPR.
Consent, as you point out, is only one aspect of this.
> was the not insignificant risk (a maximum fine of 20 million euro or 4% of turnover if that is greater)
Facebook and Google are still around. There is absolutely zero risk of any significant GDPR fine as long as the biggest offenders are allowed to run freely.
Facebook and Google have very deep pockets and are taking lots of steps to comply with the letter, but arguably not the full spirit of GDPR.
I think it would be unsafe to assume that there is zero risk of significant GDPR fines on the basis that the regulatory bodies have not picked a battle with google and Facebook.
Smaller organisations that seem to be doing less to respect GDPR are probably an easier starting point for regulators to begin enforcing the law.
There's absolutely more than zero risk. In Denmark a medium sized taxi company was fined $200,000 for keeping their customer data longer than necessary.
Also: How are Google and Facebook offenders of GDPR?
I think this is exactly the point. Smaller companies (like stripe), that play fast and loose (maybe not stripe) with European customers’ data are a good target for regulators to make a point.
From my naive perspective, if the CC or other provider validate the customer's credentials all is good and you get paid. Integration of a payment method seems fairly simple and straightforward (I'm not a web dev).
I don't really know about the types of fraud that are possible and what a Saas should be prepared for.
Question: what are the typical fraudulent activities and their symptoms?
i worked for a company that made tools like this for diagnosing ui flow issues with machine learning, its possible and likely some companies use this for tracking their users, but it has some meritable uses as well.
Don't get me wrong, but a lot of CTOs said so in the past...including the one responsible for the infamous google PREF cookie that helped the NSA spy on literally the whole planet due to how google safe browing service has to be implemented.
As long as there's no control, your statement is worth nothing. Even if you are acting morally correct with the data, your future replacement might not.
PayPal were the good guys, too, until they were not.
In my opinion these tracking mechanisms are not GDPR compliant, as there's no opt-out possibility.
I'll ask our legal team if we can somehow contractually preclude ourselves from sharing this data in the case of liquidation or otherwise bind ourselves in a useful fashion...
To your question about what the data actually includes and what the retention policies are -- we'll put together a summary of this on the page I mentioned in GP.
If you get clarity on liquidation, please consider open sourcing it à la YC's startup documents. It's something I've long wanted to include in my projects.
I hope the result has some teeth to it, and I'd like to see follow up once this item is complete.
If I were negotiating for a vendor to collect such an invasive level of personal data about me or my customers, I would insist on accordingly strong protections.
At a minimum there should be clause in your ToS making our consent expressly contingent on you upholding your protection commitments, particularly around what data is collected, who it's shared with, and when it is destroyed or 100% anonymized. It should insist you have contracts containing terms of equal strength in place with any of your vendors, subcontractors, partners, etc. who might conceivably gain access to the sensitive data.
The clause should be written such that the liability follows along to any assigns, heirs, successors, etc, and it should be excluded from any loopholes in other portions of the contract (particularly any blanket ones which allow you to change the ToS without gaining fresh, explicit consent) and preferably free from any limitations of liability.
I'm glad Stripe is taking a responsive approach to the matter and I hope you'll consider this feedback when you revisit your legal agreements.
Thanks for answering questions about this and being so open! In your top-voted comment, you claimed that this data will never be sold to advertisers. Are you now saying that you're unsure? If so, it would be helpful to update the comment, since it seems it's not accurate.
> This data has never been, would never be, and will never be sold/rented/etc. to advertisers
> I'll ask our legal team if we can somehow contractually preclude ourselves from sharing this data in the case of liquidation or otherwise bind ourselves in a useful fashion...
Attack real problems on all flanks, but I don't think you can get an affirmative from Legal.
Do you have cryptographers on staff? The "technology as a contract" approach is to implement a homomorphic encryption technique to do your cross-site correlation without being able to unmask the individual who is using multiple sites.
That way you don't have to trust your users, customers, sysadmins, big-data people, LEO, OR creditors. Keep it as secret sauce, or even better, drop an open-source library on github to advance the state of privacy. I would like to be able to ask vendors, "why AREN'T you protecting users' privacy this way?".
Yep. Also: sale, acquisition, merger, as well as government requests for data, and third party access. Speaking from experience selling a company, it’s difficult to plan for unknown eventualities, and even more difficult to keep any promises about what happens to data you have. The only effective way I know of to guarantee data you have doesn’t get shared is to delete it.
IANAL and I don't claim to understand any of this well, but I would naively assume that if Company A collected data under a binding legal agreement that they can only use it for X, then they go bankrupt, that shouldn't give Company B the ability to buy the data as a "liquidation asset" then do anything they feel like with it. Shouldn't the binding restrictions "move" with the data?
This depends on how the company is liquidated/sold. In the cases I mentioned of sale or acquisition, often the corporate entity remains in existence through the transition, so the effect is that nothing changes wrt the binding legal agreement, but a large group of new people gain access to the data. Also while legal agreements are binding, they can usually be changed, it takes some careful planning to prevent a contract from being changeable by the new owner of the contract. Think about the question of who owns the collected data in the first place. If the company owns it, and the investors own the company, the company might have a tough time getting investors to agree to waive their right to sell what they consider to be a valuable asset in the case of bankruptcy. If the company doesn't ask the investors, or can't get them to agree, then whatever they do has grounds for future legal challenge. It's all around better to delete any such data before anything changes hands.
I honestly don't think it's that simple, and in fact I suspect it gets harder the bigger the company. They could have every intention, even a plan and an working implementation today to keep any data they collect out of the hands of a buyer or the government, and still have a very hard time ensuring it when the time comes. It does sound like @pc is actively committed to it though.
I don't think anyone's playing dumb. I am completely speculating, yet absolutely certain, that they have actually considered future scenarios for collected data, and I believe that there are legitimate reasons to still need to discuss this and other scenarios that come up with a legal team every time. If it's not clear why this can happen, it will become clear if/when you run a company.
It's true that not collecting any data is a foolproof way to guarantee it doesn't get into the wrong hands, but that's tying both arms behind your back in the online world, and it would mean in this case choosing to not train any fraud detecting neural networks. There could be an even bigger mob if Stripe knew how to prevent certain kinds of fraud and chose not to for ambiguous privacy reasons.
You got caught, spin it as much as you want. This has nothing to do with fraud. We know you got caught because of the speed of your response. This is called damage control, there is also a more scientific name for it called plausible deniability.
I am disgusted with this form of surveillance. I will make sure that I won`t use Stripe in the future and force vendors to use something else.
Stripe.js collects this data only for fraud prevention -- it helps us detect bots who try to defraud businesses that use Stripe. (CAPTCHAs use similar techniques but result in more UI friction.) Stripe.js is part of the ML stack that helps us stop literally millions of fraudulent payments per day and techniques like this help us block fraud more effectively than almost anything else on the market. Businesses that use Stripe would lose a lot more money if it didn't exist. We see this directly: some businesses don't use Stripe.js and they are often suddenly and unpleasantly surprised when attacked by sophisticated fraud rings.
If you don't want to use Stripe.js, you definitely don't have to (or you can include it only on a minimal checkout page) -- it just depends how much PCI burden and fraud risk you'd like to take on.
We will immediately clarify the ToS language that makes this ambiguous. We'll also put up a clearer page about Stripe.js's fraud prevention.
(Updated to add: further down in this thread, fillskills writes[1]: "As someone who saw this first hand, Stripe’s fraud detection really works. Fraudulent transactions went down from ~2% to under 0.5% on hundreds of thousands of transactions per month. And it very likely saved our business at a very critical phase." This is what we're aiming for (and up against) with Stripe Radar and Stripe.js, and why we work on these technologies.)
[1] https://news.ycombinator.com/item?id=22938141