Hacker News new | past | comments | ask | show | jobs | submit login
Harvesting credit card numbers and passwords from websites (medium.com/david.gilbertson)
720 points by swyx on Jan 6, 2018 | hide | past | favorite | 121 comments

One of the biggest problems here is that there is no “chain of custody” from Github source to uploaded NPM module; otherwise one of the developers using the malicious package could have audited the source code before including it in their own code. ‘npm publish’ would ideally insist on reproducible builds, enforce this by minifying or compiling packages itself, and finally encourage the community to always audit the code associated with a module. Of course, people are lazy, NPM has no incentive to incur that server and engineering overhead, and someone could sneak in code anyways with a minor version update... There’s no clear solution here, and I think the only thing keeping up this house of cards is that there are much easier ways for black hats to make money.

Even if you had that, no one is going to inspect all that code. npm is a cluster-farkle of insane amounts of packages.

The whole point of the article is you should implement CSP.

Sure but why not implement CSP and also get your packages from somewhere trustworthy and audit the code you actually run? Just because "most people are lazy" doesn't mean you have to be. "Audit the code you run" is still good advice right?

I have a fairly simple Node project at work; it pulls in nine runtime dependencies, plus 13 development-time dependencies (most of those are babel or eslint-related).

Assuming none of those are pulling shenanigans like mentioned in the article (distributing different code than in their source repositories, or deliberately obfuscating malicious code), it's not completely unreasonable for me to go through and audit my direct dependencies. But, since the Javascript standard lib is crap, all of my direct dependencies have their own large pile of dependencies, which themselves depend on a bunch of stuff, and so on.

By the time it's all said and done, my "simple" Node project pulls in several hundred dependencies (I didn't go through and count, but my 'yarn.lock' on that project has ~4200 lines). I can't audit all of that code.

(This is particularly bad in Node and Javascript, but applies to other languages too. I don't think anyone's ever fully audited all of our Nuget dependencies, or Python dependencies... fortunately, those both tend to be more self-contained, so at least we know what we're getting there.)

> I have a fairly simple Node project at work; it pulls in nine runtime dependencies, plus 13 development-time dependencies (most of those are babel or eslint-related).

A well known German blog just claimed that creating a new skeleton project using @angular/cli results in 31 direct dependencies, almost a thousand dependencies in total and 300 MB code.

That's just wow.

It's often impractical to audit the source code for all your 3rd party dependencies even if they are open source. When was the last time you or anyone you know reviewed every line of your web framework or DI framework? How about the dependencies of your dependencies? Many organizations don't even review all the code their developers write internally. Does your organization code review 100% of releases? Since inception? Like it or not, we're all placing a lot of trust in our dependencies.

He actually has a work-around to CSP. He updated his post.

> Our penetration testers would see it in their HTTP request monitoring tools! > What hours do they work? My code doesn’t send anything between 7am and 7pm.

Which Time Zone? Hah!

(Not that this one nit pick takes away from the general very well made point of the article, I just love how TimeZone problems infect everything)

The browser’s? People do tend to keep their PC in the local timezone.

I wonder how much pen testing is done by hand and how much by automatic tools?

Anyway, I'm being pedantic. There are a lot of great points in this article.

Sure, the night build fails, then in the morning the engineer comes in, looks at it, can't replicate, marks as fixed.

That makes sense but then the article's claim of losing half the credit cards is only true if we assume even distribution of user activity which seems far fetched at best.

I originally wrote "about half" but there was one too many words in the sentence. Also I figure for an eCommerce site max-traffic might be 7-9pm or something so even though there's more sleeping happening outside 7am-7pm, there might be equal traffic. Anyway ... close enough.

Yeah it’s a total nitpick, the point is still made.

It's client-side code. So the browser's time zone.

I would think a good tool for pen testers could be one that runs all day, refreshing the same page occassionaly, after clearing everything, and reports any requests that differ from request to request.

One could run the site's selenium tests every hour or so and then look at the network log (use a proxy while testing). It'd then be easy to catch any request that is not white listed by you.

Of course, we all do this. /sarcasm

I'm more of a back-end dev who doesn't know all the ins and outs of the actual software used - can someone explain to me why this is an npm problem, and not an excessive dependencies problem?

I thought npm was simply a package manager - I don't see anything in the article that is specific to npm, except he happens to say that word.

It's kind of an excessive dependencies problem, except exacerbated by two things. One is Javascript's poor stdlib. This means that not only are you tempted to include lots of little packages to do basic things, but so are all of the big packages that you include to do big things for you, and all of the packages they include, etc. Often there are a bunch of different packages for doing the same basic things, and nobody agrees on which one, so you may end up with 5 different packages that do the same thing required by various packages you use.

Two is that much of it is expected to be served to the browser, so it's minified. Who audits that the minified code is actually the same as the published Github code?

At least in Ruby and Python, the code from Rubygems/Pip should exactly match that version on Github. Not that anyone necessarily audits that either, but at least it's easier.

In Ruby, the exact same problem applies since you can publish gems in one version and tag them in another - in the very same fashion as the article describes.

What makes npm particularly bad, is that JS has such a teribad stdlib. When you need to write your own lpad() at some point its easier to include a stupid little package that has dealt with all the edge cases you don't want to care about. So you end up with way more third party deps than a kitchen sink type language.

Otherwise, yes this is a fundamental dependency issue.

could we actually take a snapshot of npm’s top used but relatively inert libraries every 6 months or so and freeze them? (call it UserlandJS 2018a, 2018b, etc) and then have a separate dependency manager that only downloads those frozen libraries we include. Userland Package Manager or something. this would approach a stdlib without much effort and we would have more of a chance to catch up on malicious security stuff since there are commonly agreed upon frozen versions that everyone can pore over.

im a total noob at security, please attack/modify this idea if it has any value?

but that has nothing to do with npm.

that's the same for any javascript package manager - yarn or bower would be the same.

npmjs.com is the repository both yarn and npm use—this is about npmjs.com the repository, not npm the package manager per se.

that makes it a lot clearer, thanks!

You could arguably come up with something similar for backends. It's just a story of a malicious and sneaky dependency.


I question all these comments by people singling out npm as the root of the problem. doesn't sound to me like they fully understand the issue.

Nobody is singling out NPM. The issue is the current JavaScript ecosystem and that happens to be provided through NPM, so you can use "NPM" as a shorthand for "the way JavaScript developers these days use NPM to depend on a gazillion trivial packages with totally unknown provenance".

why not just say "excessive dependencies" and be correct, succinct and perfectly clear?

The first problem is excessive dependencies. Just one word: Left-pad. That is a library on its own, for that one simple problem. The JS ecosystem encourages that for any problem, no matter how small it is, you import some existing lib. So at the end of the day you have 100+ dependencies on a project that is only slightly larger than a "hello world" page.

The next problem is that npmjs is "free for all". Imagine anyone could easily add new packages to the debian repo. Sure, I don't expect the debian folks to audit every new addition or update to existing packages, but there is at least some chain of trust, plus there is some incentive to not have malware ridden packages in your distro because it hurts your credibility. A distro is both, an infrastructure and content provider. npmjs is just the infrastructure, so it takes some more messups for people to consider moving away from it.

Because that is the symptom. The problem is that the ecosystem encourages excessive dependencies. You can say "the way the ecosystem encourages huge dependency trees" if you like.

npm isn't the root of the problem, it's just a particularly popular example of it. Sort of like how Intel isn't really the root of that other problem, just the implementer of speculative execution with the biggest market share.

It's not specific to npm. It's a problem that generally applies to all of these package managers that introduce code that you have not personally reviewed into your site.

I think the important part is that it’s a package manager commonly used for front end development, so the malicious code runs directly in the browser. It could have been bower or something else but npm is the probably the most popular choice these days.

The root of the problem is that a dependency can be downloaded and installed from only it's minified/obfuscated form, and without any verification that the code matches what is in the non-minified/obfuscated codebase. This is just exploited through people being dependency-happy and that no one really verifies that a package isn't doing more than what is advertised.

This same problem would exist if any server-side dependency repositories allow for code to be delivered in a pre-compiled form without any verification, similar to npmjs.

I guess it's not npm per se, but js crowdsourcing factor multiplies the attack surface by large powers of ten. In other languages, having the security libs centralized and vetted "might" (you'd need an decent amount of work to ensure that it is safe) avoid getting MITM'd so easily.

The best part is when you get caught, you can just play dumb and say your npm credentials were compromised (assuming precautions were taken in not using a collection domain tied to you).

Worse than that, I think. You can claim the npm account that published the package has zero relation to your github page that hosts the untainted source.

The amount of times I've gone to npmjs.com, looked for the "official" version of a package only to get lost in a web of conflicting version and ownership "signals" is staggering. And yet... if it's on npmjs.com, then it's going into your package.json. No one audits that thing. Unless there is an obvious bloat issue. And dependencies of dependencies? NEVER. We have something like 400MB of shit in our node_modules folder. There is no human on earth that could go through that. This is a few times the size of the entire Linux kernel, for context. And it's minified.

In this example situation, the author states:

> I’ve now made several hundred PRs (various user accounts, no, none of them as “David Gilbertson”).


> I go through all the passwords and credit card numbers I’ve collected and bundle them up to be sold on the dark web.

So if you get caught, you don't actually exist and you've already sold it all without any traces.

Couldn't one circumvent CSP by sending the data to a legitimate analytics service that everyone uses like Google Analytics?

This is why, when I recently built a credit card form, we didn't include any trackers or third-party code (beyond our vendor's). No dependencies at all—our vendor also has no dependencies in the JS we use to implement the CC form.

That did mean no jQuery, no Google Analytics, no NPM modules and we had to build it as a standalone page outside our React setup, but it's worth it to be able to definitively inspect every line of code and provide a CSP that locks that page down tight to just our subdomain and our vendor's.

That's the same as protecting just the login page with https. What prevents code outside the form page from replacing the link which goes to the form page?

Exactly my thought. I guess that would mean it would need some intimate knowledge of the target site this reducing the scalability of the attack. But then again, thee are some clever people out there who could automate it.

Why not host a validated version of jQuery on your server instead of relying on a CDN?

Thanks for this. I've added a note to the post. "...consider having dedicated, lightweight pages for login and credit card collection that don’t ship any third party code (npm packages, advertising, analytics, GTM, etc.)"

Here's hoping you do the same for your login form! :D


very interesting!

How would you exfiltrate that data?

Google Analytics has a query parameter that can be used to extract arbitrary data. So make a request to that with an ID that you control I guess? GitHub mentions it in their CSP post: https://githubengineering.com/githubs-post-csp-journey/

Related: this Analytics attack has actually happened


So it seems developers are in fact responsible for dependencies that they use... Who would've thought...

I was going to say the following:

    Except that being responsible for your dependencies (and the dependencies of your dependencies...) is impossibly hard. You would need to build everything yourself after auditing the code.

But then I thought about it some more and its likely that you don’t need to audit the code, since the malware probably isn’t in the public git repo. Yes, its still a risk, but the probability of malware is much lower (and at least you CAN audit the code if you wanted to).

You still need to get the source for all your dependencies and all of their dependencies and so on and build it yourself, but you should probably do that anyway and host the artifacts in your own private repo. That’s good practice and avoids issues like the left-pad thing.

It's only impossibly hard to audit this kind of thing if you have an insanely large and deep tree of transitive dependencies in the first place. This seems to be a particularly bad problem in the JS world, for this and many other reasons, but most programming languages and their communities don't work that way. Auditing a small number of larger dependencies, when most of them are probably widely used and from reasonably trustworthy sources, is much more achievable.

Ha, what an excellent timing with another related story on top of the HN: https://news.ycombinator.com/item?id=16087024

I also think that with tree shaking and last-mile minification being more common these days, auditing might not be as daunting as it seems.

The other languages have the same problem. It's only spread over different places.

For example, JS doesn't have a standard library. Many languages do. So when JS downloads loads of transitive dependencies, many are there because there's no stdlib.

Have you audited your C++, or Java, or Ruby, or... stdlib lately? Especially those that come preinstalled with your OS of choice?

Same goes for anything that you download via Maven, or gems, or easy_install, or include as direct GitHub references in your Go code.

Those standard libraries are typically widely used. In many cases, the source can be examined. In most cases, there's a large professional team at a reasonably reputable organisation responsible for maintaining them. In almost all cases, there are no transitive dependencies not managed by the same people. Once installed on your system, they generally don't change unless you actively change them. While there is some risk in any dependency (cf. Reflections on Trusting Trust) the level of risk is on an entirely different scale in this sort of situation compared to what much of the JS world does every day.

> In many cases, the source can be examined.

Have you examined them, though? ;) And yes, I was thinking about Reflections on Trusting Trust as well :)

> In almost all cases, there are no transitive dependencies not managed by the same people.

That's why I said the risk is spread very differently compared to JS :)

And anyway, what dependencies does a webpage for processing credit cards need?

Very little or none. But it’s easy to get carried away in the client with things like card number validation, detecting the card type based on number, date drop down picker for the expiry date, animation library for showing when it’s processing, etc. Let alone leaving in the standard includes like Google Analytics, tracking URLs, A/B testing tools, etc.

What if the author snuck his code into frontend modules, and also snuck his code into backend modules?

If CSP is enabled, the frontend checks to see if the backend code has opened up the particular port or route on the backend.

The back-end code could sniff through require.cache to see if he could hook into the existing server instance ( same port ), or open a new port ( depending on CSP ).

I suppose the CSP equivalent on the backend is some sort of firewall. I also suppose servers have better monitoring of requests. Still, this method would circumvent CSP!!!

Also, hooking into the existing server instance would throw red flags if the instance was ever console.logged. You might also do an audit of your ports and see a suspicious one opened in that method. And a firewall likely would block other ports. But, still, it's feasible even with a CSP, until discovered.

Author here. Someone in a comment pointed out that you if you could get your code in express middleware (or something depended on by express middleware or similar) you could potentially alter any CSP header in the responses (if it's set in middleware before your's in the chain).

Indeed. Then the middleware could also inject exfiltration JavaScript in `text/html` or `application/javascript` responses, which would work even if the app doesn’t use npm modules on the frontend.

This applies to almost any backend web framework and package manager, but the culture of micro packages in npm suits itself well to this attack.

Clearly what we need is cryptographically-signed JavaScript and CSP pinning.

(I’m only half joking)

EDIT: oh, CSP pinning is actually a thing that’s been proposed https://www.w3.org/TR/csp-pinning/

Actually, a main question is this: can npm modules read require.cache? If so, why? Can they require code outside of the npm modules folder. Again, why? Couldn't any npm module steal your credentials, and suffer from the same source minification/github custody stuff?

Take an npm module. It does setInterval and every minute, it checks require.cache for any file with 'config' in it, and sends it to the crook. Is this possible currently?

I would think npm modules should be require-sandboxed to their own directory. Even then, there is compiled code, etc. Oy vey.

You don’t need to do such complex things. You can just require(‘child_process’).execSync(‘any shell command’) with all the user credentials.

He'd have to somehow make outbound requests from the server. IIRC, the default AWS VPC config would prevent this. Not sure about other cloud environments.

Where I work, outbound requests must be made through proxy servers which have a whitelisted set of allowed domains, which is only allowed after a security review.

AWS default VPC config whe does not block outbound by default. Neither in the security group nor the ACL. And of the half dozen AWS hosted startups I’ve worked for nobody restricts outbound yet. “Security nice to have, not prioritized yet”

You can exfiltrate data a number of ways though. DNS requests might be one obvious method.

AWS has GuardDuty now which can monitor for this type of traffic for pretty cheap.

Well-written, educational, incites action. Thank you.

I'm sometimes frustrated by uMatrix blocking every requests to third party website and forcing me to accept each of them when needed. But when I see exploits like this I'm happy that I haven't uninstalled it :D. You still have to be vigilant when whitelisting requests in the uMatrix panel though.

Note that uMatrix doesn't protect you (by default) from sending data out.

It also depends on the page you are visiting. If it's just some random post on net I actually don't care much if they do manage to post data elsewhere (the risk in that case is actually higher if I allow ajax.googleapis.com than some random non-google page), but with banking sites and credit card forms you need to be careful.

The trouble with the approach in the article is that by adding a npm dependency, it may as well be the script served by 1st party, buried in their react monster.

Seems particularly prescient given https://news.ycombinator.com/item?id=16087024

Exactly, most especially after seeing this comment https://news.ycombinator.com/item?id=16087079

If this prompts you to action, and you need a quick and efficient way to build a CSP policy for the various services you use: https://www.npmjs.com/package/csp-by-api

A bit ironic, given the article is about all the ways you could get screwed by NPM packages if you don't have a CSP.

The source code is thankfully pretty tiny and you're welcome to inspect what yarn pulls down to make sure I haven't trojaned it!

    Amazon has no CSP at all, nor does eBay.

This is a problem for the PHP composer ecosystem as well. I took a look at how many vendor packages our core has and its about 20...

The difference with composer is that in most cases you're getting the exact version from github, including the full .git directory. No minification, no "the tarball doesn't match the source". And most composer packages have 0-5 dependencies, whereas most npm packages seem to have 10-20.

But too-many-dependencies is a problem there too.

not exactly, actually. composer allows you to pull "dist" version - which is an archive that can be whatever really. you can of course set "prefer-source", but I think theoretically you can have a package without source provided at all.

A perfect example of this is the Hot Pockets deal.

Some guy made a cookie session middleware called node-yummy, which eventually became a dependency of Express. Express has a bajillion downloads, so yummy up and brokered a deal by which on every install they tweeted a like for Hot Pockets. So, Hot Pockets really started soaring, with no one having any idea what they were tweeting by installing and updating Express, until someone ruined all the fun by posting it on Medium, and getting picked up by HN [2]

IMO the really not acceptable part is that the open souce projects are not being pulled from Github. When something claims to be open source, we should have a gaurantee from NPM that we can see the source. The current setup implies the opposite [1] https://github.com/defunctzombie/node-yummy/issues/7 [2] https://medium.com/friendship-dot-js/i-peeked-into-my-node-m...

Reminds me of Writing Worms for Fun and Profit by lcamtuf:


I recently had a problem with a package that doesn't have its npm releases tagged in git. It surprised me how hard it was to figure out which git revisions corresponded to specific npm releases. I had tracked down the npm version of the package that introduced the bug I was trying to fix, but without getting in and doing some real diffing, I couldn't figure out which commit introduced the bug.

At the time, I'd never added anything to npm, but since then, I have. It dawns on me that npm versions aren't tied to git revisions at all!

Yep, only authors who are good about using `npm version` and pushing the resulting git tag to origin are going to have the proper linkage.

There is this trade-off between usability and security. For example being able to load data from other domains, now with origin policy we have do fetch the data server side. One nice thing about web apps is that they do not require a server to work. But due to xxs injections we cant have nice things.

It's almost like the browser is being abused to do something it fundamentally should not be doing. If you need to do that much client side heavy lifting maybe a web browser is not the place for it?

The ship has probably sailed on that line of thinking but in my opinion a lot of the pain we experience in web security today comes from people trying to do things they really should not be doing.

There's this vision for the web that we should be able to access public "data", not only web sites "guarding" it.

Maybe you and I have different definitions of Internet and web then. You can still access data without javascript or webapps.

Accessing data either require's identification to a database or you access the data over the web/http. Do you have any examples where you can access public data without logging in ? I can only think of FTP servers that lets you access anonymously ...

Good point. I’ll talk to some guys and we’ll turn off the World Wide Web tomorrow. Fun experiment but there were a few security bugs (fixable, but why spend the time?) and, most importantly, @mulmen didn’t like it too much.

That's really not my suggestion.

If you need to write a full application why does it have to happen in a browser? Why can't we use the model so successfully employed on mobile devices?

A security hole in a web app is only confined to that app/domain. Where a security hole in a native app can be much more detrimental. It's also less work to make a web app if you want it to work on more then one platform. Making a web app is also a better dev experience, at least for GUI apps.

I know it's just trendy to hate on node/npm, but these are really turning into bad products and services. You need usability before there's a trade-off. Reputation is still important.

while this post is about the more general attack vector, its worth pointing out that if you use a modern credit card tool like Stripe Elements it is impossible to steal credit card numbers as they are embedded in an iframe and your page just gets a token that is meaningless to anyone else

The attacker could inject code that replaces the Stripe integration with something that looks identical to the end-user but harvests the CCs.

True, but this is easily detected. It’d have to provide tokens to the page, which would fail 100% of the time when sent to the Stripe API. These types of attacks are really only effective when difficult to detect, and therefore allowed to run for more than a very short time

The article addressed that objection, though.

1. Only activate on sessions where it’s never activated before.

2. Only activate one in, say, seven times.

3. Sniff form once, fail to submit, show user "unexpected error, try again", allow normal behavior, success, user moves on

4. pull the numbers

It’s nice to imagine we follow up every single JS or user reported “it fails sometimes?” error, but a clever malicious script author knows we don’t check 100% of the time.

We are required to expect transient and potential errors from third party services, not even Stripe has 100% uptime. This kind of theft looks like a transient and unexplainable error.

The Stripe frames are set up by JS. What’s stopping an attacker from wrapping it in their own form, collecting, and having it still submit the details to Stripe?

The attacker's code isn't hosted on Stripe's domain. It could render the real Stripe iframe, but could not interact with it (ie. populate form values and submit).

An attacker could try to POST the values directly to Stripe instead, but Stripe presumably uses CSRF-prevention techniques (eg. a token in the form) to stop this as well.

Easy Solution: on login and payment forms, go to a plain HTML page, use a standard form to submit without JS (or only JS you write). Forget client-side validation and/or write it yourself.

Half the internet could go static html with css animations without an js and do bacis analytics from server logs. The rest could do webassembly, compiled from another language, missing and npm-dependecy-hell.

I've gotta post the obligatory Ken Thompson hacking the compiler story


Wow thats a rough read. Could something similar be done to django sites ?

Yes. There are basically three steps here...

1. Use social engineering to get your package included as a dependency;

2. Use obfuscation techniques to hide the real intent of the package;

3. Capture data and send it off to a remote server.

First can be pulled off, but really depends on the community / maintainers doing their job right. In case of Django, I imagine that would be pretty hard - while the codebase is giant, by itself it has no other dependencies (except stdlib and pytz). Then 3 could be either much easier or much harder on the backend, depending on how well the box is secured (e.g. outgoing firewall rules).

However the impact could be much more severe, since you're executing code on the server - at the very least you can inject any malicious JS you like, rewrite the CSP headers (if present), just dump the entire DB right away, and a lot of other bad things.

This basic recipe has a chance to work regardless of the target language/package manager. It's all up to your dev process and security regime to catch it.

One advantage Python might have is that the dependencies you use are human readable, they're not compiled, minified or obfuscated. This in theory would make auditing dependencies easier, but I imagine 99% of the time people aren't thoroughly auditing.


Yeah, and Call Manager has a bug that lets incoming callers transfer themselves to intercom. Keep an eye on that Cisco phone on your doctor's desk while he discusses your HIV test results with you. I could be recording it.

What could be the npm package the author of this article is referring to?

"This post is entirely fictional, but altogether plausible, and I hope at least a little educational.

Although this is all made up, it worries me that none of this is hard."

The article is fictional - but designed to sound plausible in order to make web developers think about a problem.

Literally any of them, that's the point.

The author mentions that Chrome Extensions are a bad distribution method. I think he is wrong. First, there are more users of Chrome Extensions than the users of npm, second, most of them don't care what those extensions send over the network. And I guess CSP doesn't apply to browser extensions.

So if you want to steal passwords, make some extension like "Mp3 Youtube Downloader" or "Ad Blocker" and get access to millions of happy users' browsers.

CSP looks like an ugly hack rather than a good solution. Why would you need to specify allowed sources for scripts if you control your HTML code? I don't understand why the author praises it. What a stupid time-wasting technology.

I think you misunderstand the role of NPM in this concept. Using NPM ensures that web sites distribute the code to their users. So it doesn't matter if there's only a few hits using NPM, if they are good hits. As the article mentions, neither Amazon nor eBay are protecting against this. If they somehow added the package to their app, that provides access to every credit card added to those sites. That is much broader than user's installing Chrome extensions.

> CSP look like an ugly hack rather than a good solution ... I don’t understand why the author praises it.

Because it appears to be the ONLY solution to the problem described, unless you never use any third party code that you didn’t build from source and audit.

you dont always control your html due to XSS and CSRF.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact