Hacker News new | past | comments | ask | show | jobs | submit login
How to take over the computer of a Maven Central user (ontoillogical.com)
405 points by akerl_ on July 28, 2014 | hide | past | web | favorite | 127 comments



At Open Whisper Systems, we wrote a small open source gradle plugin called "gradle-witness" for this reason. Not just because dependencies could be transported over an insecure channel, but also because dependencies could be compromised if the gradle/maven repository were compromised:

https://github.com/whispersystems/gradle-witness

It allows you to "pin" dependencies by specifying the sha256sum of the jar you're expecting.


This!

I am totally happy donating $10 to whisper systems for this work instead of forcing me to donate $10 to Apache Foundation (although a worthy cause) to be able to get https access to Maven Central.


Thanks! You can support ongoing development by sending donations to our BitHub. Anyone who commits to an Open Whisper Systems repository will get 2% of your donation:

https://whispersystems.org/blog/bithub/

The dashboard:

http://bithub.whispersystems.org/


That's interesting, doesn't Maven offer a checksum option to begin with? I feel like just the other day, someone was saying "who ever uses checksums anyway," but a package manager run over insecure methods seems like the perfect time to do so.


From the article:

> When JARs are downloaded from Maven Central, they go over HTTP, so a man in the middle proxy can replace them at will. It’s possible to sign jars, but in my experimentation with standard tools, these signatures aren’t checked. The only other verification is a SHA1 sum, which is also sent over HTTP.


I agree that they aren't checked by default, you'd need to implement it (as the parent commenter seemed to be doing with their gradle plugin). Regarding the sums being sent over HTTP as well, seems that you'll need a checksum for your checksum. Ultimately, though, it just seems that it'd be best to avoid this while it's over HTTP.


For bitcoinj we developed a similar plugin for Maven:

https://github.com/gary-rowe/BitcoinjEnforcerRules


Hi moxie, might I ask if you've considered SHA-384 instead?

If I understand correctly, SHA-256 is part of the SHA-2 family of hash algorithms, and like SHA-1, when used alone it is subject to length extension attacks.

SHA-384 is also a member of the SHA-2 algorithm family, but is immune to length extension attacks because it runs with an internal state size of 512 bits -- by emitting fewer bits than its total internal state, length extensions are ruled out. (Wikipedia has a great table for clarifying all these confusing names and families of hashes: [5].) Other hashes like BLAKE-2 [1], though young, also promise built-in immunity to length-extension attacks. mdm [2] is immune to this because the relevant git datastructures all include either explicit field lengths as a prefix, or are sorted lists will null terminators, both of which diffuse length extension attacks by virtue of breaking their data format if extended.

Not that it's by any means easy to find a SHA-256 collision at present; but should collisions be found in the future, a length extension attack will increase the leverage for using those collisions to produce binaries that slip past this verification. An md5 Collision Demo[3] by Peter Selinger is my favourite site for concretely demonstrating what this looks like (though I think this[4] publication by CITS mentions the relationship to length extension more explicitly).

(I probably don't need to lecture to you of all people about length extensions :) but it's a subject I just recently refreshed myself on, and I wanted to try to leave a decent explanation here for unfamiliar readers.)

--

I'm also curious how you handled management of checksums for transitive dependencies. I recall we talked about this subject in private back in April, and one of the concerns you had with mdm was the challenge of integrating it with existing concepts of "artifacts" from the maven/gradle/etc world -- though there is an automatic importer from maven now, mdm still requires explicitly specifying every dependency.

Have you found ways to insulate gradle downloading updates to plugins or components of itself?

What happens when a dependency adds new transitivity dependencies? I guess that's not a threat during normal rebuilds, since specifying hashes ahead of time already essentially forbids loosely semver-ish resolution of dependencies at every rebuild, but if it does happen during an upgrade, does gradle-witness hook into gradle deeply enough that it can generate warnings for new dependencies that aren't watched?

This plugin looks like a great hybrid approach that keeps what you like from gradle and while starting to layer on "pinning" integrity checks. I'll recommend it to colleagues building their software with gradle.

P.S. is the license on gradle-witness such that I can fork or use the code as inspiration for writing an mdm+gradle binding plugin? I'm not sure if it makes more sense to produce a gradle plugin, or to just add transitive resolution tools to mdm so it can do first-time setup like gradle-witness does on its own, but I'm looking at it!

--

Edited: to also link the wikipedia chart of hash families.

--

[1] https://blake2.net/

[2] https://github.com/polydawn/mdm/

[3] http://www.mscs.dal.ca/~selinger/md5collision/

[4] http://web.archive.org/web/20071226014140/http://www.cits.ru...

[5] https://en.wikipedia.org/wiki/SHA-512#Comparison_of_SHA_func...


Re: Length extensions

Tldr: The length extension property of the Sha2 family has nothing to do with collisions. If you are afraid of future cryptanlytic breakthroughs regarding the collision resistance of Sha2 use the concatenation of SHA-256 and SHA3-256.

You are mightily confused about the connection of length extensions and collisions. (Bear with me, I know you are already familiar with length extensions, but I need to introduce the notation to explain your misunderstanding. Also it is nice to introduce the other readers to the topic.) All actually used cryptographic hash functions are iterative hash functions. That is the message is first padded to a multiple of the block size and then cut into blocks b_1,...,b_n. The core of the hash function is a compression function c (or maybe we should better call it mix-in function if we also have the Sha3 winner Keccak in mind) that takes the internal state i_k and a block b_k+1 and outputs the new internal state i_k+1: i_k+1=c(i_k,b_k+1). The hash function has a pre-defined initial state i_0. At last there is a finalization function f that takes the last internal state i_n (the state after processing the last message block) and outputs the result of the hash function. MD5, SHA1 and the SHA2 family have the problem that the finalization step is just the identity function, i.e. i_n is directly used as output of the hash function. Thus if you take the output you can continue the hash chain directly and calculate the hash of a longer message b_1,...,b_n,b_n+1,...,b_n+m.

Normally this is not that interesting (you could have just calculated the hash of the long message directly and avoided problems with padding that I'm ignoring to keep it simple). But in some situations you might not actually know b_1,...,b_n. For example if you calculate an authentication tag as t=h(k||m) where k is secret key you share with the API and m is the message you want to authenticate. As t is published an attacker an pickup the internal state i_n=t and calculate the authentication tag of an extension: t2=i_n+m=h(k||m||m2) authenticates m||m2. Twitter had this problem with its API: http://vnhacker.blogspot.de/2009/09/flickrs-api-signature-fo... . If you want to do authentication tags right, use HMAC: https://en.wikipedia.org/wiki/Hash-based_message_authenticat...

Now a collision describes the case where two messages b_1,...,b_n and d_1,...,d_m produce the same hash, i.e. the output of the finalization function is identical. As SHA2 does not have a finalization function and I would not believe that collisions in SHA3 would only be possible in the finalization step (squeezing step in the terminology of the Keccak sponge function) if it is possible at all I'll ignore it as a source for collisions for the moment. Thus a collisions means that after processing the k blocks of b_1,...,b_k and the l blocks d_1,...,d_l the internal state is identical. As the hash function is iterative we can continue both block lists with identical blocks e_1,...,e_j and still have the same internal state. As the state is identical it does not matter which finalization function we apply the output is identical and b_1,...,b_k,e_1,...,e_j and d_1,...,d_l,e_1,...,e_j hash to same value. Thus whenever you have one collision in the internal state you can always produce infinitely many colliding messages. This property is independent of the finalization function and blake2, Sha-384, Skein as well as Sha3 allow such attacks (once one collision is found). If anything a fancy finalization function can make things worse by mapping two different internal states to the same output. But because of the problems with the length extension attacks (see twitter) a one-way finalization function is a standard requirement for hash functions nowadays.

If you are really concerned about future cryptanalysis of SHA2 you can spread the risk on several hash functions where each has to be broken to break the security of the overall construction. These are called hash combiners. The most simple one is the concatenation of two hashes: Sha256(m)||Sha3-256(m). This one will be collision resistant if either Sha256 or Sha3-256 is collision resistant. There are also combiners for other properties, like pseudo randomness, and also multi-property combiners. See for example http://tuprints.ulb.tu-darmstadt.de/2094/1/thesis.lehmann.pd... .


I was recalling "Multicollisions in iterated hash functions" published at CRYPTO by Antoine Joux in 2004 talks about this subject.

https://www.iacr.org/archive/crypto2004/31520306/multicollis...

It amplifies collision finding by finding r-collision in time log2(r)2n/2, if r=2t for some t. Figure 1 in section 3 is a very intuitive picture of how.

The third paragraph of section 5 states that the attack is inapplicable on hashes with truncated output.

Although, now that you mention it, my dusty crypto knowledge is not enough for me to explain why this technique would be inapplicable on the internal states of a decomposed hash even if it's inapplicable on the final output. I shall have to read more.

I agree that if one can find a way to apply an HMAC it should also obviate concerns around length extension, and also that combining hashes (carefully) should be able to break only when both are broken.


> The third paragraph of section 5 states that the attack is inapplicable on hashes with truncated output.

They actually refer to the large internal state size that makes the generic attack infeasible (for a state size of n bit you need 2^(n/2) many tries to find a collision on average).

> in the second, the attacker needs collisions in the full internal state of the hash function, rather than on the truncated states.

But as both sha256 and sha3-256 have internal state sizes >= 256 bit these are definitely enough for the foreseeable future to protect against generic attack. More interesting is the question whether you can combine specialized cryptanalysis two different hashes to build multicollisions. Apparently you can, at least for MD5 and SHA1: http://www.iacr.org/archive/asiacrypt2009/59120136/59120136....


Cool, but am I right that it requires that you trust the jar dependency you have initially on your local filesystem?


For Leiningen at least the goal is eventually to be able to flip a switch that will make it refuse to operate in the presence of unsigned dependencies. We're still a ways away from that becoming a reality, but the default is already to refuse to deploy new libraries without an accompanying signature.

Edit: of course, the question of how to determine which keys to trust is still pretty difficult, especially in the larger Java world. The community of Clojure authors is still small enough that a web of trust could still be established face-to-face at conferences that could cover a majority of authors.

The situation around Central is quite regrettable though.


Leiningen also uses Clojars over HTTPS by default, I believe, so even without a web of trust, Clojars is still more secure than Central.


Technically true, but practically this doesn't mean anything.

No one uses Clojars on its own, so if an attacker were able to perform a MITM attack, they could inject a spoofed library into the connection to Central even if the library should be fetched from Clojars.


It's not specifically named in the article, but the software shown with the firewall popup is Little Snitch, and it's great:

http://www.obdev.at/products/littlesnitch/index.html


The project to offer ssl free to every user of Maven Central is already underway. Stay tuned for details.


Author here.

Brian are you speaking as a representative of Sonatype, or are you a 3rd party?


As a representative of Sonatype.

The reality of cross build injection has been discussed for many years, I even linked to an XBI talk in my blog post announcing the availability of SSL.

The reality is that prior to moving to a CDN, it was going to be pretty intensive to offer SSL on the scale of traffic we were seeing. The priority at that time was ensuring higher availability and providing multiple data centers with worldwide loadbalancing.

On our first CDN provider, they could not perform SSL certificate validation and thus were themselves susceptible to a MITM attack. So the decision at that point was to run SSL off of the origin server. We wanted to make it essentially free but wanted to ensure that the bandwidth was available for those that cared to use it, hence the small donation.

The situation is different today with our new CDN, they can validate the certificates all the way through and that's how we intend to deploy it.

We won't be able to enable full https redirections for all traffic since this would cause havok in organizations that are firewall locked and for tools that don't follow redirects. Each tool would need to adopt the new url. I've already suggested this change occur in Maven once we launch.


I am not familiar with Sonatype and what relation it has with maven, but have you considered adding BitTorrent protocol to maven? This might help reduce traffic considerably.


He was speaking as a rep of Sonatype, as am I (head of engineering for same). We'd be happy to speak to you more. There is a lot going on from a community perspective and this project has been in the backlog for a while and now rapidly working its way to the top given the apparent sea change in attitude associated with security.


I'm pretty surprised that this article is news. Sonatype has been open about SSL for Maven Central since there has been Nexus or maybe even longer. I remember Jason van Zyl talking about this seven or more years ago.


I would assume this is what we should stay tuned to? http://www.sonatype.com/clm/secure-access-to-central


Yawn. Let me know when you're ready to announce a project to competently sign and verify artifacts.


Signatures have been required on Central for years and there are tools to verify them, including repository managers.

We strongly do not believe that you should entrust your private key to anyone else for signing, which is what others have done to make it easy....yet less secure.


Maybe you could assist them?


SSL would have partially mitigated this attack, but it's not a full solution either. SSL is transport layer security -- you still fully trust the remote server not to give you cat memes. What if this wasn't necessary? Why can't we embed the hash of the dependencies we need in our projects directly? That would give us end-to-end confidence that we've got the right stuff.

This is exactly why I built mdm[1]: it's a dependency manager that's immune to cat memes getting in ur http.

Anyone using a system like git submodules to track source dependencies is immune to this entire category of attack. mdm does the same thing, plus works for binary payloads.

Build injection attacks have been known for a while now. There's actually a great publication by Fortify[2] where they even gave it a name: XBI, for Cross Build Injection attack. Among the high-profile targets even several years ago (the report is from 2007): Sendmail, IRSSI, and OpenSSH! It's great to see more attention to these issues, and practical implementations to double-underline both the seriousness of the threat and the ease of carrying out the attack.

Related note: signatures are good too, but still actually less useful than embedding the hash of the desired content. Signing keys can be captured; revocations require infrastructure and online verification to be useful. Embedding hashes in your version control can give all the integrity guarantees needed, without any of the fuss -- you should just verify the signature at the time you first commit a link to a dependency.

[1] https://github.com/polydawn/mdm/

[2] https://www.fortify.com/downloads2/public/fortify_attacking_...


  Why can't we embed the hash of the dependencies we need 
  in our projects directly?
There's a lot of stuff in Maven, like the versions plugin and the release plugin, to update dependencies to the latest version. This stuff is useful for continuous integration and automated deployment, especially when your project is split into lots of modules to allow code reuse.

With code signing, you can (or hypothetically could, I don't know if anyone does this) check the latest version is signed by the same key as the previous version - whereas just pinning the hash wouldn't allow that.

I agree pinning the hash is useful if the signing key is captured.


I typically find that in practice, strong links between a project and its dependencies are not a drawback.

If large amounts of your code change across different projects at the same time, those projects don't have a very stable API nor are the evidentally actually developing separately, so there's no major reason to pretend they are isolated or introduce a binary release protocol between them. Projects like this will probably find the greatest ease of operation by just sticking to one source repository -- otherwise, commits for a single "feature" are already getting smeared across multiple repositories, and stitching them back together with some hazy concept of "latest" at a particular datetime isn't helping anyone.

The biggest indicator of over-coupled projects that are going to face friction is when ProjectCore depends on ProjectWhirrlyjig, but the Whirrlyjig test code still lives in ProjectCore. This tends to make it very difficult to make releases of ProjectWhirrlyjig with confidence, since they won't be tested until getting to ProjectCore. If projects are actually maintaining stable features on their own in isolation, this shouldn't be what your flow looks like.

Projects that are well isolated generally don't seem to have a hard time committing to stable (if frequent) release cycles. Furthermore, it actually encourages good organizational habits, because it actively exerts pressure against making changes that would cross-cut libraries or make it difficult to create tests isolated to a single project.

In contrast, tools that regularly update to "the latest" version invariably seem to bring headaches down the road.

Getting "the latest" is ambiguous. It means that your build will not be reproducible in any automated way, whether it's one week or one hour from now. It's a moving target. Can you do a `git bisect` if something goes wrong and you need to track down a change?

Getting "the latest" also doesn't take into account branching. This is something a team I'm currently on is poignantly aware of: feature branches are used extensively, and when this concept spans projects, we found "latest" ceases to mean anything contextually useful.

If you're working on projects where a CI server is actually part of the core feedback loop (say your test suite has gotten too unwieldy for any single developer to run before pushing), then a fetch "latest" can be helpful to enable this during development. But even if jenkins informs you the build is green, it's important to remember this won't be reproducible in the future; you should make an effort to get back to precisely tracked dependencies as soon as possible.

mdm deals with this by letting you use untracked versions of your dependency files... but it will consistently show that in `git status` commands, so that you A) know that your current work isn't reproducible by anyone else and B) everyone is encouraged to make an effort to get back on track ASAP.


Perhaps as a stopgap Maven Central (or a concerned third party?) could publish all of the SHA1 hashes on a page that is served via HTTPS. This would at least allow tools to detect the sort of attack described in the article.


This is a horrible policy made by sonatype. A better alternative of mavencentral should be created...


It is not the fault of Sonatype. We should not trust Central right away. If I worry about malicious artifacts from Central, I must host my own repository and manage artifacts myself. And that way, I can just map other repositories I trust more.


Trusting Central > Trusting everyone on the network path between you and central.


Evilgrade (https://github.com/infobyte/evilgrade) is a similar tool that works on a wider variety of insecure updaters. Perhaps a module could be written? Maybe one already exists, I haven't played with it in a while


I'm torn on how I feel about security being a paid feature in this case. Here the onus is being placed on the user, yet many won't be conscious of the choice they're making.

The tiff mentioned in the article was interesting to read. > https://twitter.com/mveytsman/status/491298846673473536


There's two problems with this that I see. The first is, as you said, that a lot of people won't even realize they need to pay for secure downloads.

I also feel like in the case of something like a package manager, this potentially harms the wider community in ways that charging for features in a specific piece of software doesn't.


Technically paying for an auth token is not sufficient. The token cannot be used directly by Maven or Leiningen, but only by a Nexus proxying repository manager. Not a big deal if you work at a professional Clojure shop, but a huge hassle if you're freelance or just getting started.


It is like having a population partially vaccinated, almost pointless on a large scale.


Exposing your users to MITM attacks in order to encourage donations? Pure evil.


If you aren't paying money, you aren't the user, you are a product.

Freemium models often suck because of stuff like this[1]. But if the "users" would just consider it normal to pay money then we wouldn't have crazy things going on where people providing critical infrastructure services need to figure out how to "convert" their "users." Instead, say, every professional Java shop would pay $100 a year or so for managed access. Projects that want to use it like a CDN so their users could download would pay a fee to host it.

They have bills to pay. They'll cover them one way or the other. If we pay directly at least we know what the game is.

[1] They could be inserting advertising into the jars. Hey, at least it would still be a "free" service, right?


That's a little paranoid. Let's see, we'll completely ruin our rep and our core business activity just so you're forced to donate--not to us, but to this open source group over here. Dude, put down the pipe.


Isn't it exactly what they are doing, though? The only reason you would want HTTPS is to prevent MITM attacks. By making it a premium feature, they are making MITM attack mitigation a premium feature.


My main experience with Maven has been downloading some source code, and having to use Gradle to compile it. It went and downloaded a bunch of binaries, insecurely. There were no actual unsatisfied dependencies; it was just downloading pieces of Gradle itself.

I would've much rather had a Makefile. Build scripts and package managers need to be separate.


> Build scripts and package managers need to be separate.

This. Especially when there's broken links, you're gonna have a bad (and long) time.


I will join the small chorus agreeing that build scripts and package managers should be separate. Most folks I work with disagree.

Curious if anyone knows of any well done takes on this. In either way. (If I'm actually wrong, I'd like to know.) (I fully suspect there really is no "right" answer.)


Finally someone who sees the real problem.


jCenter is the new default repository used with Android's gradle plugin, I haven't used it myself yet but it looks like the site defaults to HTTPS for everything: https://bintray.com/bintray/jcenter


Full disclosure - I am a developer Advocate with JFrog, the company behind Bintray.

So,jcenter is a Java repository in Bintray (https://bintray.com/bintray/jcenter), which is the largest repo in the world for Java and Android OSS libraries, packages and components. All the content is served over a CDN, with a secure https connection. JCenter is the default repository in Goovy Grape (http://groovy.codehaus.org/Grape), built-in in Gradle (the jcenter() repository) and very easy to configure in every other build tool (maybe except Maven) and will become even easer very soon.

Bintray has a different approach to package identification than the legacy Maven Central. We don't rely on self-issued key-pairs (which can be generated to represent anyone, actually and never verified in Maven Central). Instead, similar to GitHub, Bintray gives a strong personal identity to any contributed library.

If you really need to get your package to Maven Central (for supporting legacy tools) you can do it from Bintray as well, in a click of a button or even automatically.

Hope that helps!


You mention both Bintray and Groovy. Look at the Bintray download stats for Groovy [1] and it reports 170,000 downloads in the past month. But 100,000 of them happen on just 6 days, 40,000 of those on just 1 day (18 July). Click on country and see that 120,000 of them came from China. Comparing the numbers suggests 100,000 downloads of Groovy from Bintray during July were faked. Another 900,000 downloads of Groovy were faked during April and May. I'm not sure I trust JCenter when the 2 technologies you recommend for it have together been used to fake one million downloads.

[1] https://bintray.com/groovy/maven/groovy/view/statistics

[2] http://groovy.codeplex.com/wikipage?title=Blog07#2


I am not sure how the fact that Bintray is DDOSed from China (and still fully operational without any interruption) dismisses your trust in Bintray.

I am also not sure how you figured out those are fake downloads. For sure the script that DDOSes Bintray from China won't use Groovy, but it's a still a valid download. Not for showcasing how popular Groovy is (they factor out those things when talking about the numbers), but for the raw statistics - for sure. The file was downloaded, wasn't it?

Please elaborate?


Thanks for pointing this out, this is REALLY new, perhaps added in the most recent build. I didn't see notes for this either.


Yeah, my new projects from Android Studio Beta (released at I/O) use it as the default which was super confusing at first since I didn't see any release notes regarding it.


Nice, I was going to ask if maybe Google or someone invested heavily in Android could step up and provide a secure source of dependencies for everyone.


The biggest problem with this policy is that new users, or even experienced ones, are likely not aware of it. This is a very serious problem that should be addressed quickly.

edit: and with websites everywhere routinely providing SSL, it seem crazy that it has to be a paid feature for such a critical service.


Funny thing is that CERT doesn't have a problem with shenanigans like this. They are more concerned with buffer overflows than by-design stupidity.


So in principle, it's doing the same thing as:

    $ curl http://get.example.io | sh
which we all know is bad. But in this case, it's hidden deep enough that most people don't even know it's happening.


Sort-of. That has the additional non-malicious risks. A broken connection turns "rm -r /var/lib/cool/place" into "rm -r /var/" and the shell processes that.


It's also different in that the artifact signatures are there for you to check if you want to; it's just that most people don't bother, and even if you do, you probably can't tell which keys you should trust to begin with.


Downloading from HTTP is not an issue (as far as integrity is concerned) if maven were to validate the downloads against some chain of trust. But apparently it is not.

Now I am wondering what tool actually uses those .asc files that I have to generate using mvn gpg:sign-and-deploy-file when I upload new packages to sonatype...


Nexus and Artifactory can be configured to check the signatures, but you're into Web of Trust territory.

I wrote an article about mitigating this attack vector a while back which might be useful: http://gary-rowe.com/agilestack/2013/07/03/preventing-depend...


Running `lein deps :verify` does this.


How does it know what the correct signing key is?

edit: Looked up answer myself. Lein downloads whatever key the signature claims to be made with from public keyservers. How does this provide any additional security over not bothering to verify signatures?


The difference is that you could track down the keys either directly from the author or by someone who has already personally verified and signed the author's key. In practice this is very difficult, and using a key that you haven't gotten your friends and co-workers to sign is not any better than skipping the signing altogether.


Even if you have carefully installed the correct key from the author, if your download is intercepted and an attacker sends you a bogus artifact and signature it looks like Lein will just retrieve the attackers key from the keyserver and validate the signature.


This is true; at the time of implementation so few Clojure libraries were signed that taking it the rest of the way was not a clear win.

But clearly the job isn't finished; even if Clojure developers do a good job of signing packages and signing each others keys, (which is not generally true today) it still needs to distinguish between signed packages and trusted packages. Hopefully the next version can add this. But as with anything that requires extra steps from the developer community, a thorough solution is going to take time.


Exactly and sadly this is all too common.

Did you know that Xine, the media-player, has a similar thing behind the scenes? I didn't

http://blog.steve.org.uk/did_you_know_xine_will_download_and...


All of Maven central is only 180gb, according to https://maven.apache.org/guides/mini/guide-mirror-settings.h...

How hard would it be to just mirror it to S3 and use it from there via HTTPS?


That text was last updated in 2010; currently the repository is about 1.1TB.

http://search.maven.org/#stats


Ack! This does not do anything to increase my confidence in the Maven project's guardianship of our collective security.


> How hard would it be to just mirror it to S3 and use it from there via HTTPS?

Trivial.

Now ask: how hard would it be to pay the bandwidth charges, assuming it were a public bucket. I don't know the answer, but it's a much more interesting question.


Yes, a very interesting question. It'd be very expensive.

It occurs to me that BitTorrent technically solved the problem of high bandwidth costs long ago; millions of people transfer 1.1TB files around every day without worrying about bandwidth costs at all.

Can we come up with a similar system for jars? Why are we still relying on central servers for this at all?


I am guessing with Amazon, even though they run Java, they wouldn't do it out of the good of their hearts.


If I understand this correctly, maven based builds can contain dependencies on libraries hosted on remote servers. golang build system has (or had) something similar too. Witnessing this trend take hold is astonishing and horrifying in equal parts. Not just as a security problem (which is clearly obvious) but also a huge hole in software engineering practices. How can anyone run a production build where parts of your build are being downloaded from untrusted third party sources in real time? How do you ensure repeatable, reliable builds? How do you debug production issues with limited knowledge of what version of various libraries are actually running in production?


Java developers kind of laugh when I explain them that Linux distros struggle to bootstrap Maven from source due to being a non-trivial tool that depends on hundreds of artifacts to build.

The point is, what do you care that your repo is local, or that your jars are secured, if the tool you got maven itself in binary form, from a server you don't control?

That is the whole point of Linux distros package managers. It is not only about dependencies. Is about securing the whole chain and ensure repeatability.

Maven design, unlike ant, forces you to bootstrap it from binaries. Even worse, maven itself can't handle building a project _AND_ its dependencies from source. Why will the rest of the infrastructure be important then?

Yes, Linux distros build gcc and ant using a binary gcc and a binary ant. But it is always the previous build, so at some point in the chain it ends with sources and not with binaries.

And this is not about Maven's idea and concept. If it had depended on a few libraries and a simple way of building itself instead of needing the binaries of half of the stuff it is supposed to build in the first place (hundreds), just to build itself.


> Yes, Linux distros build gcc and ant using a binary gcc and a binary ant. But it is always the previous build, so at some point in the chain it ends with sources and not with binaries.

I don't think so. The first versions of GCC were built with the C compilers from commercial UNIX from AT&T or the like. The first Linux systems were cross-built under Minix. At some point you'll go back to a program someone toggled in on the front panel, but we don't have the source for all the intermediate steps, nor any kind of chain of signatures stretching back that far.

> And this is not about Maven's idea and concept. If it had depended on a few libraries and a simple way of building itself instead of needing the binaries of half of the stuff it is supposed to build in the first place (hundreds), just to build itself.

Any nontrivial program should be written modularly, using existing (and, where necessary, new) libraries. Having a dependency manager to help keep track of those is a good thing. I don't see that it makes the bootstrap any more "binary"; gcc is built with a binary gcc for which source is available. Maven is built with a binary maven and a bunch of binary libraries, source for all of which is available.


It's fairly easy to setup a local server containing all your jars and still use maven or ivy. I do that at my current employer.


We use a local repo as well(its easy to setup) and so this type of security is not something we even think about. If we are adding/version changing dependencies we just have to put a little more work into making sure the jar that goes to our local repo is good, but that doesn't happen every day. Of course when prototyping or just playing around this could become an issue...


But in that case why maintain two separate repositories? One for "our code" and one for external. I'm assuming the code in these repositories is open source... right? Why not simply check in the version to be used right in your local SCM?


There are a bunch of different SCMs. It's nice to decouple "hold released builds at specific versions" from your general development repository.


Hosting your own makes sense for multiple reasons: you can be assured what code you are getting, you aren't limited by bandwidth rates of remote providers, and you get to control up/down time. The first is a must; the second and third make life more tolerable.


I'm honestly struggling to understand this line of reasoning. What you seem to be describing amounts to maintaining two different code repositories. Both of them have to be versioned and coordinated with each other. Why not just check in all the code you're using in your real SCM?


Because the word "repostitory" has two different meanings, and you're confusing them.

For external libraries, a "distribution repository" is a file store for a bunch of different projects. It typically stores released binaries for distribution (libfoo.1.5.pkg, libfoo.1.6.pkg, libbar.2.3.pkg, etc...), but could also contain clones of external source repos (libfoo.git/, libbar.hg/, etc...).

Which brings us to the other meaning - a "source repository" is the version-controlled store for the source of a single project.

The repo for external libraries is a distro repo, where the repo for your project is a source repo.

If you're checking the code for multiple projects into a single SCM, why bother maintaining separate source repos at all? Why don't we all use one giant source repo for all projects everywhere? Just check out "totality.git" and use the /libfoo/ and /libbar/ subdirectories. And in your internal company branch, add an /ourproject toplevel for your own code?

When you have answered that question, you will realise why we keep separate projects in separate source repos/SCMs.

Note that you will probably want to publish different "releases" of your own project to an internal distro repo for your internal clients to use, e.g. ourproject.1.2.pkg, ourproject.1.3.pkg, etc...


Thank you for your answer. I may have been a bit imprecise with my point there. I'm well aware of the reasons to maintain multiple repositories for different projects. The weak part of the story above is that we're not talking about separate projects. We're taking about a single project, where the build system is now effectively responsible for source control for some part of the project using totally different protocols for communication, authentication and versioning. Where's the win?


"the build system is now effectively responsible for source control for some part of the project"

Well, I think I've found your problem. :-/

I had a close call with nearly installing/building some Java packages a couple of weeks ago, and due to reasons I eventually decided to try and find a different solution. Looks like the bullet I dodged was bigger than I thought.


"How can anyone run a production build where parts of your build are being downloaded from untrusted third party sources in real time? How do you ensure repeatable, reliable builds?"

By not downloading everything from maven central in real time. Companies usually run their own repository and builds query that one. Central is queried only if the company run repository is missing some artifact or they want to update libraries. How much bureaucracy stands between you and company run repository upgrades depends on company and project needs.

As for production, does anyone compile stuff on production? I through everyone sends there compiled jars. You know what exact libs are contained in that jar, no information is missing.


In golang-land it is popular to deal with this by vendoring all the packages you depend on. There are several tools to manage this like godep. This is my preferred method as it allows for the reliable, repeatable build you are talking about.

There are other schools of thought, like pinning the remote repos to specific commit-id. These are better than nothing, but still depends on 3rd party repos which I think is to risky for production code. It is great for earlier stages of a project when you are trying to work out the libraries you will use and also need to collaborate.


A couple of years ago we were trying to use BigCouch in a product. The erlang build tool was happy to have transitory dependencies that were just pointing at github:branch/HEAD. It got to the point where we'd build it on a test machine, and then just copy the binary around.


With Maven you specify the versions of libraries, jars are cached locally, and you can run your own local Maven server if you need to.


Well, isn't that git or any other sane source control system do that for you? You are maintaining your own repositories of these external dependencies anyway. So why split pieces of your project on two different repos each with their own versioning, network protocol, authentication etc. What exactly are you getting in return for the added complexity?


git doesn't necessarily handle large binaries well (less true in recent versions). Also, when using Maven (etc) you can upgrade libraries just by bumping up the version # rather than hunting down the new jar manually. You can also use the same cached jar in multiple apps/modules rather that committing it in all of them - good if you've got a modularised application.

There are also other benefits like automatically downloading and linking sources and documentation (more relevant if using an IDE).


npm has the same problem of sending packages over http, but it's even worse since on average each node package uses about a billion other packages and because injecting malicious code in JavaScript is incredibly easy.

And to be clear, just http here is not the issue. It's http combined with lack of package signing. apt runs over http, but it's a pretty secure system because of its efficient package signing. Package signing is even better than https alone since it prevents both MITM attacks and compromise of the apt repository.

In fact, apt and yum were pretty ahead of their time with package signing. It's a shame others haven't followed their path.


npm by default uses HTTPS, and has for more than 3 years. It's a little confusing because the loglines all say "http" in green, but if you actually look at the URLs being downloaded they are all to https://registry.npmjs.org/


I wrote a Maven plugin to avoid this.

It's available under MIT licence: https://github.com/gary-rowe/BitcoinjEnforcerRules


luarocks has the same problem. You don't need SSL, you need the packages to be signed.


I wonder how many enterprise apps have been backdoored through this flaw over the years by now.


I'd immediately backdoor the rt.jar and the compiler so that future binaries have the backdoor. Trusting trust ...


Sorry to nitpick, but you might wanna fix this typo: s/pubic/public :)


Yeah have you ever wrote code on the play platform? There is your proof of concept at least on earlier versions static injection using annotations ... It's also how spring works and almost all dynamic ... Hell you can jit your code don't even need to compile it into a class the run time can do it for you ... That why I always compile my jar files so they can't be read as a compressed file anyway pretty cool sounds like you could have a lot of fun with someone doing this ... You could turn there computer into anything you want using Java command line functionality .... I.e. System.get(os.name) if windows do this if OS X do this if lunix do this using Java.lang.runtime.exec then after you open the back door to there computer time for socket connections and getoutputstream etc... Anyway point being java is a cross platform lang so there is a world of possibilities and most of the time they are running this from an IDE so if you inject a sudo call who knows what could happen


The vulnerability even has a name: Cross-build injection attacks. I wrote about it some time ago [1], [2]. The complete answer includes verifying the (now mandatory) PGP signatures [3] of artifacts in Maven Central. But you need a web-of-trust for that and the whole process is rather impractical currently.

[1] http://branchandbound.net/blog/security/2012/03/crossbuild-i... [2] http://branchandbound.net/blog/security/2012/10/cross-build-... [3] http://branchandbound.net/blog/security/2012/08/verify-depen...


Instead of going WOT, why not just get a specific SHA hash, instead of a specific version?


Sure, you could obtain a hash out-of-band and pin to that. Not much more convenient given you have to do it for all transitive dependencies and Maven-plugin dependencies as well.


Thanks to whomever changed the title; I didn't like the original title, but couldn't come up with a better accurate one.


This is indeed a problem that needs to be addressed at some point. The MITM possibility has been mentioned before at SE http://stackoverflow.com/questions/7094035/how-secure-is-usi...


So if they need some money, what is a better revenue model for them?

- charge some token amount of money to projects (harms the ecosystem, probably not a good idea)

- charge some amount for projects to host old versions, or for users to access old versions (same idea as the first, just less so)

- charge for access to source jars

- paid javadoc hosting

- rate-limiting for free users (the "file locker" model; particularly effective at convincing people sharing an office IP into paying up)

Any others?


They claim the money is a donation to a open source software project, so they don't even need the money.


For those of you in the Python world concerned about such a thing, check out Peep: https://pypi.python.org/pypi/peep

It's a pip wrapper that expects you to provide hashes for your dependencies in requirements.txt.

There was a lightning talk at PyCon this year, it seems super easy to use (though admittedly I'm not using it regularly yet).


The problem goes deeper. That firewall (Little Snitch) updates itself over port 80, so most likely unencrypted.


> When can this happen? If you ever use a public wifi network in a coffee shop

Just don't do this. There is no such thing as a free lunch (or wifi).


I do it. I always thought, "What's the harm? Everything important is over SSL these days". Apparently not!


The NSA will proxy anyone they feel like.


What firewall is that? Looks nice.



We store jars in git repo...


I understand the need to raise money for projects, but the attitude[1] that security is an optional "premium" feature needs to end.

It should be no different from shipping broken code. You can't just say, "oh, well we offer a premium build that actually works, for users that want that." Everybody needs it.

Evernote made this mistake initially when SSL was originally a premium feature. They fixed it.

Granted, there are degrees of security but protection from MITM attacks is fundamental. (Especially for executable code!)

[1] https://twitter.com/mveytsman/status/491298846673473536

UPDATE: @weekstweets just deleted the tweet I was referencing where he described security as a premium feature "for users who desire it" or words to that effect.


But will users in aggregate pay more when extra effort and resources are expended on security? Will you?

If the answer is no, then the smart developer has no financial incentive to do so, and every reason to segment security out as a premium feature.

Maybe MITM vulnerability counts as broken code. But as always, markets win. I don't think the status quo will change until users consider security assurances worth hard dollars.


Good luck getting your users to come back and trust you after a major security incident that compromises their systems. I would turn it around and think of it this way, is it worth giving up some profit to help ensure you don't have embarrassing and damaging security issues? It's like insurance, you pay the premium and hope you never need it.


If they aren't paying you, they aren't your users.

This reinforces my priors that there is very little "free-as-in-beer" software.


The software that runs the repository is nearly irrelevant for the purpose of this discussion. The question is solely about services, which are a lot more difficult to fund at scale.


By that logic, smart developers have no financial incentive to fix bugs unless it's for a paid upgrade.

Think through that a little more and I think you'll find there is long-term ROI in the form of customer trust and goodwill. You'll buy the product because it works and won't hurt you, and basic security should be part of "won't hurt you".


>> By that logic, smart developers have no financial incentive to fix bugs unless it's for a paid upgrade.

Right! And that's actually a valid model. Ask HP about what it'll cost to upgrade the firmware on your enterprise server..


It's valid, but the claim was that it's the only valid model for smart developers, which is false.


Think through a little more of what the parent poster said. If companies truly did gain that much more trust and good will from secure code, they would all be doing it.


But they are, with increasing intensity. Companies really suffer from security blowups and customers are becoming more aware of its importance. This is why the attitude I cited is so dated and needs to finally come to an end.


I've heard "companies are finally taking security seriously" mantra for almost 20 years.[1] Maybe it's true this time. But often the customers don't give a hoot whatsoever, and so the company doesn't either. Admonishing them might feel good, but unless you are paying them money, your opinion is not really an issue to them.

[1] I had a boss who insisted that TJ Maxx was going to collapse because of their security holes. Nope.


When someone bakes kerfuffle muffins, you need to screenshot that shit.


Screenshots can be easily faked.


It's java who cares?




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: