Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why are Apache projects so Java-centric?
97 points by hoodoof on Mar 23, 2015 | hide | past | favorite | 52 comments
It seems most Apache projects are written in Java. Why such a heavy focus on Java at Apache?


Apache offers stewardship for projects of a certain size. That is they have to be big enough to accept the overhead of having for example a vice president of the project.

Big companies create big projects. Big companies create projects written in Java.

Big companies wants a steward for their projects to welcome outside contributors.

  Project - Company - Commercial Product
  Geronimo - IBM - Websphere (parts)
  JackRabbit - Adobe - Adobe AEM
  Felix - Adobe - Adobe AEM
  Apache CXF, Apache Camel - Red hat - Fuse
  Apache Sling, Adobe, Adobe AEM
I am sure there are other examples.

Also, Apache has a pretty good framework for "crossing the t's and dotting the i's" in legal terms for code contributions, so that big companies know that all the code is accounted for and there are no nasty legal surprises lurking - in theory.

For a while, I was part of one small non-Java bit of the ASF: https://tcl.apache.org/rivet/ but I haven't been active for a few years, and indeed, as of last weekend, decided to resign from active membership in the ASF and 'go emeritus'.

It's a good group doing good work.

  Apache has a pretty good framework for "crossing the t's and dotting the i's" in legal terms for code contributions, so that big companies know that all the code is accounted for and there are no nasty legal surprises lurking - in theory.
That sentence exactly summed up the point I was trying to make.

Not "exactly". You also expressed tired cliches such as "Big companies create projects written in Java."

Just because its tired, doesn't mean it isn't true.

I have actually audited systems by "big companies" written in other, e.g. C#, languages.

(p.s. Try "it's")

I have actually authored systems at big and small companies in: C# VB Pascal Java Python Ruby JavaScript

My point was not that big companies don't use languages other than Java. My point was that the tired cliche of Java dominating at big companies is entirely warranted. If I implied that Java penetration at large enterprises is 100%, then there is a lot more wrong with my writing than a missing apostrophe.

C# is Microsoft's Java. How many Python, Go, Ruby or Haskell projects have you audited at "big companies"?

Ruby and Python are everywhere in the enterprise. Haskell is currently "useless" [1] for the enterprise for the obvious human resource reasons.

[1]: https://youtu.be/iSmkqocn0oQ

> the obvious human resource reasons

Can you elaborate on these?

Why bother? Count the down votes in this thread and then note the number of thoughtful rebuttals. (But for the record, imo Haskell is a glorious language.)

Big companies create a lot of projects, some of them are bound to be in Java. I in no way intended to state Java was the only language for business applications.

Everyone always considers Apache as being synonymous with old school Java technologies. But they have a set of newer big data tools that are integral components at almost every major corporation or serious startup these days:

Accumulo, Ambari, Avro, Cassandra, CouchDB, Falcon, Flume, Hadoop, HBase, Hive, Kafka, Knox, Oozie, Phoenix, Pig, Samza, Spark, Sqoop, Storm, Tez, Zookeeper.

I can imagine many database companies in particular Oracle and Teradata wishing Apache wasn't as fantastically competent as they are.

But to the OP's question, many, if not the majority, of those technologies are written in Java. Definitely the majority are on the JVM, since a few of the non-Java ones are Scala and Clojure.

I think the simple answer is that while lots of people do not love programming in Java, it can be attractive for projects that want relatively good performance without trying to implement a system in C++. There are also a lot of tools and companies that have investments in deploying JVM applications.

This is pretty much exactly what I expected to see, once I figured out the potential for most of the hardest portability concerns to be isolated into the JVM. Building the future has always been about stable platforms.

Add Lucene (and by its extension, Elastic Search) to that list. In retrospect, I don't know how we did things before some of these projects came out.

Don't forget about Solr!

Avro is awesome, and I miss working with it.

At my last company, our main method of IPC was sending Avro messages over AMQP (using Apache Qpid). It was the best IPC I've worked with.

To add a little bit more to this, IBM promotes Java pretty heavily and has invested a fair bit optimising Java on POWER. So for them it's an "easy" path to get projects to migrate from x86 to POWER.

> Geronimo - IBM - Websphere

I think you mean WebSphere Application Server Community Edition which has nothing to do with WebSphere Application Server (what is commonly known as Websphere). Quite frankly I don't understand why Apache didn't kill off Geronimo over half a decade ago (other than IBM $$$): it is essentially dead, has no users and only IBM committers. But at least it will end soon, IBM as already announced that they will kill WASCE in favor of Liberty Profile.

> Apache CXF, Apache Camel - Red hat - Fuse

CXF is also the basis for the EAP SOAP stack. I don't know how "big" Fuse is compared to EAP.

> Felix - Adobe - Adobe AEM

Not so sure here. It goes back to Oscar from ObjectWeb. While it is certainly used by Adobe AEM it's also used by several other products even Eclipse Equinox these days.

I think you mean Adobe Flex not Felix XD

Incredibly no, he really meant Felix, as in Apache Felix the OSGi Container.

It seems that Adobe has a product built on that, AEM[1], and , afaik, employs at least one of the main committers of the Felix project.

[1] https://www.adobe.com/marketing-cloud/enterprise-content-man...


Adobe is the main contributor to Apache Sling, which uses Apache Felix and Apache Jackrabbit (and Oak since AEM 6). And Lucene is the default index for AEM (formerly known as Day Communique). There are other Apache projects used, too, like Tika.

And not unrelatedly, Roy Fielding is now at Adobe (formerly at Day), who started Apache the webserver and was chair of the Apache foundation.

It's not so much that Apache writes projects in Java, as that companies with Java projects find that Apache is a good place to give them to.

In particular, the Apache License 2.0 is the most appealing license for "commercial-friendly" open source, whereas the GPL with its "developer-friendly" license correlates highly with FSF-hosted, Gnome/KDE/... or independent projects.

The Java/C(++)? divide also seems to correlate with "commercial-friendly" (Hadoop, Lucene) versus "developer-friendly" (Blender, Linux) software.

What's "at Apache"?

Apache is not a company, it's a foundation, non profit style, founded around the Apache Web Server at first but adding lots of project under the Apache licence. Some where donated to it by commercial companies, others started there.

One of the major sub-projects of the Apache Foundation was a Java based server, called Tomcat, started when Java and Servlets were all the rage, more than a decade ago (1999-2000). It was based on code donated by SUN, and grew very popular fast.

From that, and under the "Jakarta" umbrella, it grew lots of other Java related projects, and this attracted new Java projects in turn.

I think part of it is a network effect; Apache has been successful with Java projects so Java projects tend to go there. There are other homes for other kinds of projects.

I think also there is a kind of pragmatism involved. For instance, even though OpenJDK is now Open source, people like Richard Stallman think Java is an oppressive island, so Java people are less likely to be part of the GNU world.

> people like Richard Stallman think Java is an oppressive island

I was surprised to hear this claim, so I looked into it. The only thing it's turning up in this vein is the description of it being "shackled", or comparable adjectives, all before the JDK was GPLed a decade ago. Do you have a specific source on this?

Many reasons:

* Initial projects started in Java by various contributors

* Large open community of Java which contributes back

* Wide acceptance of Java being cross-platform

* Today Apache is the de-facto home for any donated code from a corporate

My impression is that Apache favors application level networking and infrastructural projects and Java affords a fairly large tech sweet spot for creating such projects.

Heh, not a completely well defined question.

Java-centric relative to C#?

Possibly because a less FOSS friendly vibe among C#ers previously.

Possibly because the relative stagnation language-feature-wise of Java fostered a culture of frameworks and libraries.


What do you mean by "dynamically typed garbage" ?

From what I read Java has one of the least good implementations of static typing. Most effort for the least gain. (From when I have asked about the benefits of static typing on here).

That's a sentiment I would disagree with. Surely, there are better static type checkers out there, but all of them have compromises.

Many users of the language Java have trouble understanding the complications of proxies, generics and reflection. Unless one is well-versed in these technologies, one might be tempted to blame the type-checker.

Frameworks such as Spring or Hibernate, perturb the purity of the type-system. This creates complexity in larger projects, and again the users would blame the language, instead of the framework.

There are many valid complaints on Java's type system, such as its verbosity or type inference. Scala seems to solve the most pressing issues with these.

I'd say Objective-C's type-system is much less powerful.

I think a more interesting comparison is modern languages like Haskell and OCaml, which have much more powerful type systems than Java.

The point that Spring and Hibernate undermine the type system is valid, but why do they undermine the type system? Because it's inadequate for cleanly expressing as common a problem as ORM. If you need to resort to dependency injection and more-or-less invisible abstraction layers to write readable code, your type system has failed you.

Consider by contrast the ActiveRecord model from Rails. It's exquisitely simple and understandable, and does an enormous amount of work for you in very little code, while frankly being no more problematic or dangerous than Spring and Hibernate. Of course, this is in large part due to Ruby metaprogramming power that springs from dynamic typing, but I don't think that means dynamic typing is superior. Clearly, it's not protecting you from a lot of basic problems that a static type check can catch.

So how do you get a type system that provides the benefits of static typing, without getting in the way of clean code and leading to ugly workarounds? How do you make strong typing help rather than hurt on the small scale as well as the large?

I don't know yet, because I'm no expert on more modern languages. But Java isn't managing it.

Hibernate doesn't require dependency injection. I'm not a big fan of DI either, but I'm not sure what it's got to do with type systems either.

The Java type system is quite old. It's doing some interesting new things anyway like the pluggable type systems via the Checker framework, but nobody is going to claim Java has a cooler type system than Haskell.

If you look at Kotlin, it makes some small but much needed upgrades to the Java type system, like real properties, traits, delegates, function types, integrated nullability and flow sensitive typing, better generics etc. It doesn't go as far as Haskell though.

I don't think it's fair to compare Java with Haskell or OCaml since they don't share the same objectives. But even if you did compare them, you should try to match Java's subtyping with Haskell's more-or-less dependent types. Java's type system is apparently strong enough to express co- and contra-variance, which is quite an accomplishment.

Talking about ActiveRecord: I've come to understand this system has been the focal-point of insecurity by design issues (http://railspikes.com/2008/9/22/is-your-rails-application-sa...)

To answer your question, I'd strongly suggest to have a look at Scala. It's clean code, easy to read, fast and as expressive as ActiveRecord (plus the type-safety).

Sorry - bad mood.

I thought your comment was hilarious and a good explanation at the same time :)

http://projects.apache.org/indexes/language.html for anyone curious to the actual language split.

The numbers themselves:

    Java 213
    C 19
    Python 16
    C++ 12
    JavaScript 10
    C# 9
    Perl 9
    Scala 9
    PHP 7
I think the crucial statistic would be that there's >10 times more Java projects than the next-most-popular language.

Do things like Hadoop count as one or several projects?

Also some Apache Java libraries (several of which are used in Android)

I just counted the projects on the page provided. That said, the Java culture does tend to encourage hypermodularisation so what would be one project in another language could turn into several in Java.

Some of them are a bit odd:


Others have really, really buzzwordy descriptions:


> the Java culture does tend to encourage hypermodularisation so what would be one project in another language could turn into several in Java.

I guess it depends what you're comparing to, but I think of Java projects as more monolithic. Spring does dozens of different things compared to something like say Flask. And particularly in Haskell or Javascript you seem to need 100 tiny libraries to do anything.

What's odd about an implementation of the Chain of Command?

Design patterns are usually integrated into software projects, not themselves a software project.

It depends on what's been spun off as a separate top-level project.

Does it have its own subdomain on apache.org? Then it's a top-level project.

It's not uncommon for projects to start as subprojects and then get upgraded to the top level, and some of them came out of Hadoop. For example, Avro started out as a Hadoop subproject before becoming its own top-level project.


The question was : "Why are so many Apache projects Java-centric ?"

Because Apache is where code goes to die, and Java code is dying ;)

You just go ahead and keep thinking that. There might even be an award at the end of that rainbow you are chasing. A Darwin one, but still.

You on crack?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact