
Ask HN: Why not Java? - OmegaHN
I am building a web crawler to access data to be processed. All the code is fairly high level, so I am drawn to Python, but there are certain bits of it that require data manipulation that is much easier in a C-like language (arrays are a big part of it).<p>Java seems to fit this role very well. It is statically typed, object-oriented, and doesn't delve into memory. However, it seems to get a lot of hate (or, at least, dismissal) from many programming communities, so I am asking, why not Java? Why is it so horrible as a systems language above C? Is there any other language that fits this role in a better way?<p>I am in particular asking this because I have been banging my head against the Python syntax for awhile, but I am trying to expand what languages I can program in.
======
strlen
It's perfectly fine to use Java for this kind of software.

The hate against Java comes from using Java for application development: this
is largely due to the kinds of applications that are typically written in Java
(line of business software) and (this is the most important reason) accidental
complexity and low quality of APIs like Spring or J2EE.

Recipe for programming happyness is to use the right tool for the job:

* Python (or Ruby) for web application development, development tools, and "devops" scripting.

* C (or C++) for pieces that need deterministic performance[1], provide a "native" feeling user interface, or require control over memory layout.

Note: performance and efficiency are relative to what your throughput and
latency requirements are. Google's crawlers and indexers will remain in C++
for the foreseeable future, but (for example) crawlers for an intranet can get
away with being in Java (or Python for that matter).

* Java (or Scala, Haskell, OCaml, Go, Erlang, or one of the many Lisps) for "userland" systems programming. If the majority of the system fits under the last bullet point, use C++.

* Avoid JNI or Swig if you can. Use JSON + REST for cross-language RPC. If you need performance guarantees of a tight binary protocol use Thrift or Protocol Buffers. If you have to use JNI, consider using JNA first.

* No matter what language you use, stick to high quality libraries and tools. For Java, you'll absolutely want to use guava, Guice, and either Netty (or NIO.2 if you are using Java 7) or Jetty + Jersey + Jackson (for REST APIs).

Pick up either emacs and cscope, netbeans, Eclipse, or IntelliJ for navigating
a large Java codebase.

All Java build tools suck. Maven sucks less and is the de-facto standard in
the open source community. Twitter's "pants" is also worth looking at.

* Don't touch Spring with a 60-foot pole: in the mildest terms it's unequivocal and absolute garbage. Ditto for any other buzzword you may see in a job listing for an "enterprise" Java development job (with 20 years of experience required, naturally).

[1] Java performance can be quite high, but a JIT-ted and garbage collected
runtime implies a lack of determinism.

~~~
DVassallo
Can you give us some evidence why Spring is "unequivocal and absolute
garbage"?

~~~
strlen
Well, for starters the whole idea of programming in XML. That and gems such as
[http://static.springsource.org/spring/docs/2.5.x/api/org/spr...](http://static.springsource.org/spring/docs/2.5.x/api/org/springframework/aop/config/SimpleBeanFactoryAwareAspectInstanceFactory.html)
or
[http://static.springsource.org/spring/docs/2.5.x/api/org/spr...](http://static.springsource.org/spring/docs/2.5.x/api/org/springframework/aop/framework/AbstractSingletonProxyFactoryBean.html)

Of course I can't a priori prove that Spring is garbage, much like I can't a
priori prove that it's better to be healthy and rich than to be poor and sick.
It is a judgement call, but a judgement call that I believe I'm qualified to
make, having worked with a large Spring codebase for 2.5 years.

~~~
samspot
You can ditch the xml almost entirely in spring 3, which I've been using for 2
years now. All you need is 50-100 boilerplate lines and the rest is
annotations. I agree that 2.x was xml hell, but it is worlds better now.

~~~
ricardobeat
> All you need is 50-100 boilerplate lines

That answers "Why not Java?" perfectly ;)

~~~
prpatel
Java is a verbose language, but it's still awesome for a large category of
projects. Are you a rails guy? how about those 50-500 lines of boilerplate
code in the config files? Same thing as the Spring config files. Does that
lead me to say "Why not Rails?" No it doesn't!

Saying it politely as possible: don't be a hater.

------
gojomo
Nothing's wrong with Java. Commercial and research-quality crawlers of tens of
billions of web resources have been written in Java for over a decade. Its
threading/concurrency support and extensive well-optimized libraries make it
easier for you to make your code fast over large datasets... if you're good at
Java. (If you're not, there are plenty of ways to sabotage yourself.)

But, Java's a bit verbose, has gaps in concise support for higher-level
constructs, and sometimes the static typing gets in the way. So if you don't
find those parts helpful -- some do -- and think your performance targets can
be met with other later optimizations/design-choices/selective-
reimplementations, stick with whatever more concise language you're good at.

Or, use any of the more concise languages available on the JVM allowing
intermixing of the occasional Java facility, like Jython, JRuby, Groovy,
Javascript, Scala, Clojure, and others.

(If efficiently handling massive numbers of concurrent net/IO streams is a
priority, the recent JVM-based project vert.x may be of interest. I haven't
used it for anything but toy tests, but it seems to combine some of the best-
practices for maximum JVM IO throughput with a somewhat higher-level-language-
agnostic top layer well-suited for servers/proxies/crawlers.)

~~~
marbu
I agree, java is completelly ok for implementing a crawler. For example the
well known mercator crawler was written in java and it's authors stated:

Although our use of Java as an implementation language was somewhat
controversial when we be- gan the project, we have not regretted the choice.
Java’s combination of features — including threads, garbage collection,
objects, and exceptions — made our implementation easier and more elegant.
More- over, when run under a high-quality Java runtime, Mercator’s performance
compares well to other web crawlers for which performance numbers have been
published.

source: [Mercator: A scalable, extensible web crawler
(1999)]([http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.5...](http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.5202))

------
Derbasti
In my experience many Java programmers don't really "program" Java. They are
more like "expert Eclipse users" and Eclipse happens to output Java. This
style of development makes heavy use of wizards and those Eclipse refactoring
tools.

This probably is a consequence of the verbosity of Java-the-language, which
made heavy tooling support a necessity. And then Eclipse, which provides one
of the tightest language integration with Java of any IDE ever.

The sad thing is that this is not really the fault of Java-the-language or
Eclipse. It did spawn a whole caste of very mediocre programmers and libraries
though, which can make for a very unpleasant culture.

Used correctly, Java can be a great tool, though.

~~~
beothorn
The verbosity of java allows an IDE like eclipse to exist. I agree that's very
hard to program java without a powerful IDE, but there are a lot of things
that you can do only with a static typed language and a IDE like eclipse. It's
a fair trade. Could you ellaborate on the link between IDE dependency and the
spawn a whole caste of very mediocre programmers? (english is not my first
language, so sorry if anything I wrote sounds rude)

~~~
Derbasti
Sure, I can elaborate.

The idea is that as a programmer, you have to have an intimate understanding
of what is going on in order to make the machine do your bidding quickly and
correctly.

But that mediocre Eclipse user I caricaturized does not have that
understanding. He certainly knows how to get the job done for a certain set of
tasks, but he does not know the details of how this is happening. Thus, he
creates programs that follow "best practices", "conventions", "design
patterns" and lots of automatically created wizard-boilerplate.

That might not be "bad code" mind you, but it almost certainly is not "great
code", either. Thus, mediocre. And then these people create libraries that are
mediocre and try to use only libraries that they can understand and that are
hence mediocre. A culture emerges that is very consistent, but also very
mediocre.

~~~
beothorn
From what I see, knowledge comes from experience and study. An IDE doesn't
magically separates you from the need to know how stuff works. What you
describe is a unexperienced programmer, but those exist in any area, using or
not a IDE. From what I understood you, your see a problem with code generated
by a wizard or by an automated process and if that's your point I agree, but
that's not how eclipse is used. Besides that, I don't think there is a link
between bad code and IDE. You can't look at a bad code and say..hmm this code
probably was written in eclipse, or at a good code and say that it was written
with vi, because this connection does not exist.

~~~
Derbasti
I agree that an IDE does not separate from the need to know how stuff works.
But I certainly know a few programmers who do not dare to think beyond what
their IDE allows them to do. They use built-in wizards and refactorings, but
they do not seek solutions that are not easily expressible with those.

Hence, their code is factored just the way the IDE would, even if there are
better alternatives available. This is the kind of mediocrity I am talking
about. (And by the way, you could go much worse than Eclipse at that)

I actually had quite a few discussions where the main misunderstanding was
that we used different tools and thus thought of different things as "easy" or
"natural". In one case, one developer argued that it would be a good idea to
create a whole bunch of classes to encapsulate a problem space. However,
creating all these classes really would not have been necessary at all, he
could have accomplished the same goal with a much simpler list of functions.
Thus, we ended up with a HUGE file containing some several dozen classes that
no-one but him could navigate, because it was factored just right for his
programming editor but was all but unusable for anyone else.

One developer even told me that he did not know how to write a correct if
statement in C because his editor had a template for that. What my argument
boils down to is that this kind of behavior is bad, and I have seen it
becoming a sad kind of sub-culture in Eclipse/Java land.

That said, I completely agree that if used correctly, there is nothing
inherently wrong with Eclipse or Java per se.

~~~
Evbn
Refactoring can't create functionality. What you are saying is not compatible
with the theory if entropy. Eclipse is a high level language running on the
JVM, and not a bad one.

------
slurgfest
I am puzzled at how arrays are hard to use in Python? I cannot understand how
you could be 'banging your head' against Python's array syntax unless you are
just new to Python.

If you want to use Java (e.g.: you know it already and don't like learning
other things), who cares? Why is this an issue where you have to challenge
other people's opinions of Java? Use it if you want to.

------
rbanffy
You can always do the speed-critical parts in C and link that from your Python
code. Or, if your analysis is something already done, use a library already
written in C (such as NumPy).

Another approach could be Jython (or any other JVM language closer to the
desired level of abstraction) and Java.

I don't have much love for Java the language. It's not much easier to program
than with C, isn't faster and is very verbose. Still, what you are doing looks
like a good match for it. And all the respect I don't have for the language, I
have for the JVM.

I wouldn't use if for web app development as there are much more productive
options around.

~~~
samspot
I noticed a lot of criticism of Java's verbosity in the comments and I'm a bit
curious what people are referring to? I work primarily in Java, but also do
quite a bit of Javascript and Perl, and I don't notice Java being especially
verbose. Maybe internally I'm giving it a break because it isn't a scripting
language? I'm honestly curious to see what you guys think.

~~~
orangecat
The two areas that most frequently annoy me are processing collections and
(lack of) first-class functions.

Java:

    
    
      List<String> firstNames = new ArrayList<String>();
      for(Person p : people) {
        firstNames.add(p.getFirstName());
      }
    
      addCallback(new Runnable() {
        public void run() {
          doSomething();
        }
      });
    

Python:

    
    
      first_names = [p.first_name for p in people]
    
      add_callback(do_something)
    

Scala:

    
    
      val firstNames = people.map((p) => p.firstName);
    
      addCallback(() => doSomething());
      
    

The Python and Scala versions do exactly what they say, while the Java code
has a bunch of boilerplate that you have to mentally filter out before you can
understand what it's doing. And the Scala code is fully typesafe; the compiler
infers types rather than making you continually repeat them.

~~~
exelib
This is, because you can't Java. How you will find out, where you use

    
    
       first_names = [p.first_name for p in people]
    

? More verbose, but right way in Java is like (you can do this better with
enums and/or guava, it's just example):

    
    
        class PersonTransformer implements Transformer {
            public Object transform(Object o) {
                return ((Person)o).getFirstName();
            }
        }
    

and then:

    
    
        Collection<String> firstNames = CollectionUtils.collect(people, new PersonTransformer());
    

You can reuse it and, more important, you can search for it. Same for second
example.

~~~
rbanffy
Why would you want to reuse a construction so trivial? Why would you want to
search for it?

A couple days back I commented Java, by making some things harder than needed,
induces programmers to over-engineer and build things for needs they don't
have and to think that's perfectly normal. Think about what you just wrote.

~~~
exelib
Because it's DRY-Principle. It's doesn't matter trivial or not. If you
distribute you 'trivial' constructions over whole application, you doesn't
have any chance to have consistent, robust processing. If you collect/group
things logically, you can easy find out where you (re)use constructions and
what you can break, if you change it.

It's not over-engineering, it's just how to deal with >300 People and >6 years
projects and not to have "design dead software".

~~~
rbanffy
It's just a list comprehension. Do you imply I should use a function instead?
Why not use a very nice syntax feature every Python developer can understand?

Because if you do, I'd advise you not to add integers with the "+" operator,
but, instead, build a class with various add methods for different types of
arguments or, better yet, build add methods into every class you define so
that you can better search for them. This approach would allow you to add
things that aren't integers or even not the same type on both sides of the
operator.

~~~
exelib
I give up. You're like a little kid. Codesize or verbosity doesn't matter.
Quality is a keyword and it resulting in sustainable pace. If you doesn't
understand this, nobody can help you. Your arguments are like as "Why I should
use TDD, if I can write code directly".

~~~
rbanffy
You call me a kid because you claim using a single line syntactic feature of a
language isn't maintainable. It's like claiming for loops are unmaintainable
and that we should wrap them inside methods.

It's not people hating Java: it's you afraid of looking at other tools.

------
rockyj
Java is good, make no mistakes about it. It offers you find grained control in
almost every aspect of programming (e.g. concurrency). However it is the same
freedom that allows developers to make mistakes. For example -

One can write concurrent systems in Java without understanding concurrency.
Languages like Scala and Clojure will give you some freedom but will also
enforce certain design principles which will save you.

Similarly for web development, there are scores of frameworks in the Java
world, and you can mess it up easily. Rails / Django on the other hand will
provide one good, solid way to do web programming.

Finally, Java is showing it's age. The need to write large files of XML to
configure things and the lack of ability to treat functions as objects put
developers off. Some things are being addressed by Oracle but will take time.

------
btilly
I am curious about what, specifically, you find easier to do in Java syntax
than Python syntax.

Seriously, there is a fairly direct translation from any Java you might want
to write to completely equivalent Python. Sure, Python offers more complex
techniques such as list comprehensions and iterators. But you don't need to
use them. You can just write Java-like Python.

~~~
fusiongyro
Interfaces.

~~~
pmiller2
Abstract base classes. See <http://www.doughellmann.com/PyMOTW/abc/>

~~~
fusiongyro
Absent static type checking and manifest typing, abstract base classes do much
less than Java interfaces.

~~~
btilly
Ah yes, the age-old static typing vs duck typing argument.

Sure, in theory static typing can catch bugs. But doing it like Java does it
is a lot of work per _real_ bug actually caught.

~~~
fusiongyro
Whether or not that's true, it is a concrete difference between Python and
Java. Python is not a superset of Java.

~~~
btilly
I never claimed that the two were the same language. My claim was that you can
take anything written in Java and pretty much directly translate it to Python.

The fact that there are things Java will flag as errors that a Python
translation does not, does not change this fact.

~~~
fusiongyro
As I said, you seem to be claiming that Python is a superset of Java, not
equivalent to it, and that claim is manifestly false. Errors that are detected
in one place and not in another are a manifest difference.

Also, Java threads and anonymous classes do not translate directly to Python.

~~~
btilly
Python has threads.

In Python classes are first class objects, and you can easily do anything you
could do with Java anonymous classes on the fly. Furthermore Java anonymous
classes are usually used as a verbose replacement for a lack of closures. But
in Python you can create closures and pass them around. (You do have to do
some juggling to mutate variables, but 1 element arrays are only a slight pain
to work with.)

~~~
fusiongyro
Yes, but again, there is no Python syntax for anonymous classes. That you can
"easily do" something a different way is not proof that two things are the
same, it's proof that they are in fact different. Which again, is my point.
That Python has other, different features that can be used for other different
effects just makes it that much more different.

Python has green threads, but if you want concurrency, you have to resort to
the process module. This is not a limitation Java's threading shares.

I'm not arguing that Python is bad or inadequate, just that it is a
fundamentally different thing and should not be viewed as a superset of Java.

~~~
btilly
My claim is that it is fairly easy to translate any Java program into an
equivalent Python one. That some things need different constructs is not
evidence against this claim.

~~~
fusiongyro
Your claim is untrue, and that different constructs are needed is evidence. It
is often easy, but that isn't guaranteed.

------
pacala
> Why is it so horrible as a systems language above C?

* First class functions (interfaces with one method) plus garbage collector eventually encourage a functional programming style, with lots of little objects created on the heap. Alas, the per-object memory overhead of popular Java implementations is horrendous.

* Strong emphasis on using threads for concurrency. Alas, in practice, threads are incredibly large memory hogs.

* Verbosity. While it is possible to write clean composable code in Java, it is also remarkably verbose. After a while, this gets old and people take all shortcuts they can to limit verbosity. Which is a very bad idea. To quote an esteemed colleague, "I never took a shortcut I didn't regretted it later". Can we have our lambdas yet, pretty please?

~~~
spartango
* If you are using the JVM's generational + concurrent garbage collectors, generally the hoards of little objects disappear without hiccups or leaving much of a footprint.

* My beef with threads for concurrency revolves not around memory footprint (can you substantiate threads as "memory hogs"?), but instead around the necessity to be mindful of resource sharing. Yes, the JDK gives you lots of useful tools in this quest, but it's still not all that difficult to end up with a deadlocked app.

------
phao
There is a hate against Java, also against C#, C++, PHP (which I hate), C, and
pretty much any other mainstream language.

Notice, though, that competent people have done great jobs using these
languages. So you have some choices. Two of them are: wonder why people bash
Java or go do something useful with it. I suggest you do the second.

The key to using programming languages is in trying to use the one which will
help you the most, or get in your way the least. Sort of "the right tool for
the job". Idk what jobs java is good at. If you found out that it's good for
your project, then use it.

Take a look a this article: <http://prog21.dadgum.com/143.html>

~~~
tikhonj
Just because you _can_ do something with a language does not mean you should.
Sure, lots of people make stuff with poorer languages. Lots of people also
never write tests, skimp on documentation and copy and paste half their code.
It works for them! The best choice would be to think about the problems people
have with Java and find a good solution for them (e.g. Scala).

People use all sorts of distinctly sub-optimal tools and technologies for
various reasons unrelated to those technologies' merits. One of the biggest
reasons is familiarity--many people do not like learning radically new things
and so stick to what they know. Popularity does not imbue any sort of quality
to programming languages any more than it does to anything else like music.
There's a reason that trained musicians respect classical music--even if
they're making pop--and there's a reason programming language people respect
ML.

In short: just because many people manage to use Java does not mean it is in
any sense optimal or even good.

Also, I think the oft-repeated "right tool for the right job" bromide about
programming languages is deeply flawed. Programming languages overlap _far_
more than most tools--they are all _general-purpose_ programming languages,
after all. The difference between a hammer and a screwdriver is far greater
than even the difference between Java and Haskell. Choosing a programming
language is more like choosing the best power drill--they overlap almost
completely and can do the same jobs. It's quite plausible that some are almost
always better than others, but that you could ultimately do the job with
either. It will just be more difficult with one than the other.

Also, even if languages did differ significantly, there is no guarantee that
any particular language has anything it's best at--it can be strictly worse
than other languages for every conceivable use.

Finally, I think that surrendering to familiarity and choosing something you
know over something you need to learn is rarely a good choice. Sure, if you
have a hard deadline, it might be a reasonable compromise. But learning a
language is essentially a constant expense where its affect on your
productivity is linear to how much you program. Just because it might take
more effort to get started with Scala does not mean you should immediately
consign yourself to the drag on productivity that is Java.

You should be learning something new all the time, and programming languages
are some of the most important things to learn in CS--they affect not only
what you write but how you think. So strive to find the best one you can
rather than settling for something that works--in this day and age, expecting
your language to be somewhat usable is too low a bar to set.

~~~
phao
> In short: just because many people manage to use Java does not mean it is in
> any sense optimal or even good.

Nobody is saying the contrary. I suggested him one of the many options he has
(from which I mentioned 2), which is using java. And said that using java is
better than going around looking for why people bash java. Which happens to be
true.

Consider how he would choose a better language though.

It's surprising how cumbersome it may get to write a simple loop in a more
"elaborate" language.

> Also, I think the oft-repeated "right tool for the right job" bromide about
> programming languages is deeply flawed. Programming languages overlap far
> more than most tools--they are all general-purpose programming languages,
> after all. The difference between a hammer and a screwdriver is far greater
> than even the difference between Java and Haskell. Choosing a programming
> language is more like choosing the best power drill--they overlap almost
> completely and can do the same jobs. It's quite plausible that some are
> almost always better than others, but that you could ultimately do the job
> with either. It will just be more difficult with one than the other.

There is more to choosing a language than "the language". Here are some
reasons why he may wanna use java:

1) He has books about java, but not about anything else.

2) His co-workers use java.

3) He needs to work with the JVM.

4) He really wants to use java.

5) He has a bunch of minor reasons to use java.

6) Libraries, Libraries, Libraries.

7) Development tools.

8) Good implementations.

9) There is a standard for the language.

10) There is a huge community around the language.

11) He doesns't really have a choice. His boss wants him to use java.

12) He's in academia and most people in his institute uses java.

(there are more!)

The "right tool for the job" sure is true. And very much true. It is
complicated to select it though. That's why I didn't try to tell him HOW to
select that tool. I really think he'll be better off with java if he saw java
is good enough (another point is that a "known good enough" is usually better
than a "unknown perfect").

We choose "a language" but rarely because of "the language". As you said
yourself, a lot of times, these languages are general purpose languages and
overlap a lot.

Have you ever noticed that a lot of the languages used today are really tied
up to their implementations/running systems? C and UNIX, Java and JVM, C# and
CLR, Python and CPython, PHP, Ruby, Objective-C (this one is a really good
example), JavaScript, etc. This was true in the past too, think of delphi, vb6
and windows, lisp and the lisp machines, fortran and cobol and the IBM
systems.

A lot of what was "language design" in the past, is libraries and
implementation design today. Choosing a language is more than looking at "the
language", its syntax, semantics, idioms, patterns, etc.

Some other important stuff. Like building a GUI is important, communicating
over a network, accessing files, dealing with the data base, doing graphics
programming. You can either use, for example, c# and have lots of these from
.Net CLR with little effort, or pick OCaml, for example, and have it, but
having to do a lot of work that you'd not have to do in c#. Even if you port
OCaml to run on the CLR, it's unlikely to be as much CLR friendly than C#.

But, also, there are "local" reasons to choosing a language. These are stuff
you or I don't know because it's specific to him or his group of people.

> You should be learning something new all the time, and programming languages
> are some of the most important things to learn in CS--they affect not only
> what you write but how you think. So strive to find the best one you can
> rather than settling for something that works--in this day and age,
> expecting your language to be somewhat usable is too low a bar to set.

Learning new stuff is good advice, generally. But programming languages are
one of the most irrelevant things in CS. In the long run, they are irrelevant.

"We" already teach/learn a lot of programming concepts and techniques without
specific programming languages, but with concepts shared by a class of
programming languages (as you even said it, lots of them overlap). Concepts
and techniques that, in the past, were highly specific of particular
programming languages.

Languages get obsolete. Those which remain do so usually because of practical
matters (like C, or Java, or C++).

The idea, sometimes new, a programming language may bring is important though.
The language itself is not. For example, closures are really catching on now,
but it was invented much in the past, and first implemented in languages that
people do not use much (I guess it was scheme, but I am not asserting it)

And, so I can end this reply...

I'm sorry for arrogance, but you should not lose the ability to separate
interesting from the practical, which I got the impression you cannot do very
well. Some things are both (I guess haskell is one of these), but it's not
usually the case.

What some people (fortunately, it doesn't seem that it's most of them) don't
understand is that lots of programming do not require the sort of elaborate
constructs and idioms that, for example, scheme allows you to use. I once
talked to a guy who did lots of "business" software. I mentioned scheme to
him. Told him lots of cool stuff about recursion, clojures and macros, told
him a little about lambda calculus; showing how you could do it in scheme. And
he told me "It's cool, but it's also like people don't know what is useful
anymore.". Well, that got me thinking back then.

You can argue all you want if he's right or not; if the software he writes is
difficult or not, but it's not that he didn't see the advantage of those
things. But turns out that most repetitions do not require and are not good
with recursion (they just loop over a collection or a range of values; a
for+iterators or numbers would usually do), most functions do not return other
functions (not that you couldn't do it that way, but it's usually the case
that your program is simpler if you don't), and minimalism is not really that
much convenient in writing software for $$ (many people seem to reach this
conclusion). Lots of applications are still single threaded, and runs in only
one process. Immutability is a lot more interesting in theory than in practice
for a large class of programs. Static typing still catches a lot of problems,
and people usually do not bother that much about having to write down the
types. Beautiful techniques for managing large programs are very interesting
... for large programs. It turns out that lots of programs are not that large.
And the list goes on and on.

------
orangecat
_Why is it so horrible as a systems language above C?_

It's not "horrible", it just has many slight-to-moderate deficiencies and
annoyances that make development more work than it should be.

 _Is there any other language that fits this role in a better way?_

Scala is strictly superior when used as a "better Java". (If you go deep into
its functional capabilities you get a different set of tradeoffs). C# is
better as a language, but then you're tied to .NET.

Really we'd need to know more details of what you're doing and why you believe
Python may not work. Are you concerned about performance, or do you need to do
things that Python doesn't have convenient APIs for?

------
nostromo
I've written two crawlers in Java and found it quite well-suited.

I think most people on HN who hate Java are talking about creating websites,
and for good reason. Back in the bad ol' days, people would use Java
frameworks like Struts for web apps, and it was quite painful.

For my latest project I'm using Play Framework for front-end Java, and it's
quite delightful.

~~~
boyter
Agreed. I am in the process of porting a site from PHP to Play and I am loving
it.

I am a fan of lots of languages, but recently for anything I am supporting for
a long period of time I want static typing to catch massive re-factoring
issues. Id rather use C# then Java personally, but Play is an amazing Java
framework to use.

------
freeslave
Nothing wrong with using Java and there is something to be said for using the
language you are most productive in. But if you are thinking of building a web
crawler in Java, I would recommend taking a look at the Heritrix project:
<https://webarchive.jira.com/wiki/display/Heritrix/Heritrix> It's robust, open
source and easily extensible. Might be easier to write a custom module for it
than to roll your own web crawler.

------
samspot
The best reason to use Java is the enormous ecosystem of libraries and
resources.

The best reason to AVOID using Java is the huge demand for Java programmers
and the low supply. At my job we can barely find applicants with Java so we
end up hiring .NET people and converting them.

~~~
beothorn
Just curious, where do you live?

~~~
samspot
I live in Phoenix, Arizona, and i've heard my old market (Raleigh, NC) is
experiencing similar issues.

------
bbayer
I personnally dont like Java because of API complexity and this is why ended
up with python. I have implemented many crawlers by using Scrapy framework and
I believe it speed up development. We have crawled millions of pages without
any problem.

Python is very powerful in terms of string manipulation because it has very
good language constructs (like slice syntax) which makes development easy. At
the beginning it might be a little bit confusing but once you mastered it you
really feel power.

Twisted like frameworks also makes good job at this point. It is well-
designed, asynchronus and it suits well for multi-tier network applications.

------
jfb
It's a lousy language (IMO) with some excellent libraries and very fast
compilers. If you're comfortable with the limitations of the language, and
you're having trouble with Python, it's worth a shot, I guess.

------
Mikera
Java is a great platform to build on - and the sweet spot is definitely for
server side applications like this.

You can safely ignore the people who bash Java - they are generally clueless.
The Java language is perfectly fine: high performance, statically typed, OOP,
relatively simple and maintainable. It may not offer the most concise code and
it may not have all the "trendy" language syntax features but guess what -
that actually doesn't matter much in the real world (i.e. outside the realm of
language designers and fanboys). If saving a few characters of typing is your
major concern when choosing a language, you have much bigger problems.

But the real strength in Java is not the language but rather the overall
platform - the combination of the JVM (which is an amazing high performance
feat of engineering), the library ecosystem (which is the best overall for any
language), the tools (great IDEs, Maven, a host of other developer-focused
tools), the fact that the OpenJDK itself and most of the libraries are open
source and the portability (compiled JVM code is extremely portable, and
importantly doesn't need a recompile unlike some other so-called "cross-
platform" languages)

So overall you can't really go wrong with choosing Java for server side
applications. Although I would also give Clojure or Scala a look - if you are
after "powerful" languages then these two are pretty amazing and you still get
all the benefits of being on the Java platform.

~~~
tikhonj
I don't think Java-the-language is "perfectly fine". It's extremely verbose,
the type system is mediocre at best, it's actively hostile to functional
programming (and lets not even consider anything else like logic
programming!), the syntax is extremely inflexible.

Your tirade against "fanboys" is nothing but a straw man--the point of having
more expressive, concise code is not "saving a few characters of typing" but
making your program easier to write and easier to read (and, therefore, easier
to maintain). Sure, you can get stuff done with Java, but you can generally
get it done faster and better with other languages.

High-level features absolutely matter in the real world--they allow you to
write code faster and give you more confidence that it is correct. Code
written at a higher level is not only shorter but also more declarative and
clearer. The idea that only "language designers and fanboys" care about having
these features in their languages is patently absurd and rather arrogant.

To me, it seems Java is a compromise--it ignores decades of research and
progress in programming language design in favor of catering to people who
knew C++ and didn't want to learn something radically different. Thanks to
being widely taught, it is now essentially a lowest common denominator:
practically any programmer you meet will have at least learned the basics of
Java at some point. But I think this is exactly the sort of compromise any
good programmers should _not_ take!

Now, the JVM is, admittedly, a good platform. It has some glaring weaknesses--
poor support for functional programming, poor interoperation with native code,
long start-up time and so on--but, on the whole, is very strong. Happily, you
aren't bound to Java if you want to be on the JVM and you can use some of the
great alternatives like Scala. But this does nothing to defend Java-the-
language--having a good implementation does not make a language well-designed
or particularly usable.

------
jvvlimme
Java is suited and even powers some powerful crawlers like Heritrix
(archive.org) and Nutch (Apache foundation).

That being said, it doesn't really matter what language you write your crawler
in: its performance will much sooner be influenced by other aspects (network
latency, storage, etc) than the language you choose.

So pick the language you're most comfortable with for crawling and offload the
data processing to a lower level language that is better sooted for that task.

------
Tichy
It's just that Java is very verbose, and actually I found it particularly
horrible for data driven applications (by this I mean apps whose behavior is
determined by data/config files, not "Big Data" - I have no experience with
the latter). For complex data types you always need to create complex class
hierarchies. In other languages you could just write

webInfo = {url: "bla.bla", title: "bla die blub", links: ["link1", "link2"]}

Notice that webInfo contains two different types, Strings and Arrays. In Java
arrays or hashes you can not easily mix types - you'll end up just putting
objects everywhere, then be forced to litter the code with type casts. Or you
create the unwieldly class hierarchy. That is my prediction, anyway - I am too
lazy to come up with a good example :-(

You can also not simply write something like the hash above. The nearest you
can get is if you have created that class hierarchy with suitable
constructors, you could instantiate that in one go. At least that is my memory
- I have now avoided it for so long that I am not even sure how to instantiate
an Array or a Hash with data on the fly anymore.

I think instantiating an array with data goes something like

links = new String[]{"bla", "blub"}, and there is nothing like that for Hashes
- you are stuck with

info = new HashMap()<String, Object>;//generics are particularly ugly and
annoying

info.put("links", new String[]{"bla", "blub"});

info.put("title", "some stupid web site");

info.put("url", "undisclosed");

And so on - a far cry from the example above. (Note the Java syntax is
probably wrong, created from memory - but it is something like that).

Even if you went through the mind numbing work of creating appropriate
classes, you'd be stuck with

info = new WebInfo(title, url, new String[]{link1, link2,...});

And that is just for two different types, and notice that there is no way to
see what the name of the parameters of the WebInfo constructor actually are
from that snippet of code.

title: someTitle

is actually much more readable because you can instantly see that someTitle is
supposed to be a title.

Also if you want to use NoSQL, I suspect converting java classes to JSON could
be a pita, too.

~~~
beothorn
link can be a single String? I can see here new "info = new WebInfo(title,
url, new String[]{link1, link2,...});" that it can't. This is the kind of
thing you get with a typed object. Also, instead of Strings for title, url,
etc you could use tiny types. Yes, it is a lot more verbose, but it comes with
a advantage (compile time erros over runtime erors). TinyType also documents
what you should pass for the constructor. If you need some preparation to get
a url (I don't know, like, finding it on some txt), typing would lead you to
do it.

~~~
Tichy
If you enjoy that kind of programming, sure, go for it. It certainly is
possible to program a crawler in Java. Personally I can't go back since I
experienced more succinct languages.

Also beware of pseudo work: I suspect Java is partly popular because it makes
you feel productive. You are constantly busy creating Tiny Types (as you call
them), generating code in Eclipse (cool: one click and you have 50 lines of
code in your class) and so on. It is all just pseudo work that accomplishes
nothing, but maybe feeling productive is worth it.

------
ljw1001
There are some good, reasoned comments here. Java suffered through some
unfortunate 'best practices' that tarnished it's reputation. Building is uh,
suboptimal, as some have pointed out, but if you're working alone just keep it
simple and it shouldn't be a problem.

Unless you're building something that needs to be (1) highly dynamic (like a
web-based spreadsheet where you don't know the column types til run-time, or
(2) true real-time software, you're probably better off using java. Some
libraries do suck as others wrote, but it's the volume of good libraries you
care about. In any case, I'd argue that in many alternate languages, the code
you're writing so quickly doesn't need to be written at all in java, because
there's a library for it.

Verboseness is a fact in Java, but a decent IDE shields you from that as well.
With Java it takes a little longer to get things done, but (in my experience)
you spend less time trying on performance, fixing problems in the underlying
tools or language, or just dealing with your own bugs and keeping things
running. Since most development is maintenance, you want to optimize for that.

------
mseepgood
> Is there any other language that fits this role

Go?

------
NTH
My main problem is that it has awful support for functional programming, which
I find to be a really helpful way of doing something like a web crawler, where
you're essentially describing a computation to parse some input. I would use
F#, because it offers powerful functional programming tools, is on .NET / VS
2012 (not sure if that's a pro or con for you), and has type inference (so you
get the benefits of static typing without the cost of writing out the types of
everything).

You should probably check existing web crawler solutions to see if you can
adapt them before rolling your own.

------
lelele
Java reputation suffers because of the association of such language with
corporate drones.

We may say that with the current crop of languages running on the JVM, Java is
a low-level language. It is to the JVM what C is to hardware. You avoid coding
in both when you have higher-level languages available which will make you
more productive.

But when you want to optimize performance on the JVM for specific chunks of
your application - without resorting to JVM bytecode of course - Java is the
right choice.

------
exelib
Java is good enough for all types of projects. People hate Java because...
they can't Java right. Personally, I very like Python and JavaScript and
features like multiple inheritance or prototyping, higher-order functions and
so on. But Java is better sufficient for projects which more complex as "Hello
world", because type safe (robust, compile-time feedback), excelent IDE
support and incredibly fast and allow fast development (if you can Java, TDD
and so on).

~~~
kaolinite
There are plenty of reasons other than people being unable to "Java right",
though I'm not saying that Java doesn't get plenty of hate out of ignorance.

A good example of when you can't use Java would be embedded projects or
projects where you need to squeeze even more performance out of your code.
Java is fast but it isn't the fastest.

Feel free to love your language of choice but I'd recommend not letting it get
in the way of common sense.

~~~
exelib
Java bytecode is very compact and used also in embedded systems. Java can be
predictable.

Can you show some benchmarks compare java w/ jit and other languages?

I love another language, but Java is better choice.

------
dotborg2
You always can use JS/Python/whatever in your java application as a scripting
language.

In such case like web crawler, the main issue with Java is the scalability or
rather lack of it. You need to code it yourself, but that's not any different
than other languages and platforms.

------
spullara
DropWizard is a great way to start Java project.

