Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why not Java?
52 points by OmegaHN on Aug 20, 2012 | hide | past | web | favorite | 109 comments
I am building a web crawler to access data to be processed. All the code is fairly high level, so I am drawn to Python, but there are certain bits of it that require data manipulation that is much easier in a C-like language (arrays are a big part of it).

Java seems to fit this role very well. It is statically typed, object-oriented, and doesn't delve into memory. However, it seems to get a lot of hate (or, at least, dismissal) from many programming communities, so I am asking, why not Java? Why is it so horrible as a systems language above C? Is there any other language that fits this role in a better way?

I am in particular asking this because I have been banging my head against the Python syntax for awhile, but I am trying to expand what languages I can program in.




It's perfectly fine to use Java for this kind of software.

The hate against Java comes from using Java for application development: this is largely due to the kinds of applications that are typically written in Java (line of business software) and (this is the most important reason) accidental complexity and low quality of APIs like Spring or J2EE.

Recipe for programming happyness is to use the right tool for the job:

* Python (or Ruby) for web application development, development tools, and "devops" scripting.

* C (or C++) for pieces that need deterministic performance[1], provide a "native" feeling user interface, or require control over memory layout.

Note: performance and efficiency are relative to what your throughput and latency requirements are. Google's crawlers and indexers will remain in C++ for the foreseeable future, but (for example) crawlers for an intranet can get away with being in Java (or Python for that matter).

* Java (or Scala, Haskell, OCaml, Go, Erlang, or one of the many Lisps) for "userland" systems programming. If the majority of the system fits under the last bullet point, use C++.

* Avoid JNI or Swig if you can. Use JSON + REST for cross-language RPC. If you need performance guarantees of a tight binary protocol use Thrift or Protocol Buffers. If you have to use JNI, consider using JNA first.

* No matter what language you use, stick to high quality libraries and tools. For Java, you'll absolutely want to use guava, Guice, and either Netty (or NIO.2 if you are using Java 7) or Jetty + Jersey + Jackson (for REST APIs).

Pick up either emacs and cscope, netbeans, Eclipse, or IntelliJ for navigating a large Java codebase.

All Java build tools suck. Maven sucks less and is the de-facto standard in the open source community. Twitter's "pants" is also worth looking at.

* Don't touch Spring with a 60-foot pole: in the mildest terms it's unequivocal and absolute garbage. Ditto for any other buzzword you may see in a job listing for an "enterprise" Java development job (with 20 years of experience required, naturally).

[1] Java performance can be quite high, but a JIT-ted and garbage collected runtime implies a lack of determinism.


This post reflects the conventional wisdom very accurately. It is informative and contains good advice.

However, I also think it is a biased caricature based on "common sense" that isn't all that well founded.

Most of these languages are entirely usable in areas far outside the prescribed areas you have given for them.

If you are working on a few domains like embedded or OS stuff, low-level graphics or signal processing - or you need to interact with a specific system that is pretty language-specific like Rails or iOS - then your options narrow a lot. A few tasks are just a little forced unless your level of comfort is very high (doing 1-minute shell script jobs in C++, for example).

But it would be very hard to overstate the degree of overlap, in 2012. It no longer usually makes sense to write things in ASM for speed, for example...

In the rare situations where your favorite higher-level language is somehow not good enough for a given project, it is rare that you cannot make a mongrel project which drops to another level just where necessary. If your language does not support this then it is broken in a generally important way.

If you are not really ready or willing to fill in gaps and just want to glue existing things together, that changes things slightly - then your primary consideration is not the language but the available libraries.

The major differences between languages are mostly matters of custom and ideology rather than niche suitability.


Can you give us some evidence why Spring is "unequivocal and absolute garbage"?


Well, for starters the whole idea of programming in XML. That and gems such as http://static.springsource.org/spring/docs/2.5.x/api/org/spr... or http://static.springsource.org/spring/docs/2.5.x/api/org/spr...

Of course I can't a priori prove that Spring is garbage, much like I can't a priori prove that it's better to be healthy and rich than to be poor and sick. It is a judgement call, but a judgement call that I believe I'm qualified to make, having worked with a large Spring codebase for 2.5 years.


You can ditch the xml almost entirely in spring 3, which I've been using for 2 years now. All you need is 50-100 boilerplate lines and the rest is annotations. I agree that 2.x was xml hell, but it is worlds better now.


I sense sarcasm, but 100 lines of xml in a 200,000 line web application is wonderful. And this xml is specifiying critical things like database connection and pool settings, transaction management, entity caching, and more. You are going to be configuring this stuff no matter what platform you use, and the simpler it is the better.


> All you need is 50-100 boilerplate lines

That answers "Why not Java?" perfectly ;)


Java is a verbose language, but it's still awesome for a large category of projects. Are you a rails guy? how about those 50-500 lines of boilerplate code in the config files? Same thing as the Spring config files. Does that lead me to say "Why not Rails?" No it doesn't!

Saying it politely as possible: don't be a hater.



The word "SimpleBeanFactoryAwareAspectInstanceFactory" reflects everything I hate about Java and it's intended approach to OOP.


Cool, you must hate Objective-C too. I've done initWithParamAAndParamBWithASideOfAnObsurdlyLongMethodName but that doesn't mean I hate Objective-C.


You doesn't use autocompletion? o_O But it's much much better as simple_bean and you have not clue what it do.


I've got an autocompletion feature up and running, but I'm usually not dependent from it. However, that wasn't my point. I wasn't complaining about long class names. That's just a bonus. I was complaining about the misuse/overuse of design patterns throughout the Java Standard Library as well as Swing.

See, design patterns are a thing that's nice to have but some of them are more like cheap hacks that have made it into an influential book rather than mandatory patterns. I'd go as far as claiming that some of these patterns are merely a way of hiding what's fundamentally broken about OOP in a language where it's the only paradigm and where it is strongly enforced - like Java.


I haven't used it for a few years, but I always found it funny that a project that was supposed to make Java application development easier and more agile than J2EE ended up being just as large, if not larger. Core Spring did a lot for dependency injection, which was a big shift in thought. However, for dependency management, I'd go with something like Guice today.


Spring was innovative in 2008. Now, Spring is overbloat, buggy (look at request-mapper) and have old, sensless integration to others modules, see spring-data for NoSQL, or try integrate last version of Velocity with last version of Spring.

And additionaly, all, what you can do with Spring, you can do with JEE.


I think collectively we should point out the specific Spring modules as opposed to say that "Spring is overbloat and buggy".

Some of Spring modules seem to be quite stable enough. Others, the newer modules, will take time to be more mature.

Spring's goal have always been to be the 'glue layer' of the Java standards. Of course, now they want to be the 'glue layer' of everything, including Spring-Data for NoSQL (Neo4J and co.).

Speaking of which, your last statement is partly correct if only Spring == Spring Core. There's no MVC (in the sense of ASP.NET MVC or Rails MVC) in JEE yet (yet because things might change in the future). JEE has 2-3 technology covering the "VC" options: JSP, Servlet, and JSF. None of these are similar to that of ASP.NET MVC or Rails MVC.


I pointed one module out - Spring-Velocity integration.

"Of course, now they want to be the 'glue layer' of everything, including Spring-Data for NoSQL (Neo4J and co.)."

Why do you need this? It's just Java and you can... just use it. Without glue.

"Speaking of which, your last statement is partly correct if only Spring == Spring Core. There's no MVC (in the sense of ASP.NET MVC or Rails MVC) in JEE yet (yet because things might change in the future). JEE has 2-3 technology covering the "VC" options: JSP, Servlet, and JSF. None of these are similar to that of ASP.NET MVC or Rails MVC."

What do you mean with MVC exactly? This is just buzz word. MVC Model 2 Architecture (and now you have Servlet 3) is good replacement for Spring MVC. JSR303 and JPA (or other) is good replacement for 'M' (and of cource, in Spring it's same way). We talk about Spring or Rails/ASP? And of course, here solutions in Play/Play2.

Really, you don't need Spring. It's 'Bug layer'.


You pointed one module, as an example that describes the whole Spring as one unified framework.

Spring consists of multiple modules that you absolutely _don't_ have to use. This is where, I think, you misunderstood Spring.

When it comes to the modules, i.e.: Spring Core, Spring Transaction, Spring-Data sub-modules, Spring eventually use the JDK and/or 3rd-party API (possibility, in the case of Neo4J etc). Yes, you can use JDBC, JTA, JPA, and Neo4J API directly. For sure. But Spring has always wanted to become an alternative to using them directly by providing more features and supposedly better programming experience.

This is what I meant by _glue_. You're absolutely correct: you can use those libraries directly without Spring. But if Spring modules provide me with better programming model, more features on top of barebone implementations, why would I not use Spring modules? This, again, something that you seem don't quite see from Spring.

Let's say MVC is a buzzword and start from there. If you look at ASP.NET MVC and Rails MVC, they provide a programming model where you have a request mapper/route that maps a request to a method in your controller.

MVC Model 2 architecture does not provide you with that. MVC Model 2 architecture provides you 1:1 mapping between a URL with Servlet. Which means if you do a simple CRUD a'la Resource, you can either have 1 Servlet per Resource that acts as a dispatcher for the CRUD operation via either query parameters or multiple Servlets. Again, if you choose to live with that, that's your choice.

Spring offers a better programming model than MVC Model 2 architecture and there are people out there that prefer it. Spring also offers WebFlow, a module on top of Spring-MVC that helps building Wizard or Shopping Chart that involves lifecycle/steps. Can you live without it? For sure. Roll your own.

That's one, second, in Rails MVC, you can set the content negotiation to send back either JSON, XML, or HTML. You can do this with JAX-RS but that is a separate "servlet" per-se. If you want your API to live in a different part of your systems, this is acceptable. But if you want all of the request comes from one entry-point, you have to work harder than that.

Calling Spring as a 'Bug Layer' reminds me of this article:

http://www.codinghorror.com/blog/2008/03/the-first-rule-of-p...


I can pick other module... for example key-value. Wich benefits I have with key-value module? I don't need "integration module" for this job. Compare http://static.springsource.org/spring-data/data-keyvalue/doc...

and

https://github.com/xetorthio/jedis/blob/master/src/test/java...

I don't see any benefit or better programming experience or more features on top.

Other modules? You mentioned:

Spring Security/Acegi - Java EE do the job also. Seamless integrated with EJB and other technologies.

Spring Transaction - also.

I mean I see no reason to use Spring or any glue.

"ASP.NET MVC and Rails MVC": this is not Spring vs. Java EE question. I like Django and Wicket and it's much better as Java EE and, of course, Spring.

Map to function or to class - doesn't matter. It's same. Spring is same MVCM2. Why I should prefer write tons of xml crap, if I can use powerful workflow engine like Activiti, AristaFlow or jBPM or others? :)


Yes, I was careful to say "J2EE" and not "JEE". WebBeans/CDI in Java EE 6 are quite decent. JSR 330 seems as if they just standardized on Guice. JSR 318 is quite good as well: I highly recommend using Jersey if you're building a REST API.


I wish HN had the ability to pin posts. This is great. Absolutely what I was going to write, so naturally I think it is brilliant ;)

The right tool for the job. Java has it's place and it just where strlen said it should be.


I've been highlighting stuff with Diggo these days...


Link to the comment as a bookmark.


Rigth tool for the right job is key here. I personally like to add Groovy to the devops list.


I think groovy is sensless language. You lost all Java benefits like checked exceptions, robust code... what is about debugging? And you is not faster developing with groovy. You developing faster if you ensure quality and not inline filters or other crap. If you like groovy, look at Python.


I've used it _alot_ especially for scripting. Really powerful support for quickly write dynamic code to run transformations, connect to db's, connect to mq's etc. It's easier to read than python (for me) and you can just drop a lib in groovy_home for support to whatever dbdriver (like oracle). Seriously give it a go, before you call it senseless. But you know, use what you know and what you can be effective with to solve a given problem.


You're wrong there. I use Java for most things but dive into Groovy when I want features like almost native XML handling (via XmlSlurper & MarkupBuilder). Having a joint Java + Groovy project is simple as it's all just bytecode to the IDE.


You ensure quality by writing automated tests, not by the language you pick to write software in.

And for the record Groovy is awesome. I used to develop in it alot, but now do more Rails and JavaScript, and I miss it's power and simplicity.


What does "userland" mean?


That's what normal applications are referred to as opposed to kernel-level code. So anything user-facing that does not need low-level system access.


Code that isn't part of the OS kernel.

http://en.wikipedia.org/wiki/User_space


Yes, but you can use Java (or Scala) where C++ was mentioned.


The rationale for the distinction is given: deterministic performance.


Spring is not garbage.


Why is it not garbage?


Nothing's wrong with Java. Commercial and research-quality crawlers of tens of billions of web resources have been written in Java for over a decade. Its threading/concurrency support and extensive well-optimized libraries make it easier for you to make your code fast over large datasets... if you're good at Java. (If you're not, there are plenty of ways to sabotage yourself.)

But, Java's a bit verbose, has gaps in concise support for higher-level constructs, and sometimes the static typing gets in the way. So if you don't find those parts helpful -- some do -- and think your performance targets can be met with other later optimizations/design-choices/selective-reimplementations, stick with whatever more concise language you're good at.

Or, use any of the more concise languages available on the JVM allowing intermixing of the occasional Java facility, like Jython, JRuby, Groovy, Javascript, Scala, Clojure, and others.

(If efficiently handling massive numbers of concurrent net/IO streams is a priority, the recent JVM-based project vert.x may be of interest. I haven't used it for anything but toy tests, but it seems to combine some of the best-practices for maximum JVM IO throughput with a somewhat higher-level-language-agnostic top layer well-suited for servers/proxies/crawlers.)


I agree, java is completelly ok for implementing a crawler. For example the well known mercator crawler was written in java and it's authors stated:

Although our use of Java as an implementation language was somewhat controversial when we be- gan the project, we have not regretted the choice. Java’s combination of features — including threads, garbage collection, objects, and exceptions — made our implementation easier and more elegant. More- over, when run under a high-quality Java runtime, Mercator’s performance compares well to other web crawlers for which performance numbers have been published.

source: [Mercator: A scalable, extensible web crawler (1999)](http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.5...)


In my experience many Java programmers don't really "program" Java. They are more like "expert Eclipse users" and Eclipse happens to output Java. This style of development makes heavy use of wizards and those Eclipse refactoring tools.

This probably is a consequence of the verbosity of Java-the-language, which made heavy tooling support a necessity. And then Eclipse, which provides one of the tightest language integration with Java of any IDE ever.

The sad thing is that this is not really the fault of Java-the-language or Eclipse. It did spawn a whole caste of very mediocre programmers and libraries though, which can make for a very unpleasant culture.

Used correctly, Java can be a great tool, though.


The verbosity of java allows an IDE like eclipse to exist. I agree that's very hard to program java without a powerful IDE, but there are a lot of things that you can do only with a static typed language and a IDE like eclipse. It's a fair trade. Could you ellaborate on the link between IDE dependency and the spawn a whole caste of very mediocre programmers? (english is not my first language, so sorry if anything I wrote sounds rude)


Sure, I can elaborate.

The idea is that as a programmer, you have to have an intimate understanding of what is going on in order to make the machine do your bidding quickly and correctly.

But that mediocre Eclipse user I caricaturized does not have that understanding. He certainly knows how to get the job done for a certain set of tasks, but he does not know the details of how this is happening. Thus, he creates programs that follow "best practices", "conventions", "design patterns" and lots of automatically created wizard-boilerplate.

That might not be "bad code" mind you, but it almost certainly is not "great code", either. Thus, mediocre. And then these people create libraries that are mediocre and try to use only libraries that they can understand and that are hence mediocre. A culture emerges that is very consistent, but also very mediocre.


From what I see, knowledge comes from experience and study. An IDE doesn't magically separates you from the need to know how stuff works. What you describe is a unexperienced programmer, but those exist in any area, using or not a IDE. From what I understood you, your see a problem with code generated by a wizard or by an automated process and if that's your point I agree, but that's not how eclipse is used. Besides that, I don't think there is a link between bad code and IDE. You can't look at a bad code and say..hmm this code probably was written in eclipse, or at a good code and say that it was written with vi, because this connection does not exist.


I agree that an IDE does not separate from the need to know how stuff works. But I certainly know a few programmers who do not dare to think beyond what their IDE allows them to do. They use built-in wizards and refactorings, but they do not seek solutions that are not easily expressible with those.

Hence, their code is factored just the way the IDE would, even if there are better alternatives available. This is the kind of mediocrity I am talking about. (And by the way, you could go much worse than Eclipse at that)

I actually had quite a few discussions where the main misunderstanding was that we used different tools and thus thought of different things as "easy" or "natural". In one case, one developer argued that it would be a good idea to create a whole bunch of classes to encapsulate a problem space. However, creating all these classes really would not have been necessary at all, he could have accomplished the same goal with a much simpler list of functions. Thus, we ended up with a HUGE file containing some several dozen classes that no-one but him could navigate, because it was factored just right for his programming editor but was all but unusable for anyone else.

One developer even told me that he did not know how to write a correct if statement in C because his editor had a template for that. What my argument boils down to is that this kind of behavior is bad, and I have seen it becoming a sad kind of sub-culture in Eclipse/Java land.

That said, I completely agree that if used correctly, there is nothing inherently wrong with Eclipse or Java per se.


Refactoring can't create functionality. What you are saying is not compatible with the theory if entropy. Eclipse is a high level language running on the JVM, and not a bad one.


I am puzzled at how arrays are hard to use in Python? I cannot understand how you could be 'banging your head' against Python's array syntax unless you are just new to Python.

If you want to use Java (e.g.: you know it already and don't like learning other things), who cares? Why is this an issue where you have to challenge other people's opinions of Java? Use it if you want to.


You can always do the speed-critical parts in C and link that from your Python code. Or, if your analysis is something already done, use a library already written in C (such as NumPy).

Another approach could be Jython (or any other JVM language closer to the desired level of abstraction) and Java.

I don't have much love for Java the language. It's not much easier to program than with C, isn't faster and is very verbose. Still, what you are doing looks like a good match for it. And all the respect I don't have for the language, I have for the JVM.

I wouldn't use if for web app development as there are much more productive options around.


I noticed a lot of criticism of Java's verbosity in the comments and I'm a bit curious what people are referring to? I work primarily in Java, but also do quite a bit of Javascript and Perl, and I don't notice Java being especially verbose. Maybe internally I'm giving it a break because it isn't a scripting language? I'm honestly curious to see what you guys think.


The two areas that most frequently annoy me are processing collections and (lack of) first-class functions.

Java:

  List<String> firstNames = new ArrayList<String>();
  for(Person p : people) {
    firstNames.add(p.getFirstName());
  }

  addCallback(new Runnable() {
    public void run() {
      doSomething();
    }
  });
Python:

  first_names = [p.first_name for p in people]

  add_callback(do_something)
Scala:

  val firstNames = people.map((p) => p.firstName);

  addCallback(() => doSomething());
  
The Python and Scala versions do exactly what they say, while the Java code has a bunch of boilerplate that you have to mentally filter out before you can understand what it's doing. And the Scala code is fully typesafe; the compiler infers types rather than making you continually repeat them.


This is, because you can't Java. How you will find out, where you use

   first_names = [p.first_name for p in people]
? More verbose, but right way in Java is like (you can do this better with enums and/or guava, it's just example):

    class PersonTransformer implements Transformer {
        public Object transform(Object o) {
            return ((Person)o).getFirstName();
        }
    }
and then:

    Collection<String> firstNames = CollectionUtils.collect(people, new PersonTransformer());
You can reuse it and, more important, you can search for it. Same for second example.


Why would you want to reuse a construction so trivial? Why would you want to search for it?

A couple days back I commented Java, by making some things harder than needed, induces programmers to over-engineer and build things for needs they don't have and to think that's perfectly normal. Think about what you just wrote.


Because it's DRY-Principle. It's doesn't matter trivial or not. If you distribute you 'trivial' constructions over whole application, you doesn't have any chance to have consistent, robust processing. If you collect/group things logically, you can easy find out where you (re)use constructions and what you can break, if you change it.

It's not over-engineering, it's just how to deal with >300 People and >6 years projects and not to have "design dead software".


It's just a list comprehension. Do you imply I should use a function instead? Why not use a very nice syntax feature every Python developer can understand?

Because if you do, I'd advise you not to add integers with the "+" operator, but, instead, build a class with various add methods for different types of arguments or, better yet, build add methods into every class you define so that you can better search for them. This approach would allow you to add things that aren't integers or even not the same type on both sides of the operator.


I give up. You're like a little kid. Codesize or verbosity doesn't matter. Quality is a keyword and it resulting in sustainable pace. If you doesn't understand this, nobody can help you. Your arguments are like as "Why I should use TDD, if I can write code directly".


You call me a kid because you claim using a single line syntactic feature of a language isn't maintainable. It's like claiming for loops are unmaintainable and that we should wrap them inside methods.

It's not people hating Java: it's you afraid of looking at other tools.


Are you really arguing that the 'java' way in your example is superior to Scala's


orangecat example is type safe, your example trades a "compile time" error for a runtime error.


Python isn't type safe (but Scala). Especially for you I wrote "you can do this better with ... guava, it's just example" :)


Curious enough in fact, to do some research. I found an interesting article: http://www.informit.com/articles/article.aspx?p=1824790

I also forgot about file i/o, which I don't do much of in web applications.


Java is good, make no mistakes about it. It offers you find grained control in almost every aspect of programming (e.g. concurrency). However it is the same freedom that allows developers to make mistakes. For example -

One can write concurrent systems in Java without understanding concurrency. Languages like Scala and Clojure will give you some freedom but will also enforce certain design principles which will save you.

Similarly for web development, there are scores of frameworks in the Java world, and you can mess it up easily. Rails / Django on the other hand will provide one good, solid way to do web programming.

Finally, Java is showing it's age. The need to write large files of XML to configure things and the lack of ability to treat functions as objects put developers off. Some things are being addressed by Oracle but will take time.


I am curious about what, specifically, you find easier to do in Java syntax than Python syntax.

Seriously, there is a fairly direct translation from any Java you might want to write to completely equivalent Python. Sure, Python offers more complex techniques such as list comprehensions and iterators. But you don't need to use them. You can just write Java-like Python.


And, the OP says he wants to move away from Python to a C-like language for better arrays? It's a weird reason given that arrays are not that complex a data structure, plus the option of using numpy arrays, which are quite fast and capable.


Interfaces.


Abstract base classes. See http://www.doughellmann.com/PyMOTW/abc/


Absent static type checking and manifest typing, abstract base classes do much less than Java interfaces.


Ah yes, the age-old static typing vs duck typing argument.

Sure, in theory static typing can catch bugs. But doing it like Java does it is a lot of work per real bug actually caught.


Whether or not that's true, it is a concrete difference between Python and Java. Python is not a superset of Java.


I never claimed that the two were the same language. My claim was that you can take anything written in Java and pretty much directly translate it to Python.

The fact that there are things Java will flag as errors that a Python translation does not, does not change this fact.


As I said, you seem to be claiming that Python is a superset of Java, not equivalent to it, and that claim is manifestly false. Errors that are detected in one place and not in another are a manifest difference.

Also, Java threads and anonymous classes do not translate directly to Python.


Python has threads.

In Python classes are first class objects, and you can easily do anything you could do with Java anonymous classes on the fly. Furthermore Java anonymous classes are usually used as a verbose replacement for a lack of closures. But in Python you can create closures and pass them around. (You do have to do some juggling to mutate variables, but 1 element arrays are only a slight pain to work with.)


Yes, but again, there is no Python syntax for anonymous classes. That you can "easily do" something a different way is not proof that two things are the same, it's proof that they are in fact different. Which again, is my point. That Python has other, different features that can be used for other different effects just makes it that much more different.

Python has green threads, but if you want concurrency, you have to resort to the process module. This is not a limitation Java's threading shares.

I'm not arguing that Python is bad or inadequate, just that it is a fundamentally different thing and should not be viewed as a superset of Java.


I'm not saying you should do this, but you can have an anonymous class like this:

  type('', (dict,), {'__init__': lambda self, *args, **kwargs: not super(self.__class__,self).__init__(*args, **kwargs) and setattr(self,'__dict__',self)})
Python easily allows multiple threads to exist at once. The only reason they don't run at the same time is an implementation detail of the most common Python implementation. Python running on the CLR, the JVM, and extensions for both CPython and PyPy show that you can run multiple threads concurrently if you really want to.


My claim is that it is fairly easy to translate any Java program into an equivalent Python one. That some things need different constructs is not evidence against this claim.


Your claim is untrue, and that different constructs are needed is evidence. It is often easy, but that isn't guaranteed.


That is a noun. What is the actual use case?


In context, the noun obviously refers to the Java interface syntax.


There's just no need for Interfaces when you have duck-typing at your disposal.


See my remark below.


> Why is it so horrible as a systems language above C?

* First class functions (interfaces with one method) plus garbage collector eventually encourage a functional programming style, with lots of little objects created on the heap. Alas, the per-object memory overhead of popular Java implementations is horrendous.

* Strong emphasis on using threads for concurrency. Alas, in practice, threads are incredibly large memory hogs.

* Verbosity. While it is possible to write clean composable code in Java, it is also remarkably verbose. After a while, this gets old and people take all shortcuts they can to limit verbosity. Which is a very bad idea. To quote an esteemed colleague, "I never took a shortcut I didn't regretted it later". Can we have our lambdas yet, pretty please?


* If you are using the JVM's generational + concurrent garbage collectors, generally the hoards of little objects disappear without hiccups or leaving much of a footprint.

* My beef with threads for concurrency revolves not around memory footprint (can you substantiate threads as "memory hogs"?), but instead around the necessity to be mindful of resource sharing. Yes, the JDK gives you lots of useful tools in this quest, but it's still not all that difficult to end up with a deadlocked app.


Java's heavy syntax for anonymous classes and immutables actually discourages functional programming even when that's a strongly desired approach by the programmer.


There is a hate against Java, also against C#, C++, PHP (which I hate), C, and pretty much any other mainstream language.

Notice, though, that competent people have done great jobs using these languages. So you have some choices. Two of them are: wonder why people bash Java or go do something useful with it. I suggest you do the second.

The key to using programming languages is in trying to use the one which will help you the most, or get in your way the least. Sort of "the right tool for the job". Idk what jobs java is good at. If you found out that it's good for your project, then use it.

Take a look a this article: http://prog21.dadgum.com/143.html


Just because you can do something with a language does not mean you should. Sure, lots of people make stuff with poorer languages. Lots of people also never write tests, skimp on documentation and copy and paste half their code. It works for them! The best choice would be to think about the problems people have with Java and find a good solution for them (e.g. Scala).

People use all sorts of distinctly sub-optimal tools and technologies for various reasons unrelated to those technologies' merits. One of the biggest reasons is familiarity--many people do not like learning radically new things and so stick to what they know. Popularity does not imbue any sort of quality to programming languages any more than it does to anything else like music. There's a reason that trained musicians respect classical music--even if they're making pop--and there's a reason programming language people respect ML.

In short: just because many people manage to use Java does not mean it is in any sense optimal or even good.

Also, I think the oft-repeated "right tool for the right job" bromide about programming languages is deeply flawed. Programming languages overlap far more than most tools--they are all general-purpose programming languages, after all. The difference between a hammer and a screwdriver is far greater than even the difference between Java and Haskell. Choosing a programming language is more like choosing the best power drill--they overlap almost completely and can do the same jobs. It's quite plausible that some are almost always better than others, but that you could ultimately do the job with either. It will just be more difficult with one than the other.

Also, even if languages did differ significantly, there is no guarantee that any particular language has anything it's best at--it can be strictly worse than other languages for every conceivable use.

Finally, I think that surrendering to familiarity and choosing something you know over something you need to learn is rarely a good choice. Sure, if you have a hard deadline, it might be a reasonable compromise. But learning a language is essentially a constant expense where its affect on your productivity is linear to how much you program. Just because it might take more effort to get started with Scala does not mean you should immediately consign yourself to the drag on productivity that is Java.

You should be learning something new all the time, and programming languages are some of the most important things to learn in CS--they affect not only what you write but how you think. So strive to find the best one you can rather than settling for something that works--in this day and age, expecting your language to be somewhat usable is too low a bar to set.


> In short: just because many people manage to use Java does not mean it is in any sense optimal or even good.

Nobody is saying the contrary. I suggested him one of the many options he has (from which I mentioned 2), which is using java. And said that using java is better than going around looking for why people bash java. Which happens to be true.

Consider how he would choose a better language though.

It's surprising how cumbersome it may get to write a simple loop in a more "elaborate" language.

> Also, I think the oft-repeated "right tool for the right job" bromide about programming languages is deeply flawed. Programming languages overlap far more than most tools--they are all general-purpose programming languages, after all. The difference between a hammer and a screwdriver is far greater than even the difference between Java and Haskell. Choosing a programming language is more like choosing the best power drill--they overlap almost completely and can do the same jobs. It's quite plausible that some are almost always better than others, but that you could ultimately do the job with either. It will just be more difficult with one than the other.

There is more to choosing a language than "the language". Here are some reasons why he may wanna use java:

1) He has books about java, but not about anything else.

2) His co-workers use java.

3) He needs to work with the JVM.

4) He really wants to use java.

5) He has a bunch of minor reasons to use java.

6) Libraries, Libraries, Libraries.

7) Development tools.

8) Good implementations.

9) There is a standard for the language.

10) There is a huge community around the language.

11) He doesns't really have a choice. His boss wants him to use java.

12) He's in academia and most people in his institute uses java.

(there are more!)

The "right tool for the job" sure is true. And very much true. It is complicated to select it though. That's why I didn't try to tell him HOW to select that tool. I really think he'll be better off with java if he saw java is good enough (another point is that a "known good enough" is usually better than a "unknown perfect").

We choose "a language" but rarely because of "the language". As you said yourself, a lot of times, these languages are general purpose languages and overlap a lot.

Have you ever noticed that a lot of the languages used today are really tied up to their implementations/running systems? C and UNIX, Java and JVM, C# and CLR, Python and CPython, PHP, Ruby, Objective-C (this one is a really good example), JavaScript, etc. This was true in the past too, think of delphi, vb6 and windows, lisp and the lisp machines, fortran and cobol and the IBM systems.

A lot of what was "language design" in the past, is libraries and implementation design today. Choosing a language is more than looking at "the language", its syntax, semantics, idioms, patterns, etc.

Some other important stuff. Like building a GUI is important, communicating over a network, accessing files, dealing with the data base, doing graphics programming. You can either use, for example, c# and have lots of these from .Net CLR with little effort, or pick OCaml, for example, and have it, but having to do a lot of work that you'd not have to do in c#. Even if you port OCaml to run on the CLR, it's unlikely to be as much CLR friendly than C#.

But, also, there are "local" reasons to choosing a language. These are stuff you or I don't know because it's specific to him or his group of people.

> You should be learning something new all the time, and programming languages are some of the most important things to learn in CS--they affect not only what you write but how you think. So strive to find the best one you can rather than settling for something that works--in this day and age, expecting your language to be somewhat usable is too low a bar to set.

Learning new stuff is good advice, generally. But programming languages are one of the most irrelevant things in CS. In the long run, they are irrelevant.

"We" already teach/learn a lot of programming concepts and techniques without specific programming languages, but with concepts shared by a class of programming languages (as you even said it, lots of them overlap). Concepts and techniques that, in the past, were highly specific of particular programming languages.

Languages get obsolete. Those which remain do so usually because of practical matters (like C, or Java, or C++).

The idea, sometimes new, a programming language may bring is important though. The language itself is not. For example, closures are really catching on now, but it was invented much in the past, and first implemented in languages that people do not use much (I guess it was scheme, but I am not asserting it)

And, so I can end this reply...

I'm sorry for arrogance, but you should not lose the ability to separate interesting from the practical, which I got the impression you cannot do very well. Some things are both (I guess haskell is one of these), but it's not usually the case.

What some people (fortunately, it doesn't seem that it's most of them) don't understand is that lots of programming do not require the sort of elaborate constructs and idioms that, for example, scheme allows you to use. I once talked to a guy who did lots of "business" software. I mentioned scheme to him. Told him lots of cool stuff about recursion, clojures and macros, told him a little about lambda calculus; showing how you could do it in scheme. And he told me "It's cool, but it's also like people don't know what is useful anymore.". Well, that got me thinking back then.

You can argue all you want if he's right or not; if the software he writes is difficult or not, but it's not that he didn't see the advantage of those things. But turns out that most repetitions do not require and are not good with recursion (they just loop over a collection or a range of values; a for+iterators or numbers would usually do), most functions do not return other functions (not that you couldn't do it that way, but it's usually the case that your program is simpler if you don't), and minimalism is not really that much convenient in writing software for $$ (many people seem to reach this conclusion). Lots of applications are still single threaded, and runs in only one process. Immutability is a lot more interesting in theory than in practice for a large class of programs. Static typing still catches a lot of problems, and people usually do not bother that much about having to write down the types. Beautiful techniques for managing large programs are very interesting ... for large programs. It turns out that lots of programs are not that large. And the list goes on and on.


Why is it so horrible as a systems language above C?

It's not "horrible", it just has many slight-to-moderate deficiencies and annoyances that make development more work than it should be.

Is there any other language that fits this role in a better way?

Scala is strictly superior when used as a "better Java". (If you go deep into its functional capabilities you get a different set of tradeoffs). C# is better as a language, but then you're tied to .NET.

Really we'd need to know more details of what you're doing and why you believe Python may not work. Are you concerned about performance, or do you need to do things that Python doesn't have convenient APIs for?


I've written two crawlers in Java and found it quite well-suited.

I think most people on HN who hate Java are talking about creating websites, and for good reason. Back in the bad ol' days, people would use Java frameworks like Struts for web apps, and it was quite painful.

For my latest project I'm using Play Framework for front-end Java, and it's quite delightful.


Agreed. I am in the process of porting a site from PHP to Play and I am loving it.

I am a fan of lots of languages, but recently for anything I am supporting for a long period of time I want static typing to catch massive re-factoring issues. Id rather use C# then Java personally, but Play is an amazing Java framework to use.


Play framework is definitely a good Java framework to use. In my current projects I am finding it very useful.


Nothing wrong with using Java and there is something to be said for using the language you are most productive in. But if you are thinking of building a web crawler in Java, I would recommend taking a look at the Heritrix project: https://webarchive.jira.com/wiki/display/Heritrix/Heritrix It's robust, open source and easily extensible. Might be easier to write a custom module for it than to roll your own web crawler.


The best reason to use Java is the enormous ecosystem of libraries and resources.

The best reason to AVOID using Java is the huge demand for Java programmers and the low supply. At my job we can barely find applicants with Java so we end up hiring .NET people and converting them.


Wow, that's almost entirely the opposite problem I've found in Bristol (UK). There seems to be a lot of people that have stuck with Java after graduating from university and a severe lack of any mid-to-senior level .NET developers.

Out of interest, how do these guys find the switch to Java from C#, assuming they're not VB.NET guys?


Just curious, where do you live?


I live in Phoenix, Arizona, and i've heard my old market (Raleigh, NC) is experiencing similar issues.


I personnally dont like Java because of API complexity and this is why ended up with python. I have implemented many crawlers by using Scrapy framework and I believe it speed up development. We have crawled millions of pages without any problem.

Python is very powerful in terms of string manipulation because it has very good language constructs (like slice syntax) which makes development easy. At the beginning it might be a little bit confusing but once you mastered it you really feel power.

Twisted like frameworks also makes good job at this point. It is well-designed, asynchronus and it suits well for multi-tier network applications.


It's a lousy language (IMO) with some excellent libraries and very fast compilers. If you're comfortable with the limitations of the language, and you're having trouble with Python, it's worth a shot, I guess.


Java is a great platform to build on - and the sweet spot is definitely for server side applications like this.

You can safely ignore the people who bash Java - they are generally clueless. The Java language is perfectly fine: high performance, statically typed, OOP, relatively simple and maintainable. It may not offer the most concise code and it may not have all the "trendy" language syntax features but guess what - that actually doesn't matter much in the real world (i.e. outside the realm of language designers and fanboys). If saving a few characters of typing is your major concern when choosing a language, you have much bigger problems.

But the real strength in Java is not the language but rather the overall platform - the combination of the JVM (which is an amazing high performance feat of engineering), the library ecosystem (which is the best overall for any language), the tools (great IDEs, Maven, a host of other developer-focused tools), the fact that the OpenJDK itself and most of the libraries are open source and the portability (compiled JVM code is extremely portable, and importantly doesn't need a recompile unlike some other so-called "cross-platform" languages)

So overall you can't really go wrong with choosing Java for server side applications. Although I would also give Clojure or Scala a look - if you are after "powerful" languages then these two are pretty amazing and you still get all the benefits of being on the Java platform.


I don't think Java-the-language is "perfectly fine". It's extremely verbose, the type system is mediocre at best, it's actively hostile to functional programming (and lets not even consider anything else like logic programming!), the syntax is extremely inflexible.

Your tirade against "fanboys" is nothing but a straw man--the point of having more expressive, concise code is not "saving a few characters of typing" but making your program easier to write and easier to read (and, therefore, easier to maintain). Sure, you can get stuff done with Java, but you can generally get it done faster and better with other languages.

High-level features absolutely matter in the real world--they allow you to write code faster and give you more confidence that it is correct. Code written at a higher level is not only shorter but also more declarative and clearer. The idea that only "language designers and fanboys" care about having these features in their languages is patently absurd and rather arrogant.

To me, it seems Java is a compromise--it ignores decades of research and progress in programming language design in favor of catering to people who knew C++ and didn't want to learn something radically different. Thanks to being widely taught, it is now essentially a lowest common denominator: practically any programmer you meet will have at least learned the basics of Java at some point. But I think this is exactly the sort of compromise any good programmers should not take!

Now, the JVM is, admittedly, a good platform. It has some glaring weaknesses--poor support for functional programming, poor interoperation with native code, long start-up time and so on--but, on the whole, is very strong. Happily, you aren't bound to Java if you want to be on the JVM and you can use some of the great alternatives like Scala. But this does nothing to defend Java-the-language--having a good implementation does not make a language well-designed or particularly usable.


Java is suited and even powers some powerful crawlers like Heritrix (archive.org) and Nutch (Apache foundation).

That being said, it doesn't really matter what language you write your crawler in: its performance will much sooner be influenced by other aspects (network latency, storage, etc) than the language you choose.

So pick the language you're most comfortable with for crawling and offload the data processing to a lower level language that is better sooted for that task.


It's just that Java is very verbose, and actually I found it particularly horrible for data driven applications (by this I mean apps whose behavior is determined by data/config files, not "Big Data" - I have no experience with the latter). For complex data types you always need to create complex class hierarchies. In other languages you could just write

webInfo = {url: "bla.bla", title: "bla die blub", links: ["link1", "link2"]}

Notice that webInfo contains two different types, Strings and Arrays. In Java arrays or hashes you can not easily mix types - you'll end up just putting objects everywhere, then be forced to litter the code with type casts. Or you create the unwieldly class hierarchy. That is my prediction, anyway - I am too lazy to come up with a good example :-(

You can also not simply write something like the hash above. The nearest you can get is if you have created that class hierarchy with suitable constructors, you could instantiate that in one go. At least that is my memory - I have now avoided it for so long that I am not even sure how to instantiate an Array or a Hash with data on the fly anymore.

I think instantiating an array with data goes something like

links = new String[]{"bla", "blub"}, and there is nothing like that for Hashes - you are stuck with

info = new HashMap()<String, Object>;//generics are particularly ugly and annoying

info.put("links", new String[]{"bla", "blub"});

info.put("title", "some stupid web site");

info.put("url", "undisclosed");

And so on - a far cry from the example above. (Note the Java syntax is probably wrong, created from memory - but it is something like that).

Even if you went through the mind numbing work of creating appropriate classes, you'd be stuck with

info = new WebInfo(title, url, new String[]{link1, link2,...});

And that is just for two different types, and notice that there is no way to see what the name of the parameters of the WebInfo constructor actually are from that snippet of code.

title: someTitle

is actually much more readable because you can instantly see that someTitle is supposed to be a title.

Also if you want to use NoSQL, I suspect converting java classes to JSON could be a pita, too.


link can be a single String? I can see here new "info = new WebInfo(title, url, new String[]{link1, link2,...});" that it can't. This is the kind of thing you get with a typed object. Also, instead of Strings for title, url, etc you could use tiny types. Yes, it is a lot more verbose, but it comes with a advantage (compile time erros over runtime erors). TinyType also documents what you should pass for the constructor. If you need some preparation to get a url (I don't know, like, finding it on some txt), typing would lead you to do it.


If you enjoy that kind of programming, sure, go for it. It certainly is possible to program a crawler in Java. Personally I can't go back since I experienced more succinct languages.

Also beware of pseudo work: I suspect Java is partly popular because it makes you feel productive. You are constantly busy creating Tiny Types (as you call them), generating code in Eclipse (cool: one click and you have 50 lines of code in your class) and so on. It is all just pseudo work that accomplishes nothing, but maybe feeling productive is worth it.


Gson solves that problem by parsing Json to objects or untyped hashmaps. It's OK to mix a mini language into your program.

ImmutableMap.Builder gives you almost syntax-free hash map creation.

You can use constructors in a loop or Lists.transfrom for data-driven construction. Not exactly like JavaScript, but gets close.


There are some good, reasoned comments here. Java suffered through some unfortunate 'best practices' that tarnished it's reputation. Building is uh, suboptimal, as some have pointed out, but if you're working alone just keep it simple and it shouldn't be a problem.

Unless you're building something that needs to be (1) highly dynamic (like a web-based spreadsheet where you don't know the column types til run-time, or (2) true real-time software, you're probably better off using java. Some libraries do suck as others wrote, but it's the volume of good libraries you care about. In any case, I'd argue that in many alternate languages, the code you're writing so quickly doesn't need to be written at all in java, because there's a library for it.

Verboseness is a fact in Java, but a decent IDE shields you from that as well. With Java it takes a little longer to get things done, but (in my experience) you spend less time trying on performance, fixing problems in the underlying tools or language, or just dealing with your own bugs and keeping things running. Since most development is maintenance, you want to optimize for that.


> Is there any other language that fits this role

Go?


My main problem is that it has awful support for functional programming, which I find to be a really helpful way of doing something like a web crawler, where you're essentially describing a computation to parse some input. I would use F#, because it offers powerful functional programming tools, is on .NET / VS 2012 (not sure if that's a pro or con for you), and has type inference (so you get the benefits of static typing without the cost of writing out the types of everything).

You should probably check existing web crawler solutions to see if you can adapt them before rolling your own.


Java reputation suffers because of the association of such language with corporate drones.

We may say that with the current crop of languages running on the JVM, Java is a low-level language. It is to the JVM what C is to hardware. You avoid coding in both when you have higher-level languages available which will make you more productive.

But when you want to optimize performance on the JVM for specific chunks of your application - without resorting to JVM bytecode of course - Java is the right choice.


Java is good enough for all types of projects. People hate Java because... they can't Java right. Personally, I very like Python and JavaScript and features like multiple inheritance or prototyping, higher-order functions and so on. But Java is better sufficient for projects which more complex as "Hello world", because type safe (robust, compile-time feedback), excelent IDE support and incredibly fast and allow fast development (if you can Java, TDD and so on).


There are plenty of reasons other than people being unable to "Java right", though I'm not saying that Java doesn't get plenty of hate out of ignorance.

A good example of when you can't use Java would be embedded projects or projects where you need to squeeze even more performance out of your code. Java is fast but it isn't the fastest.

Feel free to love your language of choice but I'd recommend not letting it get in the way of common sense.


Java bytecode is very compact and used also in embedded systems. Java can be predictable.

Can you show some benchmarks compare java w/ jit and other languages?

I love another language, but Java is better choice.


You always can use JS/Python/whatever in your java application as a scripting language.

In such case like web crawler, the main issue with Java is the scalability or rather lack of it. You need to code it yourself, but that's not any different than other languages and platforms.


DropWizard is a great way to start Java project.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: