

Copying stdin to stdout in Java - KostaC
https://gist.github.com/kosta/9777932

======
abalone
As he notes in the comments, you'd really just do this:

IOUtils.copy(System.in, System.out);

using the Apache Commons library.

That's the most apples-to-apples comparison to more "scripty" languages that
have more stuff built in. With Java you pick the libraries/frameworks you want
to use. Commons and Guava are extremely popular and robust libraries and
really should be included in any comparison against "real world" Java.

I actually like how in Java you can get down closer to the metal and control
the exact buffer size _if you want to_ , or handle all the exception cases
instead of just passing them up _if you want to_. If you're comparing
intentionally lower-level more verbose Java code to higher-level Python or
whatever, that's not really apples-to-apples. High-level Java involves
frameworks.

~~~
acqq
Still, Awk and Perl need no library, the semantics was the part of the
language since the beginning, and is always available to every programmer,
also for the bigger programs. Awk and Sed were obvious inspirations for Perl.

~~~
bananas
Awk and Perl are tools built on another language that does stream copying i.e.
read stdin and write stdout.

You could build Awk and Perl in Java if you wanted and have the same result.

    
    
      System.out.println("hello world");
    

Is the same as:

    
    
      printf("hello world\n");

------
patio11
If you want a single point generalization about Java programmers, I was one
for years, and the way I would have implemented this involves minimally a
BufferedReader and an InputStreamReader. You'll also probably need to catch
the IOException. Plus or minus 20 lines or so.

I don't hate Java, but "Java as it is spoken" does not often get used for
Unix-style piping in a chain of small programs on a command line. It also
isn't the easiest tool in the drawer if you want to do light (or heavy) string
processing.

~~~
RyanZAG
Sure, but I wouldn't bring my forklift out of storage to move a sack of
potatoes across the room either which is what you're doing when you use Java
for a 10 line shell script. That's not what Java is designed for.

~~~
alandarev
Nor you grab an camel when you want a ride, as there are better options.

------
Unosolo
The essence of the argument here is that Java is a poor language as it doesn't
offer simple abstraction to copy standard input into standard output; that
Java is too verbose and low-level for the task.

The lack of direct abstraction is not a valid argument, because as a
programmer, you shouldn't be writing the logic in Java, Python, C or Scala but
rather a higher order domain language implemented in the chosen host language
and that's what the process of programming is all about. For the majority of
real world programming tasks it's unlikely that a language that fits the
domain perfectly already exist, so you have to create one.

In Java one can say:

    
    
        copyStream(System.in,System.out);
    

And then one will have to implement copyStream but only once:

    
    
        long copyStream (InputStream src,OutputStream dst) throws IOException {             
            long bytesCopied;
            byte[] buffer = new byte[8192];
            int bytesRead = src.read(buffer);        
    
            while(bytesRead!=-1) {
                bytesCopied+=bytesRead;
                dst.write(buffer, 0, bytesRead);
                bytesRead = src.read(buffer);
            }
    
            return bytesCopied;
        }
    

I prefer programmers taking this approach of implementing domain specific
language first and then expressing the logic in its terms instead of trying to
express higher order concepts without resorting to available host language
abstractions.

Let's say Python or Perl let one express stream copying more concisely
straight out of the box. However when faced with real life programming
challenges one will very quickly encounter limits of what a language can
express out of the box with one-liner. But as a programmer one has the power
to create one-liners from scratch!

 _Disclaimer: I am not a Java expert, so the code above is just to illustrate
the idea based on my very limited knowledge of Java._

~~~
pekk
Two things.

First, constantly having to write implementations like this introduces a ton
of friction as compared with reusing existing implementations. Languages do
have an influence on that.

Second, languages vary in "expressiveness," determining how much code you have
to write in order to make an implementation like this. In one language,
copying stdin to stdout fits into a natural idiom which also handles other
cases naturally. In another language with less care for ergonomics, every task
might be equally un-idiomatic.

In other words, a language can offer its own "higher order domain languages"
for the core tasks that everyone is doing over and over again as part of their
general programming. Or it can choose not to do that, because what it offers
is already Turing complete. But then the ergonomics are bad, and it makes a
real difference.

It seems wrong that when I am paying for things like long JIT warmups and
stop-the-world GC, I am still writing piles of functions with low-level idioms
that are no more expressive than C's.

If Java ships with a 'copy from one stream to another' primitive then I'd find
that a much more compelling argument than that I can treat myself to
reimplementing things like stream copying, sorting and basic data structures
on a regular basis. It's unbelievably tedious and wasteful to do this, there's
just no reason.

~~~
Unosolo
Agree, in Java (and any other language) the main culprit is not the complexity
that can eventually be abstracted away, but the trivial noise that cannot be
abstracted at all: verbose class definitions, clumsy anonymous functions prior
to SE8 (functors), lack of implicit interfaces, lack of infix method call
notation, generic type erasure, lack of basic type inference, lack of
continuations etc.

My point was that limited expression means of a language are sometimes
compounded by inability of a programmer to make a good use of the expression
means already available to them.

I also understand the desire for more a expressive language, I am a programmer
after all. One has to keep in mind, however, that the more expressive a
language is the harder it is on the reader. Java code is trivial for a reader
to follow (if not for the excessive verbosity sometimes covering up the true
intent); much of Scala code base, on the other hand, is not that trivial to
comprehend due to the high expressiveness of the language.

------
dvdkhlng
I don't know much Java. Does 'System.in.read(buffer)' block until stdin is at
EOF or buffer is full? Then this solution would be somewhat broken. Imagine a
low-bandwidth source on stdin and a audio player connected to stdout, like

    
    
      wget -O - http://audiostream | java inout.java | mplayer /dev/stdin
    

If I had to write this in C I'd use file descriptors in non-blocking mode and
select(). With Tcl one could use event-based IO.

~~~
kilburn
Using non-blocking mode may be better, but to answer your question: yes, it
does block [1].

[1]
[http://docs.oracle.com/javase/7/docs/api/java/io/InputStream...](http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read%28%29)

~~~
Perseids
(You referenced the one byte read function, instead of the multi-byte function
used in the OP.)

The documentation does not answer how long it blocks. Blocking when there is
no input is completely ok in this use case as there is nothing better to do
instead of waiting for input. The relevant question is how long it blocks
after there is some input available for efficiency reasons.

~~~
dvdkhlng
> there is nothing better to do instead of waiting for input.

In case no more input is available it would be best to already output the data
that have accumulated in buffer. Else in case the input program deadlocks
you're never going to see the last (up to 8191) bytes output.

This is what non-blocking I/O is for. With non-blocking I/O you first wait
until your input channel becomes readable, then reading functions return as
soon as the data available for read is exhausted.

~~~
Perseids
You are right but I was talking about the case when no input was there to be
read since the call to `read` and also about this specific application:

> Blocking when there is _no_ input is completely ok _in this use case_ as
> there is nothing better to do instead of waiting for input.

(Emphasis is new.)

~~~
dvdkhlng
You're right, I overlooked the reference to the function reading a single
byte.

------
chatman
Python, shell, etc. isn't suitable for high scalability, fast applications
(e.g. Apache Solr, Lucene, Hadoop etc.). And beginner examples like these make
no sense. Java clearly has an important role in this world, maybe not in yours
(if you're so irked by "verbosity").

~~~
pekk
Are you absolutely certain that nobody has used Python in high scalability,
fast applications?

------
nemothekid
I don't have the post context, but at 13 lines, I feel even something like Go
wouldn't even be considerably larger or marginally easier to understand.

~~~
hornetblack
It's not much shorter, but it was pretty easy to write it you know the
libraries.

    
    
        package main
    
        import ("os"; "io")
    
        func main() {
            io.Copy(os.Stdout, os.Stdin)
        }

~~~
AhtiK
The order of os.Stdout, os.Stdin is interesting. I'd assume src->dest being
more intuitive order.

Any ideas if this dest<-src order is by design following go design principles
or was mainly authors' personal choice?

By the way, Pipe in the same package is the other way around: ( _PipeReader,_
PipeWriter)

~~~
arthurbrown
Channels in go use the exact syntax you mention (dst <\- src). With this in
mind, the argument order of io.Copy is consistent, as well as io.Pipe. You
read from a reader - the output of the pipe, and write to a writer - the input
of the pipe.

------
cromwellian
Amazingly, in Java 8, they added a Files utility class to copy an InputStream
to a Path, and copy a Path to an OutputStream, but no Files.copy(InputStream,
OutputStream).

At this point, Guava should just be added as part of the JDK. :) In Guava,
ByteStreams.copy(System.in, System.out)

~~~
lmm
Java has really solid dependency management with maven, so I'd actually like
to see less in the JDK. Let Guava/JodaTime/log4j/etc. keep their faster
release cycle, just deprecate the parts of the JDK that used to do those
things and point developers in the right direction.

------
blossoms
There are some comments on the Gist claiming Python or Ruby is a one liner. My
response:
[https://gist.github.com/kosta/9777932/#comment-1199293](https://gist.github.com/kosta/9777932/#comment-1199293)

~~~
pekk
Your Python code is just wrong. I'm not sure what you think it's proving when
you write explicitly wrong Python code, and then show that the output is
wrong.

~~~
blossoms
>Your Python code is just wrong.

Could you elaborate? I want to understand how to fix it.

 _EDIT: I really want to know what is wrong with my code. It works. Is it a
stylistic concern?_

    
    
        $ dd bs=1m count=10 if=/dev/random of=randomdata
        10+0 records in
        10+0 records out
        10485760 bytes transferred in 0.878533 secs (11935535 bytes/sec)
        $ python stdin_to_stdout.py < randomdata | cmp - randomdata
        $ echo $?
        0
    

_EDIT2: I think I know how you came to the conclusion my code is wrong.
Perhaps next time you should read the __entire__ post. Thank you._

------
hyp0
more C-style

    
    
      while( (bytesRead=System.in.read(buffer)) != -1 )

------
codezero
may want to link to the particular post in the gist :)

~~~
fish2000
Yeah – what post?

~~~
eshyong
"Why I like Java" by Mark Dominus:

[https://news.ycombinator.com/item?id=7463671](https://news.ycombinator.com/item?id=7463671)

~~~
acqq
Who never wrote it but assumed we know that the same program in Perl, without
using command line switches is

    
    
        while (<>) {
            print;
        }
    

Still Awk is shorter:

    
    
         { print }
    

And if you are allowed change the switches of the command line to Perl for
your program that Perl program can have _exactly 0 bytes_.

~~~
patio11
Awk is even shorter: anything truthy works. You don't even need the print
statement or brackets. Thus there are nine separate 1-byte programs which
produce cat: [1], [2], [3], etc etc.

By default, unless you use the BEGIN block or something, awk will run your
program on each line of stdlin. This is useful for programs of the type:

(/some regular expression/) { some action}

The default action if you don't specify one is "print $0" (the whole matching
line). If your condition is a plain-ol' expression rather than a regular
expression, and it always evaluates truthy, you thus get every line.

~~~
acqq
You are right, the shortest awk is one byte. Please post the link all nine
variants I wasn't able to Google it.

Still the shortest Perl is 0 bytes when the command line switch is allowed to
be -p

Update: That's actually called "the sed mode of Perl." Moreover these two
command lines behave similarly:

    
    
        perl -pe 's/search/replace/g'
    

and

    
    
        sed 's/search/replace/'

~~~
dvdkhlng
SED is shorter (0 byte). SED /is/ a turing complete language [1] so don't try
to claim that I'm cheating.

    
    
      sed < input > output 
    

:)

[1] [http://robertkotcher.com/sed.html](http://robertkotcher.com/sed.html)

[update] s/set/sed/

~~~
Alphasite_
Even java can do that setOut(system.in); (iirc)

~~~
dvdkhlng
> Even java can do that setOut(system.in);

I was referring to an empty SED-program that would then be run via 'sed -f
empty-file.sed'. Of course you need >0 bytes to invoke the SED program.

