
Gonix – Unix tools written in Go - polegone
https://github.com/polegone/gonix
======
caipre
Not meaning to bring up a Rust vs Go debate, but since there are a few
comments claiming that this is a waste of time, I figured it's worth
mentioning the Rust based re-implementation of coreutils:

[https://github.com/uutils/coreutils/](https://github.com/uutils/coreutils/)

And another by suckless in plain C:

[http://git.suckless.org/sbase/tree/README](http://git.suckless.org/sbase/tree/README)

~~~
barsonme
Or my rewrite of GNU's coreutils in Go: [https://github.com/EricLagerg/go-
coreutils](https://github.com/EricLagerg/go-coreutils)

It's not complete yet, but I couldn't pass up this thread :-)

~~~
polegone
I must say that yours is much more complete than mine, though.

~~~
danieldk
Work together!

It would be great if complete coreutils was implemented!

~~~
barsonme
I'd love some help. I'm juggling 3 side projects right now, and it's hard
making time for all of them. :)

~~~
polegone
I linked it in the readme so all those people visiting my project will see
yours.

I can't help much myself (you are _way_ ahead of me), but more people should
visit your project now.

~~~
barsonme
Thank you! That's very generous of you.

~~~
polegone
No problem.

------
quink
[https://github.com/polegone/gonix/search?utf8=%E2%9C%93&q=re...](https://github.com/polegone/gonix/search?utf8=%E2%9C%93&q=readall)

Goodbye, memory.

~~~
vvpan
I don't know much about go, but taking a quick look at implementations - they
seem to be written by a programming novice, and the are quite primitive. Don't
mean to be negative, just my opinion.

~~~
polegone
I am a novice, and that is one of the reasons why I started this project (to
learn).

~~~
danieldk

        bytes, _ := ioutil.ReadAll(os.Stdin)
        lines := strings.Split(string(bytes), "\n")
    

Tip: use bufio.Scanner.

    
    
        scanner := bufio.NewScanner(reader)
        scanner.Split(bufio.ScanLines)
        for scanner.Scan() {
          // Do stuff
        }
    

And you can iterate over the lines 'lazily'. If you don't want to consume
\r's, make your own ScanLines :).

~~~
Someone
Tail also doesn't always need to read its entire input. When given a file
path, it can read backwards from the end of the file until it has found its n
lines. That makes tail fast on huge files (as long as they have normal line
lengths), but it also complicates the code.

And you may want to mmap the file, rather than open it. Whether that is a
speeds up things depends on OS, OS version, file size, file system, available
memory, phase of the moon, etc.

------
barakm
Forget the haters, this is awesome.

GNU started as GNU's Not Unix, reimplementing Unix userland for free. The
people who are adamant about calling it GNU/Linux are, on some level,
remembering that the userspace is historically a reimplementation.

Give me a busybox that I can 'go build' and that becomes really quite
interesting.

~~~
mixologic
Except for the fact that it'll be years before all of the subtle bugs are
worked out and you can rely on those apps to be as stable as the ones we've
got:

[http://www.joelonsoftware.com/articles/fog0000000069.html](http://www.joelonsoftware.com/articles/fog0000000069.html)

~~~
eloff
We have years. They're passing whether we like it or not. In the meantime you
can choose whichever implementation of the Unix tools that you want. In the
future there may be more options that fit the bill, developed in a safe
language without the kinds of buffer overflows and things that exist in C
tools today.

------
pixelbeat
A few data points on GNU coreutils.

1\. It has a fairly good test suite, that rewrites should leverage. That can
be easily done by setting $PATH to prepend the dir of the new tools, and
running `make check`

2\. To give an indication of the size of coreutils:

$ for r in gnulib coreutils; do (cd $r && git ls-files | tr '\n' '\0' | wc -l
--files0-from=- | tail -n1); done

985050 total 243154 total

~~~
pixelbeat
It's also worth pointing out, since busybox was mentioned a few times, that
the latest release of coreutils has the ./configure --enable-single-binary
option to build as a multi-call binary linke busybox etc.

------
luckydude
Heh, this brings me back. I was a young guy at Sun, perl 4 was a thing, I
actually argued that we should redo /usr/bin in perl. In the days of a 20mhz
SPARC.

Silly me. Maybe it makes sense now.

~~~
ezequiel-garzon
I'm curious: what was your main motivation? I can understand it as a worthy
challenge, but it would probably have led to a worse performance than the
C-based utilities, no?

~~~
deckiedan
It would be pretty cool if all applications on your system were written as
scripts, especially if they are very simple scripts, as it means you can open
any of them in a text editor and see what they do.

It means you can modify any of them just as easily.

I remember one time when I needed information about how much CPU usage a
program was using, I could _see_ it in the output from `top`. So to figure out
how `top` was getting that info, I had to download the source code, and grep
through several .c files. If I could just `vim top`, and read the code, it
would be very cool.

As an educational resource, being able to tell all your students "all binaries
in /bin and /usr/bin are editable! Read them! Find out how they do what they
do!" would be incredible.

~~~
ezequiel-garzon
Thanks for your reply. Yes, the ready availability of the code would be a
plus. In my mind utilities like grep or sort are very sensitive in terms of
performance, but that's just an impression.

~~~
deckiedan
Oh yes, absolutely. That is the big tradeoff.

One could hope that with many of these utils they're mostly IO bound, and so a
scripting language wouldn't change that much. I know at one point I actually
had a python based md5sum that was faster than the regular gnu one for big
files - as it loaded the data off the disk in huge chunks.

I believe ack-grep was written in perl? And that's pretty fast.

Also - how fast does it _Actually_ need to be? It's not high-frequency-trading
or a 60fps game... A computer these days can probably load perl, run a script,
and display the output faster than a computer from 20 years ago could run the
original c util... And if everything is written in it, then a lot will be
already in memory. If your shell is written in it too, then you could just
call the utils from within there, rather than having to fork/exec it anyway.

For most higher level scripting languages (such as perl, python, ruby, etc).
Things like regexps and hash tables are written in c underneath anyway.

With a lot of shell scripting, you call many many utils many times - which
would slow everything down hugely if you needed the scripting language startup
costs each time. However, if you use the same higher-level scripting language
to write all your scripts in - rather than writing SH scripts that call your
utilities, then you might even get faster than SH calling external utils
(possibly). (If that makes sense...)

------
microcolonel
[https://github.com/uiri/coreutils](https://github.com/uiri/coreutils) << I
wrote some with a friend a couple years ago, maybe they suck, or maybe they
have usage strings. >.>

~~~
mpdehaan2
While I'm not sure if he posts here, I used to work in the same group with Jim
Meyering, who maintained/maintains coreutils. Great guy.

Anyway, he told some great stories of the complexities of POSIX, and what
happens in Solaris when you have directories 20,000 lines deep (and how to do
it efficently, and the fun of teaching various coreutils commands about
SELinux). Lots of it gets surprisingly low level quick.

Coreutils is complex for many good reasons, so while these tools look all nice
and clever, they are not dealing with alot of the same kinds of issues.

(I also recall some of the heck Ansible had to go through to deal with atomic
moves, groking when something was on a NFS partition, and so on. Not the same
thing, but small seemingly easy things get tricky quickly.)

Bottom line is appreciate those little tiny Unix utilities, a lot went into
them even when you think they aren't doing a whole lot :)

~~~
johnny22
agreed! so many folks see all that stuff as "bloat". In many cases, that code
is there for a reason.

~~~
ploxiln
bloat always does have some reason. but it's still bloat.

Good engineering makes good tradeoffs, rather than throwing in everything
anyone wants.

(Not that I think this effort in go is "awesome". For one, it's probably
bigger executables than even statically linked coreutils. The suckless
sbase/ubase have some real potential though.)

------
zserge
It would be cool to have a "busybox" alternative targeted at Plan9 commands. I
personally find Plan9 utils much more logical (and easier to implement!).
Something like 9base from suckless, but as a single binary and hopefully in a
more modern language (most of the code like sam or rc is not so easy to
understand).

~~~
SSLy
> I personally find Plan9 utils much more logical

Could you elaborate on that?

~~~
zserge
Plan9 utils usually have less options and smaller functionality, but you can
easily compose them.

"rc": I like it much more than sh for scripting. It's easier to learn and has
less pitfalls.

"cat" has no options at all, it just concatenates.

"du": I like plan's "du" utility, which is easier than "find" ("du" simply
lists file names recursively, while "find"... Can anyone list all the options
of the find utility from memory?).

"rm" is a combination of rm and rmdir (directory is removed only if empty,
unless you do "rm -r" to delete it recursively. Only two options: "-r" and
"-f"

"tar" is a simple way to do recursive copying instead of cp, scp etc

"sleep" takes seconds only as an argument, but it can be a floating point (no
suffixes like in coreutils, e.g. "sleep 1h" or "sleep 3d")

"who" takes no options at all

I'm not really good at plan9 utils, and I still use coreutils much more, but
for embedded systems I would prefer to have a tiny number of simple building
bricks like "du", "cp", "rm", "tar" etc, and a few smarter commands like
awk/sed/grep/rc. I like toybox a lot, but if only that had an rc shell..

------
bjenk
What happens when you tail a 32 gig file?

~~~
misframer
From a glance at the code [0], it looks like it will read it all into memory
first.

A better way to do this would be to utilize the io.Reader interface.

[0]
[https://github.com/polegone/gonix/blob/0b65cd4fb9c6c44357d0a...](https://github.com/polegone/gonix/blob/0b65cd4fb9c6c44357d0a99cabae7eb14d995894/tail.go#L27-L28)

~~~
polegone
Thank you. I will look into this.

~~~
misframer
No problem. Also, please do not ignore errors. They're meant to be handled.

~~~
polegone
I'm working on that as well. I just fixed cat to crash and log on error, and
I'll be fixing the other commands soon.

By the way, I'm wondering how I should go through the file line by line with a
reader.

I think the most efficient way may be to scan byte by byte from the start
(head) or the end (tail), and count until reaching n amount of newlines (or
stop if at the end of the file), then print the bytes between the start/end
and the nth newline.

How does this sound?

~~~
misframer
Good question. I'm not sure. You might want to seek to the end and move back.
You probably shouldn't do it byte-by-byte directly from the file since that's
very inefficient. As you can tell, this is already starting to get
complicated! Maybe you could try mmaping the file so you could treat it as a
[]byte.

~~~
laumars
Last year I was trying to write a Go routine that read a file backwards. I was
amazed how unexpectedly difficult that proved to be.

In the end I settled for reading it from the start which worked 99.999% of the
time and enabled me to finish the project to the tight deadline I had. But
I've always meant to go back and "fix" that code at some point.

------
zobzu
[https://github.com/polegone/gonix/blob/master/cp.go](https://github.com/polegone/gonix/blob/master/cp.go)

I think it needs some work for actual parity =)

~~~
maxmcd
The author specifically singles out cp as being incomplete in the readme.

------
iagooar
Very neat. I started rewriting GNU's coreutils in Rust and find it to be a
nice way to learn the language.

Also, it is interesting how many obscure and less-known features some of the
tools provide. In this case I clearly see the 80/20 rule, you can implement
80% of the main functionality in 20% of the time, but if you want to make
exact clones you're going to need invest a lot more time.

------
giancarlostoro
I'm surprised nobody has shown someone doing this in JavaScript.

------
lttlrck
The line mentioning using gccgo to make the binaries small intrigued me... it
worked! I've only written a couple of small tools in go but the size of the
binary always bugged me. It's just a shame that setting up cross-compilation
with gccgo looks a lot more involved vs. gc.

------
tux
Very nice, please add at least some of this; grep, rm, secure, file, top,
trace, ping, whois

------
aceperry
Nice, I was thinking of doing something like that, but I don't know as many
unix tools as the author does. :-P

~~~
giancarlostoro
I rewrote "pause" from Windows into my Ubuntu box out of habit of using it in
some cases. First it was written in perl, then in Python, I also symlink clear
as cls out of habit from Windows. There's small little "hacks" that you can do
that are kind of fun.

~~~
gizmo686
Windows' cls actually erases the buffer from previous commands (so you can't
scroll up past it), clear does not. I alias "cls=printf '\033c'" to get the
Windows behavior.

------
electic
There is no reason to ever want to do this.

~~~
polegone
I'm just doing this for fun and to learn about Go and Unix at the same time.

I am a beginner, and a lot of my code is inefficient and/or incomplete, but by
putting it on the Internet I can get criticism and find out where I went
wrong.

For instance, some of you have told me that the way I've been reading files is
very inefficient, so now I'll try and do it the correct way.

~~~
tantalic
That is a great reason to do this and the best way to learn.

------
tezka
the point being?

~~~
iagooar
1\. Learning a programming language

2\. Learning some less-known flags and use cases for the tools we use everyday

3\. Reasoning about OS features and how they work

4\. Having fun

Please, if you want to be this rude go back to your troll cave.

------
vortico
Why do Go programmers always want to redo everything? It's rare to see
something actually new written in Go, proving that Go can do everything C can
(except make shared libraries and produce small binary sizes) but not that
it's actually _better_.

~~~
pjmlp
Because that is the whole point.

If C is still present in the stack, the typical C exploits are possible, which
were how many Oracle JVM exploits came to be, for example.

Reducing C presence to the same as Assembly, will just make everything safer
in our systems.

Not that it will ever happen in UNIX systems, given how C came to life.

------
malkia
You should totally change the name to Goonix (it just reminds me of one of my
favourite movies - Goonies that is)

