
Learn a Programming Language Faster by Copying Unix - rodrigoavie
http://www.rodrigoalvesvieira.com/copy-unix/
======
Cieplak
<http://www.haskell.org/haskellwiki/Simple_unix_tools>

~~~
rodrigoavie
Wow! What a pearl!

------
michaelfeathers
I write a unit testing framework in every new language I learn. I find it's a
great workout because making it usable for yourself is immediately assessable,
and it often forces you into deeper areas of the language, including
reflection and meta-programming.

~~~
eru
You might want to have a look at QuickCheck. Trying to port it's techniques to
other languages will also give you some insight (and will result in valuable
tools).

~~~
omaranto
And make you miss the ability to overload functions just on return type. Same
thing happens if you use Haskell's regex library or try t port monads to
another language.

~~~
eru
Indeed. Overloading on return type is one of the few things that you can't do
in dynamic languages by design (and in most statically typed ones, neither,
but that's an accident).

~~~
omaranto
Perl is dynamic and does allow return value overloading via the wantarray
function.

~~~
eru
Thanks. I will have to think a bit harder, and see whether I can rescue my
statement in modified form.

~~~
klibertp
Please do! It's very interesting; I'm not good enough in PLT to try and find
out the answer for myself (in such a short amount of free time I have lately)
but I'm very curious if it's true that you cannot have overloading on return
type in unityped languages by principle.

~~~
omaranto
This is not hard to understand: under usual semantics for function calls it is
clearly impossible to do return type overloading in a dynamically typed
language. But, as I pointed out above, it is perfectly practical to cheat and
provide an indication of the desired return type as an extra hidden argument
to every function call (this is what Perl does with wantarray: it is a
"builtin function" but the effect is really that of an extra boolean parameter
to every function call that specifies whether the result should be an array or
not).

~~~
klibertp
Ok, this much I understand, but I wouldn't call this overloading on return
type... It's - clearly - overloading on input type, either implicit (wantarray
- I checked the docs, it seems to be a variable set automaticaly depending on
context by the interpreter) or explicit, but input type nevertheless.

So, the short answer is - as I thought previously - that no, you can't have
implicit dispatch on return type in dynamically typed language.

Contracts are probably getting close, but you still need a version of apply
that will get expected return type and a bunch of functions to introspect (or
a macro that would do the same, but let's not wander there, as lispy macro
systems are equivalent with static type systems anyway).

------
neverm0re
For those who want a very clear, concise set of userland code to try
translating for practice, consider using 9base:
<http://tools.suckless.org/9base>

It's not quite Unix, but it's still quite lovely and it's rather succinct:

"It also contains the Plan 9 libc, libbio, libregexp, libfmt and libutf. The
overall SLOC is about 66kSLOC, so this userland + all libs is much smaller
than, e.g. bash (duh!)."

~~~
agumonkey
in the same spirit: unixv6 <https://github.com/guilleiguaran/xv6>

------
utopkara
This works well as a learning method, most probably because UNIX tools are
well documented, and there are readily available binaries. Many introductory
programming courses I took and taught back in the day were using UNIX tools as
programming assignments.

I believe, any well documented coding problem accompanied with a sample binary
implementation should have the effective educational value.

Sites like interviewstreet.com should take this as a guide, even though their
audience is not beginners. If you look at sample problems there, they are
usually poorly described, and sample outputs are trivial and don't contribute
to the textual description.

------
Jare
My favourite way to learn new languages & platforms was implementing a du-like
('disk usage') tool. It was also the exercise I proposed to my students. It
doesn't require complex algorithms but touches a lot of the basics: recursion,
filesystems, command line parsing, output formatting, etc.

~~~
CoryG89
I am taking Operating Systems at Auburn University, and my professor had us
work on the Linux kernel source, learn how to define and implement our own
system calls using the SYSCALL_DEFINE[0-6] macro and then implement our own
memory snapshot tool. It is similar to /proc/<pid>/statm. We had to get other
info on major and minor page faults as well. I learn a lot doing things like
this.

------
MaxGabriel
Rodrigo, I thought you should know that when browsing your site on an iPhone
4, the header overlapped the index. That wasn't a great description, so here
is a screenshot: <http://db.tt/7grTPHsL>

You also cannot zoom in, which made it very hard to read.

~~~
rodrigoavie
Hi Gabriel,

I'll change it. Thank you very much for reporting the problem. It now allows
you to zoom in. I'll take more time to fix the header, tho.

Thanks man.

------
Zenst
It is a good approach and back say 20 years ago the favorite was to redo the
du command and add a conversion to MB (now handled by the -h option for human
in the du and ls command).

Cat is also a good example and not just to learn a programming language but
the OS. Ask somebody to come up with 20 ways to display a file, you can do it
with not just the cat command. Now I'm not saying there are twenty ways, but
it is one area which some delving and approach will allow people to try and
find them.

Examples are for me the best way to learn a programming language and again
with the simple Unix commands you have a common base which people are more
likely to ifnd an example. Can't recall but great site on the net which has
"Hello World" in about every programming language around.

After a while, you will start to add option to your redeveloped commands, then
add entirely new commands and from there you can think about writting your own
shell. No graphics or the like to distract you too much. Graphics as a rule in
programming languages I have found to be like learning an after thought and
also you are more controlled in the mentality behind the API. So I do advise
not even looking at graphics for a while until you have learned how to flex
the language without the distractions graphics and the way they are handled
add a overhead to your code. I'm sure somebody will list a programming
language that is easier to do graphicly as apposed to text output, though its
a safe rule. SO just because you have windows, don't mean you have to jump
into GUI's from day one, if even at all.

~~~
eikenberry
I remember that hello world site, I think it fell off the internet but there
are still many of those types of sites. One of my favs was the 99 bottles song
in many languages. There are also larger projects like PLEAC and Rosetta code
that are similar.

<http://99-bottles-of-beer.net/> <http://pleac.sourceforge.net/>
<http://rosettacode.org/wiki/Main_Page>

------
Aardwolf
I learned programming in QBasic by experimenting with graphics, games and
fractals.

I honestly think that plotting per-pixel graphics is a much more fun and
rewarding way to learn programming, than a cat program.

It's only a shame Linux has no "mode 13h" screen and a PSET function in the C
language :)

~~~
klibertp
Shouldn't this be doable with simple library wrapping console framebuffer,
that you Linux guys have and I suffer lack of on FreeBSD? (I can be very wrong
- I'm just a bit jealous, not jealous enough to dive into fb implementation
:))

------
d0m
When learning a new language, my first project is usually an IRC bot. It gives
me a good feel of the language and I need to learn most core structures.

------
noonespecial
I learned both perl and python by using them to replace my sysinit scripts on
a Centos box. If you can get through this, I'd say you're "conversational" in
a language.

You'll also know Linux better than most ever will.

------
abecedarius
Here's mine: <https://github.com/darius/ung>

It can be fun if you try to put your own spin on things, though with
diminishing returns.

 _Edit:_ also
[https://github.com/darius/sketchbook/blob/master/regex/grep....](https://github.com/darius/sketchbook/blob/master/regex/grep.c)

------
davidxc
I feel like this approach might get boring after a couple of languages.

I think most people here could skim through the language intro, and then just
start working on a new project in that language (picking up more of the
language as they go).

This is the approach I usually use, and it seems more fun than porting the
same tools. Just my opinion though :)

------
minhajuddin
On a side note, a more concise version of cat in ruby is:

    
    
        puts ARGF.read
    

I know it's besides the point, However I couldn't resist :)

~~~
rodrigoavie
Hehe. Thanks.

------
arocks
For a purely functional language like Haskell, this would not be a very good
advice. Any kind of I/O would involve monads and other imperative constructs.
Better implement an algorithm involving trees or graphs to better appreciate a
functional language.

~~~
shangaslammi
Not at all. You can start doing I/O in Haskell without knowing anything at all
about monads by just treating the do-syntax as an imperative DSL.

In fact, Bryan O'Sullivan (who wrote Real World Haskell) just held a tutorial
session on Haskell a couple of weeks ago where people completely new to the
language implemented simple Unix tools like "wc". I don't think monads were
mentioned at all.

~~~
4ad
This only works if you are a top-down thinker. Many people are bottom-up
thinkers.

~~~
anothermachine
Explain?

With "do" and the coincidental naming of "return", and IORefs if you insist,
you can write imperative bottom-up code in Haskell.

------
nikcub
Your ruby version of cat implements none of the command line switches.

I learned C by going through the FreeBSD code and helping with POSIX
compliance. For fun I would implement a lot of the commands in Python. You get
the most out of learning both the language and UNIX by implementing all the
command line options.

~~~
kryptiskt
And you get a better grounding in the Unix philosophy if you omit the command
line switches and make small utilities to handle those cases, because as the
paper said, cat -v is harmful
(<http://harmful.cat-v.org/cat-v/unix_prog_design.pdf>).

~~~
Someone
The Unix philosophy is somewhat of a myth. <http://cm.bell-
labs.com/cm/cs/who/dmr/man12.pdf> shows that that same first edition already
had a -t option on ls. Surely, that should have been some | sort, but AFAICT,
they did not bother to write that until some later time.

~~~
lloeki
Sticking to a philosophy like a zealot makes it a religion. Keeping it as a
philosophy means it encourages a certain approach, not that it makes it an
absolute unbreakable rule, plus it would only results in bikeshed/flamewar
arguments.

as PEP20 says:

    
    
        Special cases aren't special enough to break the rules.
        Although practicality beats purity.

~~~
Someone
I said "somewhat" for a reason, but I think there are just too many special
cases to keep calling them exceptions. I use awk and sed, pipe things into
bzip2, but I more often use tar with a z or j option to compress stuff an in a
pipeline. We also do not have separate tools for parsing python/ruby/perl and
executing it. If code reuse using pipes were successful, wouldn't we have a
separate 'exec' tool that is backend to all scripting languages? Looking at
gcc with its zillions of options, pipes sometimes look more like an
implementation detail than as the primary way to compose tools.

Also, other kinds of reuse of the things mentioned as examples of the Unix
philosophy are rare. What languages on your system use lex and yacc? 'but
thise tools are outdated' is not a good revuttal; if they are, how has that
come about? I think the Unix philosophy is perfect for prototyping, but 'real'
stuff tends to need polish that 'the unix philosophy' cannot provide. As a
final example, consider git. As I understand , it started life as a set of
scripts, but eventually became a C program.

~~~
qu4z-2
I generally use the -j option to tar primarily because it's less typing. I
think the ideal option is for these flags to be provided, but implemented
using pipes under the hood so _tar -cj folder_ is just syntactic sugar for
_tar -c | bzip2_. Similarly, _tar -cf folder.tar folder/_ could be implemented
as _tar -c folder/ > folder.tar_. I think this way you get the conciseness of
the flags approach, but also each item of functionality is only implemented
once. Just a random thought.

------
nathan_f77
Great advice. I've written a simple version of the `column` tool as an entry
to the International Obfuscated Ruby Code Contest:
<https://github.com/saizai/iorcc/pull/1>

Just a bit of fun, the contest doesn't seem to be very 'official'.

~~~
rodrigoavie
nice!

------
stevekemp
A few years back I remember submitting the implementation of a couple of Unix
tools in Perl. The idea was to implement as many of the standard tools as
possible, in perl.

The project seems to be sleeping, although there are many contributions:

    
    
      http://search.cpan.org/dist/ppt/

~~~
draegtun
Blimey... I've completely forgotten about Perl Power Tools! Here's the project
home page - <http://ppt.sourceforge.net/>

------
ajwinn
The advantage and disadvantage of this approach: you have to already know Unix
commands. Inevitably all programmers learn Unix commands - but probably a
rough approach for beginners. Although, I'm guessing this advice isn't really
aimed at beginners anyway.

~~~
zokier
Other side of the coin: you'll learn unix commands in a very non-superficial
way.

------
pixelbeat
I had this idea to demo basic python concepts. Here's an implementation of ls
with links to further info:

<http://www.pixelbeat.org/talks/python/ls.py.html>

~~~
ahmicro
Thanks for this script, I'm following and found it very useful with links
provided.

------
beering
The example code for cat is only half of cat, which reminds me of Rob Pike's
criticism of "cat -v"---Unix programs are often more complicated than they
arguably should be, are "reimplementing" Unix programs often means
implementing a small subset of the features.

Or, look up the manpage for your locally installed "tree" and see how many
non-standard options and features have been bolted on.

I think it would be helpful when reimplementing Unix to try to work in more
core features of the language, like its object or module system, which are
more important than parsing argv or touching the filesystem.

~~~
lgas
Even if you ignore all of the bolted on options, his simple implementation of
cat doesn't even handle "cat -" so I would say less than half. I think this is
still a good exercise if you don't dig deep into the plumbing of all the tools
you're reimplementing, but it's even better if you do.

------
mahmud
Unix, as an environment, uses a handful of primitive paradigms. You will not
learn much if you're using an advanced language.

If you were to learn Oz[1] in this manner, none of the its powerful features
will be called for to implement simple filter programs, transforming text, and
crashing when confused by unexpected input.

For languages that support advanced features, you're better off modeling
"advanced" systems. Say, in the case of Oz, you might be better off modeling a
secure microkernel or a VM; not dumb text processors.

\--

[1] <http://www.mozart-oz.org/>

------
guruparan18
Another good thing about learning UNIX (or using Linux) is the ability to
write code snippets that could automate daily chores (you can do in Windows
too, but I am not going to speak about it here).

Some of the code I enjoyed writing and using is, code to remember directory
paths I visited (I visit lot of them, and it is a pain to type lengthy ones).
A wrapper for "ssh" to remember hosts and list them (again I had to visit
bunch of them daily, and pain to type fully qualified host names), just like
PuTTy-saved sessions.

These tools really boost the productivity and joy to use.

~~~
inkel
I agree with you, having the ability to script everyday tasks is awesome,
though however in the case of the ssh wrapper it might be a little overkill,
given that you can achieve the same (and more) with a ssh_config file [1] [2].
I would particularly recommend the ControlMaster option.

[1] [http://nerderati.com/2011/03/simplify-your-life-with-an-
ssh-...](http://nerderati.com/2011/03/simplify-your-life-with-an-ssh-config-
file/) [2] <http://linux.die.net/man/5/ssh_config>

------
intellegacy
Can someone give a non-expert way of going about this learning method?

For instance, what are your options on Windows or Mac? And are the languages
you can do this with limited to Ruby, Python, or Javascript?

~~~
ChuckMcM
One of the simplifying assumptions that is built into this idea is that the
input and output to the program are simple streams ( a file, a terminal ). Not
surprisingly that makes things a lot less complex so you can focus on the
programming.

Since a Raspberry Pi is only $35 or $50 with enough stuff to program it, that
is one way to get started. Of course taking an older tossed of PC and
installing Linux on it works too (which can often be done for free if there
are businesses around)

~~~
TheClassic
Or a Linux VM on your Windows/Mac host

~~~
ChuckMcM
Aye, that works well, but if you're learning to program sometimes getting a VM
installed is more frustration than its worth.

~~~
morsch
You can download complete Linux VM images that just work from the net.
VirtualBox is free and is installed in 5 minutes. It's hardly frustrating.
Certainly less so than getting a RPi up and running, not to mention you end up
with an odd, underpowered device secondary to your regular machine.

That's not to say the RPi isn't an interesting, rewarding experience on its
own merits -- that's why I got one --, but I wouldn't recommend it to anybody
interested primarily in getting a *nix environment for experimentation or
programming.

------
pdog
For learning functional programming languages, Project Euler[1] is a really
great resource.

[1] - <http://projecteuler.net/>

~~~
okal
The exercises were paradigm-agnostic last I checked. Any reason why you feel
they're specially suited to FP?

~~~
ajanuary
They could have meant Project Euler is more suited for functional languages
than imperative, IO utilities.

------
poopicus
This is a brilliant idea, and may I suggest that one takes it slightly further
than just implementing the core functionality, and aiming for a complete clone
(optional parameters and all) with a few enhancements.

Indeed, for a further ego boost, why not also benchmark the performance of
your versions against the performance of the native utilities? You never know,
your new clone might end up being the new 'less' to the old 'more'!

~~~
rodrigoavie
Yes. I think I nice approach would be just cloning core functionality of the
program initially in your language of choice, then dealing with parameters and
more advanced options as you learn that language better.

------
jfaucett
This is kind of funny since I taught myself a lot of go by porting a bunch of
the coreutils. It still amazes me how much there is to learn from unix. Many
many times I find myself going back to some programs src (cron, find, etc.)
when I need the design/algorithm/feature in my own programs.

------
arjn
I agree with this and have done it in the past myself. Its a great learning
exercise for new languages. The problems are easily defined and its apparent
when you've successfully completed. Once you've done with the easier ones try
implementing grep and then diff.

------
bediger4000
He's right. I have versions of "cat" in Java, Perl, Python, and C, all source.
Reading from stdin and writing to stdout is the skeleton of a lot of tools, so
starting from "cat" source is often the best way to get going on a new tool.

------
leoh
I think this advice only really makes sense if you understand and have an
appreciation for UNIX tools. Otherwise, I think it would be rather burdensome
to come to understand how seemingly abstruse UNIX tools function.

------
BrianPetro
Congrats for being 18 and making it to the top of hacker news. You have proven
that simple and concise advice can be greater in value than the most complex
systems (or technically advanced articles that frequent HN).

~~~
rodrigoavie
Thanks man.

------
draegtun
Also see this HN post from a few months ago about a Linux/Unix distribution
written in Perl - <http://news.ycombinator.com/item?id=4395076>

------
bane
tl;dr - this _does_ work to a point, but won't necessarily teach you idiomatic
and community practices that come with experience, but it is surprisingly
sticky

I had the great pleasure, year ago in my undergrad Operating Systems class,
for the class assignment to be "write an OS in Java"...which of course was
handed out to a group of students who had never seen Java. By the end of the
semester we had written the core guts of a multi-tasking OS, a couple shells
and the display systems to handle even displaying things like a unix-like
console, a sane piping system, all the major user land utilities (sans some of
the compiler things, but things like ls, cat, ps, etc.) a simple text editor
and intra-system messaging system, etc. etc. etc.

It was a great curriculum and really was the first time we, as CS students,
had the chance to really spend time understanding the subject matter without
spending time focusing on stupid language tricks like we had in our various C
and C++. The code we wrote was fairly straight forward (we were learning the
language as we went, so kept to the KISS method) and focused instead of the
material. It was probably among the hardest, and best class I've ever had on
any subject.

Did I know Java at the end of it?

To a point -- I knew the pidgin dialect we wrote the OS in. A few semesters
later I took a fluff software engineering course and had to hack out some
various java server bits and had a roughshod time of it as I ran head first
into the now common overengineeringitis that plagues modern Java development.
I found the syntax and most of the standard library familiar, but the
idiomatic ways of writing the code, community practices, the _shibboleths_ ,
nearly impenetrable without years buried in an enterprise software house.

I swore off Java and never looked back...moving on to Perl and Python for a
spell (incidentally my standard "learn a new language" project is to write a
simple non-lexical phrase extractor, it touches I/O, data structures, database
connectivity, program flow, and if I get daring, multi-threading and a few
other odds and ends and usually gives me a pretty good idea how a language
works.

Now years later, taking a look at Android dev, I'm finding that writing code
for the platform, _even though it's Java_ , to be like writing code for our
old OS. It's pretty simple, there's great library support, and I don't have to
wrap simple method calls in hundreds of lines of framework boilerplate
nonsense. It's actually pretty fun.

But I've definitely been drawing heavily on that pidgin dialect of Java that I
learned way back when -- it's kinda like riding a bicycle, except a few bits
have changed here and there. So yeah, I think I did "learn" the language, and
it's been amazing how much of it I can recall since it's been a decade since I
did _any_ coding in it.

(this method also handily solves the "I need a project, a goal, to learn the
language, otherwise I'm just twiddling bits" problem).

~~~
godbolev
What is a non-lexical phrase extractor?

I googled it but it leads back to this page.

~~~
kgermino
<META>

> I googled it but it leads back to this page.

It blows my mind how often this happens to me, even though I understand how
and why. Especially since it's usually just a few minutes after the original
comment is written. Google is _awesome_.

</META>

------
djhworld
I always implement "cat" as a first toy application when learning a new
langauge - it's a great introduction tool as it deals with dealing with files,
dealing with streams (stdin) and so on.

------
manish_gill
Heh. I've been doing something similar. In my free time, I've been actually
re-implementing the `tree` command (with most of the switches as well) in
Python.

I'll also be writing some other utilities! :)

~~~
rodrigoavie
Nice, man!

Share the code with me if you feel like it. My GitHub is
<https://github.com/rodrigoalvesvieira> and my email is rodrigovieira1994 [at]
gmail [dot] com

------
crasshopper
How many people learning python for the first time have a good knowledge of
unix? This is like saying you should become a stronger first time weightlifter
by juggling barbells.

------
bjoe_lewis
My first usable program in python was an implementation of 'wget' and
seriously, implementation of such unix based commands do teach you an awesome
lot.

------
opminion
Knuth and Lamport used a "literate" implementation Unix' wc as example in the
manual for Literate Programming tool cweb.

------
eisbaw
Blog post is flawed: Try to recreate sed.

------
pilsetnieks
It reminds me of the suggestion of learning foreign languages by reading Harry
Potter in that translation.

------
denzil_correa
Very interesting and unique idea. I will suggest this to everyone. Thanks for
sharing!

------
gdg92989
What an awesome Idea! I wish I had thought of this when I was teaching intro
to java!

------
devniel
Great advice Rodrigo.

------
nirvana
I disagree. I think writing code to solve whatever small problem you're having
right now is better. No point in recreating the wheel ,which is boring, and
will result in something you'll never actually use because, well, cat already
exists on your system.

But if you use it to start something new, or to add to your repretoire of
utilities, then that is something you're more motivated to complete and more
useful expenditure of time to boot.

