
Hacking ls -l - drp4929
http://www.lemis.com/grog/diary-oct2012.php?topics=c#D-20121012-233520
======
memset
Guys! The point of this article is not to prescribe the only method of
displaying human-readable file sizes. Obviously one could use `ls -lh`; the
author clearly demonstrates that he is willing and able to read man pages to
find answers.

Rather, this is a _pretty interesting_ look into what it actually entails to
make what ought to be a very simple and straightforward change.

It turns out that these simple changes are hard! Not just in identifying the
piece of code to modify, but that man pages are often incomplete or unclear.
It also illustrates the complexities behind making software portable - in this
case, using the nation-neutral place separator. It also reminds us that
solving what is on the surface a simple problem lets one uncover all sorts of
interesting and messy details underneath - including more problems to solve!

These are steps that he'd have to take no matter what the code or feature.
This article is not "complexity for complexity's sake", it's illustrating the
complexity of making changes to any piece of code - and that it is
surprisingly difficult for something that one would think is very easy!

~~~
cheald
On the other hand, you could consider it a cautionary tale about not
reinventing wheels, because a problem that may seem trivial at first often
turns out to be far more complex than expected.

This is why you try to re-use work when possible, rather than endlessly
reinventing things, because while sure, adding a comma to the printf string is
easy enough, your assumptions (English locale, compiler not trying to be
clever) are going to quickly become visible as things fall apart because your
assumptions aren't in line with the system's assumptions.

What this story really demonstrates is that without a clear understanding of
how a system is designed and the basic assumptions it makes, just "hacking on
the code" is just as likely to break things as it is to fix them.

~~~
geofft
You assume that systems tend to be well-designed and the basic assumptions it
makes are justifiable, and that the systems faithfully implement that design
and those assumptions.

It sounds like the author found a bunch of bugs in the process of making a
simple code change. That happens pretty frequently, and doesn't mean that the
author should let the priests of the cathedral deal with this UNIX thing that
is too complicated for the laity to hack on. It just means there is no
priesthood.

------
pixelbeat
I enable this for GNU ls like:

    
    
        alias ls="BLOCK_SIZE=\'1 ls --color=auto"
    

The above is a bit hacky and not very UNIXy as it's lumping more logic into
ls, rather than splitting out into functional units.

Number formatting being a very common requirement, I've proposed a design for
a new numfmt GNU coreutil

[http://lists.gnu.org/archive/html/coreutils/2012-02/msg00085...](http://lists.gnu.org/archive/html/coreutils/2012-02/msg00085.html)

which would be used like:

    
    
        ls -l | numfmt --field=5 --format=%'d

~~~
Aardwolf
I really don't understand why block size is 512 by default. It should really
be 1 by default.

Except for someone with an ancient hard disk who thinks in blocks instead of
(mega, giga, etc...)bytes, who ever needs or wants that?

~~~
alayne
512 byte blocks for ls is specified by POSIX. That would probably be a painful
change now.

------
osteele
Mr. Lehey managed to improve the system in such a way that it will subsequent
changes for him and others easier, independently of whether the specific
change to `ls` is never adopted. It's “five whys” applied to “why is this
hard” and “how can I make it easier”. It's more effort, with a greater chance
that much of it will survive the current context and requirements.

Some people improve the area they travel through, others leave debris, and
many are noops who make no difference to those who come after. If there's not
enough entropy fighters like Mr. Lehey working a system, it turns to kipple.

------
secure
In case anyone wonders, I recently looked into how locales work with respect
to LANG, LC_ALL and LC_*: <http://c.i3wm.org/6799926>

By the way, by looking at <http://www.lemis.com/grog/index.php> you can see
that the author uses FreeBSD, just in case you were wondering about /usr/src

------
jrockway
Most annoying is that gcc warns about perfectly valid and logical code. That
causes people to ignore warnings, and before you know it, you have a piece of
software that has more warnings than lines of code.

Alternatively, when you cleverly figure out how to work around the warning,
like the author does, you now prevent that rule from triggering even when it's
right. Clearly a better unit test is needed.

~~~
TwoBit
The printf ' format specifier is not Standard C. It's in neither the C99
Standard nor the new C11 Standard. So it's not actually valid, and it's a
coincidence if it happens to work with your Standard Library.

Consider that the compiler generating that warning knows only Standard C, and
in fact you could be pairing it with any C library, including those that are
strictly conforming and don't support the ' extension.

~~~
jrockway
You're arguing that the GNU C Library is incompatible with the GNU Compiler
Collection?

(In the example, he's compiling ls with the -std=gnu99 flag, which means he's
targeting GNU, not C99 or C90 or C11 or POSIX or any other standard.)

~~~
vinkelhake
The printf(3) manpage now says: "Note that many versions of gcc(1) cannot
parse this option and will issue a warning."

FWIW, GCC 4.7.2 parses the format string without warnings.

------
joeyh
Incidentially, I completed ls's set of -a-z options recently.

<http://joeyh.name/~joey/blog/entry/ls:_the_missing_options/>

(Well, actually, I never got around to writing -z, but it's clear what it
should do, and any ls hackers are encouraged to finish that up.)

~~~
haldean
Your link is broken, FYI

~~~
alexpeattie
I think it should be: <http://joeyh.name/blog/entry/ls:_the_missing_options/>

------
cheald
While I appreciate the story, what's wrong with `ls -lh`?

~~~
paddyoloughlin
For me, -h makes it more difficult to quickly compare the sizes of files in
one list by glance. This is something that I have to do often enough that it
has prevented me from adding -h to my ls alias. I'd have to use it for a bit
to be sure, but the post's suggestion seems a pretty good 'best of both
worlds' solution to me.

~~~
dredmorbius
This.

I run into the situation more often with 'du' (when trying to find which
subdirectory tree has excess junk in it), to the point that, while 'du -h' is
human readable, it's not particularly sortable so:

    
    
        du -hs $( du -s * | sort -k1nr,1 -k2 | head )
    

.... which will return the human-readable output, based on numerically sorting
the full numeric output. Eyeball comparisons are easier as you're aware that
results are already sorted by size.

~~~
pwg
A reasonably recent sort from GNU coreutils has a -h (and equivalent long
option --human-numeric-sort) which properly sorts the output of du -h, meaning
you can do:

du -h | sort -h

And get properly size-sorted output.

~~~
dredmorbius
TIL! Thanks.

------
al1x
gobble.wa@gmail.com made a similar post to the freebsd-questions mailing list
a month ago. In his case the question was how to print an md5sum along with
the file names in a given directory. I saved it because I thought it was a
clever hack.

[http://lists.freebsd.org/pipermail/freebsd-
questions/2012-Se...](http://lists.freebsd.org/pipermail/freebsd-
questions/2012-September/244933.html)

A lot of times I catch myself in the mindset of taking a step back and saying
"here are the set of tools I have at hand to accomplish a task" without
realizing that I should simultaneously be taking a step "in"--so to speak--and
acknowledging that the tools I have to work with are not immutable tools cast
of iron; they are malleable and can be re-tooled to suit my purposes.. and
that sometimes going that route can be the simplest--and in fact "best"--
solution.

------
rcthompson
Here's a wrapper I wrote for ls a while ago that allows you to spell "--color"
as "--colour":

<http://ubuntuforums.org/showthread.php?t=684239>

------
zapman449
Or, you could use 'ls -h'...

(that said, I do see the utility, since it gives a more obvious visual queue
as to the order of size differences... but if you're doing anything with the
sizes programatically, you have to remove the commas afterwards... Short
version: if you're going to do this, make it a unique flag, or a new flag
modifier to the -l flag... don't overload the -l flag without recourse...)

~~~
Evbn
Ideally a parser should respect locale, and use a sane format (not commas) for
multiple numbers in a list.

Even better if the _ separator used by programming languages were a supported
locale LC=C_FOR_HUMANS :-)

------
meyering
FYI, there is no need to change GNU ls to get that behavior. You can make it
use your locale's separator with either the --block-size="'1" option or by
setting the LS_BLOCK_SIZE envvar to that same string:

    
    
        $ LC_ALL=en_US.UTF8 ls -og --block-size="'1" .
        -rw-------. 1 5,145,416 Oct  5 16:44 A
        -rw-------. 1 5,137,692 Oct  4 14:37 B
        -rw-------. 1 5,147,168 Oct  8 07:52 C
    

This feature is documented in the "Block size" section of the coreutils
manual: i.e., you can type this to see it:

    
    
        info coreutils 'block size'

------
dsr_
Now let's consider software lifecycle in a large context: longevity of forks.

If he doesn't send the changes off to upstream, and make a case good enough
for them to be approved, then all this dooms him to maintaining his fork on
all the platforms where he wants it until he gets sick of it or convinces
someone else to do it for him.

~~~
cjg_
FYI, Greg Lehey is a longtime FreeBSD committer.

------
Aardwolf
Man, such fragile stuff. Why not code a function yourself that turns a number
into a string representing it decimally with the commas every three digits. I
normally like and use good library functions and standards, but if they're
that fragile and depend on your environment then no thanks.

~~~
michael_h
not everyone uses commas to separate their number groupings. his solution will
work for any locale.

~~~
Aardwolf
Yes, in my country we use spaces to separate thousands and "," is used for
what the decimal point does in the US. In my handwriting I use that notation.

However, from a computer, I (and I'm certain I'm not the only one) actually
expect it to output the US notation.

I'd much rather have a computer always output the same format (and that
happens to be the US format), than try to be smart with locales, when the end
result is that some things will do this, others that. Makes stuff harder to
use, and when programming, harder to parse.

I've once had to touch Excel on a Windows machine configured for a non-US
language, and it refused to import a CSV file that had commas, even though CSV
means comma separated values. It required semicolons due to the locale
settings of Windows. This stuff should not happen. A CSV is meant for
computers, and to be interchangeable, not to use different types of commas and
refuse to work with other types depending on user locale settings...

Of course, when publishing or printing, that's a whole different matter, and
there it better get the locale of your country perfect. But this here was
about output in the console, which is often meant as input of other scripts
etc...

------
jtgeibel
For such large numbers, would it make more sense to use groups of 6 instead of
3? This would allow you to easily identify the megabyte position with the next
separator at the terabyte position.

------
njharman
Man, this sounds like every change I try to make to "legacy" code. There's so
much debt and smell. I find it very, very hard to leave alone.

------
thaumasiotes
Surely the appropriate option character for this new, human-readable output is
"-h".

Makes you wonder whether anyone ever considered the problem before...

~~~
chris_wot
Human readable form in bytes? I thought it you had a file greater than a
megabyte hen it shows in MB? What if you want to read it in bytes with the
thousands separator?

------
codegeek
i always use one hack for ls. alias lsd="ls -ltrF | grep ^d"

This way, I quickly run lsd to only look for directories.

~~~
andreasvc
That's funny, I also have an alias for lsd which does this. But you can do it
without grep: lsd='ls -d */'

~~~
bgaluszka
Yours doesn't show . directories and parents does.

------
lucian303
-h

~~~
chris_wot
Not for files greater than a MB.

------
3amOpsGuy
Mountain, meet molehill.

------
guylhem
Is this front page worth materiel? I mean, it's good you took the time to add
a ', but doing the same with sed would have been faster.

Alternatively, do you know about ls -lhrS? It will print size in human formats
and reverse sort the files by size - ie the bigger will be at the end of the
list

~~~
memset
Yes, it is front-page worthy, because the details give us insight into how one
might make any sort of change to a codebase like this. "If you want to be a
hacker, _these are specific steps I took to hack something up_." In a domain
that most of us are not very familiar with!

~~~
guylhem
Editing ls/print.c sourcecode and recompiling is not hacking in my definition.

I usually consider the portability of the solution. I have linux i386 and x64
machines, my arm n900, an osx latop, etc.

Recompiling ls (or, heavens forbid, cross compiling!) for each machine may be
a bright idea.

Adding a line to your profile that will take advantage of the existing tools
like sed is closer to hacking in my definition, because it tries to think
about the bigger problem - but still I wouldn't dare calling the following
"hacking":

echo "alias lll=\"ls -l | sed -e :a -e
's/\\(.*[0-9]\\)\\([0-9]\\{3\\}\\)/\1,\2/;ta'\"">> ~/.bashrc

~~~
memset
Sure, this is fair. I tend to use "hacking" to mean "tinkering", and it could
be said that I have a fairly loose usage of the word.

To me, this article gets to the crux of what I find particularly delightful
about hacking (tinkering?): unraveling layers of complexity underneath. I feel
like I have a little better understanding of what's happening when I punch in
`ls`, and I think that particular delight and knowledge is the kind of thing
that appeals to tinkerers (hackers? :p) like myself. So - in my view -
entirely appropriate for this crowd!

But you have a fair point; there's nothing particularly out-of-the-ordinary of
this code or process, and in that sense, isn't newsworthy to hackers.

(I did not downvote you incidentally; I think it's interesting to get a sense
of peoples' different thresholds for what constitutes "hacker." I play the
saxophone, and an instructor once told me that people always came up to him
and said "I want to be a musician. How can I do that?" Well it turns out that
the moment you play "hot cross buns" on your instrument, you are indeed a
musician. Perhaps not a skilled one, but you have in fact made music. I think
of hacking in a similar way, and freely admit that it is a loose use of the
word!)

~~~
guylhem
I can totally agree with calling that tinkering - and also that it is
interesting, if only (due to my diverging opinion on the merits of the
approach) as a warning tale about how far one should go to try and fix a
problem.

But, just like you, I consider that not newsworthy to hackers, yet at the
moment it is the #1 item on HN and it kinda makes me sad especially because of
the threshold - the idea that some people _do_ consider that hacking - here of
all places - is chilling :-/

Worse - #2 item is "more people should write". I beg to differ- more people
should _code_ , so that fixing a printf and recompiling ls wouldn't be
newsworthy.

I'm sorry if it was interpreted as being rude- it was not the point - I just
wanted to present alternative approaches to the problem, because reconsidering
the problem is sometimes the right thing to do, especially when it escalate
quickly in complexity.

~~~
omaranto
I think you're complaining too much about this momentarily being the top story
on HN: (it feels to me as if) usually the top story is gossip about how much
money some startup raised or an Apple vs Android piece. I find this to be an
improvement.

