
Writing malloc wrong, for fun - jvns
http://jvns.ca/blog/2013/12/10/day-39-i-wrote-a-malloc/
======
oblio
This is going to be a programming language readability rant - ignore at will!
:)

Looking at the code I remember why I hate C. It's because of the lack of
memory management, obviously, but also because of the crappy naming
conventions. At a glance, disregarding your prior C programming experience,
what is "mut"? What is "u8"? What is "putc"?

Why does a language designed in this day and age try to cater to old fogeys
pining for C naming conventions if it wants to supercede C? We have consoles
with widths >80 characters, we have autocompletion and even Intellisense in
any decent programming editor know to man!

Compare and contrast:

putc with WriteCharacter or write_character or write_char

uint with UnsignedInteger or unsigned_integer or unsigned_int

Don't go overboard Java Enterprise edition-style, but stop saving on the
"cheap" stuff, sheesh!

I'm also spoiled by Python at this, the epitome of "pseudocode" programming
language. What's up with "~"? Couldn't they have found a 1-2-3-4 letter word
for the random tilde?

I do understand why they did it - to make it easier to draw the C crowds in.
But really, isn't there a lower level (no memory management/optional memory
management) language with decent syntax? Why is __everybody __creating lower
level languages trying to copy C? C was created in the age of teletypes and it
became popular because it was portable assembly.

It's syntax was __not __a forte.

~~~
tikhonj
Long lines of code _hurts_ readability even with modern displays--there's a
reason LaTeX defaults to something like ~60 characters a line. Shorter lines
are much easier on the eyes. Besides, I usually like to have a few files open
side-by-side, so I don't have _that_ much horizontal space even on a big
screen.

I'm not saying this in defence of cryptic shorthand but in defence of sane
line lengths and margins.

Also, unsigned_int is not a great name. It's too long and it doesn't make
sense--if it can't be negative, it doesn't make sense to think of it as an
integer by analogy! I much prefer something like Haskell's Word types: Word,
Word32, Word8 and so on. Short, easy on the eyes and clear.

Ultimately, really verbose names like UnsignedInteger and WriteCharacter just
add noise to your line and obscure the structure of the code. While it's nice
to be clear, chances are you will be seeing these words all over the place so
you want them to be short and get out of your way. C goes too far in that
direction with incredibly cryptic function names[1], but I think you go too
far in the opposite.

Similarly, I think completely eschewing symbols is a bad idea. Symbols give
your code structure that plain words lack. With a mix of symbols and names,
code is neatly delimited between the parts that specify "stuff" (variable and
function names) with the parts that specify "structure" or "operations"
(control flow, auxiliary things like variable assignments and pointers and so
on).

I think a good example here is bitwise operations. I think C's operators like
<<, >>, &, ^ and | make for significantly more readable code than their named
versions (shiftLeft, shiftRight, and, xor, or). Compare a simple snippet from
_Hacker 's Delight_:

    
    
        (x & y) + ((x ^ y) >> 1)
        (x and y) plus ((x xor y) shiftRight 1)
    

Sure, the first version is cryptic to someone unfamiliar with bitwise
manipulations, but I think it's ultimately much clearer. If you're going to be
doing more than a handful of bitwise operations in the future, the notation is
well worth learning.

Code is read more often than it's written, but it's skimmed even more often
than it's read. With a healthy mix of symbolic notation and names, I can
easily get the gist of code at a glance and mentally skip around. With just
words in the Python style, I have to actually _read_ each line of code--a
serious drag on going through a codebase. I really don't want to have to
_read_ code unless I really have to!

Words may be clearer to the uninitiated--people without experience in the
relevant language or libraries. But for common tasks, I feel this is really
optimizing for the short term (learning) and _against_ the long term (working
with code). It's worth learning some special notation if it comes up often
because that notation can be more succinct, convey information more
efficiently and--the most important part, to me--can be easily read _at a
glance_.

I certainly don't think C has a good syntax, for several reasons including the
fact that it often goes too far in trying to be concise. But at the same time,
I hardly believe in the traditions of Java which embraces verbosity and
uniformity at the expense of clarity and expressiveness (just take a look at
BigInteger!) or Python and Ruby which try too hard to read like English and
end up forcing me to _read_ too much.

[1]: Functions like: ioctl, atoi, fputc, strxfrm. Life gets even worse when
you get into intrinsics! I wrote more about this in another HN post:
[https://news.ycombinator.com/item?id=6717209](https://news.ycombinator.com/item?id=6717209)

~~~
oblio
I agree on all your points and I'd like to add that I don't know of any
definitive research backing up an optimum line length. The only thing is that
you don't want to be near the extremes - say 40 characters per line or 120
characters per line. Everything in between is acceptable.

80-90-100 characters lines are perfectly readable and also easily
diffable/viewable side by side on almost any screen (I hope you're not doing
3-way merges merges on a phone :p).

And thankfully new lines don't cost anything so we can just break up long
lines like in this very good example:
[http://stackoverflow.com/a/903767](http://stackoverflow.com/a/903767), and we
can afford better identifiers.

C/C++ people - break the cycle! Make new, human readable, 2000+ style APIs for
C. Death to the fputcs of the world!

(IMO symbol versus word is a personal preference thing and has no actual
impact on code readability unless I need to re-read the language spec after 6
months of not using the language).

~~~
kibwen
Rust prefers short keywords (e.g. `fn` instead of `function`) specifically to
give people more room to write descriptive identifiers while keeping line
lengths sane. The style guide also suggests limiting lines of code to 100
characters.

------
hornetblack
My friend suggest as the 64-bit address space is so big you could just throw
out random numbers and get a working 'wrong' malloc.

    
    
        void* malloc(size_t sz) {
            return rand() << 32 | rand();
        }
    

Or in rustopia:

    
    
        fn extern "C" malloc(len: uint) -> *mut u8 {
             unsafe {
                 std::rand::random::<uint>() as *mut u8;
             }
        }
    

Insert more so that it actually maps to memory.

~~~
kabdib
In fact, with a sufficiently large address space, you can prove that a
"random" return value for malloc() is more reliable than a correct
implementation, because of the likelihood that a machine failure would return
a "bad" (collision) result when running the longer code path of the correct
implementation.

256 bit addresses, anyone? Well, maybe 1024... :-)

------
gus_massa
A few months ago, I saw a submission where the standard malloc was replaced by
a non-freeing malloc. It never dealocates, just leaks memory. It was faster,
and was used in a short lived program (¿grep?) where the leaks are not
important. I couldn’t find it now. Is this strategy useful here as an easy
step towards a real malloc?

~~~
dbaupp
It may've been the D compiler, described in this article:
[http://www.drdobbs.com/cpp/increasing-compiler-speed-by-
over...](http://www.drdobbs.com/cpp/increasing-compiler-speed-by-
over-75/240158941)

------
angersock
The really trippy part of doing malloc (for me anyways) was using extra space
around the allocations to store the data the free lists needed to work--it
felt like hiding stuff in the margins of a manuscript.

As the author points out, malloc is very easy if you don't worry about
free'ing anything. :)

------
xerophtye
So what other mayhem did you try causing with your WRONG malloc?

------
gcb0
rusty seems needlessly verbose... never tried to read any code in it before
this post... which I fail to see the purpose...

~~~
girvo
That verboseness allows you as a programmer to get the compiler to do exactly
what you want, safely. I'm not really a fan of the syntax, but I totally get
why it is the way it is; it's hard to do that while keeping the ALGOL/C styled
syntax. Nimrod is another way of tackling it, with a Pascal/python influenced
way of attacking the problem. It can also be a bit verbose, but trades that
off for not having some features that Rust does.

Tl;dr - it's worth it, IMO :)

~~~
derefr
Are there any homoiconic "get the compiler to do exactly what you want,
safely" languages?

~~~
chrismonsanto
[https://github.com/eudoxia0/Hylas-Lisp](https://github.com/eudoxia0/Hylas-
Lisp) is one example, targeting LLVM

