The descent to C

rayiner · on Jan 28, 2014

The article is good, but I disagree with this part:

"To a large extent, the answer is: C is that way because reality is that way. C is a low-level language, which means that the way things are done in C is very similar to the way they're done by the computer itself. If you were writing machine code, you'd find that most of the discussion above was just as true as it is in C: strings really are very difficult to handle efficiently (and high-level languages only hide that difficulty, they don't remove it), pointer dereferences are always prone to that kind of problem if you don't either code defensively or avoid making any mistakes, and so on."

Not really, and not quite. A lot of the complexity of C when it comes to handling strings and pointers is the result of not having garbage collection. But it does have malloc()/free(), and that's not really any more fundamental or closer to the machine than a garbage collector. A simple garbage collector isn't really any more complicated than a simple manual heap implementation.

And C's computational model is a vast simplification of "reality." "Reality" is a machine that can do 3-4 instructions and 1-2 loads per clock cycle, with a hierarchical memory structure that has several levels with different sizes and performance characteristics, that can handle requests out of order and uses elaborate protocols for cache coherence on multiprocessor machines. C presents a simple "big array of bytes" memory model that totally abstracts all that complexity. And machines go to great lengths to maintain that fiction.

chubot · on Jan 28, 2014

And C's computational model is a vast simplification of "reality." "Reality" is a machine that can do 3-4 instructions and 1-2 loads per clock cycle, with a hierarchical memory structure that has several levels with different sizes and performance characteristics, that can handle requests out of order and uses elaborate protocols for cache coherence on multiprocessor machines. C presents a simple "big array of bytes" memory model that totally abstracts all that complexity. And machines go to great lengths to maintain that fiction.

When C was invented, it was very close to reality. It isn't anymore, as you point out. (But as another commenter said, assembly language isn't that close to reality either)

Unfortunately hardware guys and software guys didn't really coordinate, and we just hacked shit up on either side of the instruction set interface.

It is kind of ironic that we write stuff mostly in "serial" languages. But the compiler turns into a parallelizable data flow graph. Then that's compiled back to a serial ISA. And then the CPU goes and tries to execute it in parallel.

It would be a lot nicer if we wrote stuff in parallel/dataflow languages, and the CPU could understand that! Some dilletantism with FPGAs made me realize how mismatched CPUs are for a lot of modern problems.

It's kind of like the idea that Java throws away its type information when compiling to byte code, and then the JIT reconstructs the types at runtime. We have these encrusted representations that cause so much complexity in the stack. C is (relatively) great, but it's also unfortunately one of these things.

pjc50 · on Jan 28, 2014

People have serious problems learning how to write dataflow languages. Itanium exposed its complexity, and was a commercial failure. Similar complaints have been made about the Playstations that made heavy use of explicit parallelism.

It's interesting that you mention Java; some ARM systems have 'Jazelle', a system for directly executing Java bytecode. I don't know how widely used it is.

chubot · on Jan 28, 2014

Good point about Itanium, but that doesn't mean people have problems writing dataflow languages. It's more about the compilers, because very little of the code we run is written in assembly language. Any new architecture will have to gain adoption by having a C compiler.

There were high level dataflow programming languages, like Lucid: http://en.wikipedia.org/wiki/Lucid_(programming_language)

And there were academic dataflow computers as well.

I think D 2.0 makes everything immutable by default (single assignment), so it is close to a dataflow language. And that's no surprise, because Walter Bright designed it specifically to make writing a compiler as natural as possible, rather than having to make complicated inferences about serial code.

My point was that we can never make the jump because the stack of hacks we currently have works "good enough" and it is backward compatible. Generally technologies evolve by accretion, and I'm hard pressed to find an example of radical simplification.

I guess power concerns would be the only hope for something simpler.

It's kind if funny that Backus wondered, "Can programming ever be liberated from the von Neumann model?" This question also applies to hardware too.

lmm · on Jan 28, 2014

Jazelle is now widely recognized as a mistake and has been removed from newer ARMs, AIUI.

mrtbld · on Jan 28, 2014

> Java throws away its type information when compiling to byte code, and then the JIT reconstructs the types at runtime

Is it a fictive analogy or is it actually how the Java VM behaves?

snuxoll · on Jan 28, 2014

That's precisely how the JVM behave, which aggravates a lot of people when writing code using generics. In C# I can do the following:

    public T getInstance<T>() {
        return this.Instances[typeof(T)];
    }

Or, even better:

    public T createInstance<T>() {
        return new T();
    }

Meanwhile Java requires this:

    public <T> T getInstance(Class<T> cls) {
        return this.instances[cls];
    }

The type information above (the generic parameter T) is stripped after compilation, so the Class parameter is needed on the method to get a solid reference to the type we need to work with at run-time. This gets worse when we want to do the second example.

    public <T> T createInstance(Class<T> cls) {
        return cls.newInstance();
    }

It gets even more fun when you throw in non-default constructors. Non-reified generics are a giant pain in the butt.

pkolaczk · on Jan 28, 2014

There are cases when reified generics are a giant pain in the butt and non-reified ones are the right solution. See Scala or F# and .NET interoperability.

chubot · on Jan 28, 2014

It's definitely the case for generics, as someone pointed out.

If you look at the JVM byte code, you can see this has to be the case: http://en.wikipedia.org/wiki/Java_bytecode

Once you're in byte code, you don't have your static types anymore -- that is, types that are independent of control flow. You have types on individual instructions.

So basically you lost some type information before the JIT even sees it. Now, it doesn't actually matter for speed, as Mike Pall (LuaJIT author) would say. His point is that dynamically typed languages can be just as fast as statically typed languages, because you have more information at runtime.

pkolaczk · on Jan 28, 2014

I guess he meant JavaScript. JavaScript is dynamic and JS JITs try to reconstruct types at runtime to improve performance. Java is statically typed and, with a minor exception of generic type arguments, types are accessible both at compiletime and runtime.

saurik · on Jan 28, 2014

C doesn't really have malloc or free: those are part of the standard library. You can happily code in C without malloc/free, or you can add a library that provides a garbage collected malloc. What C's type system is providing really is bare metal (although as you say, to the abstraction provided by the machine, not to physical reality) to a much more fundamental extent than even a heap allocator, and certainly claiming you could just swap in garbage collection and then have a string type is totally missing the point.

snicklepuff · on Jan 28, 2014

> C presents a simple "big array of bytes" memory model that totally abstracts all that complexity.

I don't understand what you mean by this. Machine code itself abstracts away the underlying hierarchical memory structure. Sure, some machine language might have instructions to manipulate the cache, but those are easily invoked from C, using either inline assembly or __builtin functions.

agumonkey · on Jan 28, 2014

I wonder if someone will (or already has) design a low-level language which is async and cache-hierarchy aware.

chubot · on Jan 28, 2014

I stumbled across this paper describing a language that aims to do that. For some reason I was not that excited by the matrix multiplication use case, mainly because that is not the kind of application I'm interested in. I'd like to see examples of stuff a lot of cycles are burnt on in "web" data centers.

But I would like to see more work along these lines -- pointers appreciated.

http://sequoia.stanford.edu/

http://scholar.google.com/scholar?cluster=170564211168157303...

pjmlp · on Jan 28, 2014

Modula-2 back in 1978.

Async is provided via co-routines and is as cache friendly as C.

L8D · on Jan 28, 2014

That would break the point of it being 'low-level' unless you're saying the standard library and kernel were both designed to function like that.

agumonkey · on Jan 28, 2014

Low level doesn't mean dumb, but giving means to express things about the hardware.

gaius · on Jan 28, 2014

Agreed; a language like Forth is much closer to the machine, tho' if you really want "the machine" then assembly is the way to go, and writing with a good macro assembler is surprisingly high level. I still pine for the days of Devpac.

Dewie · on Jan 28, 2014

> "To a large extent, the answer is: C is that way because reality is that way.

I thought C was so widespread that it eventually started to affect how some computer architectures were designed? If so it seems a bit disingenuous to say that it is only dealing with the reality that it was given.

joelangeway · on Jan 29, 2014

The original machine that influenced C's model of computers was the PDP-11 (http://en.wikipedia.org/wiki/PDP-11). It had a mov instruction instead of load/store. It had no dedicated IO instructions. It could be treated as a sort of generic random access machine (http://en.wikipedia.org/wiki/Random-access_machine) and that is what C did and still does. So there was a reality that C simply modeled, and it was copied (with all sorts of modifications) many times.

ChuckMcM · on Jan 28, 2014

Shhhhhh! If you let them know how fun it is then everyone will want to be C programmers :-)

I got to use my crufty C knowledge to useful effect when I discovered that there is no standard system reset on Cortex M chips. That lead me to trying to call "reset_handler" (basically the function that kicks off things at startup) which I couldn't do inside an ISR because lo and behold there is "magic" in isrs, they are done in "Handler" mode versus "Thread" mode and jumping to thread mode code is just wrong apparently. C hackery to the rescue, hey the stack frame is standard, make a pointer to the first variable in the function walk backwards on the stack to return address, change it to be the function that should run next, and return. Voila, system reset.

The whole time I am going "Really? I have to look under your covers just to make you do something anyone might want to do?" As a respondent to one of my questions put it "ARM is a mixture of clever ideas intermixed with a healthy dose of WTF?"

eckzow · on Jan 28, 2014

I don't mean to be a party pooper since I've done my fair share of hacky workarounds in v7-M processors and it is always exhilarating when it works...

But since I'm a Cortex M fanboy I have to defend its reset capabilities :)

For your specific case, try something like

  volatile uint32_t *AIRCR = (volatile uint32_t*)0xE000ED0C;
  const uint32_t VECTKEY = 0x05FA << 16;
  const uint32_t SYSRESETREQ = 1 << 2;

  void take_reset()
  {
    *AIRCR = VECTKEY | SYSRESETREQ;
    __dsb();
    while(1);
  }

It's technically dependent on external hardware in your processor subsystem, but it should work if your implementer has half a brain (or, at least cares enough to read the integration manual). If it doesn't work, please flog your implementer publicly so that I can know to avoid them in the future...

Incidentally, even that tiny code snippet is uses a C extension (the __dsb intrinsic) which is either a great example of how C can be wielded to great power (I can generate raw instructions!) or how C is terribly handicapped (I need a special compiler extension or all my system code is horribly broken!). All depends on point of view, I guess...

Anyway, more info @ page 498, http://web.eecs.umich.edu/~prabal/teaching/eecs373-f10/readi...

ChuckMcM · on Jan 28, 2014

I too had already been through that particular part of the ARM7-M manual, after coming up through the data sheet to the technical reference guide, to the Cortex M architecture manual, and yes to the base ARM7-M architecture specification.

There is "should" and "does" :-) I had followed the exact same sequence and discovered on my system it didn't cause a system reset. I escalated it a bit (in this case ST Micro) and the caveats came out, "Well if you system is correctly designed, if the actual reset pin isn't connected to something that is interfering, if the core is in a state where it can actually take a reset, ..." This being unlike pretty much every processor I've worked on, from PDP-8's, 11's, 10's, VAXen, IBM 360/370, Sun 1, 2, 3, 4, Motorola 6800, 68K, 68K, Z80, 8080, 806, Pentium *, the list goes on. Most actually have a reset instruction, usually privileged, to force a system reset. But the difference is that the reset sequence was guaranteed to force a system reset if it executed, all of those previous processors had a company that made the processor as opposed to simply licensed the processor to a third party. Since it is completely reproducible on my system I've got an action item to create a small test case for errata analysis.

Yes, that is the code they suggest, with assurances that it will work in pretty much any case. Except when it might not. It was a bit stunning for me, I still marvel at the notion. Like an add statement that will 99.999% of the time add its operands together[1] unless it doesn't.

[1] No there isn't such a thing in ARM it is just the way I hear "It will almost always reset the system."

eckzow · on Jan 28, 2014

Fair enough. The Cortex-M's that I've dealt with have generally been homed inside some larger chip, and so the "reset pin" notion was a bit more vague, and in such an environment ARM's stance makes a bit more sense as you really want the reset to reset the "subsystem" (including whatever other random hardware you glued to yourself today).

It's also true that ARM has heard some of these complaints, which is why there were steps toward standardizing some things--like, the SYSTICK interface--in v7-M. It's really been a step up since the ARM7TDMI, and I hope that it will continue with v8-M or whatever the next revision ends up being.

Personally, when I'm dealing with a discrete chip and I need to reset it I've found that the most reliable methods that don't set off the hack-o-meter too badly are to wait for or directly invoke watchdog hardware... but yes, that's essentially always device specific.

ChuckMcM · on Jan 28, 2014

That is a very good point! If you are building some SoC and the CPU is just some small part of it, "system reset" can in fact be a very vague notion. I hadn't really been thinking that way and in that context the ARM position does make sense. As another person pointed out offline there is the watchdog timer, if you wanted you could set it and halt. Then it would kick you with a reset.

userbinator · on Jan 28, 2014

From memory, the Z80 and x86 don't have a reset instruction. On a 286+ you can fake one by triple-faulting.

stusmall · on Jan 28, 2014

For anyone else curious on the __dsb() call, from ARM's docs:

The Data Synchronization Barrier (DSB) acts as a special kind of memory barrier. The DSB operation will complete when all explicit memory accesses before this instruction have completed. No instructions after the DSB will be executed until the DSB instruction has completed, that is, when all of the pending accesses have completed.

diydsp · on Jan 28, 2014

Exactly! Fortunately, the article didn't point out one of C's great advantages is that it's so lightweight it can fit into and run on all kinds of interesting systems where other languages are infeasible economically. Cortices rule!

pjmlp · on Jan 28, 2014

Except there are other safer languages to choose from, like Pascal and Basic compilers.

http://www.mikroe.com/mikropascal/

http://www.mikroe.com/mikrobasic/

exDM69 · on Jan 28, 2014

> Except there are other safer languages to choose from, like Pascal and Basic compilers.

I'm curious, why do you say "safer"? These are languages for microcontroller programming. The things you do there are bound to be "unsafe", like peeking and poking memory for memory mapped i/o and disabling/enabling interrupts.

Unless, of course, all the possible things (i/o, timers, interrupts, etc) are wrapped in some kind of "safe" api so you essentially don't have access to low level facilities any more. The Arduino programming environment is kinda like this but you can still cause bad things to happen and if all else fails, hang the device with an infinite loop.

Is there any backing for such claims of "safety"?

weland · on Jan 28, 2014

> I'm curious, why do you say "safer"? These are languages for microcontroller programming. The things you do there are bound to be "unsafe", like peeking and poking memory for memory mapped i/o and disabling/enabling interrupts.

There are things which are inherently safe by design in languages like Pascal -- e.g. in a naively-written program you can't write past the boundary of the UART RX buffer and thrash some other array in your program -- but your observation is fair.

exDM69 · on Jan 28, 2014

> There are things which are inherently safe by design in languages like Pascal -- e.g. in a naively-written program you can't write past the boundary of the UART RX buffer and thrash some other array in your program -- but your observation is fair.

Which, of course, you can accomplish in C by wrapping your UART buffer handling code in functions that do bounds checking. And I assume that a micro controller Pascal or Basic dialect will have some kind of peek/poke from/to arbitrary memory addresses that can be misused just as a pointer access in C.

Safety is hard to quantify and measure and calling one language safer than another sounds more like an opinion than a factual claim, especially in this context.

acqq · on Jan 28, 2014

The "out of the box" defaults of Pascals are something you have to explicitly develop, test and maintain in your C. Which means that even during the maintenance cycle, long after it's originally implemented, it can still break in your C.

exDM69 · on Jan 28, 2014

> The "out of the box" defaults of Pascals are something you have to explicitly develop, test and maintain in your C. Which means that even during the maintenance cycle, long after it's originally implemented, it can still break in your C.

Let's not confuse languages and libraries/apis here (that may or may not be shipped with the compiler). There are libraries for C and related languages (e.g. Arduino) that actually do give you a "safe" way to deal with the hardware on microcontrollers.

It's still easy to shoot yourself in the foot in C with a bad pointer access (esp. because there are no helpers to work with strings) but I don't really see how a Pascal dialect with peek/poke would be inherently better.

I do agree that buying one of these Pascal/Basic software products that come with a fancy standard library that does safe access to the hw may help writing safer software but I don't see how that is an inherent quality of the language.

waps · on Jan 28, 2014

Don't you think that pascal's standard safeties will prevent mistakes ?

1) array bounds checking (which can be used for safe hardware access instead of peek/poke)

2) pascal-style strings (with actual support for them, both in the language and in the standard library) (meaning a missing \0 doesn't erase the entire memory)

3) type-safe pointers

4) no pointer arithmetic

5) no preprocessor

...

weland · on Jan 28, 2014

> 1) array bounds checking (which can be used for safe hardware access instead of peek/poke)

How? Peek/poke at the wrong address will generate an error in any case, and any MMU-less platform worth using will have the memory-mapped peripherals into a lower region, where it doesn't get thrashed by writing past the end of a buffer. I have seen bugs occurring because of data located past a buffer getting thrashed, but I don't remember seeing one in the context of hardware access.

> 2) pascal-style strings (with actual support for them, both in the language and in the standard library) (meaning a missing \0 doesn't erase the entire memory)

That shouldn't happen in C, i.e. there are library routines you should use so that it doesn't happen. Not that string processing isn't a pain :-).

> 3) type-safe pointers

No complaints here :-)

> 4) no pointer arithmetic

That's not always good, but it does decrease the likelihood of certain types of bugs, so yep!

> 5) no preprocessor

Also yep :)

acqq · on Jan 28, 2014

2) "That shouldn't happen in C"

It happens every day and it will as long as there's C. Zero-terminated strings are part of the standard library and almost infinite number of other libraries. You can't pretend it doesn't exist as the most common convention.

For 1) see waps' comment with byte absolute.

One complete program from that time which uses that feature: http://kd5col.info/swag/INTERRUP/0019.PAS.html

pjmlp · on Jan 28, 2014

Additionally all those safety mechanisms can be turned off for performance, but only on the exact spot where it really matters, instead of being scattered all around the code.

waps · on Jan 28, 2014

Example of absolute array:

var EGAVGAScreen : Array[0..41360] of Byte absolute $A000:0000;

Et voila : bounds-checked memory mapped hardware access.

(note: the very well known "Crt" unit uses a memory mapped video buffer like this. So if you programmed a turbo pascal program, chances were good it was using this trick for screen output. Advantage : the speed is unbeatable)

weland · on Jan 28, 2014

> Which, of course, you can accomplish in C by wrapping your UART buffer handling code in functions that do bounds checking.

Of course, and you pay other hefty prices in Pascal or Basic for getting this sort of stuff "out of the box". Pascal isn't my favourite systems programming language, either :-).

pjmlp · on Jan 28, 2014

What hefty price?

There are zero features in C that Pascal and Basic dialects for system programming don't support. The only difference is that you need to turn safety off explicitly.

I was doing low level coding in Turbo Pascal before I even cared about C.

weland · on Jan 28, 2014

In terms of performance? None, assuming a well-written compiler. Depending on dialect you run into other issues though, such as the array size being part of the type signature, which is definitely not nice. The lack of adequate tooling and portability is another issue. Not strictly a problem of the language itself, but a problem you end up facing if you write low-level code in Pascal.

Don't get me wrong, I wrote low-level code in Pascal, too. It's nice and I probably wouldn't grumble too much if I had to do it again, but there's a bunch of stuff that comes in the same package with using something other than a language widely considered adequate for systems programming.

pjmlp · on Jan 28, 2014

Faire enough.

I do conceed that even though bashing C is a pasttime of mine, I would use it if it is the best option for a given project, depending on the set of factors to be considered for the said project.

In real life projects, there should be no place for tooling religion anyway,

pjmlp · on Jan 28, 2014

Languages that don't require MISRA for example, because many of those requirements are supported out of the box.

csmithuk · on Jan 28, 2014

In some cases we don't want safety. C can be safe or unsafe depending on how you approach a problem.

userbinator · on Jan 28, 2014

In some cases, we want freedom... and that's something that has been neglected a bit too much with new programming languages IMHO. Safety/security feels like it's the latest of the dumbing-down "let's treat programmers like idiots" fad.

scott_s · on Jan 28, 2014

I think the approach of languages like Rust, which enforce strict safety by default, but allow "unsafe" regions of code, is the correct way to go.

See http://static.rust-lang.org/doc/master/rust.html#unsafety

Allan_Smithee · on Jan 29, 2014

Or if you have any sense of history, languages like Modula-2/2+/3. (And undoubtedly a few (dozen?) others I'm unaware of.)

Nothing new in software development. All the "new" shinies are decades (!) old.

pjmlp · on Jan 29, 2014

Somehow young developers seem to think only C and C++ offer native features.

I wonder how we got sidetracked like that.

Allan_Smithee · on Jan 28, 2014

I submit as evidence 99.44% of software that has been released that programmers are, in fact, idiots.

deletes · on Jan 28, 2014

Can you give an example where you don't want safety and you aren't trying to hack? //curious

csmithuk · on Jan 28, 2014

Few of the top of my head:

1. Performance reasons. For example: zero copy data structures, mutable data structures, controlling data locality.

2. Direct memory mapped hardware access. For example: device drivers, kernel, embedded systems, microcontrollers.

3. Bitwise and machine specific conversions. For example: endianess conversions.

4. Structure type coercion. For example: object systems, tagged data structures etc.

fanf2 · on Jan 28, 2014

If you are doing unsafe or machine-dependent endurance conversion code, you are almost certainly doing it wrong. http://commandcenter.blogspot.co.uk/2012/04/byte-order-falla...

mcguire · on Jan 29, 2014

Sometimes I wonder about Rob.

"In fact, C may be part of the problem: in C it's easy to make byte order look like an issue. If instead you try to write byte-order-dependent code in a type-safe language, you'll find it's very hard. In a sense, byte order only bites you when you cheat."

deletes · on Jan 28, 2014

Thank you for that link.

#leaves to check some old code

.

#comes back

Apparently my younger self knew what he was doing. ( The conversion is done correctly :shocked: )

csmithuk · on Jan 29, 2014

That's a great article - thanks for posting.

fanf2 · on Jan 29, 2014

"endurance"?! Damn you, autocorrect.

pjmlp · on Jan 28, 2014

All of those examples are possible in safe systems programming languages like Modula-2 and Ada, with the difference that only the tiny spot where it matters is marked explicitly unsafe.

Whereas in C there is no way to distinguish between unsafe and safe code.

numeromancer · on Jan 28, 2014

2. Direct memory mapped hardware access. For example: device drivers, kernel, embedded systems, microcontrollers.

Of course doing this usually means your code isn't portable, sometimes not even to other versions of the same compiler. I often wish C had a defined order for bitfields, for example.

One of the things I do to help debug & test embedded code is to factor out code which isn't dependent on the hardware platform (eg communication protocols) into a library, and compile it for, and write test programs to test them on, the host. So even embedded code is often best written to be as portable as possible.

mcguire · on Jan 29, 2014

Oddly enough, I've spent the last week playing around with type-agnostic garbage collection.

coldtea · on Jan 28, 2014

Hehe, as soon as I read the comment, I knew who it would be!

brokenparser · on Jan 28, 2014

And they're all way too slow for what I'm doing.

pjmlp · on Jan 28, 2014

Why are they slow?

If you are going to invoke the usual C FUD about bounds checking and such, you can disable them on parts where performance REALLY matters.

Plus all compilers share the same backend.

weland · on Jan 28, 2014

However, they are far more wanting on the tooling side.

pjmlp · on Jan 28, 2014

Not when buying a commercial version of such tools.

FigBug · on Jan 28, 2014

Yuk. I'd rather program in assembly than ever use those buggy compilers again.

timthorn · on Jan 28, 2014

Re: your ARM comment - had you noticed who wrote the article?

ChuckMcM · on Jan 28, 2014

Who Simon Tatham? I've been a fan ever since I modified the PuTTY code to work in an embedded system. I did not realize he was at ARM these days, gives me hope.

Svip · on Jan 28, 2014

I want to be a C programmer now! (I am actually serious, this article has inspired me to try some more C; which I have been seriously neglecting.)

csmithuk · on Jan 28, 2014

Do it; you will never regret it.

I primarily write C# (for cash reasons) but my heart will always be with C.

lmm · on Jan 28, 2014

> Do it; you will never regret it.

Citation needed. I hate hate hated C and am very glad to have seen the back of it.

csmithuk · on Jan 28, 2014

I too hated C once to the point I actually kicked a PC over on its side out of anger. I never understood pointers, structures and string manipulation. I genuinely hated it and wanted it to go away. I deleted all the code I'd written then went for a walk.

Then suddenly... click.

Suddenly, in my mind appeared an abstract model for it. It was complete and possible for one person to understand.

Since then I understand what I'm doing rather than know how to drive the compiler.

That has never happened for any other language I've written and that includes Z80+6502+X86 ASM, C++, C#, VB3-6, PDS7, ksh, Python, Perl and PHP to name a few. Assembly is close but not abstract enough.

Enlightenment is probably the best word to describe it.

dsuth · on Jan 30, 2014

I never had the click about pointers, I guess the way they were introduced to me just made them feel intuitive.

I'll never forget the time when programming changed from mechanically entering statements, to controlling the flow of data through a program though (also in C). Definitely a brain-expanding experience :)

mikkom · on Jan 28, 2014

Read this book then. It's the best book about any programming language that I have read.

http://cm.bell-labs.com/cm/cs/cbook/

The following is a link to some university course page that has the book as pdf

http://www.cs.ucsb.edu/~buoni/cs16/The_C_Programming_Languag...

adobriyan · on Jan 28, 2014

> I want to be a C programmer now!

Don't hold your breath.

d4rti · on Jan 28, 2014

I've been enjoying Zed Shaw's Learn C the Hard Way http://c.learncodethehardway.org/book/

nevvvermind · on Jan 28, 2014

Ha. Me too. I always felt a little intimidated by it, but I can't pass the chance to actually make my own structures.

deletes · on Jan 28, 2014

Do it. With modern C and a huge number of libraries you can be as productive as in any other language.

couchand · on Jan 28, 2014

Cortex as in Cortex Semi?

ChuckMcM · on Jan 28, 2014

Cortex M series, it's the "controller" variant on the ARM architecture (in my case the M4F)

couchand · on Jan 28, 2014

Sorry, lame Primer reference. Note to self: make a more complete joke next time.

wfn · on Jan 28, 2014

I hope you're not implying that any day is unimportant at Cortex Semi.

nathanb · on Jan 28, 2014

Professional C programmer here...

The points are great and this is generally a good primer for someone who wants to understand the C mindset.

The bit at the end is a bit off, though. It feels like the author is saying "yeah, C is weird and crufty for historical reasons and some people just use it because they're backward like that". Yeah, I write kernel drivers, but I also just plain like using C, for the same reason that I like driving a manual transmission and usually disable the safety features on stuff: C tries really hard to not get in your way.

I enjoy programming in Ruby and mostly enjoy programming in Javascript. But there are times when I think "this is an unnecessary copy...this is inefficient...I wouldn't have to do this if I were writing in C".

(There are also times where I think "this one line of code would be over 100 lines of C", but we won't get into that right now...).

scott_s · on Jan 28, 2014

Perhaps I can state a point simpler than another poster.

"this is an unnecessary copy...this is inefficient...I wouldn't have to do this if I were writing in C"

You should then ask yourself: does the inefficiency matter? Will it make the program noticeably slower? If not, then you can safely ignore the lack of machine efficiency and embrace the gain in programmer efficiency.

nathanb · on Jan 28, 2014

A valid question...sometimes it does. Sometimes it doesn't. Sometimes you think it won't, and then it does and you have to do some herculean things later to make it scale.

Also, I think it's a fallacy to say that higher level languages necessarily mean more programmer efficiency. When I'm doing network programming in C, here's what it looks like:

* Define a struct whose field layout matches the wire format

* Cast the incoming buffer to that struct

* Done

By contrast, Ruby requires me to marshal data and painstakingly extract each field...all because it tries to abstract away the fact that memory is a flat array of bytes.

Yeah, much of the time I'll be more productive in a higher-level language. But there are problem domains where that is not the case.

huherto · on Jan 28, 2014

I fell in love with C, 25 years ago, but then I moved into enterprise applications using higher level languages. How is the job market place for C programmers? I would imagine that younger programmers don't go that route.

seren · on Jan 28, 2014

If you work on (real time) embedded systems, it is pretty much a C and C++ world, at least for the lower layer, and middleware part. You can pretty easily work in Automotive, Aeronautics, Robotics, Defense, etc...

nathanb · on Jan 28, 2014

I started as a professional C programmer in 2007 at the age of 24, if that gives you any useful demographic data :)

mtdewcmu · on Jan 28, 2014

The joy of C isn't just in writing it. It's also about getting back a program that runs unreasonably fast at the end. Sometimes you really can tell.

pkolaczk · on Jan 28, 2014

Most of this impression is usually caused by the combination of the following things:

* fast startup

* programs in C usually do much less with more code than high level languages

Once the project gets really big and complex, C starts to get slower and harder to optimize than some higher level languages (e.g. dynamic dispatch tends to be slower in C than in C++ or Java).

mtdewcmu · on Jan 29, 2014

>>fast startup

That's a real improvement.

>>programs in C usually do much less with more code than high level languages

Ok. So?

>>Once the project gets really big and complex, C starts to get slower and harder to optimize than some higher level languages

True. The speed boost isn't automatic. It's up to the programmer to write fast code.

I have nothing against high-level languages and garbage collection. They certainly have a place. But if you use high-level languages exclusively, you'll never (or rarely) have the special joy of seeing your program run at the full speed of the hardware.

deeviant · on Jan 28, 2014

Dynamic dispatch in C?

There is none.

There bigger a project is, the more stuff it does. If a language does this stuff slow(aka ruby), then it will not run faster no matter what buzz words you invoke.

mtdewcmu · on Jan 29, 2014

Yeah... C's dynamic dispatch can be slow or fast. It's up to you, because you get to write it yourself.

pkolaczk · on Jan 29, 2014

It can't be as fast as in VM-based languages, because the code (typically) can't self-optimize / modify itself according to the usage patterns to inline dynamic calls. This is the stuff that VM can do, because it has much more information. This is one of the reasons a general sorting method like qsort is so slow in C compared to general sorting method in Java (Collections.sort). Sure, you can specialize manually or do some macros, but such manual approach gets hairy pretty quickly for something more complex than a simple sorting method.

mtdewcmu · on Jan 30, 2014

std::sort in C++ is known to be faster than qsort() in standard C, because C++ templates allow the comparison function to be inlined. I find it doubtful, though, that Java has a generic sort method that regularly outperforms qsort.

pkolaczk · on Jan 31, 2014

I measured Collections.sort and it was ver close to C++ std:sort in performance, while C qsort was about 10x slower. It outperforms qsort for the same reason C++ does it - the call to comparison function is inlined.

mtdewcmu · on Jan 29, 2014

There is a theoretical benefit in being able to optimize at runtime. But, in practice, these advantages are virtually always too small to outperform code compiled statically.

pkolaczk · on Jan 31, 2014

This is not a theoretical benefit - it is a very practical benefit, especially for object-oriented code with lots of indirection, virtual calls and dynamically loaded code. The reason it is not visible in microbenchmarks is because microbenchmarks are small and usually avoid indirection as much as possible, and even if there exist some, the code is all in one file so a static compiler can figure out all the call targets properly.

mcguire · on Jan 29, 2014

I keep hearing this, but I have not ever seen it. Do you have a specific citation?

pjmlp · on Jan 28, 2014

> The joy of C isn't just in writing it.

I though it was searching for mis-calculations in pointer arithmetic at 4 am.

mtdewcmu · on Jan 29, 2014

There is nothing about pointers that makes them unusually hard to keep track of. If you are prone to goofing on pointer arithmetic, then you're almost certainly going to run into problems even if you never use pointers, because arithmetic is hard to avoid in programming.

I don't use C for everything. When I do write C, I try to keep memory as simple as possible. I avoid allocating dynamic memory wherever possible, and when I do use malloc, I try to keep the logic around the pointers as simple as possible. This is beneficial all around, because malloc and free are not especially fast, and if you use them for everything, you won't see all that much benefit over a high-level language.

It's like building your own house or making your own clothes. There's no guarantee that you'll do a better job and get a better result than if you went the easy way.

pjmlp · on Jan 29, 2014

First of all, in other similar languages like Modula-2 and Ada, there is no need to use pointer arithetic as much as C developers do.

Even in C, most developers that use it, are doing micro-optimizations without ever testing their assumptions in terms of performance.

Finally, as a single developer it is easy to keep track of most C traps, the problem is when a project has more than a few developers, with different skill levels. Then the party starts.

mtdewcmu · on Jan 29, 2014

There are probably lots of people writing bad code in C, I'll grant you that point. But that isn't a property that's built into the C language itself. I think you are attacking a straw man. My original point was just that it's possible to write very fast programs in C.

pjmlp · on Jan 30, 2014

> But that isn't a property that's built into the C language itself.

It is, as it makes very easy to blow your leg off.

C only works properly in the hands of small teams with AAA developers.

mtdewcmu · on Jan 30, 2014

>It is, as it makes very easy to blow your leg off.

I have never seen a bug-proof programming language. If a language lets you do anything at all, then it will let you write bugs. So I don't see it as a weakness that C allows you to write bugs. If you have easy access to memory, then you can easily corrupt memory.

adrianm · on Jan 28, 2014

I experienced the inner monologue you describe for several years. It haunted my dreams when working with Ruby. But one day a rhetorical thought suddenly dawned on me that has since changed my perspective quite dramatically...

"...if I'm being incessantly bothered by what I perceive as the nagging inefficiencies of some programming language's implementation, maybe I'm not thinking about or relating to programming languages (in the large) in the way I should be..."

If all programming languages are merely tools to communicate instructions to a computer, then why is human language not merely viewed by everyone as a means to an end as well? Surely, most would agree that language is more than simply an ends to a mean, and that language does far more than simply transmit information between parties. If efficiency, lack of ambiguity, etc., were the paramount goals of human language, surely formal logic, or perhaps even a programming language for interpersonal communication would be more fitting than natural language!

So why do we insist on communicating with each other with what is often such an abstract and ambiguity filled medium?

I'll let wikipedia elaborate on my behalf: http://en.wikipedia.org/wiki/Pragmatics and http://en.wikipedia.org/wiki/Deixis http://en.wikipedia.org/wiki/Literature

tldr; it is trivial, even natural for an literate individual with the proper context to understand concepts in language that seemingly transcend the words themselves. These notions would be (and are) exceedingly difficult to formalize, and any formal expression of these ideas would cause exponential growth of the output.

Ever try explaining a joke to someone who didn't "get it"? It takes a lot more "space" to convey the same sentiment than to someone who "got it".

So what has this crazy rant have to do with anything? Well, aside from revealing I am a complete nerd, it speaks to my approach to software engineering today.

We have to let go of the machine if we ever want to really move the state of the art forward.

There are an infinitude of expressible ideas, but lacking the proper medium to abstract the expression of these ideas formally (like natural language and our brains do, well, naturally) we will never get a chance to find out what we don't know!

"We're doing it wrong" is not exactly the sentiment I'm trying to express, but it's sorta that. Maybe.

Hope this comment made any sense. :) It's 4 AM after all.

nathanb · on Jan 28, 2014

I'm afraid I couldn't hang on for the ride...perhaps I'm like the one who doesn't get the joke and needs the much lengthier explanation!

It sounds like you're saying that programming languages are constrained on two ends: on one end by being too tied to the underlying microarchitecture, and on the other end by being interpreted by our minds which think about programming in terms of language features rather than Platonic ideals.

Assuming I've come at least close to understanding your point, I guess what you're saying is that by thinking too closely about what I'm trying to do at a low level, I'm negatively affecting my ability to write idiomatic Ruby code to do useful things?

This is probably true; it's one of the curses of being a kernel developer. I think you want people who care deeply about how bits are laid out in memory being the ones who are writing your operating system.

adrianm · on Jan 28, 2014

I wasn't trying to insult you or anything, my comment was just my (extremely) sleepy attempt to express an idea that I've had shuffling around in my mind for a while now.

At times there's nothing I want to do more than solder components onto a circuit board and make a radio or something. It's really, really gratifying to make something work that's so "magical" (from a certain point of view, radio is pretty magical to me) and completely understand how everything works from start (bare materials) to finish (a working radio!).

I guess what I was trying to say was that if I would ever want to make a CPU comparable to, say, what Intel produces today, I'd have to give up my soldering gun and any notion of manufacturing the CPU with any discrete process (like soldering individual transistors) and instead adopt an entirely new approach - like maybe electroplating - in any event, it's one that allows me to make incredibly powerful things at the expense of being able to "use my hands".

Experts are always going to need to know (and I mean really KNOW) the underlying fundamentals of their field regardless of how "high level" their work becomes - see theoretical physics, et al. With that in mind, I think people who care deeply about how bits are laid out in memory are exactly the same people who will always be at the forefront of computer science and software engineering - even if 99% of their practical output in life is at a level much higher than bits. :)

Double_Cast · on Jan 29, 2014

> It is trivial for an literate individual with the proper context to understand concepts in language that seemingly transcend the words themselves. These notions and are exceedingly difficult to formalize, and any formal expression of these ideas would cause exponential growth of the output.

My interpretation: "I find formal logic's ineptitude at resolving ambiguity disappointing. But humans resolve ambiguity without breaking a sweat. Is it possible to generalize logic to encompass ambiguity?" I think the answer you seek lies in Probability Theory.

> a word to the wise is sufficient. [1]

How does a brain quickly derive intended meaning from an ambiguous lexicon like the English Language? Realize that an infinitude of nuanced interpretations are equally possible, but not equally probable. Suppose Alice says to Bob "The sea/c". If the topic was Marine Biology, Bob will expect (assigns a high probability to the hypothesis) that Alice meant the ocean. If the topic was Typography, then Bob will expect that Alice meant the glyph. Similarly, computers which deal with ambiguity (e.g. speech interpreters, facial recognition) assign higher probabilities to some interpretations than others.

> If efficiency, lack of ambiguity, etc., were the paramount goals of human language, surely formal logic, or perhaps even a programming language for interpersonal communication would be more fitting than natural language!

Computers can communicate practically instantly, but humans are bottle-necked by how quickly we can move our lips. Therefore, I would expect spoken languages to be optimized toward articulating a little as possible. One technique is overloaded vocabulary. I think humanity prizes the ability to compress information down to a single word. This unfortunately comes at the cost of computer-level clarity. But I mean, "one-liners" do make for great movies, don't they.

> Ever try explaining a joke to someone who didn't "get it"? It takes a lot more "space" to convey the same sentiment than to someone who "got it".

As far as I know, humor is one of those things that scientists don't fully understand yet. But some have a rough idea. I'm convinced music and humor are related in the sense that they set up an ambiguous "expectation/motif/theme/ context", and then playing on that expectation.

Music is defined by tension and resolution: tension being ambiguity and resolution being validation. Google an analysis of Beethoven's 5th, and it will say that the opening intervals create tension because the key is uncertain to the listener. Google Music Theory, and you'll learn that the Chromatic Scale is built around the tension between the dominant and the tonic. Occasionally, rather than deliver the punchline, a composer will leave his or her listeners hanging on a suspended-chord or a leading-tone. To experience this cliffhanger, listen to a track with a bass drop, but turn it off right before the actual drop.

Similarly, humor resolves around setting up an ambiguous expectation, and resolving it. The proposed neural mechanisms vary. But jokes always seem to involve a set up, and punchline which is unexpected, yet satisfying. And I think this is because the context is resolved. pg shared a related idea in one of his essays about ideas: "That's what a metaphor is: a function applied to an argument of the wrong type." [2]

With the above in mind, I believe it's possible for today's computers to predict whether a human will find something funny or not. But unless they'll be taught which types of topics humans considered relevant (i.e. deixis), computers would find themselves at a significant disadvantage.

> We have to let go of the machine if we ever want to really move the state of the art forward.

Probability theory is already used in AI. That's really cool, but I don't think the art as a whole needs to move forward. Though both turing complete, speech and programming languages are optimized very differently. I've already pointed out the different constraints. But also notice that while programming primarily aims towards conveying instructions, human speech encompasses a wider spectrum of goals. Meticulous clarity will have a higher impact on instructions like "automate this task" than declarations like "broccoli tastes weird".

> There are an infinitude of expressible ideas, but lacking the proper medium to abstract the expression of these ideas formally (like natural language and our brains do, well, naturally) we will never get a chance to find out what we don't know!

I'm not sure exactly what this is getting at. Incidentally you may enjoy learning about Solomonoff Induction. [3]

[1] http://paulgraham.com/word.html

[2] http://paulgraham.com/ideas.html

[3] http://lesswrong.com/lw/dhg/an_intuitive_explanation_of_solo...

laichzeit0 · on Jan 28, 2014

Recently I had to write a program for an embedded Linux router which ran on a MIPS architecture and had a 2MB flash. I only had about 40kb of space to fit the application on. I was able to get a binary that was compiling to more than 1.5mb down to 20kb through using a combination of gcc tricks like separating data and code sections, eliminating unused sections, statically linking some libraries and dynamically linking against others. It once again gave me immense appreciation for having a language and toolchain that can give you this power for those 1% of problems your career might depend on.

For amusement, the relevant section of the Makefile I ended up with:

LIB_NL_MIPS_STATIC=libs/libnl_mips_static

CFLAGS=-Os -I$(LIB_NL_BASE)/include -ffunction-sections -fdata-sections -Wall -Wextra -MD

LDFLAGS=-Wl,--gc-sections -L$(LIB_NL_MIPS_STATIC) -Lbuild -Wl,-Bstatic -lmon -lnl-3 -lnl-route-3 -Wl,-Bdynamic -lpthread -lm -lc

I'm unsure how many other languages/toolchains give you that sort of flexibility down to the linking level. Also it's self contained and doesn't require some kind of "virtual machine" or interpreter to run it.

pjmlp · on Jan 28, 2014

> I'm unsure how many other languages/toolchains give you that sort of flexibility down to the linking level. Also it's self contained and doesn't require some kind of "virtual machine" or interpreter to run it.

Almost every language that has an ahead of time compiler to native code.

userbinator · on Jan 28, 2014

That makes me wonder why "eliminating unused sections" isn't a default, as it feels like the compiler is doing a lot of unnecessary work if it's generating 1.5M of output that actually has only 20k of useful stuff in it.

laichzeit0 · on Jan 28, 2014

It's only that big because of static linking. Typically you're never statically linking your applications, but in this case it's necessary if you want to use libraries but only want the space-overhead from the code and data you use from the libraries.

It reminds me of a quote I read in the book, Expert C Programming: Deep C Secrets (an excellent book on C btw), that read: "Static linking is now functionally obsolete, and should be allowed to rest in peace." I kinda chuckled a bit when I saw.

uggedal · on Jan 29, 2014

It's far from obselete: http://sta.li/faq

rlpb · on Jan 28, 2014

> But those aren't the reasons why most C code is in C. Mostly, C is important simply because lots of code was written in it before safer languages gained momentum...

I disagree. Certainly in the FLOSS community, I don't think this is true.

C is a lowest common denominator. No higher level language has "won". So if you want the functionality in a library you write to be available to the majority, you will need to make it available (ie. provide bindings for) a number of high level languages. The easiest way to do this is to provide a C-level API. This works well because the higher level languages are all implemented in C. This isn't because C is more popular, but because it is a low level language. The easiest way to provide a C-level API is to write the code in C. So: library writers often write implementations in C.

There are three alternatives:

1) Independently implement each individual useful piece of functionality in every high level language. This does happen, but more general implementations tend to move quicker, since they have more users (because they support multiple high level languages) and thus more contributors. The number of contributors might dwindle because of the requirement to code in C, but I don't think this has happened to a significant enough extent yet.

2) Implement libraries in a higher level language and then provide bindings to every other popular higher level language. This can be done, but I haven't seen much of it. Higher level languages seem to make it easier to provide bindings to a C-level API rather than APIs written in a different higher level languages. This may be something to do with impedence mismatches in higher level language concepts.

3) A higher level language "wins", and everyone moves to such an ecosystem. This can only happen if other higher level languages lose. I don't think there is any sign of this happening.

gaius · on Jan 28, 2014

This works well because the higher level languages are all implemented in C

Not actually true; there are many high level self-hosted languages, OCaml, Haskell, Forth, Lisp, etc etc. But all these languages generally prioritize having a good C FFI. It is interesting that e.g. Thrift, Protocol Buffers et al don't seem to have made much of an inroad here.

theseoafs · on Jan 28, 2014

> It is interesting that e.g. Thrift, Protocol Buffers et al don't seem to have made much of an inroad here.

I'm actually working on a tool called Haris to deal with this very problem. I'm looking to do structured binary data serialization in a way that's efficient, lightweight, and portable (in that it conforms to the C standard). Keep an eye out, I'll probably be posting it on Hacker News within the next few weeks.

nostrademons · on Jan 29, 2014

Thrift, protobufs don't really work for in-process communication because the cost of serialization can get pretty high - you can easily blow most of your CPU time just serializing protobufs across language boundaries (been there, done that). I think Cap'n Proto may be an interesting approach - the memory layout is the same in all languages, and Kenton is reportedly working on 0-copy IPC - but it remains to be seen how it will turn out.

rlpb · on Jan 28, 2014

> It is interesting that e.g. Thrift, Protocol Buffers et al don't seem to have made much of an inroad here.

D-Bus is probably the biggest contender at the moment.

userbinator · on Jan 28, 2014

Ocaml's VM is written in C.

Dewie · on Jan 28, 2014

Isn't Haskell's runtime written in C? Or maybe it used to be but they changed it.

wting · on Jan 28, 2014

The runtime is in C-- but has the option of using LLVM for code generation.

userbinator · on Jan 28, 2014

> This isn't because C is more popular, but because it is a low level language.

I consider C to be high level enough to be reasonably portable and expressive, but at the same time low level enough that it can do a lot of things higher level languages can't.

Tegran · on Jan 28, 2014

> And there's no simple excuse for the preprocessor; I don't know exactly why that exists, but my guess is that back in the 1970s it was an easy way to get at least an approximation to several desirable language features without having to complicate the actual compiler.

Clearly this guy has never had to deal with a large, complicated code base in C. Dismissing the preprocessor as a crutch for a weak compiler shows a significant ignorance about the useful capabilities that it brings.

i_c_b · on Jan 28, 2014

I assume when he says "no simple excuse", it's more pointing to the massive problems that the mere existence of the macro pre-processor introduces for reasoning about the text of any C or C++ program, for programmers, tools, and compilers.

I've worked in a code base where, tucked away in a shared header file somewhere up the include chain, a programmer had added the line

#define private public

(because he wanted to do a bunch of reflection techniques on some C++ code, IIRC, and the private keyword was getting in his way)

Now regardless of whether that's a good idea, if you are reading C or C++ code, you always have to be aware, for any line of code you read, of the possibility that someone has done such a thing. Hopefully not, but unless you have scanned every line of every include file included in your current context recently, as well as every line of code preceding the current one in the file you're reading, you just can't know. Clearly this makes giant headaches for compliation and tools, as well.

So yeah, of course every mid to large C / C++ program uses the macro pre-processor extensively. You can do useful things with it, and there's no way to turn it off and not use it, anyway, given the way includes work in C / C++, so you might as well take advantage of it.

But it's not an accident that more recent languages have dropped that particular feature.

arethuza · on Jan 28, 2014

Even C# has limited preprocessor functionality:

http://msdn.microsoft.com/en-us/library/ed8yd1ha.aspx

The only time I have seen actual conditional compilation used in C# was hilarious - I was given a pile of code that had preprocessor directives used to make private methods public so that they could be unit tested. However, if you switched the compile time flag to the "production" state nothing compiled....

thirsteh · on Jan 28, 2014

He wrote PuTTY and all of the assorted tools. I'd say that's a fairly large codebase.

Also, one of the main reasons Go was created was that the authors were tired of the compile times caused largely by preprocessing.

reirob · on Jan 28, 2014

And he wrote a Collection of Puzzle games that work across different operating systems including Android: http://www.chiark.greenend.org.uk/~sgtatham/puzzles/

I would say Simon Tatham definitely has good experience in programming.

Guthur · on Jan 28, 2014

I doubt he meant that it was not useful, rather that the usefulness might have been better served as a function of the compiler rather than some disconnected transformation tool.

Of course this thought process would eventual bring you down the road of macros systems such as those in Lisps, but that's going to be more difficult with a language lacking homoiconicity of code and data.

zurn · on Jan 28, 2014

It's just unfairly easy in Lisp :)

There are macros in many non-homoiconinc languages (eg. Rust, Dylan), and there are add-ons for several of the languages lacking them (eg. SweetJS for Javascript, MacroPy for Python).

fanf2 · on Jan 28, 2014

You should have a look at Simon's preprocessor-based coroutines (as used inside PuTTY) http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html and his more elaborate techniques for metaprogramming custom control structures in C http://www.chiark.greenend.org.uk/~sgtatham/mp/

rehack · on Jan 28, 2014

>7. There is no convenient string type

This the reason, which stops me from going back to C. After coding in Java (mostly) for past 10 years. I wanted to switch back to C or C++. Mainly to save on ton of memory being used which I think is unwarranted.

So I experimented with a new service, and coded it in all three C, C++ and Java. When I did this I had not coded in C++ for 10 years, but it did not hurt at all. I could switch back easily with almost no great difficulty. There were some minor inconveniences of foregoing the Eclipse editor. I think, I might have missed Autocomplete the most.

But within hours after I started, I was getting my previous feeling of the Vi(m) editor coding of C++ back. And with the benefit of having STL (vectors, strings, etc.) I did not feel much discomfort.

But coding the same service in C was painful. And it was mainly because of not being able to basic things on strings easily like copy and concatenate.

But thankfully I still managed to do it. And on comparing the three services for latencies and memory usage, I found little difference between C and C++.

So eventually that service was deployed in C++ and still runs the same way.

This above episode happened about an year back, and recently I am using Go to do a lot of services (new as well as moving some old). Mainly I have been motivated by the promise of an easier C, which it seems to offer.

Some services, coded in Go, I have deployed and are already running very well. But even now, I need some more experience on the results side, to have a definitive opinion on whether Go is indeed C with strings lib (and other niceties) for me.

Edit: rephrase for clarity

humanrebar · on Jan 28, 2014

Most languages get string processing (and its closely related cousin, localization) wrong, even the ones with string classes, so I don't really get my jimmies rustled on C's anemic native string support.

http://www.joelonsoftware.com/articles/Unicode.html

On large enough projects, you end up with all kinds of custom logic around user-entered and user-facing strings, so the lack of native string processing is really only a drawback for tiny and proof-of-concept projects, which aren't really what you use C for anyway.

That being said, the right way to do string processing usually ends up looking a lot uglier than the way we are used to.

theseoafs · on Jan 28, 2014

Did you use a string library when working in C? C's deficiencies when it comes to string handling are well-known.

rehack · on Jan 30, 2014

Sorry for the delay in replying. No I did not, actually. See I was coming back after a while, I had quickly shifted to C++ (after coding briefly in C) in my career, so did not remember using any libraries.

I am sure, my task would have been easier if I had used some lib. But my main concern (and goal) was performance and memory usage comparison.

lstamour · on Jan 28, 2014

Knowing a bit of C but often programming in just about any other language, I was recently inspired to work with lower-level languages like C++ thanks to a bunch of talks from Microsoft's Going Native 2013. Specifically Bjarne Stroustrup's The Essence of C++: With Examples in C++84, C++98, C++11, and C++14 -- video and slides at http://channel9.msdn.com/Events/GoingNative/2013/Opening-Key...

C++ really has changed and is changing from what I learned back in university. It's quite exciting. They seem to be standardising and implementing in C++ compilers the way HTML5 is now a living standard with test implementations in browsers. See also: http://channel9.msdn.com/Events/GoingNative/2013/Keynote-Her...

NigelTufnel · on Jan 28, 2014

There is a great moment in Stroustrup's talk when he shows a short error message in ConceptGcc and the audience applauds.

It seems that Stroustroup was suprised by the applauses.

lstamour · on Jan 28, 2014

He's a university prof. When's the last time they ever heard applause? ;-) Really, in part, I think it was because he was trying to finish a thought and was going to summarize the feature again in two slides. But yeah, I'd have cheered. It felt like C++ the Steve Jobs keynote, in a way. All the good parts, right in front of you, shipping "soon".

lstamour · on Jan 29, 2014

For the Google index, I'd also like to note that LLVM 3.4 released three weeks ago, has support for C++14 in Clang. Incidentally it also mimics Visual C++'s compiler from Microsoft in Visual Studio. I'd expect it to ship with an Xcode 5.1 alongside iOS 7.1. Now that I think about it, I should run to Apple's developer site and see if that's true. :)

Edit: Xcode 5.1 beta 4 ships with Apple LLVM version 5.1 (clang-503.0.9) (based on LLVM 3.4svn) according to Google searches. So I guess that's a yes. I'm off to try it now. :)

Chromozon · on Jan 28, 2014

C is a great language- it let's you get down and dirty with the computer.

However, the one huge downside to programming in C is having to deal with strings. Let's face it, C strings are absolutely terrible. For such an important feature, the string implementation of null terminated char* is just miserable to work with. See: http://queue.acm.org/detail.cfm?id=2010365

stusmall · on Jan 28, 2014

That is a problem with the stdlib and not so much the language. There is nothing stopping you from creating a more robust string implementation that stores size and does bounds checking for some operations(You can't easily/safely enforce anything like string immutablity at the language level... unless anyone out there can think of a way). I've seen it before, but not often. Usually the C code I'm working on does very little string manipulation and speed/size matters.

i_c_b · on Jan 28, 2014

Well, to be fair, strings are much nicer to work with when you have a suitable overload for '+', and that very much is a problem with C the language. Same goes for 3d vectors in C as well.

theseoafs · on Jan 28, 2014

That's your opinion. I (and many other C programmers) find it refreshing that when I see a "+", a couple numbers are going to be added together. Nothing else could possibly happen, and I don't have to cross-reference the types of the operands to figure out whether a method in some far-off source file is going to be called.

i_c_b · on Jan 28, 2014

C is willing to hide a lot of numeric type casting details about doubles vs floats vs unsigned ints vs chars behind the magic of the "+" character, so maybe you're willing to endure a lot more type-dependent compiler magic than you're letting on here, despite so graciously speaking for many other C programmers. In fact, "p+1" might very well mean "p+4" if p is an int. Or maybe it means "p+32" if p is a FILE (on my compiler). Or maybe it means... Wow! Wait a second! That seems pretty type dependent to me, come to think of it! But that's just my opinion.

theseoafs · on Jan 28, 2014

Look back; I never said the + operator wasn't type dependent. It clearly is. It's certainly true in C that no matter what, the + operator adds two numbers together. The exact nature of that addition depends on the types of the operands, but the rules for that are dead-simple, and most importantly, aren't extensible; I can't include a header file that will change those simple rules.

i.e. Once you understand how pointer arithmetic works, and you know how C's type promotion scheme works, the meaning of any addition expression is basically evident, and you have a few guarantees about the behavior of the program (for example, I can reasonably expect that my addition won't take more than a couple clock cycles, depending on what kind of casting needs to be done and the like). In languages that support operator overloading, + can literally mean anything.

i_c_b · on Jan 28, 2014

BUT clearly there is a world of difference between saying "it would be nice to have char* + char* as a short hand for string concatenation somehow" - which is roughly what I was saying - versus saying "all operators in a language should be arbitrarily overloadedable", which is roughly how you responded.

Now, I'm not saying there is an obvious way to handle "char* + char" meaningfully or safely in C. But on the other hand,

char stackString = "some literal";

is generally handled in an entirely different fashion from

int myVar = 7;

if both are declared as local stack variables - and C programmers generally have no problem learning that "=" is going to mean something quite special when declaring string literals this way, compared to other data types. Because as you say, it's just one more language rule.

The only real places where I miss operator overloading in C are when dealing with strings in performance non-critical places, any time I'm using 3D vectors, and any time I'm writing matrix math. If those were handled as primitives in the language (and as the string literal example shows, C already does go partway down that road), I'd very happily part with arbitrary operator overloading.

theseoafs · on Jan 28, 2014

> char *stackString = "some literal";

Uh, in what way is that "entirely different" from the int declaration? In both cases, you're just copying a couple words from one place to another, which is what the = operator does. There's nothing "special" about that declaration; we're just writing a pointer into a variable on the stack.

My feeling is that you're uncomfortable with C's treatment of strings because you don't entirely understand the memory model.

i_c_b · on Jan 28, 2014

Given what you've written, I have my own doubts about your understanding too.

Unfortunately, that and your proactive psychoanalysis of me are not interesting conversation topics.

theseoafs · on Jan 28, 2014

Well, you could help out my ignorance by explaining what's so special about `char *s = "some string";`.

vezzy-fnord · on Jan 28, 2014

Indeed: http://bstring.sourceforge.net/

stefantalpalaru · on Jan 28, 2014

You're not forced to use the standard library's strings. Take a look at GLib:

https://developer.gnome.org/glib/2.37/glib-Strings.html

RogerL · on Jan 28, 2014

Yes, but no.

Yes, as in of course you are correct.

However, this is an argument that Bjarne often uses in the context of C++, and it drives me a bit batty. For example, he'll wax eloquent on how it is fine that there is no multidimensional array type built into the language, because look how easy it is to write one yourself, or use one somebody else wrote.

And, at the first approximation, that is surely true. Heck, I'm taking time out to post this from working on a Kalman Filter class I wrote that uses a hand coded multidimensional array class.

But, the problem is is that there are thousands of string libraries out there, and thousands of multidimensional arrays, and so on. And they don't play nicely with each other.

Heck, we use char, std::string, CString, and QString all in the same project. You can guess the evolution - a bunch of old library code written 10 years ago who didn't trust/like the new-fangled std::string stuff. External library code written in C. Then code written in modern C++ with an aim to be portable (std::string). Then some MFC UI code, and more code written by MFC people that didn't care about strewing that dependency in places where it didn't belong. And now we are in Qt, and I have to really clamp down on the code reviews to keep QString from straying beyond the UI components. Ugh.

Hey, I love C, and this is not a rant against the language. C strings have their place. I can't tell you how many times I've written code along the lines of:

  char* c = some_data(); 
  c += header_size;
  name = extract (c, ',');

You get the idea. At one time that was the state of the art way to do string processing without the cost of a lot of extra creation/deletion. You just use pointer arithmetic, move along a data source, scraping it as you go.

But these days we aren't so interested in that kind of scraping, and are far more interested in higher level problems. And we have no standard to fall back on, in C. Sure, I can elect to use a third party library, under the almost never to be satisfied requirement that every line in our project is written in house, or that every third party library either uses the library I chose, or that it can inter-operate with it seamlessly. Frankly, I've never been in a situation where either of those held. So we end up either reverting to the mean (c strings), or making endless conversion calls to switch from one form to another.

michaelhoffman · on Jan 28, 2014

> If you've used Java or Python, you'll probably be familiar with the idea that some types of data behave differently from others when you assign them from one variable to another. If you write an assignment such as ‘a = b’ where a and b are integers, then you get two independent copies of the same integer: after the assignment, modifying a does not also cause b to change its value.

This is incorrect when it comes to Python. a and b will be two different names for the same integer object, which is stored in a single memory location. The difference is that Python guarantees that integers are immutable.

Jach · on Jan 28, 2014

> The difference is that Python guarantees that integers are immutable.

Python's guarantees can be worked around: https://gist.github.com/Jach/1208215

jffry · on Jan 28, 2014

Arguably, to a user of the language, these are imperceptible from independent. Changing one cannot change the other (except perhaps through some exotic double-underscore-prefixed function with which my vague knowledge of Python is unfamiliar)

jpace121 · on Jan 28, 2014

You definitely notice the difference when dealing with mutable types in python, like lists.

  >> a = [1,2,3]
  >> b = a
  >> a.append(4)
  >> b #[1,2,3,4]

This is one of the things that throws off many beginners, and is an easy trap to fall into, not exotic at all.

beambot · on Jan 28, 2014

OP literally says this in the next sentence:

> But if a and b are both variables of the same Java class type, or Python lists, then after the assignment they refer to the same underlying object, so that if you make a change to a (e.g. by calling a class method on it, or appending an item to the list) then you see the same difference when you look at b.

rcxdude · on Jan 28, 2014

His point is that there isn't a difference in python: in both cases you're just changing labels, but in the case of integers they're pointing to immutable objects.

kyllo · on Jan 28, 2014

That's why understanding when variable assignment gives you a reference vs. a copy is absolutely crucial to working with object-oriented languages.

jffry · on Jan 28, 2014

Right, I was talking specifically about the integer case in Python

hornetblack · on Jan 28, 2014

It is possible to do some evil hackery and change the value of integers in the small number cache.

http://hforsten.com/redefining-the-number-2-in-python.html

yason · on Jan 28, 2014

I remember when I first learned C.

I was 13 and having written assembly for years I finally got a machine that was actually equipped for running a full-blown C compiler. Compiling was slow and the produced code was slow but all I could think of was how easily I could generate [assembly] code with just a few lines of C. Loops, pointers, function calls, conditionals... just like that. Wow. So productive.

C felt like writing assembly but with much better vocabulary. C was to assembly language what English was to the caricatured "ughs" of the stone age.

I often compared the output of the compiler to what I would've written myself: the output was bloaty, the compiler was obviously not very smart, but it did do what I wanted and the computers had just got fast enough to be able to actually run useful programs written in C without slowing down the user experience. So you couldn't necessarily distinguish a program written in C from a program written in assembly, and you could "cheat" by choosing C instead. That was so exciting!

The thing is, however, that since these trivial insights of my youth it turns out that C actually never ran out of juice.

I still write C and I'm enjoying it more than ever.

In C, I've learned to raise the layers of abstraction when necessary and writing C in a good codebase is surprisingly close to writing something like Python except several dozen times faster and you can build your memory layout and little details the best suitable way you want, in various meaningful contexts.

I love doing all the muck that comes with C. String handling, memory management, figuring out the best set of functions on top of which to compose your program, doing the mundane tasks the best way in each case, and never hitting a leaky abstraction like in higher level languages.

The thing is, the time I "waste" doing all that pays me back tenfold as I tend to think about the best way to lay out my program while writing the low-level stuff. Because such effort is required there's a slight cost in writing code which makes you think what you want to write in the first place.

In Python you shove in stuff into a few lists and dicts, it just works and you will figure out later what was it that you really wanted and clean it up. But often you're wrong because it was so easy in the beginning. In C, I have to think about my data structures first because I don't want to write all that handling again for a different set of approach. And that makes all the difference in code quality.

However, I don't think you could impose a similar dynamic on a high-level language. There's something in low-level C that makes your brain tick a slightly different way and how you build your creations in C rather than in other languages reflects that. The OP said it very well: C reflects the reality of what your computer does. And I somehow love it just the way it is.

I've worked most of my career in higher level languages but I've never set C aside. It has always been there, even with Python, C++, or some other language. Now I'm writing C again on a regular basis and with my accumulated experience summed into the work it's truly rewarding.

userbinator · on Jan 28, 2014

I was much the same - started out with Z80 asm, moved onto x86 shortly after that, and never really liked HLLs (including C) until I was almost 18 - I always felt I could do better than the compilers at the time (and I did), so there wasn't any reason to move up. I still use C and x86 asm frequently, more the former now, but I'll sometimes go back to something I wrote in C before and start rewriting bits of it in asm just to see how much smaller I could make it.

> Because such effort is required there's a slight cost in writing code which makes you think what you want to write in the first place.

It also tends to make you think of the simplest, minimal design that works, and that translates into more efficient and straightforward code. Higher level languages make some things really easy, but then I always feel a little disappointed by just how much resources I'm wasting afterwards.

mironathetin · on Jan 28, 2014

This proves the point: any language is good as long as you really know what you are doing in that language.

GotAnyMegadeth · on Jan 28, 2014

>>+++++++++++++++++[-<+++++++>]<>>+++++++++++++++++++[-<++++++>]<>>++++++++++++[-<++++++++>]<+>>+++++++++++[-<+++++++++++>]<>>+++++++++++++[-<+++++++++>]<>>+++++++++++++[-<++++++++>]<>>+++++++++++++++++[-<++++++>]<+>>+++++++++++++++[-<+++++++>]<>>+++++++++++++++++++++++++++++[-<++++>]<>>++++++++++[-<++++++++++>]<+>>+++++++++++[-<++++++++++>]<>>++++++++[-<++++>]<>>++++++++++++[-<+++++++++>]<+>>+++++[-<++>]<>>++++++++++++[-<+++++++++>]<>>++++++++++++++[-<++++++++>]<>>+++++++++++[-<+++++++++>]<>>+++++++++[-<++++++++>]<+>>+++++++++++++++++++++++++++++++++++++[-<+++>]<<.<<<<<<.>>>>>.>>.<<<<<<.>>>.<.<<<<<.<.>.>>>>>.<<<<<<<<<<<.>>>>>>>>.<<<<<<<<<.>>>>.<<<<<.>>>>>>>>..>>.<<<<<<<<<<<.>>>>>>>.>.<<<.>>>>>>.<<<<<<<<.>>>>>>>>>>>>>>>.<<<<<<<<<<<<<<.<<<.>>>>>>>>>>.>>>>>.>>.<<<<<<..<<<.>.<<.>>>>>.

claudius · on Jan 28, 2014

Please do people a favour and either format this as code or add more linebreaks, at the moment it makes stuff rather uncomfortable to read (until one clicks ‘Fit to Width’).

GotAnyMegadeth · on Jan 29, 2014

Yeah sorry, I wrote it like that to make it more confusing...

But it's actually not too confusing. A brief explanation: The first 75%ish, the bit with all the +s, is filling up the elements of the BF array with the ASCII value of all of the letters contained in the sentence. And the second 25%ish is just scrolling to the right location and printing out the letters, hence lots of <>s and .s. During the fill up stage each ASCII value is made using two of it's factors, in an attempt to reduce the number of characters. So each ASCII letter looks like this N+[-<M+>] where N and M are the two factors chosen. For example 32, an ASCII space, is ++++++++[-<++++>], N=8, M=4.

I'm sure this isn't the most character efficient for short sentences, but it might not be too bad for paragraphs.

PS, in analysing that I have noticed there is a pointless extra > at the start.

nfoz · on Jan 28, 2014

That sounds like a flaw in HN moreso than the post.

69_years_and · on Jan 28, 2014

Seems like a case in point, _is that braintuck or something, what's it do?_

d4rti · on Jan 28, 2014

It's a brainfuck program - you can run it in your web browser at http://www.iamcal.com/misc/bf_debug/

zem · on Jan 28, 2014

if you're not familiar with simon tatham, do poke around his site [http://www.chiark.greenend.org.uk/~sgtatham/] - he has an eclectic and delightful assortment of code and writing. probably best known for putty, but the rest of it is a lot of fun to browse through.

blueblob · on Jan 28, 2014

He should be best known for bsd-games (package name on archlinux, I think ubuntu too). :-)

http://www.chiark.greenend.org.uk/~sgtatham/puzzles/

EDIT: oops, the link is for a different package called puzzles on archlinux though he also did make bsd-games (which are text based such as boggle)

zem · on Jan 28, 2014

he wasn't behind bsd-games, which is a collection of old games originally created by various people - see

https://github.com/vattam/BSDGames/blob/master/AUTHORS

spikels · on Jan 28, 2014

The article does exactly what it sets out to do: introduce C to programmers used to more modern languages.

I started programming in C again a few months ago after a 15 year hiatus and the language I remembered loving seemed strange and tedious. This would have been a great reminder of the many differences that after a while you just take for granted. Something similar would be useful for most languages but just more so for C (or say, FORTRAN).

My only quibble would be that while malloc/free are covered many variables are simply automatically aloocated and deallocated on the stack. C's dual approach to memory management is yet another frequent source of confusion.

warmwaffles · on Jan 28, 2014

I love C. It wasn't my first language to jump in to, but it was eye opening to see the power of pointers and low level operations. Java just couldn't get me close enough to the system.

nadam · on Jan 28, 2014

The article only discusses the 'extremities' (C vs. Python/Java, etc...) when there is an obvious and popular 'compromise': C++, which has most of the discussed advantages of both sides. (Although it has some drawbacks; it is a bit more difficult to master than either C, Java or Python.)

helicon · on Jan 28, 2014

A great resource for learning C is CS50 on edx:

https://www.edx.org/course/harvardx/harvardx-cs50x-introduct...

mooreds · on Jan 28, 2014

As someone who swore off c after a college class and an experience with perl (three cheers for memory management), this was a great intro article to the idioms of c.

jevinskie · on Jan 28, 2014

I first learned "true" programming with Perl. My next language was C to learn how to program microcontrollers. Incidentally, they remain my two favorite languages even after Python, C++, Java, Tcl, shell, LISP, and multitudes of assemblies. I feel these two languages cover most of my uses.

I just wish the built-in XS Perl<>C integration was simpler. I need to look into some CPAN modules for a more ctypes-like interface. Better yet, it should use libclang to autogenerate bindings!

bch · on Jan 28, 2014

Since you mention you have Tcl experience, you may want to (re?)explore the Tcl/C relationship. An absolute joy to work with.

Also, re: bindings, you may be interested in the SWIG interface generator: http://www.swig.org

jheriko · on Jan 28, 2014

interesting read. one of the later comments is a bit off the mark though:

" As a direct result of leaving out all the safety checks that other languages include, C code can run faster"

C is fast not just because of missing safety checks but because more generally you don't pay for features you don't use. Things like function calls and reading data are not complicated by run-time type logic for instance - this is very important, its why you can write an Objective-C class which has the same content as a bunch of C functions and the C functions will be (sometimes very significantly) faster.

This is one example, but many language features in high level languages suffer from similar performance problems - by being super generic and ultra late binding they can never perform as fast as a clean implementation which knows everything at compile time.

If you want dynamic late binding type functionality in C you have to do it yourself...

gmac · on Jan 28, 2014

The really nice thing about Objective-C, of course, is that you can just dip in and out of plain C as the mood takes you (e.g. where speed matters, or perhaps where you're just doing numerical stuff and C is less verbose). I've come to C via Objective-C, and (like many other commenters here) have found it incredibly satisfying.

tedchs · on Jan 28, 2014

What a great explanation. I have been doing some low-level Go programming recently (including implementing the writev syscall), and I think this document would also be useful for Go programmers.

pjmlp · on Jan 28, 2014

Many of the features people nowadays atribute to C, do exist in other impereative languages that compile to native code.

But as many used to be in diapers when compilers for those languages were available, they only know C.

collingreene · on Jan 28, 2014

This is really great. I have found myself saying some of these same things when explaining things. Going to keep this in my pocket to use in the future. Thanks!

wmt · on Jan 28, 2014

I was bit bothered that with all the talk about malloc it was never highlighted that not all memory needs to be manually freed: local (stack) variables are quite safe, why it is a common pattern to give pointers of local variables to functions to store their results in. There are some common exceptions for this when you just need to call malloc, but these should be treated as exceptions.

acorkery · on Jan 28, 2014

I really enjoyed this article. I started out programming in C, then quickly on to Java.

I didn't appreciate the language at the time, but with hindsight, the fact that you need to worry about memory allocation and performance means you've a better understanding of what's happening on the underlying system.

It should never be taken off CS courses!

ejk314 · on Jan 28, 2014

I completely agree. I took one semester of Java then one semester of C when I was first learning to code. I think that extra semester of C gave me a significant advantage over some of the other programmers in my next school because I had a much better understanding of what was going on under the Java straight-jacket.

NAFV_P · on Jan 28, 2014

This goes into more detail on pointers than some tutorials on C++, pointers are bread and butter in C.

The article should have given a link to:

http://cdecl.org

The last time I visited that site I ended up crossed-eyed, incontinent and speaking in DEC-PDP8 assembly for several days.

betterunix · on Jan 28, 2014

"the length of the array isn't stored in memory anywhere"

This is probably not true. For arrays on the heap, the size (or an approximation e.g. number of pages the array spans) would have to be stored somewhere in order for the array to be deallocated. For arrays on the stack, the size is either known at compile time, or else it was at least available when the array was allocated and could be kept in the stack frame (and in many cases would be kept in the frame anyway).

Not only that, but the common pattern of passing a pointer to an array and its length as arguments to a function implies that most of the time C programmers keep the length of the array stored somewhere. You are really talking about niche cases where the length of the array is truly and inherently unavailable.

Really this has more to do with the fact that C is meant to do as little as possible for programmers -- it is supposed to be "close to the machine."

yan · on Jan 28, 2014

I'm sorry, but I think the article's version is actually closer to the truth.

Regarding stack arrays, while the compiler certainly knows how large an array is (of course it has to, you're using the length in the code it parses), it will almost certainly not write code or immediates that hold the size anywhere to the resultant binary. If you allocate three arrays, each of 12 doubles, a compiler can simply emit "sub esp, 288" and be done with it. It also won't stop you from referencing memory at an unreasonable offset from any of those arrays. (Compilers can warn you though.)

Passing the array along with its size further goes to prove the original statement, not refute it. If you have to do something manually, it strongly implies that the language/runtime isn't doing it for you.

Regarding arrays on the heap, the runtime even then does not require you to place a size anywhere. What likely happens is the right cell from a bucket of the closest size to the allocation you need is returned to you and marked in use. It will be as large as the size you're allocating, or larger. The only artifact of its original size is which span of memory it's located at. When you free() that allocation, that block simply gets marked as free, no size needed.

rayiner · on Jan 28, 2014

> What likely happens is the right cell from a bucket of the closest size to the allocation you need is returned to you and marked in use. It will be as large as the size you're allocating, or larger. The only artifact of its original size is which span of memory it's located at.

Almost every segregated-fit allocator I've ever seen still stores the size of each cell in the bucket header, although of course you only need to store the size once for the whole bucket since each cell is the same size. Still, it's stored somewhere. Also, only some allocators are segregated fit. dlmalloc, for example, which is very widely used, uses boundary tags, storing the allocation size with the allocation itself.

betterunix · on Jan 28, 2014

"What likely happens is the right cell from a bucket of the closest size to the allocation you need is returned to you and marked in use"

...implying that the size of the block is, in fact, available to the allocator. This might not be stored explicitly (it could be stored as a pair of pointers) but it is not unavailable. It is also necessarily available to the deallocator, which must in some way be aware of what it is marking as free. As I said, this might only be an approximation of the size of the array, though for bounds checking purposes it would usually be good enough (since nothing else should be allocated in the "excess" space that is marked as in-use).

"Passing the array along with its size further goes to prove the original statement, not refute it. If you have to do something manually, it strongly implies that the language/runtime isn't doing it for you."

Take a closer look at the original statement. He did not merely say that the size is unavailable to programmers, he said it is not available at all. That is not really true -- it is more that the compiler does not do anything useful with the information.

To put it another way, a compiler could conceivably emit bounds-checking code but still not provide programmers with an explicit way to get the size of an array. The result would be the same: programmers would still be forced to pass array sizes around, to avoid a call to abort (or whatever behavior occurs when a bounds check failed).

brigade · on Jan 28, 2014

No it couldn't, because the compiler cannot make assumptions about the heap allocator beyond what the C spec says, which is basically just the function signatures. An allocator that never frees memory is entirely within the C spec and would need not know individual allocated sizes after allocation.

Or, if a compiler decided to anyway, it would have to link a compiler-specific standard library because it's entirely implementation-dependant what and where bookkeeping information is stored. In turn, this means you couldn't safely link code compiled by different compilers. Bjarne might think that fine and dandy, but C never had that level of damage.

tptacek · on Jan 28, 2014

The fact that C programmers pass around explicit lengths kind of illustrates the point the author is trying to make, and that you're arguing with. The allocator stores enough state to make "free" work, but that state is not idiomatically available to C programmers. It would be a code smell if a C programmer pawed around malloc to get the size of a block. If you need to know the size of something in C, you arrange to always know it.

unwind · on Jan 28, 2014

It would be way, way, worse than a smell. It'd be a clear-the-room-this-sh*t-is-baad type of situation. At least in my book, but I can be somewhat delicate in my sensitivities sometimes. :)

And yes, of course it's true that C APIs use explicit lengths because they're nowhere to be found at run-time. We didn't all just miss that and add an extra argument for fun.

wglb · on Jan 28, 2014

Consider the case where you have a struct containing something in addition to an array. If you malloc sizeof the struct, then the array size is likely not to be found anywhere.

ksk · on Jan 28, 2014

In an attempt to be pedantic for no reason you have opened yourself to criticism by other pedants ;) The words heap and stack are not even mentioned in the C standard specification. You're conflating the implementation with the language itself. The post was about the language.

gaius · on Jan 28, 2014

I have often wished for this. If you can pass a pointer to free() and it frees the previous allocation, it must "know" how big it was, so why is there no way to ask it? You can obviously stash it yourself at malloc() time in a lookup table, but still.

angersock · on Jan 28, 2014

You are really talking about niche cases where the length of the array is truly and inherently unavailable.

You mean like the niche case of null-terminated strings?