C really is fantastic language to learn. When I learned it, I also came from the scripting language world (Python, PHP). Having that as a base, then learning C, lead to revelations about how those scripting languages are created, and how the data structures I took for granted in a scripting language are actually implemented.
From what I understand in formal CS courses, you learn from the bottom up, but learning from the top down was much more enlightening for me.
I went in the other direction and it was enlightening too. I learned how useful hash tables can be, and that a word count algorithm need not be a complicated assignment full of segmentation faults. :-)
I also learned how data structure libraries can be separated from user code (when learning C in class we usually created ad-hoc data structures heavily coupled with the calling code).
> I took for granted in a scripting language are actually implemented.
Exactly. Not only that you but I had started to appreciate the scripting language (Python) more because I see much it does for you. I knew that before but actually seeing how one would have to handle reference counting, error handling it is nice to go back to Python.
I too started with scripting languages, and also learn in a similar way. Hearing the value that learning C provided you makes the decision to learn it myself a simple one.
To those versed in C, is the tutorial linked in the original article a recommended one?
I personally didn't like Learn C the Hard Way at all. I can see the approach for
Learn {language} the Hard Way (mostly examples and exercises) working well for
languages like Python or Ruby but I felt that for C it was too vague and each
exercise didn't really give me an idea of what I was learning/accomplishing.
With the other languages you can get away with focusing on functionality but I
think you really need a more indepth understanding in C to avoid headaches
involved with the language.
While C Programming Language by K&R is a little dated, I still think it gives
you a better idea of learning the language or at the very least, how to navigate
the code and find your answers better.
Disclaimer: I already knew C fairly well and decided to go through LCTHW to see if I
could pick up any hidden gems; might differ for someone with little to no
experience.
C programming is still alive and well is many parts of our profession. The biggest reason why is that even though c++ _can_ do memory management like C most of the time programs implemented in it don't and you can really start to loose a grasp as to what your asking the computer to do when you abstract away where your putting all the stuff your unknowingly asking it to make.
In the other hand, I find that C++'s ability to add semantics to memory management is extremely valuable in doing the right thing at all times.
unique_ptr<>, scoped_ptr<>, and shared_ptr<> all say different things about heap memory lifetime, and those make it easier to maintain your grasp on what you're asking the computer to do.
Like many things in C++, you can do that in C too (e.g. by using a variant of Hungarian notation). But the important thing is to do it.
I've been playing w/ Rust this past week, and I've had something of an enlightening experience related to the pointer semantics you describe.
Rust's memory management works similarly to what you describe in C++[0], but things like unique_ptr, et al. are baked into the syntax and compiler. -- It's basically syntactic sugar for memory semantics.
Coming from a background of GC'd languages (Java, Go, and Ruby) these different allocation semantics were Very Hard(TM) for me to understand at first.
I kept battling w/ the type checker, throwing different pointer types at it until the example compiled. (Rust's compiler is _very_ intelligent, w/o using "unsafe" code, you basically cannot have a "use after free" condition in Rust. -- It will not compile.)
As the parent points out: I've never really spent any length of time _understanding_ the allocations I'm asking a computer to make. I just generate things, they become garbage, and the runtime cleans it up eventually.
Once Rust "clicked" for me, I realized something: I know the lifetime of _every single variable_ in this piece of code.
I know when it's allocated, where it's allocated, and when it gets freed.
The reason it was so hard for me to initially grok Rust was because I'm _not used to reasoning_ about memory allocation.
Once I understood the semantics though, it was as though I had an _entirely new_ level of abstraction at my fingertips.
These semantics are definitely some powerful stuff, and I'm very glad I'm taking the time to try and understand them.
Or you can do what c++ and rust did which is neither reference counting or GC ( yes I know rust has a GC just stay with me for a little). Its a strong and distinct message about what you plan on doing with the memory your getting. When I run across a int* i don't know what your doing with it just by seeing its type, but if I run across a shared_ptr<int> I know its reference counted and will stick around until I am done with it and if I am the last one it will be free'd. In the same manner If i see a unique_ptr<int> I know that if i need it I must take it from someplace and the compiler will be kind enough to tell me if I messed it up somehow.
shared_ptr probably does use reference counting, but unique_ptr doesn't.
There's literally only ever one "owner" of a unique_ptr, which is why the parent comment talked about taking it from somewhere.
C++ exchanges ownership of unique_ptr's in the move constructor or assignment operator (i.e. there's no "copy" operation available, unlike with std::auto_ptr). If a unique_ptr still happens to own a pointer when the unique_ptr is destructed, it knows to destroy the pointed-to object as well and free the memory, but the object itself is not reference-counted per se, there's only the Boolean sense "alive/dead".
The terminology may not be completely accurate (although I did try very hard to not say anything incorrect), but I honestly think it does a good job of showing you how to actually use pointers. I know the resident C experts will probably find something wrong, but I've received tons of comments and emails from people saying they finally "get it". It's something I'm proud of.
I never really got why people found pointers to be complicated. I suck at writing C code, but pointers are something I understood pretty easily. It in the name really, it's something that points to (the location of) something else.
I doubt that anyone struggles to understand pointers, they/we struggle to understand the implementation of pointers in C, in some cases. If you want to explain pointers, to people from a object oriented programming background, you might start by comparing points to "by reference" and "by value".
My issue has always be why I would care about the memory address of a variable, rather than getting the value as the default behavior, but that might be why I suck a C.
Just because you understood it great the first time doesn't mean everyone is like you. I was turned off to C for a long time due to how awful my teachers were at explaining the concept. My first time being exposed to pointers was in a binary tree traversing algorithm handout given out by my teacher - it was just way too much at once.
I just think it's weird that pointers are made to be so complicated, when the part I don't get, which is header files, get no attention. Header files might be the sole reason why I didn't care for C. I don't understand them and they are never properly explained.
I don't understand them and they are never properly explained.
There is nothing to get. They're just text, inserted wherever you told the preprocessor to insert that text, exactly as if you had typed the whole lot in yourself. What you choose to do with this ability is up to you. Many people choose to use it to avoid having to copy the same text over and over again into other files.
You are not alone. I picked up enough C syntax to write trivial programs like the "guess the number" game in it around age 13 or so, and I already understood the concept of a pointer or memory address from reading about (but not actually practicing until I was older) assembly language and old game consoles. The questions "what the hell are headers, how am I supposed to separate a program into multiple files, how am I supposed to structure my program?" kept me from even trying anything in C until I was 17, and even then it took months of fiddling/guesswork and finally stumbling across Learn C The Hard Way before I had any idea of what I was supposed to be doing instead of throwing everything in a single giant source file. I'm still rarely sure about where I'm supposed to draw the line, and languages like C that use headers to provide and control interface and visibility make it such a pain in the ass to refactor that I err on the side of large source files.
Since you seem to be asking, headers in C are mainly used to provide interfaces to functions and datatypes defined in other source files. It's basically C's kludgy way of marking things as public or private, and choosing what external functions a source file can see.
For each source file other than the main one, you write a separate header file that includes declarations of the functions in it (just `int dostuff(int arg);` without the actual implementation in curly brackets), and definitions of the publicly-visible datatypes used in it (so your struct definitions usually go in the header, not the source file). You control whether or not something is visible externally or not just based on whether or not you write a declaration or definition for it in the header, and you control which functions are visible to each source file based on which headers you include.
In a separate source file, you include the header of every source file that contains functions you want to use. Now, the compiler knows the signature of those functions, and when you write `some_function_in_another_file();` you can think of it as "setting aside space" for the function call, even if the compiler has no idea what code is actually contained within it. It just sets up the parameters it is going to pass to it, and relies on the assumption that the function will follow the rules and return a value in the right place that it can go on to use.
EDIT: I should add that the whole purpose of making the function declarations visible is so that compiler knows how to arrange the stack. It can't know what kind of code to generate unless it has some idea of what arguments the function will be passed, and what the state of the stack will be after the function is called. I'll also add that it's important to understand the difference between definitions and declarations
All source files are compiled separately into object files. When the executable is created, a program called a linker ties up the loose ends; for example, it would decide on an address to store `some_function_in_another_file()` at, and replace the "function call stub" I mentioned earlier with the code to actually call this function.
Libraries will generally use include to generate a "super header" that a user include with a single include statement and get access to every function and data type in the library. In your own code, however, it's usually better to explicitly include only the particular headers that a source file uses, only generating "super headers" for large internal modules that will be treated a bit like external libraries by the rest of your program. For example, if you were making a game, and you wrote all of your own code to draw graphics in software, it would probably smart to make a "graphics.h" that includes all of the headers in the graphics module, so that other code can access all of it with a single include. Within the graphics module, however, you would only explicitly include the certain source files that each file needed to access. At least, that's how I would do it.
You should also include each source file's header in the source file itself; if you wrote your header correctly, it will work just fine, and you can use this to catch discrepancies between the source file and the header early on.
In an ideal world, there would be no need for this. We'd have a smarter language that allowed you to just mark each datatype or function as public or private in the source file, and instead of including a header, you'd import from a source file, or import from a namespace or something. That's just the way C is, unfortunately.
If that didn't help, read Learn C The Hard Way. Opinions on it differ, but it really helped me get an understanding of problems like header files that were forming the real and frustrating roadblocks in the way of doing anything interesting in C.
>I never really got why people found pointers to be complicated. I suck at writing C code, but pointers are something I understood pretty easily. It in the name really, it's something that points to (the location of) something else.
That's the easy part. Not many people (if any) struggle to understand that. It's all the consequences of that, and they way it interplays with various C features, that get people confused.
I never really got why people found pointers to be complicated
I find that its often badly taught using truly awful analogies that don't help and are often actively unhelpful as soon as you step outside the very narrow boundaries of the analogy. A simple and easy thing, badly taught, becomes difficult.
I'm curious what exactly it is about pointers that makes them so hard to understand.
Many new programmers seem to warm up to Java "references" just fine, and they have mostly the same semantics of C pointers.
Maybe it's because passing by value is conceptually harder to grasp than passing by reference, and in C you must understand both in order to use pointers?
I have never understood what was so confusing about the concept of pointers. What I can see as confusing though is the way C's syntax works with declaring multiple levels of pointer indirection with arrays mixed in, although that can be mitigated with the use of typedefs.
You may be surprised how much a bad professor or tutorial can derail someone. Almost all concepts seems "obvious" or, HN's favorite word, "trivial" once you know it! But getting to the "a-ha" moment can take varying amounts of time depending on the quality of the presentation of the material.
Well now I feel old. When I was in high school, CS was a required course for graduation and it was taught in C++ (that being the pedagogical language of the APCS exam). It wasn't even that long ago (late 1990's).
Here's another surprising (but fully understandable) perspective given in the OP:
"Function pointers — It was interesting to me to see that such a low-level language contained functional programming concepts."
If you started from assembly and worked up, this looks very different, because at that very low level, functions and data look the same too. (Just like Lisp).
The punchline: "Since calls to known procedures are just gotos that pass arguments, lifted lambda expressions are just assembly language labels that have been augmented by a list of symbolic names for the registers that are live at that label."
How is that assembly has code as data just like Lisp? Is it possible to create executable code during runtime, change the language's syntax, and stuff like that, using assembly?
Assembly language is just a high level description of machine code: each assembler statement maps to one machine instruction. Those machine instructions are nothing but sequences of bytes. It is certainly possible to create executable code during runtime; you just write the appropriate bytes into a buffer, mark the buffer as executable, and jump in.
Back in the old days, before memory protection, we used to do all kinds of code-as-data-as-code tricks. I remember dynamically generating trampoline functions that effectively did partial application. I'd write a little stub of assembly code that would push a bunch of literal zeros onto a stack and then jump to zero; when I wanted to use it, I'd copy it, poke actual values into the literals, and then pass it along as a zero-parameters function pointer.
You don't change the language syntax; that doesn't really make sense in assembly world. It's more that assembly code lives in a world of bytes and pointers, and is itself very directly composed of bytes and pointers, and so it's as easy to use the same bytes-and-pointers concepts on code as it is on data.
This style still exists on microcontrollers, where there is generally no operating system. Instead, your compiler produces an image which you flash onto the controller chip. The controller has some startup sequence which jumps to some address in flash and runs. The chip's subsystems are controlled by registers located at specific addresses; in order to drive such a device you have to know how its address space is laid out, where your code is placed in it, and how your code interacts with the different types of memory and registers located in the address space.
Of course, I did not say assembly is just like Lisp in every respect.
But as long as we're picking nits --
It is of course completely possible to create executable code during runtime, using assembler. Often not advisable. (Recall the punch line of: http://www.cs.utah.edu/~elb/folklore/mel.html)
Reading the story on the link made me feel envy :)
I admire those who really understand how it all works under the hoods.
Although I tried to learn assembly on my own a long time ago, I never went much further than reading some imput and printing it to the screen. It seems it is a very specialized knowledge nowdays.
The best way to learn assembler is by writing an assembler, which you can do in pretty much any language you're comfortable with (https://github.com/rayiner/amd64-asm) with just a manual (http://support.amd.com/us/Processor_TechDocs/24594_APM_v3.pd...) . I think x86 is really not a bad pedagogical tool. For all the crap it gets, it's really a fairly clean architecture. And while we think of 32-bit and 64-bit extensions as having "piled on cruft" what's really happened is that they have made the architecture conceptually cleaner and more orthogonal.
To be clear, even on architectures that allow it (or OS environments: the NX bit on modern x86 boxes means that you can't write to code space without mmap'ing the region yourself), it's a Very Bad Idea. Code and data use different L1 caches (on Sandy/Ivy Bridge, there's even a still lower level "uop" cache for code!), and keeping them coherent after self-modifying code has run is (1) extraordinarily difficult to get right and (2) hugely slow, much slower than simply indirecting on data in the first place.
Literally, yes. It's arguably much more readily apparent that code is just data when programming in assembly than in any other environment. It's all bytes, and you're forced to accept that from day 1.
You aren't going to have the expressiveness that a Lisp provides available to you, but self-modifying code and macros assemblers are common in some assembly programming scenes. Anything to save a few bytes or cycles. I'm a little rusty, but here's a basic example off the top of my head (sorry, wall of text incoming):
Functional programmers are familiar with the concept of "map," an operation that applies a function to every element in a list or what have you. Let's say I'm programming in 6502 assembly, and I have a little function that adds an amount to every byte in a page (or every byte in some particular 256 bytes). Let's say you also want a similar function that instead multiplies each byte by two (just a single left shift), or masks some bits off with a bitwise AND, or whatever. It would look something like this:
store zero in X register, add constant to the memory at the address (some constant address + value in X register), increment X, branch back to the adding part if the zero flag is not set (ie, loop until X overflows to 0), then return to wherever we called this function from.
You could write a handful of these almost identical subroutines, with the only real difference being the single opcode that reads, modifies, and rewrites each byte... or you could just rewrite the opcode at runtime!
Now, your add-to-page, multiply-page, mask-out-page, and any similar functions all share a little block of code that you can think of as "their map," and the actual functions you call initially could look something like this:
Write the constant for the relevant opcode to the instruction in map you want to replace ($7D for ADC absolute,X on the 6502), and jump to the map subroutine.
That's a bit of a contrived example, but in a routine with a more complicated access pattern, it might really make a difference in byte savings: Imagine a routine that clips offscreen entities in a game, that calculates something like "if their X coordinate is below or above some value, they disappear for now. If their Y coordinate is below some value, they fell into a pit and died." You could reuse a lot of the general logic for checking the left edge of the screen for the other two edges, just by rewriting a constant and instruction or two.
A perhaps more readily useful example is rewriting "constant" addresses. How would we rewrite our above "map" routine to modify arbitrary pages, and not just one in particular? We could store the address of the page we want to modify in memory, and use an indirect addressing mode for add. Indirect addressing modes of instructions do something like this: Load two bytes from some address in memory, and treat that as the address we want to operate on. Problem is, this indirect address mode adds two cycles to every operation we perform when compared to the original constant-address map. Suddenly, we're wasting over 500 cycles per map!
The solution is instead to treat the "constant" address in the instruction stream of the map routine as your address variable. Just rewrite those two bytes during your "function prologue," and voila, you have a general-purpose map routine that only uses half-a-dozen cycles or so more than one that only worked on a certain page.
The part I've been leaving out is using assembler macros to automate a lot of this stuff for you. Again, I'm rusty, and I never got all that experienced in writing macros, but someone could very easily write themself a macro (if they're using a powerful macro assembler, like ca65) that takes a single argument, the instruction you want to execute in your map, and generates for that stub subroutine that replaces the opcode for them. I found simpler macros to be more useful in everyday code, though: you make a macro that fills in a gap in the 6502's instruction set, like performing arithmetic between the accumulator and index registers, or basic 16-bit arithmetic, and from then on, you can pretend that the CPU had those instructions all along.
I may have only used these techniques for shaving bytes and cycles off of straightforward routines, but in some ways, going from writing 6502 assembler to writing C, Lua, and JavaScript actually feels like a step back in terms of expressiveness, even if they are certainly more productive languages in actuality, and first-class functions/function pointers cover much of the most practical (and least dangerous) use-cases for self-modifying code. I suppose I won't get that feeling back until I set some time aside to really learn a Lisp.
I took the AP CS exam in 1992 or so. My CS classes in high school were in Pascal, which is what the exam was in. I always thought the syntax in Pascal was a little easier to understand than C.
Variables are declared like this:
name : type;
for example, here is an integer i and a pointer to an integer p:
i : integer;
p : ^integer;
^ reads like "pointer to" here.
If you want to point p to i's address:
p := @i;
@ reads like "address of" here. "at" also makes sense. By the way := is the assignment operator.
And if you want to assign a value to what a pointer points to (in other words, deference it), you can:
I don't know when it switched, but my APCS exam in 2008 was Java. The sample program we had to learn for the exam was also in Java, so I assume that was the default language.
I learned C as first language. I haven't used it for any real-world work, but it was fun way to get into programming. K&R told me not just C but how to program, then I read TUPE which told me how to effectively use the best development environment, Unix.
"For Windows users I'll show you how to get a basic Ubuntu Linux system up and running in a virtual machine so that you can still do all of my exercises, but avoid all the painful Windows installation problems."
A bootable Ubuntu USB stick with persistent storage might be easier to get started with? Can be created from a Live CD.
I dual booted for about a year, and then I used virtual machines instead for the last year. Using virtual machines has been a LOT easier. I use Lubuntu on Virtualbox, and I don't plan on going back.
Things that annoy me about Ubuntu dual boot
* My Wacom tablet is terribly hard to configure on Ubuntu.
* Flash support seems to be hit or miss in terms of out of the box functionality. Sometimes it works right when you install chromium, sometimes it doesn't
* Windows weird partitioning. Windows will only let you have 1 or 2 of these special types of partitions on your HDD, and I somehow maxed out that partition count and couldn't fix it. It was very difficult to work with.
I was thinking more of a group of people meeting to work through the Learn C the hard way book, perhaps in a coffee bar or something, bringing their own laptops. A bootable stick might make a C environment more accessible to them.
The puredyne project produced a bootable linux specifically to provide a common environment for training in audio production. Seemed to work for them.
If you bring a copy of the ISO you can have them all on starting from the same Virtualbox install (takes 5 minutes to set up without knowing the software + Ubuntu installation time).
But I've heard Vagrant is good for use cases like yours, though I haven't personally used it.
That sounds like a fun activity by the way, have fun with that :)
I have 28" monitors, so I had to do some command line configuration to make the tablet respond. It may feel like the appropriate sensitivity out of the box on a smaller screen. There was also no GUI configuration last time I checked.
Re Valgrind: Last time I worked through CTHW, I couldn't use Valgrind because of missing OS X 10.8 support and I felt like I was missing a lot by not using it. I'm tempted to go back and work through it again, this time with a VM.
Spending some time learning C, ML and Scheme is well worth it. Virtually all modern languages derive from these three languages - and all 3 of them are internally consistent enough to be beautiful.
I know learning C might seem a bit of a waste of time, but given Obj-C, C++, Python and Ruby can leverage it for performance it's not a bad thing to learn.
Objective C and C++ can leverage it for performance? Objective-C is a strict superset of C and C++ a near superset. If you learnt Objective C or C++ properly, you already know C (!).
While your statement is technically correct, in practice C is a very different beast from C++/Obj-C. For example, in C dealing with global state is a important problem, while one would just encapsulate the state in a object in C++. So someone who learns just one of the supersets can easily miss all the details which the additional features of C++/Obj-C hide.
I am a diehard fan of C, and even I think that C++ is preferable past a certain size of program. Classes and templates are far too powerful to not use; just consider things like `scoped_ptr`. Just because C++ has nasty parts doesn't mean you have to use them, in my experience most of my worst experiences with C++ come from reading other peoples' code (Boost.Spirit, I'm looking at you).
I'm going to be a contrarian, and say that I've just never found a program that was more readable with classes. Classes tend to rip up your algorithms to shreds and spread them over dozens of different files, which is really painful.
I've been reading through the code in libfirm lately and it's incredibly refreshing. A pass is usually in a single file (how novel!) instead of being scattered around because of the visitor pattern.
Oh, well the dot syntax is just, well, syntactic sugar, and arguing that THAT is the reason anyone (who knows what they are doing) uses C++ is silly. It's because of inheritance and DRY so you write less code.
Right, and my point is that I've never seen a situation where inheritance really makes things dramatically clearer versus the alternatives (indeed, the mantra in C++ these days is that you should have "has-a" relationships with composition versus "is-a" relationships with inheritance).
I think inheritance/OOP encourages you to draw module boundaries in the wrong place. It encourages you to spread the logic of an object over several modules of code. If you find yourself using inheritance, you'd better ask yourself: could I instead abstract my base class into a self-contained object that my new object could hold a reference to?
I used to be super into using Boost, etc. I found that it became really unmaintainable (though compiler errors might be better now). My personal style is basically C with STL containers.
Who on earth thinks it's a waste of time? I think it's essential for every programmer. I certainly wouldn't consider someone fluent in the language (say, Python) until they understand how it is implemented.
Also, C++ is arguably faster than C (for the effort put in) and Objective C is a strict superset. Saying either calls into C for performance reasons is pretty silly and doesn't make any sense.
From what I understand in formal CS courses, you learn from the bottom up, but learning from the top down was much more enlightening for me.