
Scalable C – Writing Large-Scale Distributed C - ingve
https://hintjens.gitbooks.io/scalable-c/content/preface.html
======
jasode
_> The more C++ you know the worse you become at working with others. First,
because your particular dialects of C++ tend to isolate you._

So why is using something non-standard and non-universal like <czmq.h> not
perceived as a "dialect" of C?[1] Any non-trivial source codebase beyond _"
printf("hello world")"_ will be a "dialect" of the programmer(s). When looking
at the C source files of Linux kernel, Redis, and SQLite, etc, the syntax
patterns, helper macros, string manipulations, etc do not look the same.

Also the author's example of,

    
    
      for (i = List.begin (); i != List.end (); ++i) 
        cout << *i << " ";
    

is not the same semantics as:

    
    
      char *fruit = (char *) zlist_first (list);
      while (fruit) {
          printf ("%s ", fruit);
          fruit = (char *) zlist_next (list);
      }
    

The C++ loop is multithread safe. The C version is not. For the zlist_next()
to work, the "list" data structure needs to maintain mutable state in between
subsequent calls. (Think of how something like strtok() works by mutating the
string).

[1]possibly because Pieter Hintjens is the programmer & CEO behind ZeroMQ and
czmq.h. Therefore, it doesn't feel like a dialect to him?

~~~
PieterH
CZMQ is a library of around 30K lines of code. The Scalable C book is in many
ways a guide to using that library in real projects.

Of course every programmer develops their own dialects, just as every writer
has their own voice. C is however a small language and if you stay away from
weird macro magic, any well-written C code is mutually intelligible to other C
programmers. Whereas with C++, dialects can have so little overlap they are
not mutually intelligible.

FWIW neither of the fragments is safe if you are sharing state between
threads. In practice we do _not_ share objects between threads, and each
object holds its own state, and thus our C code is 100% thread safe and
reentrant.

~~~
jasode
_> The Scalable C book is in many ways a guide to using that library in real
projects._

I'm not criticizing the book or czmq. I'm sure it's a fine library. I just
found your characterizations of C++ to be strange.

For example: " _> , any well-written C code is mutually intelligible to other
C programmers._"

If you're going to qualify C code with " _well-written_ " to help make your
point then can't the same qualification be applied to C++? If so, it means you
believe that there is " _well-written C++ code_ " that is simultaneously
unintelligible to other programmers. In your opinion, what would be an example
of that? (E.g. If you can point to github of unintelligible but well-written
C++ source code.)

And your other statement: _> , because your particular dialects of C++ tend to
isolate you._

I don't know if CryEngine and Unreal game engines are well-written C++ but
they seem attract developers. There's also the scuba diving app[1] written in
C++ that also has Linus (who you already know hates C++) modifying the source
code. I contend there are bigger factors than C++ dialects causing isolation.

The C++ _ABI_ may be more isolating than the C _ABI_ because of C++ name
mangling incompatibilities. But I don't see how the C++ dialects (of well-
written code) are isolating.

[1][https://github.com/torvalds/subsurface](https://github.com/torvalds/subsurface)

~~~
PieterH
Since you ask, I'll admit it: I enjoy trolling C++ users because the language
has so often thought of itself as superior to C. Beating on the language
always gets lots of discussion going, which is fun. I've nothing against the
language, or any other language, as such. Good tools, in the hands of good
developers.

~~~
unscaled
I find it the other way around. It's usually C hackers who tend to describe
C++ as the devil incarnate. C++ developers tend to be more pragmatic - we'd
happily write in C if necessary, we just find it limited. And since C++ is
almost a superset of C, there's rarely any need to use it.

------
SFjulie1
Exactly the kind of code I make.

I use C for performance.... as an extension for python that have a GC and a
GIL. But more than never, I first use numpy (fortran) because it is dazzling
fast and has specialized tricks of digital signal processing availables
(ifft). And I do C after the profiler says to do so. When needed. If needed.

Most of the dynamic data structure (message sent with zmq) the config, the
parsing are better handled in python.

And since I do as much sloc / day in C and python ...

(1 python loc = 6 C loc)

I code 6 times faster.

And I don't have the headeaches of the dependency management

I am totally okay with C, but when doing distributed system more often than
never you also multi-threading. And C is not builtin for thread safety, it is
harder.

So I C some masochisms at work here.

And I am pretty sure I am not the only coder thinking this book is full of
pedantism and of advices not to be followed and it is empty.

How did such a poor news made it to the top?

~~~
vidarh
> more often than never you also multi-threading

This is a Windows-ism that's crept into Unix-likes over the years, and it's
not a good trend. Just don't do it. Especially since sharing direct access to
state makes it harder to decouple components to scale them further. This book
specifically argues to pass state via IPC for a reason.

> And I am pretty sure I am not the only coder thinking this book is full of
> pedantism and of advices not to be followed and it is empty.

Maybe, but iMatix has been a successful company for more than two decades, and
delivered impressive open source applications (e.g. Xitami web server) and
code-generation tools (Libero ec.) as well as cross-plaform utility libraries
for C (SFL etc.) already two decades ago, and have gone on to develop large-
scale C-based distribued systems and in the process developed things like AMQ
and 0MQ that's been incredibly successful, so I for one tend to at least pay
careful attention as they actually do have a track record.

~~~
SFjulie1
I know a lot of successful companies with products that are notoriously poorly
coded : \- "security software"; \- "game industry"; \- "car industry";

And if you read correctly I think the biggest issue is TIME is MONEY.

What are the advantages of using techniques that:

\- are expensive (productivity is constant in sloc whatever the language);

\- are notoriously a systemic risks given their domain of work and is hard to
audit;

\- that can be smoothly achieved by upgrading a faster to build architecture
in a scripting language ;

They may have bigger balls of steels than me doing it in C and be the best
programmers in the world. My question is business oriented: what is the
economic rationale of a full C solution from start?

~~~
ArkyBeagle
Beats me.

I know that if I had a large enough project and had to add people, I have a
dozen people in my Rolodex who speak 'C' at the expert level and that they
will perform. But depending on the problem domain, that might mean C++ or it
might mean Python or something else.

But given the level of hostility the language inspires, I have to wonder. To
wit "bigger balls of steel" and "best programmers in the world." Both
sentiments are quite foreign to the sorts of environments I've worked in, I
assure you.

~~~
SFjulie1
Sorry for you then. Living in such a boring world, and what a disdain-full
answer that makes my point.

I have add my share of conferences technical or about FOSS. And I met peoples
with a _lot of_ passion ... and code delivered. You probably use their
software daily.

I have been using more than 13 langages ranking from C to forth, matlab, vhdl,
spice, python, perl and php.

There are definitively cultures associated with languages and beliefs.

Perl community is thinking coding is like speaking/writing a foreign language;

Ruby about you IQ and technical skills are totally correlated with how nice
your apple laptop looks like and how expensive it is. They are our hipsters
(troll);

Python secretly hides a sect hating braces and everything that looks like C
and believe C coder can't make safe thread code, malloc, correct string
handling. And they hate braces.

for c++ coder referring to linus torvalds rant would be the spirit.

Java coder believe in the utltimate safe portable VM and the power of GoF. And
think people look the wrong way;

Haskell thinks of themselves as alchemists loving to use obscure terms coined
by an hallucinated metaphyscian priest that said ET must exists and that no
one will notice. They still laugh of their ultimate joke;

And C coders think that only them are the pure programmer, the only one that
can see the matrix between the purity of abstraction and the undetermination
of hardware/norms due to the imperfection of the humans. But, with their
discipline that is above the norm (no noob accepted) they can fight the God of
Entropy

FORTRAN coder think that computers are a pain and would just like to have
exact figures much more than nice looking interface and wonder when a correct
intuitive language will appear (<\--- My sect)

They appear maybe because for each language comes a practical field of use and
that one computer language cannot fill all the needs.

The need for correction and exactitude in science conflicts with the "ease of
use" of numbers.

The need for having cheap workforce conflicts with efficient cheap to maintain
code;

The need for preventing embezzlement (origin of SQL) conflicts with creative
accountability;

At one moment, at my opinion C is like a middle age corporation. Trying to
promote a one best way of CS that always boil down to C.

C community maybe "professional" as opposed to "enthusiasts". But I think it
does not always serve them.

And I do not think that recognizing Computer Science is a peaceful uniform
land, but an arena full of organic entities in conflicts with logical distinct
rationalities for the same resources.

In short, I have the write to mock other cultures.

~~~
ArkyBeagle
Nicely put. Very nicely put.

I don't know how you came up with "disdainful"; it's more sort of sad and
weary as I read it now. After all, I started with "Beats me" \- such a
decision would have to be very local. The first rule of 'C' is "don't use 'C'"
these days... the people I know _DON 'T_ swagger; that was my point.

The "professionals" vs. "enthusiasts" divide is extremely interesting in all
fields of endeavor. I'm definitely on the "professional" side.

I... don't think 'C' programmers are "above any norm"; they just sort of know
where the rocks are right under the surface of the water. It's more difficult
to explain than to do. If a bunch of people misrepresent themselves as ...
badass because they sling 'C', I can't help that. The appropriate mentality
for it is one of caution. I specifically called that out here...

It also matters less because coding a system is roughly 5-10% of the actual
cost of most deployed systems. Language matters much less than mechanism.

Meanwhile, the worst horrors are inflicted using systems like SAP.

Don't feel _too_ sorry for me; I use at least three language systems every
day, and have messed with ... dozens ( all resulting in deployed code at some
level) , including graphical CASE tools.

------
halayli
This looks like it's coming from someone who doesn't know C++ well and is just
coming up with reasons to fit their bias. The fact that he/she didn't mention
any disadvantage to the C code written beside verbosity makes it clear.

For one, it's easy to forget to call zlist_destroy. Who owns what in C can get
very complicated and you can run into dangling pointers. At least in the C++
version you can manage ownership easier in their case.

I am not defending one language over the other, I use them both and have
experienced the advantages and disadvantages of each.

What's being shown in this book is not how typically you create link-lists.
man queue(3) to see how it's generally done.

The C++ for-loop is not how you typically iterate over a list , again the
author decided to show a bad example to confirm their bias:

    
    
      for (const auto& i : List)
        cout << i << " ";

~~~
unscaled
More likely, someone who hasn't programmed C++ in the last 10 years.
Forgetting auto, and using the cumbersome 3-part for loop with iterator
boilerplate when you only need value shows age. Initializing the list is also
easier now, with initializer lists syntax, so you could just do:

    
    
      list<string> lst = { "tomato", "grape", "apple", "orange"};
    

and cut another 4 lines, making the C++ line count half of C version. Not a
negligible difference, as the author claims.

------
nickpsecurity
Nice work in progress, Peter. Look forward to seeing more of it given your
prior work. I light how you preempt many C-related counterpoints with model-
driven development that generates C. Done excellently by iMatix and many
others. I'm especially interested in how you'll apply that to distributed C.

------
petke
I'm a cpp programmer who recently spent a week learning zeromq to replace
named pipes in a project. By the end I was disappointed by the cpp language
bindings as they only cover the low level library. Had I known from the start
I probably would have looked elsewhere. Its a shame cpp is ignored in much of
the open source community in favor of c. If nothing else Cpp after all is a
safer c.

~~~
dschiptsov

       http://250bpm.com/blog:4
       
       http://250bpm.com/blog:8

~~~
petke
Yes I read those before. I didn't find them convincing. Intrusive lists is an
anti pattern that you can also do in cpp if you want. Getting rid of
exceptions doesn't mean you get rid of errors. It just means you can more
easily ignore errors and continue running a corrupted program. But the big
picture though is that cpp I a safer language. A core library might be written
in c for whatever reason. But its good to provide a wrapper in a safer
language for users to use.

------
neikos
> _/ /Solution: make /usr/local writeable.//_

> _This is a brutal and effective solution, the best kind of solution_

I... uhm, what?

> _Solution: grab the latest CZMQ git master from github._

No, you do not want to run your software off of master, and the fact that
Master doesn't always build (because of errors) should be a fringe occurrence
with CI now being free/cheap and highly flexible.

~~~
michaelmior
I'm not sure about CZMQ, but I assume what you eat is that you don't want to
run code from a development branch that's rapidly changing. That's not
necessarily what master is in all projects. The master branch is sometimes
used as the latest stable release.

~~~
neikos
True, I forgot about that aspect. However in this case that doesn't apply
either as a stable branch should always compile.

------
lukaslalinsky
It's funny how this centered around ZeroMQ, which is written in C++.

~~~
geocar
This might be part of the reason:

* [http://250bpm.com/blog:4](http://250bpm.com/blog:4)

* [http://250bpm.com/blog:8](http://250bpm.com/blog:8)

~~~
jeremyjh
Which are weak arguments that point more to the author's dissatisfaction with
the architecture of libzmq than with problems in C++ language. This was
discussed previously here:

[https://news.ycombinator.com/item?id=3953434](https://news.ycombinator.com/item?id=3953434)

------
vidarh
Github repository:
[https://github.com/hintjens/scalable-c](https://github.com/hintjens/scalable-c)

------
tom_mellior
I'd be interested in this if it were nearing completion. I think structuring
the book around problem-solution pairs is a nice technique. But it would be a
much better read if fewer irrelevant statements of opinion were strewn in.
Also, a lot of the bizarre statements of irrelevant "fact" should be checked,
for example:

> In the Old Times, creating a repository was days, weeks of work.

I can't begin to comprehend what this may mean. "svn create" (or whatever it
was called) was always instantaneous. Setting your project up for network
access took longer because you had to read docs and write a config file, but
the same is true for Git.

> Optimizing compilers (...) may remove assertions.

Bullshit. Using NDEBUG removes assertions, and yes, this indeed means that
assertions must be side-effect free. But an optimizing compiler? No. If that
actually happened for calls to impure functions (and no, it really doesn't
happen), it would be a major compiler bug.

> a nasty reminder of the old days when computers stored data and code on
> different kinds of rust, and languages enforced that

Code and data do live in different places in memory; nowadays more than ever,
for reasons of security. C's original "declarations before statements" rule
(the context here) was simply because it makes it much simpler to write a
primitive single-pass compiler.

> The standard C library often puts destination arguments first, which is a
> hangover from assembly language. MOV X, Y.

... or maybe it's an analogy with assignment statements, X = Y?

Note that three out of these four examples are just irrelevant opinions, so
they should be removed from the text even if they weren't factually false.

~~~
PieterH
Creating an svn repo was fast, yet you could not use it without a dedicated
server, DNS configuration, security configuration, firewall configuration,
etc. etc. If this was your only job, sure, a few hours' work. For the rest of
us, begging a sysadmin or spending days learning the details.

Whereas with git it's literally "git init ." or clicking on Github.com and
we're ready to roll.

I do appreciate the fact checking, and you're welcome to send me more
comments. Errors of fact don't survive the editorial process, one hopes.
Opinions, that's a different story.

~~~
tom_mellior
> with git it's literally "git init ."

That doesn't magically give you a shared, network-accessible repository with
all the correct access controls.

> or clicking on Github.com

SourceForge has existed since 1999, and after a click you have always been
ready to roll.

> and you're welcome to send me more comments

But I probably won't if your strategy is "spread misinformation first, then
make others work to point out mistakes, then defend an indefensible position,
then maybe change it". That's not how communities are built. That's what you
yourself criticize in the section on merging strategies...

------
bluejekyll
> then you know where C stops working, as a language.

He actually makes a really strong argument against using C right in the first
two paragraphs.

C is a dangerous language. Assembly is even more dangerous. There are
languages that compile to close the same speed and are systems oriented with 0
overhead.

I'm truly curious, if you're working on a new project would you pick C? Or
would you reach for something that's going to reduce the bugs that inevitably
come from writing even 10 lines of C?

~~~
chris11
What languages would you personally pick over C?

~~~
bluejekyll
Rust, no debate.

------
signa11
this seems to be still it is early nascent stage, with a complete toc missing,
most likely, in the works. caveat emptor.

~~~
PieterH
Yes, indeed. I've updated the book title on Gitbooks to make this clear. I'm
writing and publishing the book piece by piece, to get feedback early on in
the process.

------
magicmu
I know the basics of C, but stopped short of getting deep into threading and
concurrency since it seems like Go and Rust handle that in a more efficient
way (although there's no way I would use Rust in production yet). Are there
any advantages to using C/C++ for a new large-scale project?

~~~
steveklabnik
Just for curiosity's sake, what specifically would make you not use Rust in
production yet?

~~~
OopsCriticality
Not OP but from the perspective of the industrial side: no track record, no
formal standard, changes too fast, incomplete documentation, doesn't have an
extensive commercial and supporting ecosystem (e.g. Parasoft, Java Path
Finder), limited pool of experienced programmers with embedded and regulated
environment experience. Arguably, it falls under the heading of "too new".

I'd prefer to deal with known knowns rather than the known unknowns or _gasp_
unknown unknowns of something new. It's a very conservative position, but it's
borne out of the expense associated with mistakes and corrections of.

~~~
steveklabnik
Cool thanks! I'm trying to figure out what blockers are so we can prioritize
things; a lot of these are very reasonable, but not immediately actionable
things for me. Sounds good. :)

~~~
OopsCriticality
Sorry I can't offer anything more specific and actionable; I guess comparing
Rust to a fine wine, something that must be aged to reach full potential, will
have to do :)

~~~
steveklabnik
Hehe, no need to be sorry. It's one of the best answers, actually: it means
that there aren't any fires, it's just about playing the long game and letting
time pass. I prefer that. :)

------
_pmf_
> While C lends itself to building libraries, it has no consistent API model.

What language has? Wouldn't this require first class modules (which few
systems have; JS' hacked together solution is obviously not to be considered a
true solution)?

~~~
vog
OCaml has a typesystem in which modules are first-class citizens, just like
functions. They have a clear separation between interface (they call it
"signatures") and implementation. The compiler enforces that you can only
write against the interface, making modules with the same interface really
exchangeable. The modules are also parametrizable (they call such modules
"functors").

[http://caml.inria.fr/pub/docs/manual-
ocaml/moduleexamples.ht...](http://caml.inria.fr/pub/docs/manual-
ocaml/moduleexamples.html)

[https://realworldocaml.org/v1/en/html/functors.html](https://realworldocaml.org/v1/en/html/functors.html)

[https://realworldocaml.org/v1/en/html/first-class-
modules.ht...](https://realworldocaml.org/v1/en/html/first-class-modules.html)

~~~
tom_mellior
That's all true, but it doesn't mean that OCaml has a "consistent API model",
whatever that may mean. Unless "provide a fold and a map for all datatypes",
which I guess is consistent across most APIs, is a "model".

------
doodpants
So far I've only read the Preface and part of Chapter 1. What bugs me is this:

> * Write portable code that runs on all platforms.

Ok, good plan.

> * An operating system you are comfortable with. Linux will give you the best
> results. OS/X [sic] and Windows are usable if you have no choice.

So... results vary by platform? And then after the "hello world" example:

> And you should see that familiar Hello, World printed on your console. If
> you are using OS/X [sic] or Windows, it won't be this easy. I'll repeat my
> advice to install Linux.

Funny, this example works just fine for me on OS X. You do realize OS X is a
Unix-like system, right?

> Having said that, remember this rule:

> Linux is the native environment for C development.

Gee, I wonder how people like Dennis Ritchie ever managed to write C code
before Linux came along?

------
jheriko
those three points in the bullet list near the start all seem to miss the mark
for me.

------
fizixer
Love it.

I dream of the day when all current system-level fads bite the dust, replaced
by new fads, while C is still running as the system layer. (hint: just like it
is happening today with the 90s fad called C++, replaced by fads like Go,
Rust, D).

~~~
dunerocks
lol? C++ is hardly a "fad"!

~~~
kev009
Well, IMHO it was, and C was too (google books magazines from 80s and 90s).
The people left using both languages are usually doing so deliberately rather
than because it is trendy.

