Hacker News new | comments | show | ask | jobs | submit login
Kore: a fast web server for writing web apps in C (kore.io)
271 points by api on May 17, 2015 | hide | past | web | favorite | 170 comments



Awesome! I've used https://www.gnu.org/software/libmicrohttpd/ in the past (for fun stuff I was building to learn C), will try this out instead. But..

> Only HTTPS connections allowed

Seems a bit painful. HTTPS on the webserver itself is a fair bit more painful to setup and administer than HTTPS in your reverse proxy / loadbalancer (I prefer nginx). Web servers should support plain HTTP.


You can turn off TLS on Kore.

$ make BENCHMARK=1

It is not a run time option by design, but it is there.

I want Kore to have sane defaults for getting up and running. That means TLS (1.2 default by only), no RSA based key exchanges, AEAD ciphers preferred and the likes.

edit: spelling


RSA with DHE or ECDHE is a sane handshake. I would avoid DSA and ECDSA based key exchanges because they fail catastrophically with bad random number generators. For most APIs session caching is more important than a faster initial handshake.

The HTTPS only choice would annoy me a lot because I run most HTTPS services in behind a reverse proxy in a FreeBSD jail on the same host. HA proxy and nginx are still superior to most applications in regard to reliable TLS termination.

Using HTTPS by default a the right choice for a new project but offering no HTTP support (outside of a benchmark) patronizes the user.

All in all this looks like a nice way to export C APIs through HTTPS.


Thanks.

I agree the BENCHMARK build option is a bit confusing. I might end up renaming it altogether.


For sanity sake, this build option is now NOTLS.


Everytime something like this comes out, I think the following : Wouldn't it be great if I could develop and debug this under Windows with VS and deploy my release with Linux. The network libraries are always Linux(and family) only.


I don't see a reason why you can't make it cross-platform, with a bit of effort: https://github.com/jorisvink/kore


I'm not trying to antagonize here, but what do you find so bad? I would call it a little bit more painful; three lines to add to an nginx config, along with generating the cert. Maybe 10 minutes of work? Thirty if you're getting a CA to sign your cert for you. I could see pain if you need to wait for finance to approve, or if you're trying to get domains validated on behalf of a customer. And I suppose it adds another setup step to wireshark (if you need to debug neat bugs), but that's a set-it-up-once-and-forget-about-it thing.


It's a scalability problem.

If you have many web servers behind a reverse proxy that takes care of TLS it's often pointless to waste processing power on encryption in between.


Yep. And on top of that a reverse proxy is unable to inspect the requests to route them smartly (based on path, cookies, etc.).


Well, the reverse proxy would have made the backend request itself, so it's free to inspect or modify anything.


I'm curious- why C? Strings, scoped objects and C++11 move operators seems much safer and clearer from an API perspective.

The complaints about C++ seem to mostly be around the ability to abuse the language, not specific issues that C solves. Something like https://github.com/facebook/proxygen seems like a better API.

And I don't quite buy portability- if it's not a modern compiler with decent security checks then I'm note sure it should be building web-facing code.


I've been building an HTTP/1.1 server in C++11. Along with a C++ wrapper around SQLite, I've been having a lot of fun putting some lightweight forum software together. I definitely enjoy the code structure and compile-time safety over PHP.

Using a threaded model with tiny stacks, and std::lock_guard for atomic operations.

The biggest downside is you have to run the same OS your server uses on your dev box (which is what I do); or you have to upload the source and compile the binaries on your server directly. (or have fun with cross-compilation, I guess.)

To answer the inevitable "why?" -- for fun and learning. Kind of cool to have a fully LAMPless website+forum in 50KB of code. Not planning to displace nginx and vBulletin at a Fortune 500 company or anything.

Still wishing I could do HTTPS without requiring a complex third-party library.


Go would solve the problems you mentioned (cross compilation and HTTPS support), and would also offer first-level support to many web concepts and protocols that you need to implement from scratch in C++ as the ecosystem is not there. Of course if it's just for fun, then anything goes :)


I am so glad that my company chose to write the majority of our web oriented frameworks in Go. It's simple to the point of boring for most of this stuff.


I find this so much more pleasant to use than the alternatives. There's a portable distribution of it which you can put in your project and statically link.

http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/...


Just put HAProxy in front of your server. You'll want that anyway most likely.


The biggest downside is you have to run the same OS your server uses on your dev box

You should check out Docker.


Docker only helps with portability between Linux distributions, not operating systems.

You can't run it on Windows, Mac or the various BSDs without a VM, at which point you're running "the same OS your server uses".


Yes and no.

I use OS X for my main machine, and can happily deploy something to a server running 'not OS X' and be confident that all libraries and dependencies are exactly the same in production as what I was using in development.

Sure Docker is running through a stripped down VM on the Mac, and so technically it's "the same OS my server uses", except it's abstracted away so that for all intents and purposes I'm using OS X for development (and email, browsing, and other things) and deploying hassle free to Linux.


That's not true; Docker runs great on OSX natively. You can't run the containers there, but the client itself is fine and largely abstracts away the whole "where am I building" issue.


Minor nitpick - Docker runs great on OS X, but it doesn't run natively, it runs on a minimal Linux VM - Boot2Docker.

I agree that the reality of the situation is that it's abstracted away enough that the distinction isn't really all that meaningful.


No, the docker program itself does indeed run natively on osx. You do not need to use boot2docker or any other VM.

You can't run any containers on osx, but docker itself runs fine as a client binary to docker running on a linux server. I use this configuration daily using a native osx docker binary on my workstation speaking TLS to a docker service running on a CoreOS server.


C++ is really horrible, everything feels like an after thought. The new unique_ptr and shared_ptr ends up with really ugly code, the concept is good but wow...

Can we move away from this horrible language already?

http://harmful.cat-v.org/software/c++/ http://harmful.cat-v.org/software/c++/coders-at-work http://harmful.cat-v.org/software/c++/linus http://harmful.cat-v.org/software/c++/rms


I don't know about your case, but most of the time I found complains like yours , is from people who never actually tried the language, or just heard a bad story or have one bad experience reading a bad legacy code.

I think Torvalds and Stallman fall in that category actually. Stallman even mention generics which C++ doesn't have and it is very different to the templates mechanism that C++ does have.

As for the people who actually uses the language (specially after C++11), most of them would tell you that there is a great subset of the language that works for them. A valid argument I have heard before is that subset is different with each people and the problem arises when maintaining somebody else code. Well that happens with every language (Have you ever had to debug a memory corruption bug in an old C code with void* all over the place? Hint: it's not pleasent )

C++ is a language that allow you to use abstractions with a very reasonable performance and very reasonable resources, and in that niche there is not a real alternative.

Yes, you can have a fine tuned VM running Java or .Net code that might be comparable with C++ but with a cost in memory. I don't think Rust or D are ready yet to be a real competitor.

So... no... we can't move away from this "horrible" language yet.


On the contrary I always heard several years ago how good it was and it was the language to rule them all. Now those years have passed and I used it extensively from time on and off, even jumping on the C++11 standard as soon as it was available to start working in. Now I know the dismal truth about the state of C++ projects.

Now the people who defend C++ or push it to every project (like in this thread) is almost always someone who sit exclusively in C++. You know, the people who think they are experts in C just because they know C++ (like in this thread)? There is the real incompetence.

I also fail to see how hunting down a memory corruption bug in C will make C++ look better, and since you mention it is legacy code it is the same in C++.

It does not matter what language I suggest, the real fact is that C++ user will use C++ for ANY project regardless. So no, we can't move away from this horrible language yet, but not for the right reasons.


> "I also fail to see how hunting down a memory corruption bug in C will make C++ look better, and since you mention it is legacy code it is the same in C++."

It doesn't, I meant it as just an example that every laguange has its own caveats.

>It does not matter what language I suggest, the real fact is that C++ user will use C++ for ANY project regardless.

That just not true in my experience. Actually none of the C++ developers that I know uses just one programming language for everything. They pick the language depending of the requeriments of the project (and that's how it should be isn't it?)


I use it a lot, and find it to be quite tolerable once you get it. Its verbosity is a wart, but it's understandable when you understand what it is. ObjC has [[similar syntactical] issues].

The trick to it is not to over-use or abuse the language's many features. I tend to write C++ that is a fairly thin layer on top of C, using the STL for data structures and algorithms but using things like my own templates very sparingly.

I like Go, but it's really only well supported server side. Rust is promising but not ready for prime time.


Go seems to be slowly replacing C for server side and commandline utilities.

Server side due to its memory safety and easy concurrency.

Commandline utilities due to its easy to distribute staticaly linked binaries.


... easy to distribute HUGE statically linked binaries. Go solved the DLL hell and/or VM platform hell problem by just punting it.

Of course, I have wondered for quite a while if it might not be interesting to do away with DLLs in favor of static linking and then use both disk and memory deduplication to handle the efficiency issues.

Of course the problem here is: what happens when a major bug like Heartbleed is found in a library used by 500 things?


> Of course the problem here is: what happens when a major bug like Heartbleed is found in a library used by 500 things?

You run away screaming. Static compilation used to be the only option 30 years ago, and there are multiple reasons why mainstream moved away from it.


C++ is a swiss army knife that works with C, which I have learned very well and have great tools for. I understand the concern, but moving away is not so simple. That being said, I prefer well-designed C-style libraries in a lot of cases, since C++ libraries often have opinionated styles (the language is huge, after all).


Actually C with a pool allocator is an excellent language for writing a web server. I wrote one in C 15 years ago, and the code is very elegant and simple:

http://git.annexia.org/?p=rws.git;a=tree


It seems that people are so intimidated by the infamous complexity of C++ that they don't even want to bother getting more familiar with it.

So, although technically the existence of C doesn't make sense, as it is superseded by C++ (except couple of things), C is winning in the branding department.


I don't know if this is the reason why people choose C over C++ for some projects, but if language complexity is the reason, it isn't that people are just "so intimidated by the infamous complexity of C++ that they don't even want to bother getting more familiar with it".

In the 90s, C++ was much more popular than it is now. It was used as the go-to general purpose language for all kinds of "serious" software (not addressed by VB or Delphi). Early on, almost everyone was very impressed with the power C++ brought, but after a few years, as codebases aged, it became immediately clear that maintaining C++ codebases is a nightmare. The software industry lost a lot of money, developers cried for a simpler, less clever language, and C++ (at least on the server side) was abandoned en-masse -- almost overnight -- in favor of Java, and on the desktop when MS started pushing .NET and C#. So while today C++ is servicing its smallish niche quite well, as a general-purpose programming language for the masses it actually proved to be an unmitigated disaster. It is most certainly not the case that C++'s infamous complexity is "intimidating"; C++'s complexity is infamous because of the heavy toll it took on the software industry. Which is why, to this day, a whole generation of developers tries to avoid another "C++ disaster", and you see debates on whether or not complex, clever languages like Scala are "the new C++" (meant as a pejorative) or not.


I also feel like a lot of people are biased because of Stallman and Torvald's stance on C++. I kind of have an irrational distaste towards the language, mostly due to their influence.


And Yosef Kreinin. Not arguing for or against the FQA, but its mere existence catalyzed (not fed) my journey of critical thinking towards C++.


and most of the people in coders at work, including

https://geeki.wordpress.com/2010/11/21/ken-thompson-on-c/

(when this topic comes up, I always find it funny that url's rarely contain ++ or # when talking about the descendants of C)

and apple of course, with their objective c.


I find funny that the by investing into JITs instead of optimizing compilers for Java and .NET, C++ managed to get a spot in HPC.

Now both Java and .NET eco-systems are getting there AOT optimizing compilers. .NET with MDIL and .NET Native, Java targeted for Java 10 (if it still goes to plan).

Also, thanks to Oracle's disregard by mobile platforms by not providing neither JIT nor AOT compilers for Java, C++ became the language to go for portable code across mobile OSes when performance matters.


> I find funny that the by investing into JITs instead of optimizing compilers for Java and .NET...

But that's because Java is meant to cover a very wide "middle" -- plus maybe a few corners where possible -- rather than every possible niche.

> Also, thanks to Oracle's disregard by mobile platforms by not providing neither JIT nor AOT compilers for Java

Well, it didn't start out that way, did it? But mobile platforms -- because they're rather tightly controlled -- are, and have always been, much more driven by politics than technical merit.


I am (or at least used to be, at some point) fluent in C++ and have built large applications with it. But I'd still rather write C, because its infamous simplicity is of a lot of use to me -- whereas I found that much of C++'s complexity tends to solve a lot of problems UML designers think we have, and very few problems that we actually have.


Not really. I'm sure this happens some of the time, but I suspect that's more common with developers who haven't written either.

In my experience, a lot of the developers who prefer C to C++ are developers who wrote a lot of C++, found that it only improved productivity in solving problems that it created, and went back to C and realized how much easier it is to write software in C.

C++ gets you caught up thinking about problems that don't even matter.


one of my favourite quotes from stroustrup says that 'don't use malloc and free, unless you want to debug 1980s' problems.

which i find curious, given the truism about debugging and understanding code taking longer than writing code.

personally i would prefer 1980s problems people in general know the answer to, rather than up to the minute esoterica.

all that said, the scott meyers modern c++ book is out, so i should really start and finish that before forming an worthy opinion in 2015.


I would recommend you to first read "Tour of C++" or another book that explains better the new features on C++11. Meyers explain very well some of edge cases of move semantics but if you don't understand the concept first then you might not have a good experience reading the book.

>"says that 'don't use malloc and free, unless you want to debug 1980s' problems."

The problem with malloc and free is that they left to the developer to keep track of the references and memory corruptions bug are not pleasant to debug. RAII idiom actually helps a lot in that context, and that is what Strouptrup was referring to.


thanks for the recommendation.

i think malloc and free are a lesser evil than this stuff:

http://thbecker.net/articles/rvalue_references/section_08.ht...

feel free to disagree!


The link that you provided does not have to do with malloc and/or free but with perfect forwarding:

http://thbecker.net/articles/rvalue_references/section_07.ht...

Also that example uses a Factory pattern which is actually verbosing the example, but it's like comparing apple and oranges since that's OOP which you actually wouldn't be doing in C.

What you would be comparing is something like:

mystruct * stcVar = malloc(sizeof(mystruct)); free (stcVar);

Vs.

std:shared_ptr<mystruct> stcVar;

Of course that is a stupid example; However things gets more interesting when you have pieces of code sharing the same structure (threads maybe?) and you don't know exactly which code should be the one in charge of releasing the pointer.


i think you misunderstand.

clearly, it is C++, not C.

instead of a simple malloc and free, you will invariably end down the rabbit hole learning about rvalues. perfect forwarding, move semantics, lvalues - the list goes on - these topics arise from the simple concept of RAII.

which i find worse than malloc and free.


>i think you misunderstand. >clearly, it is C++, not C.

I got that, what I said is that you wouldn't have that problem in C because it arises when you are doing OOP, and most C devs would not use OOP. Also, keep in mind that it is perfectly fine not using OOP in C++.

> instead of a simple malloc and free, you will invariably end down the rabbit hole learning about rvalues

That's not true. Actually you can be a very decent C++ developer without knowing the notion of what an rvalue is. Move semantics is just an optimization to avoid extra copy of objects, so it is completely optional.

At the moment of writting this, there is an entry on the front page with an example of a modern C++ piece of code. You would notice that there is not a single explicit heap allocation nor any other crazy stuff.

https://news.ycombinator.com/item?id=9560667


i think moving data around in memory is not an exclusive feature of OOP and i hesitate to guess why you think that operations similar to those exposed by the 'move semantics' are not something undertaken (regularly!) by c developers using malloc and free.

> instead of a simple malloc and free, you will invariably end down the rabbit hole learning about rvalues

truth is subjective to me, in my experience, when you program C++ you will end up exploring a myriad of vast expanses of language features. you may feel that someone can be a very decent C++ developer without knowing that, plenty others would be aghast that someone ignorant of rvalues would describe themselves as 'very decent'.

remember the ostensible creator of the language rates himself at 8/10.

i think that front page example is rather interesting, given the number of #using directives - it gives an indication of the time that developer has spent learning the language. most of them are not trivial to understand to the degree that this guy has. also reading his background, i'm quite sure he knows what an rvalue is.

as you can imagine, i don't really mind explicit allocations - as an awful lot of knowledge is required in C++ to deal with implicit allocations.

in my opinion, there's a lot of 'crazy' stuff there, the short linecount is a product of the author having done his homework.

i get the impression that you and i would use very different subsets of C++. mine would be far smaller! i usually confine myself to whatever idoms the libraries i pull in use and go no futher.


The exietence of C makes perfect sense thanks, it's a relatively small and simple language with masses of flexibility.

C++ adds masses of complexity and implicit behaviour. While development in C++ can be quicker and might be 'safer' it can also produce all sorts of unexpected problems.

It also encourages all sorts of nested template types that can make existing codebases incredibly hard to read.

Further, in embedded situations, you may not have space for its standard library.


Even without the standard library, C++ offers lots of safety mechanisms over bare bones C.


And a lot of complexity, obfuscation and implicit behaviour.

I have developed and enjoyed developing in both. C has an elegant simplicity about it and you can do literally anything. C++ can be quicker, and it has a bunch of useful standard stuff, but it does have some downsides and quirks. There's room in the world for both.


I never enjoyed C as such, always felt too little when compared with Turbo Pascal, only used it in anger for university projects and my first job.

For anything else where the option was between C and C++, I always picked C++ when given liberty of choice.


I can learn complex stuff, but with C++ I don't C the point :) Honestly, I don't see what I'd buy with all that complexity.


> So, although technically the existence of C doesn't make sense, as it is superseded by C++ (except couple of things), C is winning in the branding department.

Programming Languages are in the domain of UX. Type systems, syntax, RAII -- they all just a serve a means to an end, which is useability.


Why not? I'd love to build web applications on a router with pure C.


Man, me too. I feel more comfortable knowing where all my memory is, and that my types are being checked at compile time.


Do you really know where all you memory is being used without Valgrid?


Valgrind should definitely be a part of every C developer's workflow, if possible integrated into the unit and integration tests.


It doesn't help that is isn't available in all platforms, e.g. embedded, real-time OS, commercial UNIXes, mainframes.


That is unfortunate, but in those cases, if C or C++ is the only option, I would advocate writing cross-platform code and running the tests under a Valgrind-supported platform.

This is a bit more difficult for kernel-level code running on an RTOS, as there might be a lot more unique APIs to stub out, but it can still be done.


You are no longer testing the same code under the real execution conditions, thus missing out possible leaks on the original code path.

Although you can add that better test that than nothing, assuming that the respective OS vendor doesn't offer tooling similar to Valgrind, which I fully agree.

EDIT: added some clarifications.


Two of those, embedded and real-time, Valgrind should help very little as you shouldn't be allocating anything anyway apart from statically.


Well, there are still the other pointer misuses that are so common in C code.


Embedded systems. For example a surveillance camera could have a small web interface for configuring it and allowing remote access. Nowadays, even cameras have enough power to run real web servers with Ruby on Rails, but for smaller embedded systems, like a pacemaker, a web app written in C could make sense.


My impression has mostly been that on weird embedded platforms you'll just have some bad C compiler available, and that's the reason. Is it more than that?


C is ripe for integrating with other higher level languages, and the no-fuss license encourages this. I'm looking forward to checking this out further. Good luck Kore!!


I don't know if this is very heavy, so i might be wrong here. But one thing I can think of is embedded applications that want to have a web UI?


Isn't nginx written in C?


The only valid reason is for those embedded devices where C and Assembly are the only languages being supported by the vendors SDK.

But I doubt such tiny processors are being used in boards with network capabilities.


C++ is just one of the many failed attempts to improve on C. So, yes, C has its issues, but C++ is certainly not the solution.

The long story. There are three dominant Turing-complete axiomatizations: numbers (Dedekind-Peano), sets (Zermelo-Fraenkel) and functions (Church). The Curry-Howard correspondence shows that all Turing-complete axiomatizations are mirrors of each other.

If "everything is an object", it means that there must exist a Turing-complete axiomatization based on objects.

Well, such axiomatization does not exist at all. Nobody has phrased one. Therefore, object orientation and languages like C++ are snake oil. They fail after simple mathematical scrutiny. C++ is simply a false belief primarily inspired by ignorance.


It speaks poorly of this community that you get second-guessed more for building an application in C than for building one in JavaScript + Node.js.


If you want to use C today you need a really good excuse. The tradeoff between security and performance is simply not there anymore.

Unless you write on an embedded system, a game, or a high performance number crunching application, C is premature optimisation.

And even in the above we see drastic changes today, embedded systems have become so powerful that they can run scripting languages (http://www.eluaproject.net), game engines are written in C and scripted with other things (http://docs.unity3d.com/ScriptReference/), and inmemory-bigdata systems like spark offer significant advantages over classical HPC frameworks like MPI (http://www.dursi.ca/hpc-is-dying-and-mpi-is-killing-it/).

While JS is horrid, it at least doesn't have manual memory management.


C is alot about portability, not necessarly performance.


If you only care for portability you can always use one of the billion safe languages whose interpreters are written in C. Lua for example.


You may not only care about portability. And "safety" sometimes also means being able to meet realtime requirements, and also often means having the ability to carefully account for resource use in ways that many interpreters does poorly (and no, kernel enforced system limits are not always an option).

Many interpreters also make assumptions (e.g. expecting a POSIX'y system) about the host system that many embedded platforms doesn't necessarily meet.

I've more than once looked at interpreters for embedding and found most of the alternatives sorely lacking. Very few interpreters are well suited for embedding on constrained platforms at all (Lua, admittedly is probably one of the more solid exceptions). And once you start having to write lots of support code in C to port or sandbox your interpreter of choice, the reason for considering an interpreter quickly becomes less compelling.


Cause it would add an extra layer to my solution, which might be more complicated to get right than just bare C.


You're right, unixprogrammer; real hackers write everything in the venerable C.

On the other hand, there is a fair bit of negativity in this thread, just because it is C. That might not be in the hacker spirit, so to speak.


Writing a web application in C sounds like a good trigger for an utterance from the Jargon File: "You could do that, but that'd be like kicking dead whales down the beach."

We've advanced the state of the art quite a bit with dramatically more expressive languages than C that are sufficiently efficient in terms of memory and CPU. This is especially true when communications are occurring over HTTP and not direct socket-to-socket comms.

Why use C instead of D, Rust, Go, C#, Java, Perl, Python, Ruby, Scala, Clojure, Erlang, Elixir, Haskell, Swift, OCaml, Objective-C...?

I didn't miss C++, it just seems a worse alternative than C.


> Why use C instead of D, Rust, Go, C#, Java, Perl, Python, Ruby, Scala, Clojure, Erlang, Elixir, Haskell, Swift, OCaml, Objective-C...?

Because C runs pretty much anywhere? There are plenty of platforms where C is available where I doubt you'd find any of the others above (e.g. C64; yes there are C compilers for them; yes, I'm mentioning it tongue in cheek)

Because you can generate small, compact static executables? E.g. I used to write network monitoring software and an accompanying SNMP server for a system with 4MB RAM and 4MB flash, the latter of which had to include the Linux kernel and a shell on top of the application in question. The system was so limited we did not run a normal init, and couldn't fit bash - instead we ended up running ash as the init...

There are plenty of use-cases where "web application" == "user interface for a tiny embedded platform".


> Because C runs pretty much anywhere?

I always hear this argument, but as time has progressed, for better or worse 'anywhere' has become a much smaller target. If your language runs on Intel and ARM then it's good enough. There are a lot of reasons I might choose C for a project, but 'run anywhere' is not one of them.


I'd say Intel, ARM, Power, and Sparc is good enough.


C is a good solution for things like realtime multiplayer with lots of state, lots of side effects, etc. A lot of the modern abstractions actually get in the way, for example:

* List traversal order matters a lot when it's something like a list of monsters getting struck by a spell and the spell has complicated side effects. Brushing it under the rug with abstract iterators or functional Array.map's is a recipe for not knowing how your own game works.

* Realtime is an illusion, it really means "fast turn-based", you don't want players with fast connections to get an advantage by spamming commands and having them executed the instant they're received. You want to queue commands and execute them fairly at regular pulses. So much for all your abstract events infrastructure!!

* Certain object-oriented idioms become eye-rollingly silly when your application actually involves _objects_ (in the in-game sense). Suppose it's a game where players build in-game factories, suddenly the old "FactoryFactory" joke just got a million times worse.

I'm not saying C is the best for those sorts of applications, but it's certainly not bad, and a lot of modern language features just aren't appropriate.


Why use C instead of D, Rust, Go, C#, Java, Perl, Python, Ruby, Scala, Clojure, Erlang, Elixir, Haskell, Swift, OCaml, Objective-C...?

C and Rust are not in the same play field of Ruby, Python or PHP. These languages are typed, compiled and MUCH faster.

You'll obviously build 99% of your application in Ruby, but you might need C or Rust for high-volume calculations.

An example that happened to me a few weeks ago. Scaling a financial application to make millions of calculations. The core App is made with PHP, and the difference between 0.1sec and 0.000764sec gets important here.


Thank god it's written in C and not C++


Yeah. Thank God. I love broken string implementationsssssdd##%^$*$#@@#$%$$$###ludvik^0knox^0shamen^0password^0pass^012345^0qwerty^0dreamdevil^0

C is just awesome.


You're breaking the tables.


We got that solved. https://github.com/antirez/sds


Except it isn't neither part of ANSI C, nor is used by any relevant third party C library, but it is solved....


This seems to be an adaptation of the earlier SafeStr lib. SafeStr is nice except that the C community is still too busy arguing about their awful lstr* and str*_s functions to take any notice and incorporate something genuinely secure like SafeStr into the standard lib. And the NVD just keeps growing...

So... you guys added something random and silly like <complex.h> to the standard, but still couldn't get around to a working string implementation? OK. Well, good luck with all that.

Not solved yet.


Easy strings for C, with Unicode support (and as fast as std::string): https://github.com/faragon/libsrt


libsrt, sds, SafeStr... How many times will this problem be solved independently before something like this makes it into the standard lib? Libsrt looks nice, but if you can't use it to interface to libraries that use the same types, you're still futzing around with char buffers.

What does Kore use? libsrt? Let's have a look at how you're supposed to program a web app in its examples:

   int
   serve_file_upload(struct http_request *req)
   {
	int			r;
	u_int8_t		*d;
	struct kore_buf		*b;
	u_int32_t		len;
	char			*name, buf[BUFSIZ];

   b = kore_buf_create(asset_len_upload_html);
	kore_buf_append(b, asset_upload_html, asset_len_upload_html);

   if (req->method == HTTP_METHOD_POST) {
		http_populate_multipart_form(req, &r);
		if (http_argument_get_string("firstname", &name, &len)) {
			kore_buf_replace_string(b, "$firstname$", name, len);
		} else {
			kore_buf_replace_string(b, "$firstname$", NULL, 0);
		}

		if (http_file_lookup(req, "file", &name, &d, &len)) {
			(void)snprintf(buf, sizeof(buf),
			    "%s is %d bytes", name, len);
			kore_buf_replace_string(b,
			    "$upload$", buf, strlen(buf));
		} else {
			kore_buf_replace_string(b, "$upload$", NULL, 0);
		}
	} else {
		kore_buf_replace_string(b, "$upload$", NULL, 0);
		kore_buf_replace_string(b, "$firstname$", NULL, 0);
	}

	d = kore_buf_release(b, &len);

	http_response_header(req, "content-type", "text/html");
	http_response(req, 200, d, len);
	kore_mem_free(d);

	return (KORE_RESULT_OK);
   }
Uh oh. So Kore invented yet another safe string/buf type? Why didn't they use libsrt? What's all this kore_buf stuff?

Strings in C are NOT a solved problem. They're a hot mess.


Yes, Kore don't use libsrt, although you could use it in your Kore user-modules (not all services are going to be trivial operations). And I agree, C strings is far from being a solved problem. In the case of libsrt strings, in addition to have embedded length (fast concatenation, search, etc.), it does support UTF-8 operations, include case conversion, without requiring OS support (i.e. without "locale", nor using custom hash tables, but in-code efficient hardwired character range selection), and most typical string operations.

P.S. Before implementing libsrt strings I did a wide study of many string and generic C libraries, implementing the best from all, and adding things that were not still covered (I'll investigate the Kore string/buffer implementation, too):

https://github.com/faragon/libsrt/blob/master/doc/references...


Kudos for the library design - it looks quite nice. We're looking for a safe, cross-platform, C string library at my work now - I'll do an evaluation of libsrt.

As for ANSI C, maybe someday this will get folded into the standard and we can pass around ss_t* 's rather than char* 's whenever we use third-party libraries.


Thank you for the consideration. Although the library targets safe and cross-platform code, I don't recommend you using libsrt on production code, yet. Rationale: the API is not yet finished, and it could have some changes that could break your build. Suggestions are welcome :-)

P.S. I don't expect any standard committee adopting that, not even wide usage (I'm glad just having some feedback! :-D).


> I didn't miss C++, it just seems a worse alternative than C.

This is such a retarded opinion. A person expressing it probably doesn't know shit about C++ or just plainly an idiot.


This looks pretty neat actually. A ton of effort clearly went into it and it looks like the code is really well written with pretty well thought out interfaces

"Its main goals are security.."

Is it actually?

I also don't really see an advantage of using something like this over something like the Go net/http package.

Web-type API stuff is usually high enough level that something like C doesn't make sense. Go has nice enough standard packages for system things that even if I was doing a lot of system-y stuff I would be alright. I don't really see the type of work I would be doing where I want to use this.


> I also don't really see an advantage of using something like this over something like the Go net/http package.

C runs "everywhere". Your old C64 from the 80's? Has C compilers.

Go does not, even with gccgo.

I'd bet there are also still likely at least two orders of magnitude more programmers that know C well, and still more programmers with in depth experience of embedded development in C than have even tried Go.

I've still yet to meet anyone outside of the startup devops bubble that have written any Go, and often what language the developers have experience with matters more.


> Your old C64 from the 80's? Has C compilers

Yeah, but the idea to use them back then was like trying to use Ruby for real time applications in modern days, given how shitty they were.

Home computers might have had implementations of C and Pascal dialects, but we all used Assembly when stepping out of the built-in Basic and Forth enviroments.


I can see this being useful for embedded systems where Go is not suitable. I'm not super familiar with that whole world, though.


Ahh that is a use case I hadn't considered. I guess it would be pretty useful if some type of basic webserver was required.


It is.

For example being strict on the network input path and doing proper validation of incoming data is a strong part of the design.

Or was the question more related to, it is C therefor security cannot be part of the process?


It does seem like if your goal was security than perhaps Rust would be a better choice.


If your goal is security literally any high level language is a better choice.


High level language like php ? Security and secure code is a mindset. You code without thinking that bad things will happen, you will get bitten.

Sure high level language can help with memory management... but plenty of CVE are because of sloppy coding, not because of low level language.


While you are correct in that picking a higher level language doesn't shield you from writing insecure code, insecure C/C++ failure modes are usually quite a bit worse than other languages used for that purpose. I don't trust myself to write code that handles memory 100% correctly and is also network-facing. If much better developers than I manage to screw that up, what chance do I have?


Mmm, it can be difficult to take advantage from C exploits (in the context of a webserver). On a well written C system, you might expect most bugs to lead to crashes.

PHP bugs tend to be more exploitable, because you're doing something supported by the language.


But what's stopping someone from writing an app-level vulnerability in C vs any other language? Most of them are because of horrible handling of strings, which is something C is also not that great at. I'm not seeing the security benefit here.


My gut feeling is that if you really want security in C, you have fewer constructs to misuse and so you get less unexpected behaviour. At the same time, you get more protection mechanisms; guard pages etc. I.e. harder to be secure, but if you really want to harden, you can get harder than in higher level languages.

I'm not putting any kind of weight behind that, though; I just feel that it's a bit odd for people (not specifically meaning you, just the whole thread) to criticise this purely on language choice and not put any substance behind their criticisms that actually relate to the software in question.


I think the idea still stands. PHP is a fake high level language in my opinion.

A high level language should reduce boilerplate and 'force' you to write concise and predictable code.

PHP does none of that especially in the context of error/exception handling.


None of your standards for what a high-level language should be have any bearing on what a high-level language is, according to the definition that people actually use, and those standards might exclude Python, Ruby, C++ and Java among others.


Actually yes. For security, even PHP is a better choice than C (for certain versions of the idea of "security"). There's entire classes of security problems that are literally impossible in PHP.


Rust won't help you on the dozens of platforms it isn't available.


[deleted]


http://arewewebyet.com/

If you need something running on HTTP next week or next month, it isn't.

If you can wait a year or two, it'll probably be a closer thing.


Given that this project is implementing a web server from the ground up, that site doesn't really have advice. The milestones for "Are we web yet?" include a web server, someone writing a new web server in Rust would contribute toward the goal, rather than relying on the dependencies being discussed.

"Are we web yet?" is about whether you can effectively build web apps in Rust, not whether you can effectively build core networking infrastructure in Rust.

That's not to say I'm arguing this should have been written in Rust (if it were me, I might have done so, but it's not, so I don't get a say).

Also, you can't possibly argue that C satisfies all, or even most, of the milestones given for "Are we web yet?"


"Also, you can't possibly argue that C satisfies all, or even most, of the milestones given for "Are we web yet?""

C has all the libraries. Like, all of them, ever. I don't like it necessarily, but it's true. It has all the HTTP servers, all the database drivers, all the email and all of the "misc".

(Yes, not literally all. But it's a closer thing than we'd like to admit!)

Except for the fact that it's basically completely unsuitable for use in a network environment due to its design flaws, and one "security-focused" framework can't change that because you've still got all the rest of C's problems and large set of libraries that also really shouldn't be put on the network (and frankly I still trust the "security-focused" framework about as far as I can throw its immaterial self, because it's still written in C), C would be the perfect web programming language, and passes, yes, darned near everything.

Also, I don't know if you got here after the message was deleted. I think my reply makes more sense in the original context it appeared in.


This kind of statement warrants a lot more support.


Though Rust is very popular around here, and is an exciting project, this is a very low-effort comment which doesn't cast any light on the subject.


The site and documentations looks well done, great job!

Architecture looks pretty interesting too. Wonder why was there a need for an accept lock? Ordinary accept() socket call already allows for simultaneous threads/process wait on a single socket.


I think the authors want to avoid thundering herd. You can find this basic pattern in the book UNIX Network Programming.


Correct.

The accepting socket is shared between multiple workers which each have its own fd for epoll or kqueue. Because of this a form of serialising the accepts between said workers is needed to avoid unnecessary wakeups.


Actually that is being changed:

http://lwn.net/Articles/633422/

See part about EPOLLEXCLUSIVE


That is great, thanks for sharing.


If you are the author, thanks for sharing the project. You did a great job and made the right choice of having per CPU worker processes each with their own epoll loop.


Hmm, thought that was fixed 10 years ago or so. I can't believe the standard accept() call will wake up all threads on a modern Linux kernel.

Maybe if it is using epoll/select/etc. it would exhibit the thundering herd issue.


Great! This looks really nice, and interesting.

Some fantastically quick points from a very cursory glance at the code. Feel free to ignore this.

- The code uses the convention to put the argument of return inside parentheses, making it look like a function call. This is very strange, to me.

- It treats sizeof as a function too (i.e. always parentheses the argument).

- It is not C99, which always seems so fantastically defensive these days.

- It's not (in my opinion) sufficiently const-happy.

- I saw at least one instance (in cli.c) of a long string not being written as a auto-concatenated literal but instead leading to multiple fprintf() calls. Very obviously not in a performance-critical place, so perhaps it's not indicative of anything. It just made me take notice.


Author here.

I see you picked out the few things that I consistently hear on the coding style I adopted which is based on my time hacking on openbsd. I have no real points to argue against those as it is based on preference in my opinion.

I am curious why you arrived on it not being sufficiently constified however. I'll gladly make sensible changes.

As for the multiple fprintf() calls ... to me it just reads better and the place it occurs in is as you stated pretty obvious non performance critical.


Right. I could have guessed these were based on some coding style guide from somewhere.

I still don't see the point, or why any sane guide would prefer to treat return as a function. It just never seems helpful to me, and always wasteful/more complicated. I realize it's just two tokens, so it's probably not "important" in any real sense of the word, but it irks me. I like to point it out since it can help others cargo-culting this.

It's not sufficiently const if there are places where a variable could be const but still isn't. :) To be super-specific, the variable 'r' here: https://github.com/jorisvink/kore/blob/master/src/cli.c#L542 is one such case. It should be declared inside the loop, i.e. as "const ssize_t r = write(...);" since once assigned the return value from write(), it's read-only.

Of course, many ancient-smelling style guides seem to outlaw declaring variables as close to their point of usage, too. Note that declaring variables inside scopes other than the "root" one in a function isn't even C99, but many people seem to think you can't do that.


That's fair. Parenthesising return is a matter of readability and flavour to me. It tickles my spidey sense if it is missing.

I strongly dislike declaring variables anywhere else but the function root, but I agree with you on the example you provided that those kind of variables could be constified to be sane.



Was quite excited to try out a little websocket server with Kore till I saw it fork's per connection. I don't really want 20k processes for handling 20k connections, I was really hoping for an event loop.


Kore does not fork per connection.

It uses an event driven architecture with per CPU worker processes. The number of workers you have can be controlled by the config.


Evented io is great for extremely high concurrency, but that isn't always the right thing to optimize for. A forking web server might be faster for users depending on the application.

Lastly, you can't just have an event loop without also creating an entirely async platform. For an event loop to work well, all operations from file reading to network requests need to be completely async.


Out of curiosity in what scenarios do you see a forking web server being faster than a evented server that balances requests across cores and can direct a request to the core with the best cache for the request?

I completely agree with need to async. The hard part is that many operations are async without an async interface. For example memory allocation, or even memory usage if the memory was not truly allocated by malloc.


What does it mean to balance requests across cores? To run T event loop threads/procs, where T is tied to the number of CPU cores? So like, a pre-forking, multi-proc, evented server?

I actually can't think of a case where a multi-threaded/forking-only web server would be faster than that. Again, assuming complete support for async libraries used throughout the web application.

Are there any web servers that have this architecture? NodeJS obviously doesn't. *

* Actually, for maximum absurdity, it looks like Kore, the web server we are currently discussing, has this architecture


Any non asynchronous application.


Why would you have a web server that doesn't process requests async?


There are a few reasons I can think of.

1) Client libraries you might need to use in your web service might not be available in asynchronous versions.

2) Writing blocking code is much easier to write than asynchronous code.

3) Your server code is CPU bound, so there's no benefit to an asynchronous model.

4) If your web app runs in an asynchronous server and your app crashes, it'll crash the whole server. On the other hand, in a forking model, only the client that the child is serving will be impacted; the other workers will be unaffected.

5) Memory leaks are easier to contain in a forking model, assuming the child can exit or be killed after N requests.


Really?

| Event driven architecture with per CPU core worker processes

So each process should be able to handle a lot of concurrent connections, just like nginx.

And I tried the websocket example, and saw only the first worker process responding whenever a websocket is created.


Unless you are optimising for space, there is no real reason to use C in IO bound processes (for which event loops are ideal); you may as well use Python (or even JS if you must) as your performance will be dominated by IO time.


There's always Vibe.d from the D programming language. Granted you would have to write in D, but most of the libraries of C are available. Concurrency is definitely accounted for since D supports it internally in the language. If you're seriously considering a 'native' approach to web development.


I've used https://github.com/cesanta/mongoose in the past maybe I could try this one.


Writing high level C applications can be easy, if you use a library that frees you from using dynamic memory on typical data structures (e.g. strings, vectors, sorted binary trees, maps). I'm developing a C library for high-level C code, with hard real-time in mind, is already functional for static linking: https://github.com/faragon/libsrt


When an HTTP API is just an additional feature of a larger project, it may make sense to keep using C: a toolchain available everywhere and well known (including cross-compilation and full bootstrap), a small memory footprint, easy use of any library needed for the project.

I am doing a lot of that and will keep a look at Kore. Unfortunately, HTTPS only and non-evented core is a no-go for me.

I am currently relying on the web server embedded in libevent, as well as wslay for websockets and some additional code for SSE. To easily start a project, I am using a cookiecutter template: https://github.com/vincentbernat/bootstrap.c-web



Probably it would be better to have also a lib or dll, not just a program that render c/c++ servlets. It seems that all (that the kore executes from the feeded code) is running in servlet threads or at least started from servlet thread. Or it would be cool to add some "application" framework, not only "C-servlet" framework


Has anyone done any performance testing on kore? Yes I know I can do this myself, but why invent the wheel.


You'd invent the wheel to get you places. You wouldn't reinvent the wheel when someone's already done a perfectly good job.


So much for me trying to be clever and avoid a cliche :)


as in... http://okws.org


Interesting. Any performance data on this beyond the vague it is faster?


Anything I've seen is 10-ish years old. It was relatively fast, but also complex and covered by the GPL (for better or worse).


I found this which seems interesting

https://github.com/koanlogic/klone


The portability aspect is interesting.

I use uwsgi extensively (not just with python), and I think it sets the bar these days.


If you think Kore is interesting, then also check out Tntnet <http://www.tntnet.org>. I've checked it out a few years ago and it felt good - stable, complete, easy to use etc.


Process per connection does not scale no matter how light weight, even with COW. kore's connection handling model is the reason apache2 mpm_prefork fell out of favor many iterations ago.

The only valid argument to avoid a single event based I/O is some sort of hard blocking I/O such as disk or non-queuing chardev.

However I'm still not biting, this is solved...and as usual the answer is somewhere in between. For example, in RIBS2 there are two models for connection handling, event loops for connections and "ribbons" for the non-queuing bits [1]. RIBS2 is also written in C for C.

[1] https://github.com/Adaptv/ribs2/blob/master/README

Edit - mention RIBS2 is also written in C


Kore does not fork per connection.

It uses per cpu worker processes which multiplex I/O over either epoll or kqueue.


A clone is a clone. The main inplementation difference between a thread and process in Linux is COW, for all intensive purposes a worker process and a fork are the same in this use case. Neither have led to scalable web servers.


Except you are basing yourself on the fact it creates a single worker process per connection. It does not.

Workers are spawned when the server is started. Each of them deals with tens of thousands of connections on its own via the listening socket they share.

This is a common technique and scales incredible well.


> apache2 mpm_prefork fell out of favor many iterations ago.

By whom, exactly? There are still plenty of reasons to use a forking web server (see my other comment in this discussion). Saying it "does not scale" is misleading; even with a event-driven model there are only so many CPU resources that can be used to serve responses to clients.

Event-driven webservers are fantastic compared to forking ones, for keeping open many thousands of relatively idle connections (if that is your definition of "scale"). But many web services simply don't do that.

Preforking webservers, like event-driven ones, still have a rightful place in this world. As with all things technology, you have to pick the right tool for the job.


Good job & good luck! ;-)


Kore mapped to JavaScript (NodeJS):

  var kore = require("kore");

  kore.on("request", http_request);

  function http_request(req, resp) {
    var statusCode = 200;
    resp.write("Hello world", statusCode);
  }


Anyway I love this.


hmm maybe put it in the kernel and use it for debugging? cant use userspace threading libraries though.


this is AWESOME!


You can also do that with uWSGI: http://uwsgi.readthedocs.org/en/latest/Symcall.html


So, on one hand, it's one process per connection and ease of development of C, but on the other hand, it has execution speed of C.

Can anyone who works in web (I don't, but I'm curious) explain what kind of services is this good and bad for?


It is not a forking web server.

It is evented I/O with multiple worker processes.

It is literally in the documentation and easily spottable in the code.


Control interfaces to embedded devices?


Thanks, that actually seems to be the perfect use-case (from my almost non-existent experience with web development).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: