This article is full of D hype and so so so far away from reality.
Alright, some basics;
Everything that has to access persistent information frequently is bound by disk/network/whatever IO, and web programming is no different. The reason why languages such as Python and Ruby are very viable options for this task is they are quite a lot abstracted away from bare metal to hasten the development process. The wait for IO is quite long compared to logic execution, so even though we are executing more instructions to get the same job done, the resulting overhead isn't very significant.
CPU-Bound computations are most definitely will execute faster in statically compiled languages such as C/C++, D, golang relative to interpreted languages but in the context of web programming this is not the case. Even though; if rendering HTML is such a big pain in the ass that it is slowing you down a lot you can always use a library that is implemented in a language that's fast, say C, and use its wrappings in your scripting language of choice. I am not sure about to what extent this is supported in other languages but I know you can do this in Python, heck you can do this in golang, even though it is a relatively new language [1]. I wonder how the author will move to C10M world by optimizing the wrong thing.
"Write in Python/Ruby/whatever and optimize the slow parts in C" echoes around the programming community endlessly, but I wonder how many people have actually done it. It's kind of hard.
- Automated tools like SWIG have weird limitations and are complex.
- Binding to a C library manually means you have to wrangle the data from a heapy, pointery dynamic language world into whatever format the C libary wants.
- Writing the C code to operate on the dynamic language's objects directly means you have to learn how the language works under the hood, and your C code will never be useful in any other language.
- Python and Ruby both have a GIL that prevents you from using multithreading to its full potential.
- The way dynamic languages lay out objects in memory causes an inherent slowness everywhere. Your app's slowness may well be a death of a thousand cuts, with no easily optimized hot spot.
I think anyone who has actually worked on a project built this way will appreciate the idea of a compiled language that is closer to the expressiveness of Python.
This isn't an assertion, just a question: isn't that exactly the reason why Cython exists, both to more easily facilitate the connection between Python and C, and to create essentially "a compiled language that is closer to the expressiveness of Python"?
Writing your application in Python/Ruby/Whatever and optimize the slow parts in C sounds like a good way to have the worst of both worlds: the lack of efficiency of the interpreted language combined with long development time, difficulty porting, buffer overruns and segfaults of C programming.
So you're arguing that it's "hype" because while it's all true, it doesn't matter if something is I/O-bound. Even if you're right, a server farm that uses 5x fewer CPU cycles is saving a lot of electrical power. 5x is of course a very conservative estimate.
I remember this argument being made in favor of Java (over native compiled code) and there it at least had some credibility. Python and Ruby are far, far slower.
Certainly, the only credible/really important argument in favor of Python (or whatever alternative language you want to suggest) is programmer productivity. Get the feature out the door, and then when it's making money figure out how to optimize it. If I were going to pick on anything in the article, it's the long line of "}"s in the HTML generation example. One of the best arguments in favor of Python's indentation I've ever seen.
All this theoretical bottleneck debate, but it really boils down to this:
If you're writing frequently-run code in Python/Ruby/JS, or any such highly-dynamic-at-runtime language, then chances are very good that your CPU, memory access, CPU cache, etc are going to be part of your bottleneck.
Write in something that doesn't effectively turn an i7 into a Pentium 4 (and be sure to use efficient memory management techniques), and your chances of main bottlenecks being IO-only are much better.
The belief that IO is the only bottleneck is a self-defeating prophecy. It leads to code and techniques that cause CPU to become a bottleneck once again. Don't forget Wirth's Law.
Alright, some basics:
having data "on disk/network/whatever IO" does not mean that you application will be bound by I/O.
If you care for your application performance you will quickly learn how to cache data (and that means figuring out algorithmic space and execution complexity).
You will then, quickly care about how well your chosen programming language, libs and OS deal with memory allocations, instruction parallelization, on-die cache optimization, and so on.
And finally after all that, you will still care about I/O so you will figure out how to optmize your I/O access to benefit from hardware assumptions of a particular set of storage/network devices
Then you will look back at your solution and realize that your "use its wrappings in your scripting language of choice" is nothing more than a wrapper around another language. And then you will be wondering if it was worth starting with a different tool.
People endlessly parroting the "IO bound" line never post numbers. As far as I've seen it's just not true. Things like web apps are routinely bottle-necked by execution speed, not disk or network.
Well, my experience is that web apps are most often bottlenecked by the database speed, because almost any other problem can be solved with money (and not even much of it).
I've seen databases bottleneck on lots of different resources, even some virtual ones (unexpected serializing). But I'm very suspicious of people claiming that one must write webapps in low level languages "because speed". (And yes, I know there exist problems out there where this is true, the same way that there exist people out there that've won the lottery.)
It depends on the web app. My experience is that being bottlenecked on the database, while natural, is so crippling to scaling that extensive caching is used to prevent the majority of requests from touching the database.
I know this is an inevitable comment on any article that calls any software slow, but: "~500 requests/sec" using ten servers? Python may be slow, but I would be very suprised if it can't do 50req/second.
This doesn't pass the smell test for me either. I'm running a Python Django server with a primarily write load on an EC2 small instance, and easily exceeding 50 requests per second, with very low load on the box.
As I read it that's one AWS server of unknown size serving ~500rps of an application of unknown complexity. Tripled when using PyPy (whatever that is).
Maybe a very small instance serving something fairly complex?
Yep, I could readily point to my Erlang application that can serve thousands of requests per second, doing real work. While simultaneously handling 10k+ websocket connections. All that on a single machine (16G RAM, 8 cores).
Does that mean that Erlang is the best? Of course not! It only means that Erlang was specifically designed for this sole use case, which only happens to encompass the whole "web" thingie.
It would be an overkill to use Erlang for single-threaded software that requires number crunching speed or for any kind of system scripting. Comparing Erlang to D or Python is stupid. As is (IMHO) using any of the latter for massively parallel servers.
How complex? In one second a modern processor can do ~1 billion operations (ish, some are faster, some are slower, sometimes multiple are done in the same clock tick). Even if its slow, core2 architecture.
This means they have the time for about ~200 million instructions per request (Ignoring internal disk I/O, or network I/O).
That amount of work is insane!
:.:.:
I want to say their doing something fundamentally wrong. And it has nothing to do with their language.
The i7 can dispatch 4 instructions per cycle. In practice, I find it can realistically execute about 2 per cycle. So at 2GHz, that's closer to 4 billion instructions per second, or ~800M instructions per request as per your calculation.
The slowness of their system can probably be blamed on slow database access, or some kind of initialization cost they're paying for every request (i.e.: calling into a binary like in the old CGI days, initializing the Python VM every time).
Like I said something fundamentally wrong. With their approach.
Un-Indexed databases, databases far away from front end servers (in terms of network topography), or weird VM things with python.
Something is bad, and if changing languages solved their problems, they are just sweeping an issue under the rug. It'll hit them in the face later, and harder. Be it developer knowledge, or architectural choices. It'll surface again, they'll (hopefully) be large, and the problem will sting harder too.
Unless you're a mathematician or theoretical physicist the gross of your CPU time will be spent waiting for IO. Reading from disk, writing to the network, synchronizing, etc.
They're probably just aggregating 2 or 3 APIs, maybe hitting a database and then adding it all together.
That description can apply to almost any and all web applications and is inherently IO-bound.
"the gross of your CPU time will be spent waiting for IO. Reading from disk, writing to the network, synchronizing, etc."
People just sort of chant this, yet... if you upgrade from Python to something on the faster end of the spectrum, like D, you are very likely to still experience significant speed up, in my experience, even if you don't touch IO access patterns. You're even more likely to see a real latency decrease. And that's before we start actually multithreading or anything.
For all the work done on them in the past few years, the dynamic languages remain slow, slow, slow.
I think people often don't look at the math very carefully... if you do, say, half a dozen DB queries each less than 1ms, but your entire web page is clocking in at 50 or 100ms of rendering, all numbers that are very easy to see in real life (such as my own personal Django blog, where I've carefully counted each DB access and carefully indexed all of them), you are not actually spending all your time in IO wait.
One of the worst cases for a dynamic language is crawling a large object hierarchy, obtaining lots of tiny objects from them, and then merging them together in the end. You pay and pay and pay for the constant new object creation, reference count management, endless resolutions of methods, and all the other things dynamic languages are doing over and over and over (even when JITed).
Now, guess what "rendering a template" looks like internally.
Oh, and don't forget, if the DB returns in 1ms but your language reports the query took 5ms, you can't count the time it took your dynamic language to handle what came back from the DB as IO wait!
(I have to admit, I'm really done with the dynamic languages. It was fun when the megahertz went up every year, but now it's like wearing 20lb concrete shoes and trying to pretend that's not a problem, it doesn't affect my performance at all... and the 20lb is already after we cut it down from 40lb with all the JITs and stuff, which rhetoric notwithstanding simply do not produce anything like C-like performance in practice.)
I agree with you. There is a structural problem with the view put forward by the parent. If the validity of a view relies on I/O being the bottleneck, it encourages coding practices among that keep the implementation I/O bottlenecked. Once you already believe such a thing to be true you have little motivation to challenge or overcome it.
It is not the case that I/O bottlenecks can always be overcome, but it can often be, depending on circumstances, provided one tries of course.
"Unless you're a mathematician or theoretical physicist the gross of your CPU time will be spent waiting for IO"
As someone working in VM research, I'm not entirely convinced this is true. Perhaps we should do some work to find out where most web applications really do spend their time.
Here's one real-world example: Rap Genius. They certainly aren't mathematicians or theoretical physicists, all they do is process text I think, but it appears they spend over half their time in the Ruby interpreter - not waiting on network or database (if I'm reading the graph correctly).
I can't find the blog post, and the above poster likely knows more then me.
I remember seeing a post several years ago that XEN increases the likelihood of cache misses by 25-50%. Thus while the CPU looks to be at 100% processing power its really waiting on RAM/cache.
I'm not convinced. There are benchmarks[0] that strongly suggest a choice of language is as much a factor in general performance (not merely I/O tasks) as platform (though these frequently go together) and hardware.
If you have just a few thousand pages, with a few thousand bytes each in a normal (real) server, no your computer will keep everything at memory, and will only touch the disk for saving data. Also, if you have enough independent requests, throughput will not be network bound.
But CPython may still keep most of its time waiting for RAM. D is much better at this, as is Pypy.
This can be solved by minimizing blocking code, either by using actors (Erlang, Akka, etc) or just chaining callbacks on futures/promises and using monadic composition to avoid callback hell.
Oooh thank you for reminding me. I recently upgraded to a i7-4790k I've been meaning to jump back into DF now that I should have much better single threaded performance. I honestly haven't played since 2008 on my parents home pentium3.
That's not true. Well it's true in a very narrow technical sense, but it's not really true.
For example, the amount of housekeeping python does in order to execute a function call is staggering. It leads to all sorts of nice functionality, but nevertheless (plus C++/D does it almost entirely without housekeeping. Either no housekeeping, or 1 level of indirection).
Python has so many indirections for a function call it hardly even makes sense to talk about it in numbers of indirects.
Assembly hello world on my machine : 86,607 cpu cycles (of which < 20 actually in the program)
Syscalls used by the assembly version : 2 (write and exit)
Python hello world on my machine (.pyc was available) : 59,099,731 instructions (including half a million branch misses)
Syscalls used by python to execute 'print "hello, world"' : 1139 (each of which causes a program reschedule)
These programs do the same thing. Programmers often forget that things they take for granted are not in fact free, they may not even be O(1). Memory allocation. Subprocess execution. Function calls in scripting languages. Syscalls. Writing to files. Allocation of bytes on disks. All of these things come at a really, really high cost, and most not even O(1) costs (e.g. memory allocation is O(N^2) on a busy server as long as things actually fit in main memory, and O(N^4) or even worse when using virtual memory).
Sadly using memory does not even have bounded complexity. At some point, just attempting to use virtual memory might cause virtual memory to be allocated just for the lookup. This is generally referred to as "thrashing" and you're very likely to have rebooted your machine before this completes because it'll be frozen for minutes, sometimes hours, if this happens.
Likewise the memory model is useful, but huge. Strings in C++ take one byte + the actual contents of the string. Strings in python take up 60 bytes + twice the length of the string. And that's assuming you just set a variable to the string. If you construct the string, the difference is going to be much bigger.
The point here is that things that are io-bound (esp. memory bound) in python may be cpu-bound in C++ or D, simply because you avoid doing all the indirections that higher level languages do.
> The point here is that things that are io-bound (esp. memory bound) in python may be cpu-bound in C++ or D, simply because you avoid doing all the indirections that higher level languages do.
I think you mean the reverse: "things that are cpu-bound (esp. memory bound) in python may be io-bound in C++ or D".
How many instructions were executed after the interpreter was loaded into memory (a much more realistic analog to the twisted server model)?
Once loaded into memory, any program which is bound by IO to memory (i.e. moving the stack from memory to caches/registers) will show up in tools not as being IO bound, but CPU bound.
And yes, CPU bound programs will benefit greatly from moving the hotspots into a linked module written in C or Cython.
I have no problem with moving away from Python (I'm in the process of doing this myself), but the costs associated with re-writing an entire program (especially one complicated enough to only handle 50 requests per second) are non trivial, and if there was simply a small CPU hotspot, it could have been smoothed away in a number of ways that don't involve learning a new language.
In short, everything points towards OP moving to D because of a personal desire instead of a real business case.
Used to write translators for computer languages. Biggest was PL/M to C. That was easy because PL/M had fewer constructs than C. I managed to recognize constant declarations and map them to #defines or consts which actually made the code More readable.
But these days, languages have features that may be completely orthogonal to other languages. Automatic translation may not be possible. Still it would be by far the cheapest solution.
I thought system calls were packaged into the binary itself and didn't necessarily cause a job to re-schedule. But just caused a context switch to take place, then execution continues.
I thought re-scheduling only happened on interrupt, or a thread reaching a blocked stated.
About 10 years ago I remember prototyping some code on Linux with a perl script running a java program as a "coroutine" (er, service) via request/response pairs over a socket (not http). Then we moved it to AIX, where it was essentially unusable due to the lost time slice each time an IO sys call was made. On Linux, the remaining time slices were recovered and immediately used. On AIX, the time slice was simply lost until the next process scheduler tick. Ouch.
Technically they cause a context-switch and a scheduler run upon return (I belive, not 100% sure), but you're right that does not necessarily result in getting put on the back of the work queue.
I have a lot of respect for Walter and for D. But having not enough time to try everything, I'm leaning towards doing a project in Nimrod[0] when I have the time.
D people who have some knowledge of Nimrod (or Nimrod people who have good knowledge of D) - where do you think D outdoes Nimrod? D is more mature, with a larger community, and a recognized brand - granted, and these are NOT trivial things -- in practice, they usually matter more than any specific feature. Yet, my question in this case IS about language/environment features.
As far as I can tell, all the examples in the article can be done at least as easily/tersely/nicely, if not more so, in Nimrod.
I don't know much Nimrod, but here's an article I wrote about a D package I also wrote, in which a number of D features have come together in an especially pleasing way:
> go (Google must be joking if they actually consider it for system programming)
I am curious what led the author to be dismissive of Go in such a strongly negative manner. Lack of generics? Disagree with certain language design choices? Too many cuddly caricatures of gophers?
C and Go both classified as high level system language, but I can't deny the fact that C allows for finer control over machine usage so it's hard to put Go in the same bag.
You are right yet Go is much closer to C than Erlang in syntax, type system and memory-representation. Go is so close to C in fact, that its compiler is a modified C compiler.
Therefore I'd bundle it with C and D rather than Erlang.
Ridiculous. Go's lack of verbose classes and lack of inheritance, pass everything by value, the existence of pointers, first class concurrency primitives, compile to binary / no interpretation/JIT, lack of a VM, built in unit testing/benching, fast compile times, memory usage, easy C integration, etc etc all make it very different from Java.
I've been writing Go full-time for over 2 years and used to write mostly Java/C#, so I should know. When I started with Go and ported many of my personal Java applications, they all were much more maintainable and straight forward in Go.
Of course C had always been favorite language, and all I ever really wanted was a "modern" C, so I am probably biased.
Nevertheless, many of the design decisions make sense in that context.
For example: the difference in initialisation of simple data types vs slices and maps. For an application programmer these are weird inconsistencies. But they make sense in the domain.
Or the way error handling works. Very tedious to have to do-check, do-check, and not be able to have automatic upwards delegation. But in system programming it's about robustness, not ease of development. A database server can't just restart if it has a file or memory problem. There needs to be a solution and it needs to be immediately next to the problem.
I don't do anything quite so tedious in my Go-based webserver projects. For functions that can return errors that I don't have a way to recover from (db calls, for example), I use a simple rapper function that automatically logs errors and panics. That doesn't seem to me like much of a source of difficulty or complexity.
Web apps are apps, they have different expectations and provide different levels of robustness guarantees. As I said, Go's design decisions are motivated by a different problem domain.
That being said, I've followed a similar pattern in writing Go web apps. I passed errors upwards from their originating site to the HTTP handler functions, because that was where the error handling was possible with the best context.
"The strange alias _curr this is a lovely feature of D known as subtyping. It basically means that any property that doesn't exist at the struct's scope will we forwarded to _curr, e.g., when I write myCtx.foo and myCtx has no member named foo, the code is rewritten as myCtx._curr.foo."
That's a great feature. I don't see many languages investing enough focus into this kind of "delegation wiring."
Yes. It makes it easy to do things like, for example, create your own "int" type without having to duplicate all the behaviors of int. Just override the behaviors you'd like to change, then forward the rest to the wrapped int field.
If I understand correctly you can achieve something similar in many other languages by implementing a dereference operator. You'll have to be explicit when you use the object though (o.foo vs. o->foo for instance). I tend to prefer these kinds of explicit constructs over compiler magic, but it's a matter of taste really.
The built-in version has the big advantage that it integrates well with tools: you can statically determine the target of the reference and jump to it.
Go has something similar, but it's implicit with type embedding, you don't have to explicitly alias anything; also the Plan 9 C dialect has it too (and was used extensively in the Go runtime until recently). One nice idiom in Plan 9 is to be able to call Lock on various structures that embed a lock.
Interesting but this is not the same feature. Go implements what seems like multiple inheritance, D simply dispatch calls to a member, which can be a pointer to T instead of T. So you can implement custom pointer types.
Yes, it's a different feature, but is not multiple inheritance either (although I understand why, at a first glance, you'd think it would seems like that). It's just composition. In this new example, look how you can do b.myInt, even though b is now a bar, not a foo. In fact this happened in the old example too, but it was invisible. This is how the Lock function accessed the mutex.
Why did Walter choose to not open source the compiler from the beginning? I would be willing to bet that it would hold the position that Ruby/Python currently hold if he had made that choice. The ability to apt-get/yum to install a language on a cheap linux server would have done wonders for its adoption, especially in the middle of the rise of linux and the web.
Perhaps that would have helped with adoption. I'm not all that convinced that would have really tipped the scales. It's been 5 years since it was open sourced and there have been over a hundred contributors but it's still only fairly recently gotten to the point where development is more about mundane bug fixes instead of implementing huge, unfinished parts of the language and blocker bugs that make the language hard to use. Open sourcing sooner would have given it a head start but even then, I don't think D would have been ready for heavy use in the middle of the rise of linux and the web.
Here's what Walter said at the time when asked why it took so long:
I've been intending to for a while, it took a while for me
to clean it up, check all the licenses, and get it into a
presentable form.
Essentially, it's pretty obvious that the world has changed,
and closed source is no longer acceptable for a mainstream
product that people will be relying on. Open source is the
future, and it's past time for dmd to join the party!
I feel like D's lack of adoption was mostly due to issues around the way things transpired, specifically the closed-source compiler and of course the big mess with the standard library. Otherwise it was well-placed and well-timed, just ill-executed.
I think it's a good point. I can't speak for Walter, but I can say I've fostered openness ever since joining D development and we're both glad things are now in the right place.
I definitely agree that D is better from many and all the hype of the day gos to Go, not D. I really have issues with the Go syntax though - it's really not intuitive, some things are just made different just for the sake of being different, syntax is all over the place. I think Rust is way more elegant, unfortunately, it doesn't get as much hype as it deserves compared to Go.
Go's syntax is really easy to most programmers who is willing to learn. It's probably just 1 days work to pick it up. It's very similar to Swift's syntax in many aspects, and I don't hear anyone complaining Swift's syntax is not intuitive and just for the sake of being different.
Perhaps instead of just claiming you can give us some examples on "syntax is all over the place" part?
> syntax is really easy to most programmers who is willing to learn
This is exactly how this sentence should look like, no need for any language at the beginning. While syntax does matter (and I only recently arrived at this conclusion) its "intuitiveness" or "similarity" is utterly unimportant. You either are a "real programmer" and have no problem picking up different syntaxes and semantics, or you're not. That's all there is to it.
It is important - we're not just robots doing work to eat and have where to live, we need to enjoy what we do and having pleasant syntax is what made many fall in love with Ruby. Let's not pretend syntax doesn't matter - it does, indeed, otherwise there wouldn't be Scala today either!
I agree, syntax does matter. A program should look good on the page. It's much more satisfying to write code in a language that you can craft in an eye-pleasing manner.
I think the reason swift gets away with it is because it borrows enough from a language you already know that the rest feels like an addition. (eg. Python + that, C# plus this, OCaml + brackets, c++ plus ARC, ObjC minus square brackets)
Also, swift has an API that's familiar to ObjC devs which means it's just changing brackets. It's pretty easy to port a ObjC program to swift line by line.
Working with something familiar is always a good thing. Having something intuitive is a great pleasure for your soul - you memorize generic principles, not specific constructs. I like in rust that they really put abbreviation all across the board consistently. If they use "fn" as a keyword, also system lib is abbreviated similarly "to the max". I don't have to remember is it "string" or "str" - Rust consistency saves time and makes it obvious.
Please, it's really unfair to single out Linux GUI tools. Basic system utilities are also written in python, like yum, which masks its slowness by doing network access on each invocation and makes you think that's why it's so slow.
...and it it used to be far worse. In the old days it was spending most of its time parsing metadata, until the C metadata parser introduced, and later sqlite databases were added to repositories as an alternative for the XML metadata.
I even backported the C metadata parser to CentOS 3 and 4, because it was a whole different experience altogether:
On the other hand, every time I start a program and have to wait tens and hundreds of seconds for it to finally fully load, it's invariable written in C, C++ or Java...
Now I could start bashing Java and the others, but I hope I don't need to and that you already realized that your comment is a bit unfair.
Are you sure it's not Java you mean? Everytimg i see a slow desktop app it's Java. What python apps do you mean?
Sublime Text starts and runs very fast for me, for example. No complaints here and no difference to some other light weight editor like gedit.
Alright, some basics; Everything that has to access persistent information frequently is bound by disk/network/whatever IO, and web programming is no different. The reason why languages such as Python and Ruby are very viable options for this task is they are quite a lot abstracted away from bare metal to hasten the development process. The wait for IO is quite long compared to logic execution, so even though we are executing more instructions to get the same job done, the resulting overhead isn't very significant.
CPU-Bound computations are most definitely will execute faster in statically compiled languages such as C/C++, D, golang relative to interpreted languages but in the context of web programming this is not the case. Even though; if rendering HTML is such a big pain in the ass that it is slowing you down a lot you can always use a library that is implemented in a language that's fast, say C, and use its wrappings in your scripting language of choice. I am not sure about to what extent this is supported in other languages but I know you can do this in Python, heck you can do this in golang, even though it is a relatively new language [1]. I wonder how the author will move to C10M world by optimizing the wrong thing.
[1] http://gopy.qur.me/extensions/