This article shouldn't say to you: "See, Go is BAD, Python is GOOD!" It should say, "That's an interesting case study. If I'm working on a project that involves lots of sockets and concurrency, I'll want to take what they said into account when I'm making technology decisions."
Goroutines feel cheap, but if you're holding 140k connections, and just 20k of them do something that spins up a goroutine each... you can easily exceed the memory constraint. As such, we had to put goroutine pools in place, careful select statements around them from connections to ensure we didn't overwhelm external resources, etc. It was a huge pain. It has been drastically easier to control resource usage with these constraints under python/twisted.
YMMV, of course, this is just our experience. Part of the reason for putting it out there is that there are already many people who have talked/blogged about going from Python -> Go. I thought maybe the world could handle just one story about going the other direction.
Did you try something like that?
One thing I wish was possible in go is being able to use the `select` keyword with both channels and IO.
E.g. what do you do if you want to send notifications to lots of clients but for some the connection is very slow (you would probably need to buffer the data)? Do you have hard limits of maximum buffered data until you close the connection? End to end backpressure (for which channels are quite good) doesn't seem like the best option for 1:N broadcasts, because then the slowest receiver slows down all others.
And what do you do with connections which are sending you lots of (probably unexcepted) data? Stop reading from that socket?
In our case, when notifications buffer for a slow client, this API gets triggered and we mark the client connection as 'paused'. Until that state is cleared by more data getting to the client, notifications go to the database instead with just a flag on the client connection to check the db when the pending data was retrieved.
We do a similar thing on the receiving end to pause reading off the socket if we're already doing more work on behalf of the client at once than desired.
twisted documents this as producer/consumer: http://twisted.readthedocs.org/en/twisted-15.4.0/core/howto/...
This post reminds me of another post I recently saw on HN, in which the author (someone with an Erlang background) lays out all sorts of reasons why he chose Ruby for a highly concurrent application that launches lots of (heavyweight) threads. Upon seeing the link on HN, my first thought was, Ruby!!?? But then I read the post and the reasons were all very sensible and practical-minded, so in that case Ruby was arguably a much better choice than Erlang, Go, Scala, Rust, etc. for a highly concurrent application.
Edit: here's the post I mentioned about Ruby being used for a highly concurrent application: https://news.ycombinator.com/item?id=10394450
The problems I observed with Go was that its regex seemed to be slower than Python, and memory usage was way higher. I explicitly added some GC requests.
For example, to extract links from hacker news homepage, you would just do
Go's a good language for some things. But it does nothing special or significant to close the massive productivity gap between dynamic and static language. Yes, it's terse compared to many other static language and it has stuff like implicit interfaces, but those are superficial (but nice) things when it comes to what and how you do things in dynamic land. Go might actually be a step back due to its poor type system and poor reflection capabilities.
I think Go's great for a seemingly new breed of "infrastructure" systems which is becoming more important due to how systems are starting to be designed (services hosted in the cloud). It's great for building CLI which don't require your customers to install anything else. And it's good for a services / apps that needs to share memory between threads (which, in my experience, is where dynamic languages really start to fail).
But for a traditional web app / service? It's horrible. At least as horrible as most static languages. It sucks for talking to databases (more than almost any other language I've used). It sucks for dealing with user-input. Like most static languages, the stuff you need to do to handle a request, which has essentially 1ms of life, is cumbersome, error prone, slow, inflexible and difficult to test.
You want my anecdote? Go is brilliant for web services. We've decreased server costs significantly while decreasing response times by orders of magnitude for write-heavy APIs. Concurrency primitives that do bleed into parallelism have made a mockery of interpreted dynamic languages.
But don't believe me. Ask Cloudflare, or Google, or Dropbox, or any other number of companies how horrible Go has been.
But just for shits and grins, I'll bet it'd take me moments to find people in situations where Go didn't meet their needs or domain requirements.
Please stop with the absolutes. They're absolutely ridiculous.
So while I was blocked on a coworker filling in one of the gaps we both needed, I was able to rewrite it all in Python.
I'll probably give this talk again at a venue where it'll be recorded, which should add a lot of missing context to these slides.
Precisely, and isn't this presentation the perfect example of this phenomenon?
* Initially we had an implementation in language X
* We then rewrote it in language Y - the lessons learned by making the system anew (this time knowing the exact problematic spots, what really to optimize for, etc) - we got a better system. Long live language Y
* We then rewrote it in language X - the lessons learned [...] - we got a better system. Long live language X
Good programmers can productively write good and fast code in C, Python, Java, Go, or whatever. The skill of the developer and the understanding of the problem matters much more than the programming language
This is part of why Rust removed the M:N scheduler and light-weight threads before Rust 1.0. It's hard to predict your memory use if the run-time is going to be creating/destroying OS threads, and juggling your lightweight threads (goroutines/etc) between real OS threads.
I agree entirely that the next iteration you write to solve the same problem is going to be better than the one before. The problem-scope is well defined, and you're already familiar with it and where the prior implementation was lacking. In this case the extra predictability of knowing what was going to be occurring at once did help.
A web app has 4 (often more, rarely less) such boundary:
- Getting input from users
- Querying a database
- Getting results from a database
- Outputting results to the user (html, json, ....)
Within these boundaries, yes, static languages are less error prone. But you get no compile time checks AT the boundaries. You'll need integration tests (and it's easier to write tests in a dynamic language (where IoC is a language feature) than static languages.
You deal with these boundaries via automated mapping (with annotations, or external files (like in Hibernate)) or manual mapping. Automated mapping might not be much more error prone, but it's certainly much more cumbersome (especially with weak reflection). Manual mapping is also much more cumbersome. Does this cumbersomeness make it more error prone? I don't think it helps.
That data is always a string (given the nature off HTTP requests). So the only issue there is converting the string to an integer when necessary. But since you should cleanse any data that arrives via HTTP request, you'd need to validate that your "integers" are actually purely numeric even in dynamically typed languages. So there's really no extra work there between dynamic and static languages.
>- Querying a database
You'd be querying an SQL database with either parametrised queries or ORMs. Both of which are data-type agnostic (ie you wouldn't be needing to convert integers into strings to embed into SQL strings).
As for No-SQL databases, there might be an issue with some and statically typed languages. But that's not an issue I've ran into with the languages and APIs against the (admittedly limited) range of no-SQL databases I've used.
>- Getting results from a database
This is where your argument is the strongest. Sometimes there can be an issue if you don't know what return values you're expecting from the database. But that's easily overcome if you actually chat to your database architects before hand. But in all honesty, I'd be disappointed in any web developer who wasn't the least bit interested in the datatypes of the records he's querying nor the structure of the database he's effectively writing a frontend for.
>- Outputting results to the user (html, json, ....)
I do get the points you're making, and you're right that sometimes statically typed languages do make you jump through a few additional hoops. But most of the times these issues only arise if you're a careless programmer - in which case you're going to run into all sorts of dumb issues even with dynamically typed languages (eg if you don't validate your input data then you're going to write less secure web applications - regardless of your language of choice. That's why I sometimes look at statically typed languages as just another layer of data validation with regards to web development)
Lets take a few statements:
> Most people who rewrote their apps from X to Go and saw improved performance and readability benefited much more from the rewrite than they did from Go.
Yes, the rewrite would certainly have helped. However some compilers and runtimes are just faster than others. You wouldn't say that rewriting C code in Perl would result in faster code would you? Of course not. But that's the implication of your post. Go does outperform some languages. Granted it's still slower than some others out there, but not all languages are equal in terms of performance so it's pretty naive to imply they are.
> At best, the fact that Go has a relatively weak ecosystem, means that they had to write from scratch a lot of things they were getting for free in X. But, because in X is was a library, they only used 5% of the features, but paid a high performance cost and had a complex API to work with.
That's completely rubbish. Go is a young language, that much is true. But it's ecosystem is actually very impressive given it's age. Since you're talking about web development, lets look at all the libraries you might want:
x SQL / no-SQL databases: check;
x compression: check;
x hashing / encryption: check;
x image manipulation: check;
x monitoring (eg New Relic): check;
x httpd frameworks: check;
x html templating: check;
x smtp (sending e-mails): check;
x JSON / XML: check;
x web sockets: check;
> Go's a good language for some things. But it does nothing special or significant to close the massive productivity gap between dynamic and static language. Yes, it's terse compared to many other static language and it has stuff like implicit interfaces, but those are superficial (but nice) things when it comes to what and how you do things in dynamic land.
This sounds more like a rant about how much you hate statically typed languages than it does about how poor Go is compared to [insert preferred language]. For what it's worth, statically typed languages do also offer some productivity bonuses over dynamically typed languages: a big one being the fussier compiler / runtime checking can pick up subtle bugs (eg 0 / "0" / false) that might otherwise take a little while to trace. And I am aware that some dynamic languages have typed checking operators, eg ===, but you can't always guarantee what you're libraries are going to handle / return so you're still sometimes left tracing values up the code base to find the origin of the problem rather than having the compiler explicitly tell you at the first point where your uncleansed data arrives.
So there are also advantages with going down the statically typed route.
> But for a traditional web app / service? It's horrible. At least as horrible as most static languages.
Sorry, but now you're just descending into unashamed language bigotry. A large proportion of the worlds cloud services are supported by statically typed platforms such as ASP.NET, Java, and Go. This forum, HN, is written in Haskell, which is also statically typed. Saying "most static languages [are] cumbersome, error prone, slow, inflexible and difficult to test." is so far off the mark that it's just plain ignorant.
Which is a real pity as there would have been a few good points raised in your rant if you hadn't have jumped off in the deep end with your ridiculous generalisations.
No, it's written in Arc, which is a variant of Lisp and is not statically typed.
I know I've been heavily down voted in my previous comment, but I've developed in well over dozen different languages of different paradigms for the last 3 decades - so I have quite a broad range of experience as well as being language agnostic - ie I'm not just some angry fanboy :p
I will grant you that Go does get a little more awkward if you're dealing with null data types in the database as you then need to start casting interfaces. Which gets a pain real quick. But it's rare that you actually need null types in the database - usually that requirement can be circumvented at the database design level (eg using default values in the table design or defining flag fields).
Sometimes a different language will require you to architect your platform a little differently, but that's kind of the point of having different tools.
Although I corrected your interlocutor on a factual inaccuracy, I actually fully agree with him/her on the principle, and as the author of a strongly-typed database access library (Opaleye for Haskell) I'm in a very good position to!
Both Python and Go are fine. They both have their strengths and weaknesses. I personally wouldn't write a web app in Go (at least, anything beyond the most basic admin interface). I also personally wouldn't write a very large and complex Python system given the huge unit testing burden necessary to ensure safe refactoring down the road.
The biggest reason I like Go is because it makes it really hard for engineers to create huge, complex abstractions. Engineers (and especially less experienced engineers) just love them some abstractions. In my experience, most abstractions aren't justifiable. The net effect is that it usually makes their software harder to learn, harder to debug, inflexible (ironically), and late for whatever deadline they were supposed to hit. You can't write Java enterprise software in Go, and I really appreciate that.
> the middlebrow dismissers
You know what really grinds our gears, though? People who don't read documentation.
If you don't read documentation, you'll get a negative vibe. Because your question is literally sitting at the top of the FAQ. It's been asked 2^1024 times before. WHY WON'T YOU READ THE DOCS?
You can, and I've seen it, unfortunately.
As a side note, sometimes you do need that performance gain, without wanting to resort to C or C++. I hope to see Python make some gains there with the addition of the type information.
Why do you consider unit testing a burden? I find unit tests the best way to formalize specifications before even starting to write code.
Personally, I do love me a type system. Even if you have to think harder about how to architect your code, I think this kind of thinking is required for software to be good.
SSL is extremely expensive, on RAM (perhaps the implementations have optimized for throughput over RAM). I have yet to benchmark any SSL implementation in any language, with any binding, that can use less than 20kb per SSL connection. I mentioned in my talk here that SSL is very expensive, here's my benchmark suite that others may add to:
I have implementations in several languages, so far both Go and Python 3.4 can get as low as the 20kb cited. If you can get your per-connection state below 20kb, then merely adding SSL means doubling or worse your RAM requirements, which is huge.
I appreciate that everyone loves obsessing on the language wars, but the SSL RAM overhead affects us regardless of language. I covered that in one of the slides near the end, it'd be great to see some movement on reducing the RAM footprint here.
Every connection has a base cost of the TCP kernel send/recv buffer, which in our case we dropped a bit to 4kb each. So that's still 8kb per connection right there. If we terminate the SSL on a separate machine from where we handle the connection, then it means we'll be using 8kb more memory per connection. Probably even greater because nginx has its own send/recv buffers for data.
I'm sure our use-case is a unique one, most people care about raw through-put so the majority of SSL optimization has focused on lowering CPU use under high load rather than memory use under massive amounts of connections.
I've had good luck writing applications in Python, then profiling them and implementing critical sections that are CPU bound in C as modules. Any sections that are memory hogs can be converted to stream processors.
Lately I've started implementing the modules with Rust and the results are promising. It seems like a nice balance of developer productivity and application performance.
I worked with a fairly complex http api app that ran as a rather svelte wsgi framework under gunicorn, and we saw at least 10x or more increase in memory usage than cpython when we switched to pypy, once the jit was fully warmed up (memory usage seemed to hit steady state after about an hour). pypy has also historically (in my experience) been a bit more "lazy" (deferring GC of individual objects longer) than cpython when an object falls out of scope.
My general rule of thumb for pypy has historically been that you trade memory (requires more of it) for speed (faster) when compared against cpython.
Maybe the reimplementation itself was just far more efficient? Hopefully the talk itself clarified that point. I would be very interested in hearing more about that particular aspect.
PyPy using less memory than Go does seem weird but depending on how much the GCs differ and how they are configured it could simply be that Go's GC doesn't give up memory as freely to the OS.
The JIT memory is constant at runtime (proportional(-ish) to the amount of code, which is fixed) while it is desirable to have the number of coroutines be as large as possible.
It can take a very long time until the memory consumed by the JIT actually remains constant. If you do continuous integration and deploy several times per day, your application might never reach that point.
D in particular seems like it would be the logical upgrade path from python or ruby. it has a comfortably familiar C lineage, supports a variety of programming paradigms, and has good concurrency support. i wonder why people don't at least give it a look. (personal experience - i tried to use it twice, several years ago, and gave up because the tooling was bad, but from what i've heard that's very much improved today)
D is the sweet spot for Python programmers to upgrade to without going backwards to Go (Programming language design wise) nor weighed down by all the new (and very good) stuff in Rust.
D has everything from a nice IDE(Xamarin Studio), debugger, package management (Dub), statically compiled binaries, pretty decent std lib (not as good as python or Go, but very good nonetheless).
I still write Python if it's a "script" that has to run on a $work server, where it is safe to assume that Python would be available and sufficient for most tasks.
Are you quite sure that all those bells and whistles, all those wonderful facilities of your so called powerful programming languages, belong to the solution set rather than the problem set?
— Edsger W. Dijkstra
Why aren't you using ALGOL?
"In December 1968 the report on the Algorithmic language ALGOL 68 was published. On 20–24 July 1970 a working conference was arranged by the IFIP to discuss the problems of implementation of the language, a small team from the Royal Radar Establishment attended to present their compiler, written by I.F. Currie, Susan G. Bond and J.D. Morrison. In the face of estimates of up to 100 man-years to implement the language, using up to 7 pass compilers they described how they had already implemented a one-pass compiler which was in production use in engineering and scientific applications."
That split doesn't forbid reading the module metada.
What i agree on:
What i agree on with the presentation: Concurrency using goroutines and channels is f... hard besides very primitive scenarios. Even fork-join isn't that easy. There also the lack of expressiveness hurts: It's nearly impossible to build higher level abstractions above the channels/goroutines. You always have to do the bookkeeping of your goroutines.
I also agree on the error-handling problems: It's often hard to locate errors. It requires a big amount of discipline by the programmers to achieve some kind of ability to locate errors. No, i don't want the Java/C#-'i throw exceptions everywhere'-style back, but Go is the other extreme. Some more lightweight-panic wouldn't be bad.
What i can't agree on:
That you cannot mock without interfaces in Go is typically not a problem: There is no real encapsulation (_, ., no constructors) so in many cases you just an instantiate your structs as you need them. The classic for mocking - time - is problematic as in every language. IO is typically behind the various io.Reader/Writer... interfaces: No problems there.
The criticism about memory consumption i don't get: Every system i saw ported to go from Java, Ruby or Python had a much lower memory footprint than before. And typically go allows to optimize allocation quite well when needed.
I'm using a small utility library to wrap the original error and add function name, line, file name and optionally a descriptive message that explains what failed.
This was my point: In many areas go forces the developer to the right thing (no unnecessary imports, gofmt...) and does not rely on developers discipline. But when i comes to error handling it does.
What i would wish for is some extended error handling supported by the compiler. I don't want a stack trace, but the compiler easily could produce for example a line number where the error was returned.
- CI system detects a commit and checks out the latest code
- CI system makes a virtualenv and sets up the project and its dependencies into it with "pip install --editable path/to/checkout"
- CI system runs tests, computes coverage, etc.
- CI system makes a output directory and populates it with "pip wheel --wheel-dir path/to/output path/to/checkout"
- Deployment system downloads wheels to a temporary location
- Deployment system makes a virtualenv in the right location
- Deployment system populates virtualenv with "pip install --no-deps path/to/temp/location/*.whl"
The target node only needs a compatible build of python and the virtualenv package installed; it doesn't need a compiler and only needs a network connection if you want to transfer wheel files that way.
Really nice stuff
Considering I'd have to write a build script to use Platter, it didn't seem like it would be a lot of work to write a few extra lines and not require an additional dependency.
- go get
- go test
- go build
- copy to target
It's possible with Python, it's easier with Go. It's a place where we could use a lot of progress.
Once you'd done the up-front work of figuring out how to do deployment sanely, it became equally easy for both of you.
This bakes a whole virtualenv with all python dependencies (including compiled C libraries) into a .deb package. The packages tend to be big-ish (3MB to 15MB), but the target system only needs the right python version, nothing else.
Yes, it helps. But you can use Docker with Go programs as well (and drop a lot more of the base image in the process).
So as someone who is clearly in no position to be criticizing other projects yet, isn't Heka exactly the sort of project you shouldn't do in Go? I say that because I have the feeling you should use Go only for very concrete cases, given its lack of proper abstractions.
I.e. writing a tool that receives log lines over HTTP, extracts metrics and forwards those to StatsD? Perfect use case for Go. But writing a tool that lets you plug in arbitrary frontends to forwards to arbitrary backends? Perhaps they got it to work nice, but that sounds more like a case for a more general language.
Go shouldn't have "use cases". One should be able to do almost everything with ease with a language built in 2008/9.
And unfortunatly that is not the case. Go has excellent concurrency features, but is limited by dumb language design decisions which make it painfull to test and to write good reusable and composable libraries for.
I'd love to replace my entire stack with Go but I can't. Something I would write in Ruby in 10 days takes 2 months in Go. And worse, I cant write the code the exact way I want which does piss me off. Give me some choice, not random constraints. Aside from concurrency , I shouldn't have to ask myself how I write something in a specific language. This is the goal of refactoring, and it comes later.
I will personnally invest in Crystal and dump Go as soon as it runs on Windows. It has channels, and that's all I need.
Have you looked into Elixir? It has Ruby like syntax, but uses an actor model for concurrency(it runs off of the Erlang VM). For handling concurrent tasks it tends to benchmark around Go's speed, but is much nicer to do things in. While it is admittedly immature, the ecosystem still has a decent amount of packages and the tooling isn't bad.
Assuming that Go is a general-purpose high-level language, yes, but is it? It was created to replace C++ in critical infrastructure, not Python/Ruby as the end-all be-all of default platforms for every situation. Its syntax simplicity and speed absolutely makes it attractive to a wider audience, sure, but if it does the job it was designed to do very well, should we be angry that it doesn't do all jobs well?
Developer time and execution time are both important metrics when considering a language, and Go is very well situated when the major developer time gains offset the minor execution time losses vs C/C++. That it's less well situated when the developer time losses vs Python/Ruby are incurred on a project when the execution time gains are irrelevant isn't a failing of the language, it's a trade-off.
Note: I gave this talk.
I look at Ruby or Go... or even Java and every new language feature has a much more rapider adoption curve.
is this a pretty solid statement that the entire Python 3 + asyncio path is a dead end ?
I certainly wouldn't call the Python 3 path "dead", as starting a new project you'd be silly not to use Python 3. It's just a very slow process.
Also, porting from 2 to 3 isn't even that hard (there's a script that does 95% of it for you).
I'm not sure if you have worked in the Ruby or Go ecosystem... but if you are starting a new project, you are using latest Ruby+latest Rails. this is not the case in Python. I asked a question here and on the /r/python forums. What framework should one use to build an API using asyncio+postgresql. Nobody seems to be doing it.
the answers I got mentioned using Tornado with Python 3 code on top of it.
People have compared asyncio with goroutines - but even for a project that is a migration from Go -> Python by a fairly advanced developer ... Py 2.7 is being used.
That's just FUD, it might have been true many years ago but not any longer. http://python3wos.appspot.com/, there are a few red holes but from what I could tell some of those have substitutes that in many cases are better even. It might also be true for in house libraries but at least there you have the chance to upgrade yourself.
the ecosystem is still running 2.7. I cant wait to switch to Py3.
Which one? Pyramid and SQLAlchemy work like a charm in Py3. Until you really name what your need is, that's just fud.
I hope that PyPy will have a Py3.5 compatible interpreter someday, but CPython is good enough for now.
if Pypy + asyncio was available, would you have built everything using that stack ? There have been all these benchmarks that asyncio is so much slower than threads 
How would you compare that with Go ?
I do like Rust, but must we clutter every Go-related thread with comments about it? The zealousness is annoying.
The constant conflict might be annoying but using Rust or even C++ would certainly be a very reasonable choice.
The problem was pretty simple: pull event messages from AMQP and then shove them into elastic search and file system. Heka and Flume were both sort of overkill so I decided to write it in Rust. I got extremely far but alas there were some issues with the Elastic Search Rust library that I'm still resolving. Surprisingly the AMQP library worked pretty well.
I will vouch for the OP's point on error handling as Rust has a similar issue to Golang but not as bad because of the awesome type system (still I hate to admit but I really miss exceptions at times).
Anyway to relate again to the OP I went back to what I know best.. boring ass Java and wrote the app in a hour or so. It took about the same memory as Heka (surprising since its Java) and appeared to be slightly faster than Heka (elastic search indexing became the bottleneck for both so take that with a grain of salt).
Long story short.. I think the drivers and libraries really are the deal breakers and not so much the languages themselves (with some minor exceptions like the GIL).
I find that a weird sentence in an otherwise carefully written article. The author is talking about writing software which does a lot of socket IO. So I would expect the performance discussion to make some reference to this; I assume what he's talking about in the quote is the behavior of pure CPU-bound code but he doesn't discuss to what extent this is really relevant to his project.
1. Goroutine memory use - Post happens to be about 1.2 and 1.4 and I started with 1.5.
2. Debugging - yes handling errors are a bit tedious. I have not written much boilerplate to handle errors. I just copy error handlers a lot. Maybe that’s why error strings are a lot common.
3. Goroutine leaks - This is scary, I have used goroutines with channels but properly so far. Yes you can write code that leaks. This is something you will have to check yourself.
4.Testing - not done much.
Overall - I feel author learned of some negative aspects about Go and turned away. Some of them like Goroutine memory footrpint will improve with time (e.g. 1.5 earned a lot of praise for GC improvement). A lot places author mentions possible improvements for e.g godebug or latest Go with SSL but did not try it as much. So it may not be as relevant to new Go adopters.
Big statement, anyone confirm this?
On a network bound daemon, which is not CPU-bound, the GIL is really not an issue, so it never came up.
There is only one case where the serialized runtime presents a problem, and that is in CPU-bound 90s-style shared memory parallel computation that the industry as a whole has been trying to escape for the past 20 years, because thinking about individual threads and the lifetimes of shared memory allocations turned out to be an incredibly shitty abstraction.
Even if you want to shoot yourself in the foot, both Python and Node.js provide facilities to allow e.g. concurrent array access (in Python via the multiprocessing package). The reason those approaches aren't more popular in those languages is exactly because the model itself is defective. Anyone worth their salt working in a computation-heavy domain stopped writing explicit threading code a long time ago.
If I can't get a nuanced presentation out of a slide deck, then maybe an article or corresponding speech is required!
Gratz. Next time, use C. PyPy won't solve your problem. You're writing a low latency, high performance, large throughput, super-optimized message router. Even the libs you like in Cpython are.. freaking C. Do you know why? Do you understand what happens underneath the language? Do you know why PyPy is faster?
Don't get me wrong - and in fact, maybe you get me right:
Go rocks. Python rocks. It's just not the tool for that very job. Today, it's still freaking C.