Hacker Newsnew | comments | show | ask | jobs | submit login
Systems Programming at Twitter (2012) (monkey.org)
57 points by lelf 497 days ago | 28 comments



I don't understand how this is about systems programming. Would someone care to clarify?

-----


I'm also confused. I have a book on my bookshelf titled "Systems Programming", and it's about writing kernels and runtimes in C.

Can someone explain to me when the definition went from kernel level to server duct tape? I'm guessing it's about the same time the term 'full stack developer' started showing up.

-----


I take it to imply the 'system' they are programming is the distributed/concurrent complex they are running. To other people, 'system' is below that, and this is application programming, or "middleware"

It seems like at any level, you write the system, and applications are above.

-dB

-----


What else does the systems community do these days besides concurrent distributed systems?

-----


Power management and flash/SSD-aware filesystems are also quite the rage.

-----


Ya: distributed computing, storage, and power are hot. Where did all the kernel research go?

-----


Much of it happens with different objectives, or at different levels, in the context of computing, storage and power management. See Akaros for example: http://akaros.cs.berkeley.edu/akaros-web/overview.php .

-----


This is about system programming where system must be understood as a composition of several networked systems.

It's different from a single-node system like pure kernel software e.g. hdd drivers, among other things it's higher level.

For instance you have to deal with the Two Generals' Problem. They are "expert systems" which relates more to kernel dev and where kernel knowledge is required, like databases. Expert systems even distributed expert systems are meant to solve well identified problems, they are the CPU, GPU and HDD of the global system.

Usually the higher in the stack you go, the more specification of the logic that must be implemented is loose:

- less standardized

- less identified

Everywhere Logic is a moving target, like so called "business logic", the way you express the logic is very significant. Otherwise, said "code design" and code maintainability is a significant "business asset" as it allows to react quickly to change.

All the thing that are described in the slides are the features of the Scala language and platform that twitter think allows to write code "that is well designed upon which you can react quickly".

Mind the fact there is nothing specific in what was describe only to Scala as language or platform except maybe that it's build upon the JVM and that it's static typed.

Code design, code look'n'feel especially in context that involves concurrency is not a new or specific to distributed systems.

The most important point in the 11th slide http://monkey.org/~marius/talks/twittersystems/#11 that introduce what is following (try to guess) is the second bullet: "Very expressive". Also "The point is not the language. The ideas and techniques matter. Scala happens to be our language.".

Other people seems a bit worried about the lake of interest in system programming as it is understood before "horizontal scaling" era. Nothing could be more wrong, "single node systems" are an active area of research and engineering:

- AOT & JIT compilation

- Erlang VM (aka. BEAM) style: Termite VM, stackless

- OS: Plan 9

- languages: Scala, Julia...

Specifically in (high frequency) finance and research they can deep dive in "single node systems" to optimize things out because of higher reliability, latency needs, everything is bigger. Examples of such things:

- monetdb, the first column based database - YAMI4, dubbed messaging solution for distributed systems with a realtime bias - developped at CERN.

Such softwares are not so famous or easy to know about outside of the field, nonetheless they exists.

Don't hesitate to complete or correct that list or anything else I've written.

edit: added a note about finance, scientific research & realtime bias.

-----


also have a look at The C10M problem http://c10m.robertgraham.com/p/manifesto.html

-----


Twitter seems like the poster child for Scala. I've played with Scala a little and it definitely has a lot of things going for it yet I wonder:

- Is Scala used exclusively for systems programming in Twitter or are there other languages in the mix?

- The majority of other big software players seem to be not using it for various reasons. Is that a concern?

- How is the learning curve for new hires? How long before they become productive? How long before they write great code in Scala (as opposed to still figuring things out but maybe being productive)?

- Isn't the JVM too big of a performance compromise for large scale software where every bit of performance matters?

- Is it easier to hire people for Scala positions? Harder? The same?

- How do engineers in Twitter with a lot of experience in other platforms and languages feel about Scala after having used it for a while? E.g. vs. C++, Go, Java, C# just to throw a few names into the hat there. Do people tend to just write C++ in Scala or Java in Scala or whatever or do they adopt to new paradigms with ease?

I know we always say that a good engineer can pick up any new language and to some extent that's true but I think expertise in a specific language builds up over time. Maybe an analogy would be a musical instrument. It seems some of the other languages have larger pools of talent, more mature tools, etc. I guess it's hard to be a new language in this world...

-----


May I ask what you think Scala has going for it? All of your questions undermine rather than point out the strengths of the language and the platform on which it runs.

> The majority of other big software players seem to be not using it for various reasons. Is that a concern?

Twitter, LinkedIn, Sony, FourSquare, Tumblr, Amazon, UBS, NASA, The Guardian, etc., is that a concern? On the flipside it's not likely that Facebook uses F# as its primary language, and that Apple does not use Haskell; in neither case is this a cause for concern ;-)

> How is the learning curve for new hires? How long before they become productive? How long before they write great code in Scala (as opposed to still figuring things out but maybe being productive)?

So, your asking 2 questions in one: how long before new hires become productive in Scala, and how long before they develop mastery of the language. The former, very quickly, write Java without semicolons, and the latter, depends on the abilities of the new employee, could be in as little as 3 months to learn the ropes, corner cases, etc., but for average programmers I'd say at the very least 6 months, and more likely a year or more.

The biggest hindrance in learning Scala, IMO, is not the language itself, but the build system, SBT. God help you, there are some hard yards to wade through at first, especially if you have trouble getting the build *.scala files recognized by the IDE (i.e. blind coding plain text files with no idea what <+= is vs. <<=, or % vs. %%, or that these types of symbolic method pointers are even available to begin with).

> Isn't the JVM too big of a performance compromise for large scale software where every bit of performance matters?

What? The JVM is a performance beast, perhaps if you drop down to straight C or go with C++ you'll get better performance, but you'll do so at a price (i.e. reduced maintainability and flexability).

> Is it easier to hire people for Scala positions? Harder? The same?

Compared to what?

> Twitter seems like the poster child for Scala

sounds like a backhanded complement...

-----


Right. Those are questions/concerns.

In terms of what I think is going for it: There's a nice IDE from JetBrains and Eclipse support is good. The REPL/worksheets. It does perform well on the JVM and running on the JVM means it's portable and you can also leverage a lot of the Java eco-system.

It seems that where performance really matters a lot of companies do choose to "drop down" to C or C++. I think there's still a gap vs. the JVM.

I guess I still need to be "sold" on it hence all the questions. It does seem to work for Twitter as far a large application and scale but I was hoping to get a little inside information on that. I thought what differentiated Twitter is how comitted they are to the language and the scale it's used so it seems there's a lot to learn from their story beyond what's in the presentation.

-----


In a large system like Twitter you're not going to micro-optimise every line of code. You want a good performance baseline and the ability of quickly write correct code. For many Scala and the JVM is a better tradeoff than C or C++ in this regard. See Martin Thompson's blog if you want to know more about low-level performance optimisation on the JVM.

You're also overlooking the benefit of a modern type system. If you haven't used one before it's difficult to appreciate what they bring. Twitter's Summingbird project is a great example of how a few powerful abstractions, enabled by Scala's type system, can simplify a very complex problem.

What I like about Scala and the JVM is they have a large expressive width (http://noelwelsh.com/programming/2013/07/10/expressive-width...). I can write very terse, expressive code when productivity is the main focus, but also reach down to grub around with low-level code if I need to squeeze out performance. Very few platforms offer this.

-----


Isn't the JVM too big of a performance compromise for large scale software where every bit of performance matters?

Actually no - having a good concurrent garbage collector actually helps a lot.

Consider any message passing system. Messages come into the system, you create an immutable object, and send a reference to that object to other subsystems. Eventually, when each subsystem has processed that object, it is garbage collected.

In C++ you need to manually handle tracking the object and deleting it when it's no longer needed. This dramatically hinders the sort of systems you can reliably build.

Further thoughts on this here: http://www.chrisstucchio.com/blog/2013/why_not_python.html

As for low level performance, the JVM is actually very fast - i.e., nearly as fast as C. The main places where it becomes an issue is that you can't always control the cache locality of objects.

    class Foo {
      val bar: Bar
      val y: Int
    }
In C/C++ you can demand that foo.bar be stored in the same page. On the JVM you can't.

-----


My experience circa ~2011

> Is Scala used exclusively for systems programming in Twitter or are there other languages in the mix?

Java and C++ were also used.

> The majority of other big software players seem to be not using it for various reasons. Is that a concern?

Not really.

> - How is the learning curve for new hires? How long before they become productive? How long before they write great code in Scala (as opposed to still figuring things out but maybe being productive)?

Took me about four months. It would be easier these days because the related tools (mostly IDE and build) suck less.

> - Isn't the JVM too big of a performance compromise for large scale software where every bit of performance matters?

It's mostly fine. It's just the very highest throughput systems (load balancers where you expect to saturate the pipe and not pause for 5ms) where you wish you had C.

>- Is it easier to hire people for Scala positions? Harder? The same?

You mostly train into it.

>- How do engineers in Twitter with a lot of experience in other platforms and languages feel about Scala after having used it for a while?

Mixed bag. Some love it, some hate it.

-----


I've heard many complaints about Scala being too much of a "kitchen sink" language; too many paradigms and too many features from each paradigm. It still seems like the best language that uses the JVM, though.

-----


There's also Kotlin, which is largely a better Java. And Java 8 is an enormous improvement too, since it supports lambdas, default methods, streams, etc.

-----


Clojure seems to be picking up some steam, though it's a dynamic language and thus appeals to a different crowd.

-----


Hi, I'm currently in the researching & prototyping phase of what's the best and will-be-the-most-community supported distributed real time distributed framework based on the reactor pattern?

So far, I've got:

Queuing Framework (rolling your own streams):

1) ZeroMQ (language agnostic, so have to roll your own serialization and services, seems like Google Protocol Buffer is the best way to go)

2) RabbitMQ (relies on a central message broker, with lots of built-in bindings)

Event-Based Frameworks:

2) Akka (JVM)

3) RxExtensions (.net) but have RxCpp and RxJava ports

4) Reactor (JVM)

Stream-Based Frameworks

4) Twitter Storm (Java-ish, but suppose to be language agnostic?)

5) S4 (Java-ish, but suppose to have bindings?)

Can any peeps help me out on your opinion what is the best or give me a better one? Currently, I'm having fun multiplexing sockets, dereferencing C pointers and seg-faulting, so any suggestion is welcome.

-----


IF you're looking at RabbitMQ you should take a look at Kafka.

Overall, I think looking for one framework to-rule-them-all is a mistake. Different components demand different tradeoffs.

-----


You would like Storm

http://storm.incubator.apache.org/

-----


Check out Amazon Kinesis for messaging.

-----


The content here is interesting, but I don't like the way the information is displayed. Most of the time when I read HN I skim the articles I'm interested since it's not worth my time to dive into all the details. This layout makes it pretty difficult to do that efficiently.

-----


no a single word about the "data bus" they use. This would be better entitled something like "Scala crash course by a twitter system engineer"

-----


They don't use a global data bus AFAICT. It's all driven by rpc through finagle (i.e. thrift). Finding service endpoints is likely done through zookeeper or some equivalent.

-----


That is correct. The service discovery is called ServerSets and is available here: https://github.com/twitter/commons/blob/master/src/java/com/...

-----


I understand that there is finagle and also a service discovery backed by zookeeper.

I fail to understand how twitter's data stays consistent without something like linkedin's databus. Doesn't twitter manage a single point of thruth or a single version of thruth and propagate changes to secondary databases like social graph & search indexes?

-----


Is anyone else having trouble viewing this on a mobile device?

-----




Applications are open for YC Winter 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: