I wrote and deployed (to production) some Clojure code at Netflix just yesterday. Among other things at Netflix the Mantis Query Language (MQL an SQL for streaming data) which ferries around approximately 2 trillion events every day for operational analysis (SPS alerting, quality of experience metrics, debugging production, etc) is written entirely in Clojure.
This runs in nearly every critical service, ~3000 ASGs and easily > 100k servers and Clojure allows us to also compile it for our NodeJS services as well.
Perhaps you can answer this simpler one:
Netflix has been using clojure for a long time now; has that been a positive experience broadly speaking, that means clojure is still being used for new projects, or not?
Having a large successful project in clojure is lovely, but much of the community’s concern around it is that its hard to maintain, and falling in popularity, broadly speaking.
It would be very nice indeed to see those points addressed by a large scale user of clojure.
> Did they keep writing more Clojure?
Yes but it has never been the primary language at Netflix.
> How much more did they rewrite from Java to Clojure?
Very little, if any was rewritten from Java.
> If so, how much of their code is now in Clojure compared to Java?
A very small amount given that it isn't the primary language and Clojure code bases tend to be much smaller than Java.
> Do they use Clojure rather than Java for new code?
This is a personal choice each engineer makes when they write new code. Those who like Clojure might reach for it more often. Clojure is also easy to use within the environment at Netflix since everything was JVM based.
> What other languages do they use? Python? Erlang? Rust?
> Among the things that seemed great with Clojure in 2013, did they find that some of these were not so great after all once the codebase grew? Any other problems?
I've always found larger Clojure code bases to be a bit unwieldy. Fortunately you can usually continue to abstract and keep the size small. If you choose your abstractions carefully you can get a lot of mileage out of this.
I've found the lack of static typing to be a bit of a pain at times especially when refactoring. My safety net for this in the project mentioned in the GP post is to have comprehensive unit tests. If I were to initiate this project today I'd likely explore using Spec to make type assertions.
What are the reasons for this? FP language vs. OOP? Less boilerplate (again maybe due to FP)? Higher-level abstractions in the language or libraries?
I have seen that F# code (another FP language, although I've read F# is more from the ML family via OCaml, vs. Clojure being from the Lisp family) can be significantly shorter than equivalent C# code, for example, as shown in some comparisons on the fsharpforfunandprofit.com site.
Interested to know.
One big difference is that Java APIs tend to require the collaboration of various class instances to get something done, things that you would implement as a single function + options object on your own.
Bouncy Castle is a good example. You may need a Hasher, HasherStrategy, ASNEncoder, DERParameters, and ASNSerializerStrategy instances to execute what you would've implement as `(asn-encode thing)` otherwise, maybe even having to subclass some of them to change some behavior you'd expect an option flag for.
Clojure's own Java-helper macros will compact your 1:1 Java interop code as well, so you have fewer lines even when writing Java from Clojure. Also, short-cuts like ad-hoc reification in Clojure will spare you LoC where you might otherwise have created a whole file for a class with custom interface implementation in Java.
Of course, line-to-line code is also just more compact in Clojure, but ecosystem/api difference is one I don't see mentioned as often.
I don't think this is just good vs bad, though. There are certainly upsides to the more rigid everything-in-its-right-place code you tend to have in Java which has been making big strides in improving itself over the past decade.
I agree with your last point.
You basically have maps, lists, vectors, functions and not much in terms of hierarchy. A lot of code in Object Oriented languages simply exists to manage the hierarchies and structures you build and that's something that clojure largely avoids.
There's also of course the macro capabilities of lisp that can save you a lot of boilerplate if utilized correctly.
>You basically have maps, lists, vectors, functions and not much in terms of hierarchy. A lot of code in Object Oriented languages simply exists to manage the hierarchies and structures you build and that's something that clojure largely avoids.
It's similar for Python's built-in data structures: tuples, lists, dicts, sets, frozensets, with their built-in features and methods, including slicing for lists and strings (even without using, say, the collections module of the stdlib). (And of course including building up nested structures from the same.) Had read early on in my use of Python and later experienced for myself, a good amount, when doing work with it, that those structures are fairly powerful and for many apps, you do not even need OOP structures and hierarchies.
List, dict and set comprehensions are great for that, too.
Although Clojure may have some additional ones that Python does not, not sure, since I haven't used it.
- Macros allow you to greatly reduce boilerplate.
- Lots of built in functions that operate on very few types.
- Compact intertop
And I agree with wild_preference that this isn't necessarily a good vs. bad debate. It is just a matter of fact that Clojure is concise.
5M req/sec is pretty cool.
> that means clojure is still being used for new projects, or not?
This has been / will always be a professional choice of the engineer(s) starting a new project at Netflix. Clojure is great for a lot of reasons and lets you target JVM/NodeJS at the same time (our two largest backend languages) but as a LISP most people aren't going to be excited about using it.
> Having a large successful project in clojure is lovely, but much of the community’s concern around it is that its hard to maintain, and falling in popularity, broadly speaking.
In my personal experience maintenance has been a breeze. I had a meeting at noon yesterday where a data scientist wanted a new feature in the query language and we had it shipping to production by 3:00pm. If the code base were say 10x as large I'm not sure I'd have the same opinions about ease of maintenance but I haven't leveraged spec and I've been able to continually increase abstraction to keep the code size small as more features came in.
As for falling in popularity that is my perception as well though its current level of popularity still seems sustainable. I'd imagine this has to do with it being a LISP, with STM not being as popular as anticipated and with spec taking longer than anticipated.
I agree that Clojure has a lot of strong points and that s-expressions
probably put a lot of people off, but as a Lisp programmer, I was very
disappointed in Clojure's debugging/interactive development story (and I've
heard that from a lot of others). It feels more like using a typical scripting
language compared to the traditional Lisp/Smalltalk experience, and even
there, a typical scripting language would at least give useful backtraces. As
it stands, I think a decent number of conventional Lisp programmers would also
worry about large Clojure programs being unmaintainable unless they're
Common Lisp user here. I was also disappointed in the same way when trying Clojure.
Other minor things i didn't like was the noisy  on the syntax, and the fact that for practical purposes you're fully tied to the JVM and the java runtime libs.
That's more like a positive thing about Clojure. Java inter-op and targeting the JVM gives Clojure a great chance of adoption at large enterprises.
These days no one really has a issue installing jars on a production machine.
My general experience of cider is that for a certain set of tasks it is much better than slime. But, the problem is, most of my day-to-day coding tasks are hampered by the language: e.g. if I have a web server running and I want to change a request handler, you can’t just recompile the handler, you also have to restart the server.
If you call the function by name yourself, like (foo arg), and you recompile foo, any code that called foo will see the new version.
> I wrote and deployed (to production) some Clojure code at Netflix just yesterday.
Seems like he answered at least one.
I explored potentially using Calcite when we initiated this project. The syntax is very similar because both are an SQL dialect. Some differences are that MQL largely targets unstructured data (there is no schema for the streams it operates on - they're just streams of JSON blobs).
In addition to that one of the goals of MQL was to have different compiler backends which allows different call sites to operate on different levels. For example the code that runs in our API/proxy/other large services will look at an entire query and only evaluate the WHERE / SAMPLE clauses expecting something down the stream to complete the query. Conversely the client side is a full SQL -> RxJava implementation for the data stream. I can explain more of how this works if there is interest -- it allows us to only egress data that is actively used in queries so devs can log every request and only pay the cost of those for which they are querying.)
Doing our own implementation also allows us more customization, and to compile to a NodeJS backend as well which is a critical ingestion point for our operational data. We support everything on the linked page except hopping windows, subqueries, and DML (MQL is query only, we assume no structure which was true of the streams before we ever wrote a query language for it). Of course we have to implement a lot more of this ourselves which is pretty in line with Netflix's Freedom and Responsibility. We had the freedom to implement our own query language but have the responsibility to maintain it (Calcite would have been sacrificing some freedom to avoid responsibility.)
I am wondering how you do client side data egress filtering - does each event need to get materialized, assessed for certain fields or structure, and then sent once for each outgoing stream on your sender? Seems like a good strategy for reducing network bandwidth, but it might reduce your throughout moderately (Due to serialization and analysis costs) or cause hotspots (If you have a distributed stream that is partioned in a particular way)? These are normally problems each consumer individually faces, where now it compounds where the producer does that work for each of the N readers. I appreciate the idea a lot, just wondering if you’ve had any issues making it scale nicely for highly subscribed streams.
The MQL that runs in the services checks the WHERE clause of every query running and then projects a superset of all the fields necessary for every part of the query for all matching queries (including other parts that the server won't be processing: group by, order by, having, etc...). It then tags the event with all of the matching queries. This way we only need to egress a single event.
What happens next requires a lot more context about Mantis, which is a reactive stream processor. The data is egressed from the service to what is referred to as a source job in Mantis... it is the source job's responsibility to multiplex this data to any of the consuming jobs. Jobs can be subscribed to one another so this is a natural fit for Mantis.
As for scaling everything is round robin after it leaves the service and enters Mantis but obviously busier services will pay more cost because they're processing more events. Regular expressions in WHERE clauses has been a pain point for scale, and absurdly heavy queries such as `SELECT * FROM STREAM` result in data dropping as most single machine consumers can't keep up with the large streams. We have different methods of providing different delivery semantics but in the base case we're at most once and will drop data if the client can't keep up.
We have some streams that exceed 1 million RPS at certain times of the day, and last we checked the system moves around 2 trillion events per day so we've managed to scale it to the needs of the company pretty well. It has been a while since I performance tested it, but it's always been a goal to have it push as close as possible to saturating the gigabit connection on the boxes on which it runs.
I'd imagine getting this for free from Calcite would have been really nice.