Hacker News new | past | comments | ask | show | jobs | submit login

I am very glad you asked!

I wrote and deployed (to production) some Clojure code at Netflix just yesterday. Among other things at Netflix the Mantis Query Language (MQL an SQL for streaming data) which ferries around approximately 2 trillion events every day for operational analysis (SPS alerting, quality of experience metrics, debugging production, etc) is written entirely in Clojure.

This runs in nearly every critical service, ~3000 ASGs and easily > 100k servers and Clojure allows us to also compile it for our NodeJS services as well.




Hey, would be great to have this on our success stories page at https://clojure.org/community/success_stories. If you would be able to do that, please contact me at alex.miller@cognitect.com. Thanks!


Thats an interesting point that doesn’t answer any of the questions the parent asked. :)

Perhaps you can answer this simpler one:

Netflix has been using clojure for a long time now; has that been a positive experience broadly speaking, that means clojure is still being used for new projects, or not?

Having a large successful project in clojure is lovely, but much of the community’s concern around it is that its hard to maintain, and falling in popularity, broadly speaking.

It would be very nice indeed to see those points addressed by a large scale user of clojure.


Hey, sorry I saw that and typed a quick response just as I woke up. I'm not usually at a computer so early in the day. I'll address these now that I'm in front of a machine. :)

> Did they keep writing more Clojure?

Yes but it has never been the primary language at Netflix.

> How much more did they rewrite from Java to Clojure?

Very little, if any was rewritten from Java.

> If so, how much of their code is now in Clojure compared to Java?

A very small amount given that it isn't the primary language and Clojure code bases tend to be much smaller than Java.

> Do they use Clojure rather than Java for new code?

This is a personal choice each engineer makes when they write new code. Those who like Clojure might reach for it more often. Clojure is also easy to use within the environment at Netflix since everything was JVM based.

> What other languages do they use? Python? Erlang? Rust?

NodeJS and Javascript, Python, Ruby all have a seat at the table but the majority of back-end code at Netflix is on the JVM, the majority of that is Java.

> Among the things that seemed great with Clojure in 2013, did they find that some of these were not so great after all once the codebase grew? Any other problems?

I've always found larger Clojure code bases to be a bit unwieldy. Fortunately you can usually continue to abstract and keep the size small. If you choose your abstractions carefully you can get a lot of mileage out of this.

I've found the lack of static typing to be a bit of a pain at times especially when refactoring. My safety net for this in the project mentioned in the GP post is to have comprehensive unit tests. If I were to initiate this project today I'd likely explore using Spec to make type assertions.


>Clojure code bases tend to be much smaller than Java.

What are the reasons for this? FP language vs. OOP? Less boilerplate (again maybe due to FP)? Higher-level abstractions in the language or libraries?

I have seen that F# code (another FP language, although I've read F# is more from the ML family via OCaml, vs. Clojure being from the Lisp family) can be significantly shorter than equivalent C# code, for example, as shown in some comparisons on the fsharpforfunandprofit.com site.

Interested to know.


I worked at a shop that used Clojure and Java.

One big difference is that Java APIs tend to require the collaboration of various class instances to get something done, things that you would implement as a single function + options object on your own.

Bouncy Castle is a good example. You may need a Hasher, HasherStrategy, ASNEncoder, DERParameters, and ASNSerializerStrategy instances to execute what you would've implement as `(asn-encode thing)` otherwise, maybe even having to subclass some of them to change some behavior you'd expect an option flag for.

Clojure's own Java-helper macros will compact your 1:1 Java interop code as well, so you have fewer lines even when writing Java from Clojure. Also, short-cuts like ad-hoc reification in Clojure will spare you LoC where you might otherwise have created a whole file for a class with custom interface implementation in Java.

Of course, line-to-line code is also just more compact in Clojure, but ecosystem/api difference is one I don't see mentioned as often.

I don't think this is just good vs bad, though. There are certainly upsides to the more rigid everything-in-its-right-place code you tend to have in Java which has been making big strides in improving itself over the past decade.


Interesting, thanks.

I agree with your last point.


one thing for sure is the higher level abstractions you tend to use in functional languages, but a big thing about clojure is that it's very flat and generic in terms of datastructures.

You basically have maps, lists, vectors, functions and not much in terms of hierarchy. A lot of code in Object Oriented languages simply exists to manage the hierarchies and structures you build and that's something that clojure largely avoids.

There's also of course the macro capabilities of lisp that can save you a lot of boilerplate if utilized correctly.


Thanks.

>You basically have maps, lists, vectors, functions and not much in terms of hierarchy. A lot of code in Object Oriented languages simply exists to manage the hierarchies and structures you build and that's something that clojure largely avoids.

It's similar for Python's built-in data structures: tuples, lists, dicts, sets, frozensets, with their built-in features and methods, including slicing for lists and strings (even without using, say, the collections module of the stdlib). (And of course including building up nested structures from the same.) Had read early on in my use of Python and later experienced for myself, a good amount, when doing work with it, that those structures are fairly powerful and for many apps, you do not even need OOP structures and hierarchies.

List, dict and set comprehensions are great for that, too.

Although Clojure may have some additional ones that Python does not, not sure, since I haven't used it.


The two sibling replies to my own do a good job of enumerating the reasons;

- Macros allow you to greatly reduce boilerplate. - Lots of built in functions that operate on very few types. - Compact intertop

And I agree with wild_preference that this isn't necessarily a good vs. bad debate. It is just a matter of fact that Clojure is concise.


Apparently Go as well:

https://www.reddit.com/r/golang/comments/765izv/what_cool_pr...

5M req/sec is pretty cool.


What is stopping you adding Spec to your existing code?


The thing I would be getting from spec is confidence when modifying the code base, which I get from the unit tests I wrote pre-spec. If I were to initiate the project today I'd probably have less tests and more spec.


I’ve found that overly using spec leads to more maintenance than upside. We went all in using spec and generators when they were released and ended up having to debug the specs themselves. 2c


I think of spec like seasoning; if you put lots on everything, you ruin the dish.


And now that I've responded to the GP, to address your points:

> that means clojure is still being used for new projects, or not?

This has been / will always be a professional choice of the engineer(s) starting a new project at Netflix. Clojure is great for a lot of reasons and lets you target JVM/NodeJS at the same time (our two largest backend languages) but as a LISP most people aren't going to be excited about using it.

> Having a large successful project in clojure is lovely, but much of the community’s concern around it is that its hard to maintain, and falling in popularity, broadly speaking.

In my personal experience maintenance has been a breeze. I had a meeting at noon yesterday where a data scientist wanted a new feature in the query language and we had it shipping to production by 3:00pm. If the code base were say 10x as large I'm not sure I'd have the same opinions about ease of maintenance but I haven't leveraged spec and I've been able to continually increase abstraction to keep the code size small as more features came in.

As for falling in popularity that is my perception as well though its current level of popularity still seems sustainable. I'd imagine this has to do with it being a LISP, with STM not being as popular as anticipated and with spec taking longer than anticipated.


> but as a LISP most people aren't going to be excited about using it.

I agree that Clojure has a lot of strong points and that s-expressions probably put a lot of people off, but as a Lisp programmer, I was very disappointed in Clojure's debugging/interactive development story (and I've heard that from a lot of others). It feels more like using a typical scripting language compared to the traditional Lisp/Smalltalk experience, and even there, a typical scripting language would at least give useful backtraces. As it stands, I think a decent number of conventional Lisp programmers would also worry about large Clojure programs being unmaintainable unless they're superbly written.


> I was very disappointed in Clojure's debugging/interactive development story (and I've heard that from a lot of others). It feels more like using a typical scripting language compared to the traditional Lisp/Smalltalk experience

Common Lisp user here. I was also disappointed in the same way when trying Clojure.

Other minor things i didn't like was the noisy [] on the syntax, and the fact that for practical purposes you're fully tied to the JVM and the java runtime libs.


>>for practical purposes you're fully tied to the JVM and the java runtime libs.

That's more like a positive thing about Clojure. Java inter-op and targeting the JVM gives Clojure a great chance of adoption at large enterprises.

These days no one really has a issue installing jars on a production machine.


The latest Clojure in the pipe, 1.10, is supposed to have a lot of work done fixing stacktraces FWIW.


I'm very happy to be reading this!


Oh, yes this is another excellent example of pain while building Clojure code bases. Though I'll say I only agree with half of your statement. I've found the interactive development story to be great with nrepl/fireplace/vim but the debugging is downright terrible... this is the single biggest blocker to me using it for larger systems.


Debugging seems acceptable on emacs with cider. Sayid seems interesting as well.


Yeah, cider has an amazing form by form debugger for Clojure. And the way cider uses overlays to surface this is quite nice too.

My general experience of cider is that for a certain set of tasks it is much better than slime. But, the problem is, most of my day-to-day coding tasks are hampered by the language: e.g. if I have a web server running and I want to change a request handler, you can’t just recompile the handler, you also have to restart the server.


(Although, if you anticipate this, you can call the ref (e.g. `(#’foo arg)`) rather than the function `(foo arg)`. But this means that you have to plan the dynamically modifiable parts out ahead of time.


That's not true. You only have to prepend #' if you pass the function by value (as in your web server handler example).

If you call the function by name yourself, like (foo arg), and you recompile foo, any code that called foo will see the new version.


Hmm, I’ll try it again, but I remember issues with the function call syntax too. Iirc, it had to do with the fact that redefinitions don’t count if the use site is running in another thread. But I’ll double-check this when I get a chance.


It's a big exaggeration to say Clojure stack traces aren't useful. They have some extra noise but they do the job of pointing out the call chain and the exception value. Tooling (CIDER at least) can automatically hide the frames about the Java runtime and highlight the Clojure info & line numbers.


> Did they keep writing more Clojure?

> I wrote and deployed (to production) some Clojure code at Netflix just yesterday.

Seems like he answered at least one.


> Having a large successful project in clojure is lovely, but much of the community’s concern around it is that its hard to maintain, and falling in popularity, broadly speaking.

What community? The larger software community or the clojure community. I personally see no logical reason why a "large" clojure project would be harder to maintain then say a java, python, ruby, javascript, etc.. project. If anything, the guiding principles that make Clojure a well designed language at the micro level should have an exponential effect the same way poor decisions do.


What's the condescending tone for?


Do you know offhand how MQL compares to the Apache Calcite [0] extension of SQL that some big data platforms are using for streaming SQL?

[0] https://calcite.apache.org/docs/stream.html


Hey Alex,

I explored potentially using Calcite when we initiated this project. The syntax is very similar because both are an SQL dialect. Some differences are that MQL largely targets unstructured data (there is no schema for the streams it operates on - they're just streams of JSON blobs).

In addition to that one of the goals of MQL was to have different compiler backends which allows different call sites to operate on different levels. For example the code that runs in our API/proxy/other large services will look at an entire query and only evaluate the WHERE / SAMPLE clauses expecting something down the stream to complete the query. Conversely the client side is a full SQL -> RxJava implementation for the data stream. I can explain more of how this works if there is interest -- it allows us to only egress data that is actively used in queries so devs can log every request and only pay the cost of those for which they are querying.)

Doing our own implementation also allows us more customization, and to compile to a NodeJS backend as well which is a critical ingestion point for our operational data. We support everything on the linked page except hopping windows, subqueries, and DML (MQL is query only, we assume no structure which was true of the streams before we ever wrote a query language for it). Of course we have to implement a lot more of this ourselves which is pretty in line with Netflix's Freedom and Responsibility. We had the freedom to implement our own query language but have the responsibility to maintain it (Calcite would have been sacrificing some freedom to avoid responsibility.)


Thanks for the detailed answer - as you likely already know many projects have taken the approach you detailed in your past paragraph, it makes sense if your goal is more freedom for sure.

I am wondering how you do client side data egress filtering - does each event need to get materialized, assessed for certain fields or structure, and then sent once for each outgoing stream on your sender? Seems like a good strategy for reducing network bandwidth, but it might reduce your throughout moderately (Due to serialization and analysis costs) or cause hotspots (If you have a distributed stream that is partioned in a particular way)? These are normally problems each consumer individually faces, where now it compounds where the producer does that work for each of the N readers. I appreciate the idea a lot, just wondering if you’ve had any issues making it scale nicely for highly subscribed streams.


The events are already materialized in memory as part of whatever is recording them, thankfully.

The MQL that runs in the services checks the WHERE clause of every query running and then projects a superset of all the fields necessary for every part of the query for all matching queries (including other parts that the server won't be processing: group by, order by, having, etc...). It then tags the event with all of the matching queries. This way we only need to egress a single event.

What happens next requires a lot more context about Mantis, which is a reactive stream processor. The data is egressed from the service to what is referred to as a source job in Mantis... it is the source job's responsibility to multiplex this data to any of the consuming jobs. Jobs can be subscribed to one another so this is a natural fit for Mantis.

As for scaling everything is round robin after it leaves the service and enters Mantis but obviously busier services will pay more cost because they're processing more events. Regular expressions in WHERE clauses has been a pain point for scale, and absurdly heavy queries such as `SELECT * FROM STREAM` result in data dropping as most single machine consumers can't keep up with the large streams. We have different methods of providing different delivery semantics but in the base case we're at most once and will drop data if the client can't keep up.

We have some streams that exceed 1 million RPS at certain times of the day, and last we checked the system moves around 2 trillion events per day so we've managed to scale it to the needs of the company pretty well. It has been a while since I performance tested it, but it's always been a goal to have it push as close as possible to saturating the gigabit connection on the boxes on which it runs.


I came back to add: One of the things that just popped into my mind that Calcite has that we don't is query planning. Obviously since MQL was developed in house we didn't get any optimizations out of the box. We've since added a layer that processes the parse trees looking for optimizations and while it isn't quite query planning it has yielded some significant performance gains (or rather poor performance mitigations, haha).

I'd imagine getting this for free from Calcite would have been really nice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: