Hacker News new | past | comments | ask | show | jobs | submit login
Hobbes – A language and an embedded JIT compiler (lambda-the-ultimate.org)
152 points by ah- on July 16, 2017 | hide | past | web | favorite | 49 comments

This should be of great interest to anybody working with high volume sensor data in IOT field. I will be checking it out very soon (on a few deadlines now so cant afford the diversion). Anything that is fast and simple for massive volumes of small record types should always be of great interest to IOT/WOT folks.

I agree with the comments about KDB/Q - tried to look at it but could never understand it. Maybe cos im not fulltime dev, nor in that field, but KDB just always becomes too hard. AT least hobbes looks like i can work it into C++. Looks at first 'parsing' to be a case of right tool for the job of working on high volume discrete data which one would expect from an investment bank.

Kudos to the bank for releasing this part of their secret-sauce.

Thanks, I hope to hear about your experience if/when you look into this (and I'm happy to help in any way I can). :)

I started the hobbes project at MS and am happy to answer any questions that folks have about it!

Are you able to comment on your implementation of row types?

You give the following type for a field selector:

> :t .foo

(a.foo::b) => (a) -> b

These give constraints with arbitrary types embedded in them. Most examples in the literature using qualified types, suggest something like the following type:

(r lacks foo) => { foo :: b | r } -> b

Where r is of kind row. The lacks constraint here is a very simple constraint and it is clear that this function takes a record, the disadvantage is that predicate lists can get very large. The old Haskell TREX extension built into Hugs worked this way.

It would be great to learn some background behind your choice. Thanks.

Yes, TREX and Mark Jones's work on qualified types was very influential to this project. I am a big fan of that work.

I added a few details in this post below:


I hope that answers your questions although I know that there's a lot more that could be said about this topic.

What motivated you to start building your own language? Did you use much kdb/q before?

How did you manage to get this open sourced?

Well I have been interested in programming languages for a very long time. I was hired into a group at Morgan Stanley specifically to develop this project. We had some initial use-cases where some internal trading systems could be "hot patched" with new logic but it was very slow (there was basically a simple AST-walking interpreter here). So at first I just had to make a roughly equivalent thing that could be much faster. However once that immediate need was satisfied, we found many more interesting places to apply this tool. I would say that the storage and live data-analysis features are just as valuable as the efficient machine code produced by hobbes.

Yes I have used kdb/q before. It's pretty popular in the financial domain. I really like what we can do with type systems. :D

Getting this project open sourced was really accomplished by other folks who I'm not sure if I can name here (I will check with them) but I was very happy to help make it happen. There are several folks internally who have been working really hard to make it possible for the firm to open source the work that we do and hobbes is just one part of that effort. There are several folks in management positions in the firm who have made open source efforts like this a priority and have even gone out on a limb to make sure that this happens. Morgan Stanley is a really remarkable company in the technology/finance domain, in my experience.

I'd like to know this as well. I know most array languages coupled with a DB aren't as fast as a language with a JIT, but the code is quite expressive. If Kdb+ can ever easily be run on an FPGA or GPU cluster...that would be something.

Not kdb+, but a proprietary (internal-only) language that it heavily influenced was designed around execution on GPU clusters.

The FPGAs were used mainly for feedhandlers and there was a different DSL for that (compiling to verilog).

It was indeed rather something to see :)

Was it received favorably? A secret sauce, or a failed project.

Secret sauce, at least for the team involved, and well-received. Nice combination of extreme perf and decent productivity for them.

I've moved into non-finance stuff for quite a while now though, so not sure what's become of it. Given business challenges in that particular sub-field, who knows..

There was a paper a while back from someone working with MS on a Q typechecker in Prolog. Was this you?

Nope, that wasn't me. :)

The LtU blurb says "a variant of Haskell", but the github readme does not mention non-strict. Strict or not?


any plan to move beyond "allocate linearly and never free anything" memory management ?

semantics look nice and coherent but unfortunately without some kind (automatic or manual) of memory management it's not usable as a general purpose tool.

I wouldn't summarize it that way. We obviously "free" memory or else our trading systems wouldn't work and we'd be in the news in a bad way. :)

Allocation does happen out of a default region and applications that embed hobbes generally define "transaction boundaries" for their use of compiled expressions which may allocate. It's also possible to manually nest these regions and control them yourself, which we do in some applications (e.g. where we're loading whole days of trading data and indexing them in batches).

Ultimately we might formalize that process with a region inference algorithm.

Wait, so there's not even manual memory management? I can't even call free()? How would this work then?

All embedable languages should come with their own header parser/code generator to save us from the hassle of generating a bunch of class wrappers and boilerplate registration code. I think this high up in the list of concerns in a professional setting.

This looks like a typed take on KDB/Q, something that is long overdue! The key to making this work is structural typing of rows (row polymorphism / extensible records). I'd be interested to see more details on how they tackled this.

Morgan Stanley has a history of languages in the vein. MS was a big APL shop in the 80s. Arthur Whitney[1] worked at MS and developed the languages A, and A+. He later wrote J, and K.

[1] https://en.wikipedia.org/wiki/Arthur_Whitney_(computer_scien...

I think J was mainly written by Roger Hui (now @ Dyalog APL) after Ken Iverson designed it himself. I think Roger was inspired by Whitney though...or was Whitney actually involved?

Yes, structural typing of records (and variants and recursives) has been very valuable to us.

There are a couple of type constraints (ala type class constraints) that make most of this work.

There's a constraint that looks like 'r={lbl:h * t}' that decides if/how a type "r" can be broken into a head field of type "h" (with a field named "lbl", this type variable gets the field name and can be used to e.g. relate one record type to another, like for deciding structural subtyping conversion), and a successor record type "t" (this bottoms out in the unit type).

There's also a constraint "r.f::x" (or we write "r/f::x" if "f" is a type variable) which decides if/how the type "r" can be indexed by the field name "f" to determine a type "x".

(Field access can also be overloaded, and we have some pretty awesome use-cases for that, hopefully I can put up some examples of that shortly.)

So e.g. we can use the destructuring constraint to make a generic print function for all record types, like:

  instance (r={lbl:h*t}, Print h, Print t) => Print r where
    print x = do{putStr(lbl++":");println(recordHead(x));print(recordTail(x));}
The field access constraint makes basic function definitions trivially "duck typed". So e.g. if I write ".jimmy" (shorthand for "\x.x.jimmy" or if you prefer Haskell syntax this would be like "\x -> x.jimmy") then it will select the "jimmy" field of whatever type it is given, however that is done for that particular type.

There's a lot more to say on this point but maybe it should be turned into its own doc on the site...

Ah overloaded field access, that explains the type signature. I implemented a very similar extensible record/variant system for a product definition language at Barclays. I hope we see a lot more structural typing in industry going forward, it is an especially good fit for typing queries.

I'd love to see more about this, seems very interesting language. Something to explain how/where this is/could be used, maybe few more complete examples to show the language etc. Also some notes about performance would be great, I guess it is reasonably fast by the looks of it, but that is pretty vague.

Yes the whole project started because we needed dynamically-determined logic (in various places) that could fit in very tight latency budgets. And the output of this compiler is put in a critical trading path that sees a very large portion of all daily US equity trades.

So there is a lot to say about performance.

I wish there was a more clear language tutorial, instead of mixing embedding tutorial with the language itself.

Yes that's a good point, I do hope to build out more targeted documentation and example programs over the coming weeks.

Is this a "pure" language like Haskell? Or it's more like OCaml?

And, even if it's not pure, is there any syntactic sugar for monads?

Anyway, this looks awesome, especially with easy C++ interfacing that seems to be there: a system where you can have machine learning (anything from bayesian to deep nns) code in C++ and business logic in something Haskell-like is the stuff of wet dreams...

It's not pure currently, so definitely more like OCaml.

We make frequent use of overloaded array comprehensions, which is one form that monad syntax can take. The "do" notation introduced by Haskell isn't supported right now, but maybe soon. I'd like to refactor the parser code a bit anyway.

Imho non-pure is probably ok. Haskell's purity is probably what destroyed any chance of it growing popular enough to have a healthy ecosystem of libraries and community.

Anyway, since this has no memory management, neither rust-like nor gc, this is not in the niche I care about now, so moving on...

But again, great work. "Strict Haskell" with good C++ interop and decent records is awesome.

Can someone explain variants vs sums vs tuples vs records? This sentence from the README confuses me: "We can combine types with variants or sums (the "nameless" form of variants, as tuples are to records)."

Sure, I guess "record type" makes sense? Basically a record type is a "struct" or a series of values with particular names (the "field names"). A tuple is the same thing, but by convention the "names" are by position (so you have the "0th field", the "1th field", etc).

A variant can be _one of_ a series of values (where a record is _all of_ a series of values), and each of the things can have a particular name (the "constructor names"). As with tuples to records, a sum type is just a variant where you don't care to name the constructors (so you have the "0th constructor", the "1th constructor", etc).

A common example of a variant is an "enum" type, where here the payload types are trivial (just the unit type) and only the constructor names matter. In hobbes we might write the type "|red:(),green:(),blue:()|" or the shorter syntax "|red,green,blue|" for an enum that we might write in C as "enum { red, green, blue }". You could also represent this same type as the sum "()+()+()" but that looks a little weird/profane and maybe not clear to other programmers.

Tuples are sometimes called "product types" and that connects well with sum types and the general logical/combinatorial interpretation of types. If you see the type "bool * bool" (or perhaps "2 * 2"?) then it means "a bool AND a bool" and there are four such values. Similarly if you see the type "bool + bool" then it means "a bool OR a bool" and it's either the first or the second one.

If you're interested in the general area of type theory, you might like the book "Types and Programming Languages" by Ben Pierce.

> you will need LLVM 3.3 or later

Though it doesn't build with 4.0. (And is now in the AUR.)

Nice to see first-class parsing in a language.

OK, there's now a PR in the queue to allow hobbes to build with LLVM 4.0+. Pending review, it will likely be merged to master soon.

Doh you're right. We will need to look into LLVM 4.0+ soon.

Examples do not look very readable.

TL;DR: They do not look readable to you because they are a "foreign language" you are unfamiliar with, much like Japanese is to most westerners.

The vast majority of languages are essentially minor syntax "skins" applied to the same underlying Algol ideas; This describes C, Pascal, Python, Ruby, Perl, Java and most other languages - which is why being proficient in one lets you read others. Haskell, Ocaml are close enough to read, but usually not write, unless you are specifically proficient in one of them.

APL (and J and K and likely Hobbes) are very different, both in concept and syntax -- to the point that you actually have to know one of them to be able to read any of them. They look intimidating at first, and they often take longer (per character / per line) to read even when you do know them, but they are very orthogonal; e.g. in K, the expression |/0(0|+)\x efficiently computes the maximum subarray sum of the array x; this is not a predefined program - rather, each character here carries a meaning, and all combined they solve the mss problem.

If you looked at Japanese[1], it would look unreadable to you, but that's not because there's a problem with Japanese.

[0] https://en.wikipedia.org/wiki/Maximum_subarray_problem

[1] Pick any language you do not know with a character set you don't know - Japanese, Arabic, Farsi, Hebrew, Thai ...

Thanks, I do have it in the queue to make more targeted examples. I realize that what's there is a little scattershot and each of the use-cases mentioned could use a lot more detail.

For example, inside MS we have these reactive hobbes processes that watch log data and incrementally build B-trees to deliver future query responses very quickly. I hope to put some examples like that up in the coming weeks.

I may be way off base with this thought but I can't help but ask the question anyways.

Back in last year's HotChips conference Oracle spent a fair bit of time talking up new hardware capabilities targeting databases and their new ISA extensions that support it. It seems to me that this might be a good fit for some of the things it seems like Hobbes might well suited for. So I wonder if you guys spent any time thinking about that (or any ISA beyond x86).

I am not aware of that work from Oracle, but it would be interesting to talk with those folks!

Our main focus has been x86, although there are interesting alternative platforms.

Hey thanks for the response.

I am not a sparc developer, so I can't really point you in the right direction with any certainty. I did a quick web search for the buzzwords they were using in the HotChips presentation ("Software in Silicon") and found this link, which has the slides from the talk among a whole lot of other stuff. My expectation is that you guys might find the database / analytics acceleration stuff of interest.


Anyway, my question was motivated by a desire to inform my nebulous thesis that getting developers, even really smart devs working for major corporations with deep pockets, to target specialized hardware or to do optimization beyond getting code to compile with a favorite compiler is really difficult. My follow on theory is that this is going to create some interesting tension as hardware developers are increasingly motivated to do these sorts of things as Moore's law runs out of steam.

They look readable to me, certainly much more so than K or Q.

> much more so than K or Q.

For most people that is not very high bar.

> > much more so than K or Q.

> For most people that is not very high bar.


Out of curiosity, how much experience do you have with APL, J, Q, or K? They aren't like C, so none of your previous programming skills are transferrable, but that doesn't mean it is hard to read.

I have programmed some Q professionally. It provides a lot of syntactic sugar, does not support operator precedence and there is a culture of avoiding whitespace. These can make it hard to read sometimes.

That i can agree with.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact