Hacker News new | past | comments | ask | show | jobs | submit login
The road to OCaml 5.0 (ocaml.org)
166 points by wczekalski on Oct 7, 2021 | hide | past | favorite | 115 comments



I really hope to see more interest in OCaml in the future.

It is probably one of the most underrated programming languages. The perfect marriage between state of the art functional programming and pragmatism. A great static and strong type system. Solid performance and an insanely fast compiler. Also compiles to JS if you need that.

Multicore support will make it quite perfect. Only thing that is holding it back more than that and the reason I have not done many projects with it, is it weirdly fragmented ecosystem.

Having to decide which standard library to use is a pain but you can cope with that. Tooling is getting there but stuff like automatic code formatting solutions are still pretty immature (and have really weird defaults).

Frontend there is that ReasonML/Reason/ReScript thing that Facebook it trying to do. It offers an alternative syntax but nearly nobody uses it because they changed the name and I think also the syntax three times already. So it is all a mess.

Don't let that stop you though. There are some pretty solid mature libraries in OCaml and if need be interop story with C and other languages is solid.


> Only thing that is holding it back more than that and the reason I have not done many projects with it, is it weirdly fragmented ecosystem.

I wonder if that's precisely why people use it. I've been thinking about it, and I think people using OCaml value independence a lot. That's something that doesn't help building a community, since communities often thrive on consensus. As an example of that in the linked thread: Yaron Minsky's second comment about Flambda 2, which I'll copy here:

> And, I should add: Jane Street’s intent to upstream our work is not the same as upstream’s intent to accept it. None of what I’ve said is an announcement on behalf of the core OCaml team, nor am I in any position to make such an announcement!

This comment, to me, speaks volumes in terms of respect for the independence of the OCaml team. And independence seems to be something Jane Street values a lot too. They have lots of libraries that they freely share with other people. If you want to use, in a way, their "flavor of OCaml", you're free to do so. And if you don't want to, you're free to do something else.

You can see the same thing with JSOO, ReasonML/Reason/ReScript and now Melange. You're free to pick what you want. Same thing with the multicore. You want to use it? Great! You don't want to? They are working hard to make sure your code will still work and won't suffer too much performance regressions.

It may be a bit weird if you're used to other communities, I know I took a long time to understand why things are this way, and I may still be completely wrong. But I think the angle of valuing independence explains a lot, and is also a good way to know if it's a language and ecosystem for you or not.

Another thing that may not help: the book "Le langage Caml" is a great introduction to the language and programming, but sadly it's not translated.


> Only thing that is holding it back more than that and the reason I have not done many projects with it, is it weirdly fragmented ecosystem.

Maybe you are referring to the Async/Lwt dichotomy? Hopefully with multicore (and basic support for "effects" that are going to be merged in OCaml 5.0) this will become less of an issue going forward. Since the runtime is becoming considerably more capable, I expect there to be less "real" fragmentation going forward as the libraries begin to use more of the primitives provided by the runtime rather than building their own from scratch.

But then again, fragmentation is a way of life in other ecosystems too. Haskell has an ever increasing number of effect systems and preludes, Rust has many async runtimes also (async-std, tokio etc.). Fragmentation can often mean a time of competition and vitality as different approaches duke it out.

Regarding syntax -- I feel too much time has been spent in the OCaml ecosystem on surface syntax. OCaml syntax has its flaws but syntax is really a small aspect of the overall art of programming. The OCaml format is here to stay -- even if it is a bit wonky. Once you commit to it you can begin to worry about more substantial things. The ReasonML community brought in the new syntax, but with the departure of Rescript (Bucklescript) from the OCaml community I expect the usage of the new javascript-y syntax to decrease.

(If I may be heretical, I actually prefer the traditional OCaml syntax! ReasonML tries to be like JavaScript with the braces and so forth. I prefer the Haskell/OCaml syntax to the JavaScript/Rust/C/Scala brace syntax. Interestingly, Scala 3 allows a braceless style in an effort to match Python perhaps. Fashion changes. Algorithms and programming patterns endure. We shouldn't worry about the syntax so much -- as long as it is not APL ;-) ! )


> (...) with the departure of Rescript (Bucklescript) from the OCaml community (...)

I wasn't aware anything like that is happening, is it? Rescript departed from Reason, that's all, right? They want to focus solely on js target, because... that's what rescript is. New stuff they're doing looks really good.


The best source of information is probably this thread: https://discuss.ocaml.org/t/a-short-history-of-rescript-buck...


Yes, thank you, I don't see any OCalm depart there, just depart from ReasonML is mentioned.


I think the "depart" is not supporting newer OCaml feature. While Rescript/ReasonML at first was just an alternative syntax, it has now become a standalone programming language.


I'm sure they'll support new features that apply to rescript runtime context.

Things that don't make sense in javascript runtime context should not be supported.

Backward compatiblity on OCaml side is also something they don't need to care about as its ephemeral - it must typecheck and spit out valid javascript code, that's all.


What I meant is that Rescript is not really tied to OCaml as strongly as it was before. The connection before was: ReasonML frontend + Bucklescript compiler backend. Now you just have Rescript which is a rebranding of Bucklescript with a more focussed front-end language (which is a subset of ReasonML).

ReasonML aims to supports all of the OCaml syntax while Rescript's aim is to preserve only that syntax that is beneficial for a JS target.

ReasonML syntax can be used to do backend OCaml programming as well as front end programming. Rescript is focussed only on the (browser) frontend and drops some features from ReasonML. Its all quite complex and confusing but the TL;DR according to me is that ReasonML syntax is not going to be as important in the OCaml community as before...


I don't think ReScript is a subset of ReasonML. It's forked language that already has some substantial differences and will keep diverging.

ReScript is very active. ReasonML is dead for quite a while now. I'm pretty sure ReScript will drop ReasonML syntax soon?


> It's forked language that already has some substantial differences and will keep diverging.

This is really about opinion and individual assessment. It depends on how much emphasis one assigns to surface syntax vs deep language features.

Yes, the surface syntax of Rescript has been cleaned up a bit. There are some new language features but the language is still "close" to OCaml/ReasonML as of end 2021 according to me. If you're able to understand OCaml/ReasonML today you won't have much problem reading Rescript.

But, yes, it is diverging and it could look very different in the coming years.


I find it really interesting how I had never heard of OCaml before frequenting HN, but how incredibly passionate so many people here are about the language.

It honestly seems like a great lang, and I hope I get to try it out for a project sometime soon.


"I find it really interesting how I had never heard of <THING> before frequenting HN, but how incredibly passionate so many people here are about <THING>" is one of the thing I love about this place. There are always people ready to share their knowledge and passion about something I didn't know before.


> Frontend there is that ReasonML/Reason/ReScript thing that Facebook it trying to do. It offers an alternative syntax but nearly nobody uses it because they changed the name and I think also the syntax three times already. So it is all a mess.

"Nobody" uses ReasonML / Reason.

Plenty of people use ReScript.

I wasn't a big fan of the ReScript split, but I think by now it's unfair to speak of these as if they're one community with a confusing story. I think it's very fair to say it's now two separate communities: ReScript and OCaml. It was very confusing for a while, but by now it actually is much easier to understand than before the ReScript split:

ReScript is really its own language now. The compiler for that language just happens to still understand OCaml syntax, for now. The ReScript language and community is focused on the JS ecosystem, with readable JS output.

OCaml has js_of_ocaml. JSOO compiles OCaml to JS, so it's focused on the OCaml ecosystem. The JS output is not readable but you can build "any" OCaml program.

Really that's the main story -- not so hard to grasp?

There is also melange, but that's a relatively new effort in the OCaml community (attracting OCaml-y refugees from ReScript) whose status I haven't formed a view on yet. The idea is to compile OCaml programs to readable JS. Reason used to do that, but Reason now only has a tiny community (and I believe it now uses JSOO?). ReScript still does that, but using it for that purpose is no longer supported.


> The compiler for that language just happens to still understand OCaml syntax, for now.

I think another important point is that the compiler is a fork of the OCaml compiler. That means that to contribute/maintain the compiler, you need to know OCaml. This is probably going to stay this way for a very long time, since the speed of the compiler is important.

> Really that's the main story -- not so hard to grasp?

What's not helping is that the people around Reason never really said "it's dead, move on to OCaml or Rescript". The pages for things like Reason Native, Esy, ReasonML are still up.


> I think another important point is that the compiler is a fork of the OCaml compiler. That means that to contribute/maintain the compiler, you need to know OCaml. This is probably going to stay this way for a very long time, since the speed of the compiler is important.

Important to the maintainers, absolutely -- but not directly important to users of course: it's firmly the goal that you don't need to know any OCaml to use ReScript (and I think it's mostly true by now).

> What's not helping is that the people around Reason never really said "it's dead, move on to OCaml or Rescript". The pages for things like Reason Native, Esy, ReasonML are still up.

I assume they'd say they're not dead. I think Jord Walke, for example, didn't want to break his commitment to existing users using it for native projects, so put in effort to help people switch to a JSOO-based Reason (I haven't followed that story, so not sure how that turned out). I think that's good behaviour.

They certainly have a far smaller community than before though, because most people went to either ReScript or OCaml, or kept a foot in both communities for different uses (and no doubt some left entirely).


> Important to the maintainers, absolutely -- but not directly important to users of course: it's firmly the goal that you don't need to know any OCaml to use ReScript (and I think it's mostly true by now).

That's true, and according to the maintainers, with the recent improvements it should be easy to maintain. And using tools written in other languages is not a first for JavaScript users (ESBuild comes to mind).


That's the main story for targeting JS, but reason native is not dead (unless something has changed recently, I haven't been actively working with it for a while).


Just my experience when I was trying out OCaml (I was mostly trying it out for frontend using the bucklescript-tea package) was that most people in that space used the ReasonML syntax. I don't remember my exact issues, but I was trying to avoid using the reason syntax because I preferred OCaml's. There were a couple of hurdles I needed to get over before being able to even really begin. This was probably 3-4 years ago, so things may have changed.


If they can also ship algebraic effects (and even better, typed algebraic effects), then I think it will push the language back firmly into "state of the art". This will mean it continues to get the attention it deserves. I'm excited about algebraic effects, I think they are much more intuitive than monads (and don't require code to be rewritten).


The current plan is to have runtime support for effects in 5.0, but without a syntax nor an effect type system. Those two will come later in the 5.x branch. The aim is to decouple the switch to the multicore runtime from the design of the typed effect system. In the interim period, effect handlers would be exposed through an experimental module (exposed through an experimental module (see https://discuss.ocaml.org/t/multicore-ocaml-september-2021-e...) to allow early experimentation.


OCaml Workshop 2021 - Experiences with Effects - https://www.youtube.com/watch?v=k3oQwpyXmpo


Exciting stuff! Keep up the good work!


multicore ocaml won't change adoption of the language in any significant way


In some strange way, that is what I like about the language. OCaml is not about amassing the largest possible user base but making a great programming language. Of course, these two objectives are not contradictory but the point here is that OCaml aspires to be a language on which you can write ground breaking and innovative software, not necessarily _popular_ software.

OCaml has an academic flavor -- maybe it's not as academic as Haskell but it moves in similar ways. There is a desire to be correct and have a theoretical framework instead of amassing a ton of language features. OCaml is the foundation for Coq and other interesting compilers, type checkers and theorem provers. Over the years, the language has grown more mainstream and you can build a decent web backend on it today, for instance.

So fine, maybe multicore won't change adoption of the language in significant ways. But I foresee that the introduction of multicore will allow some amazing software to be written in OCaml in the future. Software that is truly groundbreaking and innovative. Take the example of Coq itself -- it is an important foundational software today in Computer Science. Multicore will allow Coq to potentially speed itself up and that will bring more real world applications in the ambit of Coq.


I completely agree, only libraries in popular domains might


Is there a curated list of libs to review ?



I really wanted to like OCaml, still do. I gave it a good shot a couple of years ago, wrote a few basic programs and loved it.

But it to me seemed packaged like many languages in the days of yore, when a language shipped simply as a compiler, and nothing more. The way of the world today to me seems to be a compiler, together with a complete standard library and consistent packaging system.

My experience with OCaml was thwarted repeatedly by a byzantine exploration process of packages depending on other packages, which required other packaging systems. Once I reached that point where it felt like I was spending more time figuring out the complex ecosystem, rather than writing code, I rapidly lost interest.

And perhaps such a point comes in exploring any new language. But it came much too early for me in OCaml. I had so much more I wanted to learn, but couldn't. I am hopeful for the new release. Thank you for your efforts, OCaml team.


A couple of years ago, opam was the recommended package manager and dune was the recommended build system–just as today. The opam package index was also searchable for libraries. The OCaml website may have been slightly less clear about these things than it is now, but I think a reasonable user would have been able to find them, especially if they went to the forum and asked. People would have gladly answered questions.


I don't think that's totally fair. The Up and Running page of the OCaml website (https://ocaml.org/learn/tutorials/up_and_running.html) was added during 2020. Before that it lacked a straighforward introduction on what you need and how to install it. Node, Go and Rust all come with the package manager, and Rust even comes with a way of managing the different Rust versions. The essential part are here, and everything works well, but for new users it lacks polishing. You can argue that it would take a lot of time for a community that is a bit short on manpower, and that's true. But in the end the experience isn't as good as with other ecosystems.


Let's take a look at the page around this time two years ago.

Landing page: https://web.archive.org/web/20191002202720/https://ocaml.org...

From which you can click through to the Install page: https://web.archive.org/web/20190819032815/https://ocaml.org...

Over there the second and third lines are:

> The OCaml compiler and libraries can be installed in several ways:

> - With OPAM, the OCaml package manager (recommended).

That points you to the opam install instructions, which looks pretty similar to what it does now.

Look, I agree with you that OCaml installation and tooling are not the easiest to get into. But it wasn't like what the GP was making it out to be.


There's no mention of dune on the old page, which sounds like what they were looking for.

> Look, I agree with you that OCaml installation and tooling are not the easiest to get into. But it wasn't like what the GP was making it out to be.

I disagree with your interpretation of the initial post. Notice the "But it to me seemed" and "My experience with OCaml", and also "The way of the world today to me seems to be". These things are important. They inform us that that person is not exposing a truth, but sharing an experience, and personal preferences. They are also, I think, a way of saying "I don't know if the installation process was too hard, or if it was me that didn't understood something, but what I know is that in the end it didn't work out.".

So at this point, you have multiple options. You can empathize, you can work hard on trying to solve the issues on your end, assuming they are on your end, which might not be the case, you can ignore the post. What, I think, you shouldn't do is to deny this experience and imply that that person is "unreasonable".


Not sure when you tried it, but as a newcomer I have the impression packaging has got a lot better in the past few years.

I didn't have the experience you described (not yet anyway!).


>Hopefully, OCaml 5.0 will then be released between March and April 2022.

Just to call out expectation-setting here in the comments: yes, the MVP of multicore will ship in OCaml 5.0, but OCaml 5.0 will ship no sooner than March 2022 (and very likely some point later, based on how challenging it appears to be to integrate the large-scale changes for multicore).


My favourite quote about OCaml:

"Never have I took so long, to write so little code, that does so much"

OCaml can be a big learning curve, but I urge you to push through. The syntax might not be everyone's cup of team, but you get used to it quickly.


I really wanted to settle on OCaml as the "real programming language" that I would learn for any "serious programming" I had to do. I couldn't make it stick (in part because I don't actually do any "serious programming") precisely because of the syntax.

There's too little of it! OCaml seems to take a "you don't need syntax except when you need syntax" approach, which I found very destabilizing. One of the major online OCaml tutorials said something like "If it doesn't work the way you expect, try adding parentheses", and I thought "Oh hell no. In a Lisp I know exactly how many parentheses I need: all of them". I prefer not having to think about it, and letting the parentheses become invisible to me.

But otherwise I have a deep and irrational fondness for the language, and still wish I'd been able to make it stick.


On parentheses, this is one of the main reasons why I integrated ocamlformat with my editor: I write explicit parentheses around everything and I let the formatter remove the superfluous ones. No surprises or guesswork that way.


> One of the major online OCaml tutorials said something like "If it doesn't work the way you expect, try adding parentheses"

That sounds like what I did with C++ with * and & when I didn't understood them. Do you think it's a lack of exprience/comprehension on your part, or that some parts of the syntax are fundamentally flawed?


Well anything can be chalked up to lack of experience -- I'm sure if I was programming OCaml every day it wouldn't be an issue. Nor would I try to categorically claim the syntax is flawed!

But so much of the syntax, particularly around function calls, is simply a long row of whitespace-separated tokens, and I feel like my brain has to do extra work to parse what's what, and figure out associativity, and constantly remember what the ~ and the ? are doing. This[1] section of the tutorial makes perfect sense when you read it, but that doesn't mean you can easily scan a long function call with a lot of arguments, and instantly see what's happening.

The block-level syntax is great. But it got to the point where if I didn't write/read OCaml for a few days in a row, I forgot how it worked. And that's simply calling a function, nothing esoteric.

[1]: https://ocaml.org/learn/tutorials/labels.html#When-and-when-...


To quote the tutorial you linked:

> The syntax for labels and optional arguments is confusing, and you may often wonder when to use ~foo, when to use ?foo and when to use plain foo. It's something of a black art which takes practice to get right.

So I don't think it's just you.


Labels and labeled optional arguments, they were pretty straightforward to me when I was a newbie. I came from C. OCaml was the type of language in which I just wrote something and it worked on the first, or at worst on the second try without even having to look up the syntax. I would never be able to read Haskell, for what it is worth, even though in OCaml I mix imperative, functional, and OO; I use whichever makes the most sense. In a project I typically use all three.


Dang, it's finally happening. I've been waiting for 10+ years.


I run a hedge fund. On any given day I hear a large number of complaints from the technologists that complex python systems are difficult to look after and we should use something else instead. There's some Rust being used, but there's little chance to get a quant to use Rust to do research because research is an exploratory process and the last thing one wants is a language that requires a lot of thought about lifetimes etc.

How is the python-ocaml interop story? To be clear, any language that does not have first-class interop with python is basically dead in the water (at least for our case).


Have you looked into Julia, Nim, Clojure, or even Common Lisp? I'm not sure bout Python interop with CL, but Nim and Clojure seems to have some kind of beta-grade interop, and there's a solid interop story in Julia. And all of those languages have some of their own "native" data analysis and scientific computing toolkits (Julia having more than "some", of course).

That said, complicated Python systems can be improved a lot by adding type annotations. That's more of a solution for web servers and other "easily type-able" applications. Typing support for scientific computing isn't quite there yet. So it depends on what kinds of systems are the complicated ones.


Thank you, we've dabbled with Julia and indeed it works very well. We are just a bit worried about betting the barn on it so to speak. It's still very niche and we are just not seeing the kind of meteoric rise that Rust is exhibiting for example. we would ideally not want to become the sole caretaker of some niche language. Jane Street can afford it with Ocaml, but we can't :(

For that reason, Julia is being closely watched, but so far we are not thinking of pulling the trigger.


Link to the interop lib for Clojure you're referring to for people who don't know it: https://github.com/clj-python/libpython-clj

Really a remarkable feat of engineering. Here's its author giving a talk: https://www.youtube.com/watch?v=vQPW16_jixs


I was thinking of linking this as well. Clj-python is just such a fascinating junction. I don't care so much for clojure, though I'm continuously impressed by the ecosystem and the productivity of it's experts. Very cool stuff.

Core.logic and the like opened a huge door and similar ideas have exhausted my free time for several years now.


Julias interop with Python is excellent: https://github.com/JuliaPy/PyCall.jl (also with R, see RCall.jl). It's just not statically typed, so the original problem is not solved - albeit Julia being the better language for scientific purposes.

CL could also be great language-wise (https://digikar99.github.io/py4cl2/, https://github.com/snmsts/burgled-batteries3) but I don't know how good the interop is in reality since I haven't tried it.


There is an actively developed python to ocaml interop library for purposes quite similar to yours. I have seen demos where ocaml and python are used within the same jupyter notebook

https://signalsandthreads.com/python-ocaml-and-machine-learn...

https://github.com/thierry-martinez/pyml


Thank you, I'll pass this on. An important feature is zero copy arrays, which seem to be supported.


I personally haven't used it but Jane Street heavily uses OCaml and has written a blog post on this: https://blog.janestreet.com/using-python-and-ocaml-in-the-sa...


I think Elixir would be interesting for your usecase.

It's a dynamic, garbage collected language. It's easy to pick up and get going with. As a functional programming language there isn't a lot to learn in the way of language constructs, and you don't even have to do the 'wrestling with the type system' thing that you have to do in compiled functional languages like OCaml or Haskell (like you do in Rust).

Its processing 'horsepower' is probably comparable to Python, but it's much better for building low latency things if you want to run something in a bit more of a production use case. This is also improving due to the recent addition of a JIT.

The addition of NX is making Elixir an increasingly interesting place to do ML - write Elixir, have it run on GPU etc. See https://dashbit.co/blog/nx-numerical-elixir-is-now-publicly-...

Python integration is probably best done using the Erlang 'port' system - running Python as a managed process and communicating with it using messages over stdin/stdout. I use it for C interop and it works well (and fits well with the Elixir/Erlang process model). It's not difficult to roll your own in Python e.g. https://github.com/fujimisakari/erlang-port-with-python/blob... or look at something like http://erlport.org/


Thank you! So this looks interesting but it seems like there's no easy way to share numpy arrays?

The main use case for a language other than python is a more robust codebase but also performance. We need to be able to efficiently ship lots of large arrays between the languages and the Rust-Python interop supports zero copy arrays for example.


Elixir and Rust are very good friends, so to speak. Writing a library in Rust that you can use from Elixir is only slightly worse than trivially easy.

But I agree somebody has to put the work.

I've made a good career with Elixir but I still don't think it's a good fit for a hedge fund. IMO invest in Rust.


Ah, no. I'm sure that's build able in Elixir using a NIF (function built into the VM, in a similar manner to Python modules written in C) but you'd have implement it, I'm not aware of anything out of the ox.


Python type checking (type annotation, mypy) should at least partially solve the problem of maintaining complex Python systems. Though it doesn't help with performance.


The larger problem in my view is that big Python systems tend to follow OOP design since functional programming patterns do not work well in Python. So you start with something minimal and simple inside a script or notebook, but quickly it evolves into something more like a Java code-base.

Typing does help, agreed.


I strongly suggest the Attrs library for cutting down the boilerplate of making small "data classes": https://attrs.org/

With type annotations, you can move away from "inheritance OO" to logicless "data classes" and functions that operate on them.


Java has good to great FP, making most objects immutable is also trivial. Mypy is good to get strict typing enforced but, but I still prefer the native and slightly wonky typesystem to a tacked on one that builds on trust.


> Java has good to great FP

Java does absolutely not have "good to great FP" support. It's an imperative and OO programming language that recently got lambdas, no more, no less.


Yeah 100% - Java lacks many of the features required for serious FP and the ecosystem of libraries is heavily OOP too (although functional wrappers can usually be implemented).


There's sealed classes, records, pattern matching, optionals. It's getting there.


All good improvements, but Java is missing some really key pieces:

- pipe operator (or custom operators in general)

- do-notation

- tail call optimisation

- currying / partial application

- (better) type inference

- expression orientated

- less syntactic noise around function calls (fewer commas and parenthesis)

- type-classes or even runtime generic information to work around that

Some of the thing I’ve listed can never be added without dramatically changing what Java is. At that point it would be a new language.


I think you're moving the goalposts from "good to great FP support" to "is Haskell". I think that's the only one that fullfills both do-notation and tail call optimisation at the same time. Well, there's also Purescript and Scala.js running on JavaScriptCore since it has tail call optimisation, but that's a bit of a stretch. I see a lot of people like you that seem to have a very precise idea of what FP is supposed to be, and that idea always boils down to "basically Haskell". I think that's a fallacy, and that Haskell is not the only way to do FP. We have now languages like Koka, and soon (ish) OCaml that have effects, and are going to provide an alternative to Haskell's monads. There's also stuff like Scala with the DOT calculus. Lots of exciting stuff is happening in that space, and being a purist of one specific implementation isn't helping.

To get back to the subject of "good to great FP support", maybe we disagree fundamentally. Here are a few examples:

I would say that Rust is a good example of "great FP support". It's not a functional programming language, but it supports a lot of the patterns that make statically-typed functional programming languages effective. However, its DNA is still mostly "imperative programming language".

For "good FP support", I'd say TypeScript and C# are nice examples. C# has LINQ, pattern matching, records, optionals in a way. TypeScript has the usual JS stuff, plus the type system is nice. Both are lacking in some places. For example, in TypeScript/JavaScript, the usual map/reduce/filter are only functions on Array, and don't work with iterators.

To take the opposite view: what would be a functional programming language with good or great support for another paradigm? OCaml could be an example of good/great support for imperative programming. Scala has great functional programming and great object-oriented programming.


I would add F#, which can do Object Programming like C# but has most of the features I listed above.


Works up until every function has a

  def calc_xxx(df:pandas.DataFrame) -> pd.DataFrame 

type...


I would write principled Python with strict coding standards. Make type annotations mandatory and turn up pylint or flake8 to maximum warnings. It really helps avoid a bunch of silly mistakes, while still providing a way out for doing crazy stuff that Python is good at.


Some of the Python FFI tools are listed here: https://ocamlverse.github.io/content/ffi.html. But clicking through to GitHub, the repos haven't been updated in a while.



I wasn't counting changes to project metadata like gitignore


If your language doesn't worry about lifetimes, they don't go away. It just means you have to worry about them yourself instead.

Sometimes that is great. Other times, that will be very hard and error-prone.


When you are trying to solve a complex optimal payoff problem, you really don't want to get bogged down with lifetimes. That's a completely orthogonal concern to what you are trying to establish. You are not writing production code, you are doing research. It's the core reason why languages with easy REPL and immediate feedback (like matlab, R, python, julia, etc...) are used for research, because you get immediate and interactive feedback. The keyword is interactive.

Once you have to think of types and lifetimes, a lot of the productivity goes down the drain.

99% of the stuff you do in research ends up being consigned to the cutting floor because it doesn't work. The 1% that ends up being useful is the only part worth productionizing.


Most of my day-to-day coding is in Typescript and I often find myself wondering if more than a few jobs wouldn't be easier and faster with plain JS and no need to feed the type checker. Many of the same thoughts about you vs the language tracking things applies here too.

In my case, I'd say that yes, I have to track the types myself, but the tradeoff is at least sometimes worth the extra mental overhead in my opinion. I'd say that the same can be true for lifetimes as well.

In your case, you will definitely be tracking lifetimes on some level and to say otherwise is going to be false (even GC'd languages must track lifetimes to ensure garbage is eliminated). The question is about the mental tradeoff vs the time taken. I'd guess that you are correct in your assessment. My only real point is that there is a cost that should be considered.


You are correct, lifetimes are being tracked in some form at all times. So let me rephrase: lifetimes are irrelevant to the primary work of a quant during research. The objective is to establish whether an idea works. In 99% of the cases, it doesn't and the code and the project are a dead end. Under these circumstances you need to reduce the amount of cognitive overhead that goes into program structure to a minimum.

Research is fundamentally different from the usual programming exercise. Research is like prospecting. You want to try as many different locations as possible. You don't want to build nuclear shelter grade construction at every potential site because 99 out of a 100 is a dud. You'd never find anything.

You want a tool that allows you to get quick results to confirm if a site has potential and then you want to be able to scale your tools for proper mining.


An extremely simple thing like having two objects stored in a struct where one object has a reference to the other is a Herculean task in Rust. This is not a language designed for prototyping...


This is perhaps where I get hate from both sides, but I think such cases are best done in unsafe code blocks.

A tool that makes something much harder without any meaningful gain should be avoided. Rust provides the tools to not have to fight the system and they should be used in these situations.


That's what my solution is. I am not going to get into pins and get cargo crates to solve such a simple problem. I just resort to unsafe. But then at that point...C++ is easier for me.


You can easily avoid such constructs in most cases though.


I've allocated both these things on the heap and Rust simply won't let me store them both together. I don't know about you but this is an extremely common pattern in almost all languages, you just don't think about it in the gc languages and in C++ storing pointers is no issue. The popularity of crates like rental also shows that it's not as easily avoidable as you suggest.


Can you post a simple code example? It is hard to imagine what the difficulty is.


You can read through this SO question and the top response does a good job explaining the possible solutions: https://stackoverflow.com/questions/32300132/why-cant-i-stor...


Seems like using Box solves this issue fairly simply.


Its not the language its the people, instagram is almost entirely run on python, if they can so can you. https://instagram-engineering.com/tagged/python


This is a terribly misinformed take. If you throw enough resources at Python then sure, you can probably get adequate throughput. The problem is that in finance a lot of problems require you to think about latency, which is a total non-starter for Python


If it were a total non starter than why would their entire company be using it?


It's not clear which company you're referring to, but

- Instagram was at one point a startup. It's common for startups to write in a scripting language to optimize for speed of adding features. Then if you grow or get acquired, the scripting language often eventually gets replaced by a language more optimized for maintenance cost, safety, and/or speed.

- Data scientists often only really know scripting languages, and at any rate scripting languages are useful for prototyping algorithms that need to change daily. Hence a lot of hedge funds use Python. For code that is stable and really matters for performance purposes, it's common for funds to use C/C++ or even FPGAs.


I have no doubts that we are able to handle our requirements with python, but if there's a better way, does it not make sense to investigate?

A skilled carpenter can undoubtedly use a hammer instead of a screwdriver. This doesn't mean that I should insist they use a hammer when a screwdriver would do a better job.


Sounds like they're already using Rust where performance matters and are looking to switch


Instagram has much lower latency requirements…


I believe they’re referring to the hedge fund. Lower latency requirement is a confusing way of phrasing it as lower latency is a higher bar. But generally most hedge funds do not need to operate at the speed Instagram does.


So Instagram runs on Python? That only proves that the Instagram team can build Instagram in Python. How does that help me with my technology choices?


> last thing one wants is a language that requires a lot of thought about lifetimes etc.

I challenge you: A lack of understanding about the data lifetimes in a program means lack of understanding about the data.

Not saying you can't have a lot of short-lived data items that you don't want to manage one-by-one. I'm saying that for the vast majority of data items, one should be able to give a reasonably well defined lifetime upper bound. So a good solution is to make a few boxes that group items by lifetime. And from time to time, throw the outdated boxes away.

And of the few items that don't have such an upper bound at creation time, many can be created in a special box that allows migrating boxes later when required.


> A lack of understanding about the data lifetimes in a program means lack of understanding about the data.

But this argument can extend forever.

Is your program precisely dependently typed? If not is that a lack of understanding about the nature of the data as well and should you challenge yourself to fix that?

You have to trade-off how much you specify things with how valuable it is to get the result more quickly.


What you say is true. I only brought up "boxes" because the concept is still not widely known.


You don't have to challenge the person you're responding to. You have to challenge their quants. And they're not going to want to add that into the million other things they're thinking about while doing research in a Jupyter notebook or something.

You're just not going to get this buy-in from people who want to use a tool to get their work done.


Thanks, but I think we may be talking cross purpose here. 99% of the research code ends up being thrown away (well, archived). Not because it's bad code necessarily, but because the idea that was being prototyped is a dead end. This means it's paramount that the language you use has to be as low friction and interactive as possible.

Imagine you are trying to establish whether there's a relationship between timeseries X and timeseries Y. You just want a tool that allows you to quickly calculate some summary statistics of these timeseries, clean them, convince yourself that they behave according to your expectations and then run some form of regression.

Nowhere in this process do you care about lifetimes. It's literally irrelevant. In fact, as long as all your work fits into memory, you don't even care about memory management. Your objective is to answer the primary question, everything else is a costly distraction.

The 1% of ideas that ends up being worthwhile is what gets productionized and needs to be robust. But obviously rewriting everything from language A to radically different language B adds it's own headaches.


This is a great explanation about concurrency and parallelism and where multicore fits, FYI https://discuss.ocaml.org/t/multicore-ocaml-vs-thread/5838/1...


I write a lot of Scala for living, Ocaml looks a bit outdated to me. Having said that, Ocaml compiler is one of the greatest miracles in PL when it comes to speed vs complexity of the language. Scala/Haskell/TS are not even close. I hear Ocaml's runtime performance is not too shabby either


What do you find outdated about it?

> Having said that, Ocaml compiler is one of the greatest miracles in PL when it comes to speed vs complexity of the language. Scala/Haskell/TS are not even close.

Someone will probably come correct me but what I've heard is that the compilation speed partially comes from the Pascal/Modula-3 influence, since Niklaus Wirth took compilation time into account when designing programming languages. From what I understand, OCaml doesn't allow circular dependencies outside of a single file, and that helps. Go doesn't allow them too, and is also known for its compilation speed.


The OCaml module system and its separation between interface and implementation is inspired from Modula-3 indeed. And the OCaml compiler is built to be able to compile compilation units while only knowing the types of its direct dependencies. This helps with both separate compilation (you don't need to recognize cycles, nor do you need to know anything about the implementation of your dependencies) and incremental compilation (you can minimize the number of components to rebuild if only an implementation changed and not an interface). It is surprisingly easy to break this property, for instance by requiring to have some global knowledge of all types involved in a program during compilation, or only compiling monomorphic functions.


Thank you for the explanation.


That's funny because the Scala 3 new syntax seems to copy quite a bit from OCaml.


Out of curiosity, what problems does Haskell's compiler(s) (I think it's really just GHC these days?) face that OCaml's doesn't?


speed. Ocaml compiler is probably as fast as Go one


GHC has many backends and there is GHCi as well.

No need to always go through the slowest path.


The regular backend is the fastest according to their docs: https://downloads.haskell.org/~ghc/latest/docs/html/users_gu.... Having an interactive toplevel isn't a substitute for fast compilation (and OCaml has both).


Agree - there are some aspects to OCaml that feel a bit outdated but the language has been trying to refresh itself over the last few years. With multicore (and a minimal version of effects) in OCaml 5.0, certain aspects of the OCaml will become state of the art again. This is just the start though -- lots of interesting features (around effects especially) should land in the future.

You mention that you write a lot of Scala for a living -- just as a friendly (and intended to be a light hearted) riposte, some aspects of Scala strike me as "long in the tooth" too. With Scala 3 the language has done an admirable job to modernize but I find:

- The language feels heavy and (unnecessarily) "enterprise-y" -- reminiscent of the early 2000s rather than 2021

- The JVM is capable and performant, no doubt, but adds another heavy-weight and monolithic feel to the Scala platform. (Scala native likely to be essentially minuscule for years to come)

- The language veers towards a C++ style "I will have every PL feature." Sometimes less is more

- A Scala IDE (metals or JetBrains) feels clunky. sbt is over engineered and slow and given how important it is to Scala, does not give a good overall impression of the Scala platform

- Some questionable language features like implicits remind me of magic in Ruby (implicits are addressed in Scala 3 but I wonder how many years the ecosystem will have to deal with their complications -- forever??)

- The JVM seems to let down Scala in other places. Example (a) Null is rarely used in Scala but it could still pop-up in weird situations and not always because of Java interop. (Scala 3 tries to fix this via "explicit nulls" but there are compromises with that feature also). (b) A Functional style Scala (Cats and others) is popular. But true functional style has a lot of recursion. This, according to me, requires proper tail call support in the runtime which the JVM will never have. The Scala compiler tries to be smart but I wonder if it is able to deal with tail calls without blowing the stack in _all_ situations. In other words, it is difficult to do a "Haskell" on the JVM -- which we can see in a lot of places in the Scala ecosystem.

(BTW, I have pointed out some flaws of Scala but notwithstanding my criticism, Scala has got many good features that make it worthwhile. I may use it for a future project, lets see...)

> Having said that, Ocaml compiler is one of the greatest miracles in PL when it comes to speed vs complexity of the language.

I totally agree with the statement. Its a very balanced language in all important parameters: a high level of programming abstraction is possible, the LSP language server is responsive, the dune build system is great, compile times are really miniscule and run-time performance is great for a garbage collected language.


I disagree on everything you said about scala, except your point about JVM :) but obviously I am biased. WRT to JVM, pure FP recursion (beyond simple tail call elimination) relies on trampolining which is a whole other can of worms. Stacksafe but with heavy performance penalties.


Even Odersky regrets the abundance of curly braces, now that Python is eating the world.


> - The language veers towards a C++ style "I will have every PL feature." Sometimes less is more

Do you still feel that way with Scala 3? From what I understood, the work on the DOT calculus helped reduce and simplify the core of the language.


Yes, part of the reason why I am generally excited about Scala 3 is because of the work on the theoretical underpinnings of the language on the DOT calculus.

Unfortunately I don't know much more about this other than "this is a Good Thing" and has helped/will help with dealing with edge cases in the language and the compiler, better type inference etc.

But Scala 3 still is overwhelmingly compatible with Scala 2.x (which is a required because of the tons of legacy code out there). Given that Scala 3 continues to be essentially the same language as Scala 2, the overall complexity of the language has not gone down very much even though the core of the language is now more consistent.

Put another way, the emergent complexity of the (tad more uniform) building blocks of Scala 3 still needs to be tackled by programmers.

I also want to point out that Scala 3 compilation speed is supposed to be faster but generally speaking the compiler is still slow-ish.

All in all, Scala 3 is more compelling than before. I may still adopt it in the future for a project. But I'm not as starry eyed about it than some others may be...


That's great. It's a terrific language.

Python let's me start programming quickly. ML let's me finish quickly.


A lighthearted but very true remark: OCaml is a wonderful language.


<Obi Wan's voice> "OCaml... That's a name I haven't heard in a long time..."




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: