Hacker News new | past | comments | ask | show | jobs | submit login
New implementation of Git in OCaml (github.com/mirage)
245 points by tosh on Aug 26, 2017 | hide | past | favorite | 124 comments

I've been using OCaml for a couple of side projects, and I have been absolutely blown away by the amount of power this language provides. It strikes a nice balance between high-level expressiveness, while not sacrificing on performance.

I feel like OCaml is one of the programming world's best kept secret - if it had better tooling and better marketing, it could have taken over the world. I'm cautiously optimistic about ReasonML for this reason.

Version incompatibility is a big problem for OCaml. Every time I try to use it seriously, the recommended tools never seem to work together on anything but the latest version. It's like Rust, but without the "young language" excuse. My impression is that INRIA is too eager to add features and tweak the language.

Next, everything builds so slowly, and just trying to set up Core, utop, and ocp_indent is a trial in patience, watching the same dep build again and again, confusion about OPAM switches.... nope nope. OCaml/OPAMs's got a distribution problem. They've pushed the complexity off to the end user and the experience is terrible. Jane Street planting their flag on the language and encouraging massive dependencies like Core, or shit like "corebuild" is even worse. I would never depend on anything OCaml-based on it unless it could be installed by my OS' package manager (Arch/AUR doesn't count, because it builds). That rules out most popular OCaml tooling, and ruins the development experience.

I'd prefer OCaml to Haskell, I find it more practical, but I feel much better depending on Haskell-based software, which generally "just works." As much as they're a wart, Haskell syntax extensions are a real benefit. You can build new fancy Haskell software with older GHC versions.

> the recommended tools never seem to work together on anything but the latest version

Whenever a package is added to the repository, the CI tests (the latest compatible version of) every package that depends on it to check everything still works. However, non-current packages may break if they're missing upper bounds (it doesn't check all previous releases).

However, since opam-repository is just a Git repo, you can clone a known snapshot of it at some point in time and keep using that. That allows you to continue using old versions of packages without the risk of new software causing unwanted upgrades.

> My impression is that INRIA is too eager to add features and tweak the language.

That's strange. I've always had the opposite impression - that they maintain compatibility at all costs, including leaving sub-optimal APIs in the standard library.

> Next, everything builds so slowly, and just trying to set up Core, utop, and ocp_indent is a trial in patience, watching the same dep build again and again

The Core alternative standard library is pretty huge, indeed (I don't use it). I just tried installing all the tools you mention in a fresh container:

$ time docker run ocaml/opam:debian-9_ocaml-4.05.0 opam install core utop ocp-indent


So, 6 min to set up a dev environment with those tools. That installed 67 packages, so compilation took about 5 seconds/package on average. I'm not sure why it would build the same dep twice - I haven't seen it do that.

"> My impression is that INRIA is too eager to add features and tweak the language.

"That's strange. I've always had the opposite impression - that they maintain compatibility at all costs, including leaving sub-optimal APIs in the standard library."

When I started using OCaml, many standard library functions weren't tail recursive---you couldn't get the length of a list with more than ~10,000 elements, for example. The library definitely seems like an afterthought compared with the language. (And that would be why there are so many of them.)

Why would you hold 10,000 elements in a linked list?

That's just - for example - every point in 100×100 R^2 cartesian grid. 10k elements is not by any measure a 'large number of elements'.

Just pick your algorithmic use case for a good excuse to use list and not an array from here:


When you first write design your program, you might indeed use a different data structure than a linked list when you anticipate 10k+ elements.

But often programs are used long after they are designed, and it is great if they can degrade gracefully as you move outside their original design parameters.

I couldn't hold them in my cupped palms.

> However, non-current packages may break if they're missing upper bounds....

Out of curiosity, how feasible is it for opam packages to use lockfiles like npm is nowadays? Then packages which depend on some other package will continue to build against the exact same version they always did, until and unless they are explicitly upgraded to the latest version of that dependency.

> I'm not sure why it would build the same dep twice - I haven't seen it do that.

Maybe https://discuss.ocaml.org/t/why-does-opam-recompile-so-much/... provides a clue to that.

I largely agree with you on OCaml, which was one of my favorite languages for quite a while and still is somewhere near the top. Last I checked, there were 3 (?) standard libraries, counting Core and the OCaml release.


"I'd prefer OCaml to Haskell, I find it more practical, but I feel much better depending on Haskell-based software, which generally "just works.""

I take it you haven't been bitten by cabal? It's a nightmare every time I touch it.

On the other hand, I've been using stack (https://docs.haskellstack.org/en/stable/README/) with hakyll, which seems to improve the situation dramatically. It's just that first build for a new project that's very slow.

`cabal new-build` basically fixes the problems you have had with Cabal, and the development experience with Haskell under Nix(OS) is pretty great too.

I think a first build with cabal is also very slow.

Haskell could solve that with binary distributions - but that requires more resources than are devoted to it, it seems.

I've been working on a side project in OCaml and I can totally relate. I'm an OCaml old timer and I find amazing the amount of development that has happened recently. There's a lot of ongoing development in the libs and the surrounding tools (more so than in the language). I've spent a lot of time just to set up my environment and I had to pin several packages to their development version to make things work (jbuilder, merlin, ppx...). Moreover, a lot of these tools lack proper documentation, and it's difficult to get answers to your questions since it's a very small community.

I'm also an OCaml old timer and I think I can relate too. I believe the recent tooling changes are going in the right direction and will eventually fix several of these problems, for example:

There's a push to remove "optional dependencies" which are the reason why opam dependencies rebuild again and again: http://rgrinberg.com/posts/optional-dependencies-considered-... For example in the Mirage project we've been working on this https://discuss.ocaml.org/t/ann-major-releases-of-cohttp-con... but it has caused some breakage here and there.

jbuilder (from Jane Street) is excellent: expressive, easy to understand, builds packages extremely quickly, is actively developed, has minimal dependencies and a lovely manual http://jbuilder.readthedocs.io/en/latest/ It takes care of generating boilerplate for other tools like merlin (which to be honest I never got around to manually configuring). There's also work to integrate it with utop https://github.com/janestreet/jbuilder/issues/114

jbuilder also supports building multiple libraries in one big source tree, so we could switch to a package lockfile model: the author uses opam to create a solution to the package constraints and checks in the specific versions known to work, the build clones the dependency sources and jbuilder builds it all simultaneously. I'm keen to try this on one of my larger projects so that "git clone; make" just works, irrespective of where the host OCaml comes from.

PPX syntax extensions depend on specific compiler versions, so when (for example) homebrew updates to OCaml 4.05 you might find that extensions you need have not been ported yet. ocaml-migrate-parsetree aims to fix this problem http://ocamllabs.io/projects/2017/02/15/ocaml-migrate-parset...

There's obviously still plenty of work to do, but I think things are improving!

Yes, I agree that things are going in the right direction.

But a lot of things I'm trying to use aren't quite there yet. You mention jbuilder and the generated .merlin... it just doesn't work with current opam packages (at least not with ppx - I'm not doing anything fancy, just basic project with core). To fix this, I had to learn about opam pinning features, and then I had to carefully choose the jbuilder commit that works (the one after the bug has been fix, but before it breaks the compilation of bin_prot that I also use) and so on... As for Merlin, it's great, but the documentation is minimal. When it doesn't work as expected, it's hard to find where to look.

Core and Async are great, but besides 'Real World OCaml' that is a bit outdated and insufficient, there is little documentation and not much help on Stack Overflow.

That being said, I'm not complaining and I'm grateful to the guys developing these tools. But I think newcomers should expect some difficulties if they want to do anything serious using these most recent tools.

> It's like Rust, but without the "young language" excuse.

Howso? Stable Rust has been nothing but 100% solid for the past 1+ year I've worked with it.

Tools depend on Rust nightly. My points are more about the respective ecosystems than the core languages. For instance, getting up and running with RLS is a real hassle (in my experience, but I'm sure it's improving).

RLS is a pretty new thing and is very much a nice to have but not in any way required. I haven't used it until about a month ago. Racer worked just fine on Stable.

I dunno, maybe you had a bad experience but considering how well Cargo and the stable compiler work I don't thing it's fair to categorize Rust as never working correctly.

There's a path to RLS, and eventually Clippy, both working on stable, and to be distributed via rustup. The work isn't quite done yet. I'm guessing ~3 releases for RLS, counting the release next week.

The opam is pretty new. When I looked at OCaml a few years back my phd work, it didn't exist, so there's some churn in the tool chain

I think your overstating the pain though. Even in other languages, best practice demands you pin a library version so you wouldn't have some of those issues.

OPAM was released in 2013 and I recall it being mainstream by 2014. Maybe I am overstating the pain, but I've jumped through hoops for plenty of languages, and OCaml stands alone here. As somebody who's pretty good on the command line and at solving installation issues, the thought of a junior developer trying to get started with OCaml gives me shivers (probably another impetus for Facebook's Bucklescript).

> Facebook's Bucklescript

I think you mean FB's ReasonML. Buckle is Bloomberg's OCaml-to-JS compiler.

OCaml is great in itself, but in practical terms, it is only great for limited dependency projects (this however doesn't mean limited complexity--a Perl 6 implementation called Pugs was once written in OCaml. FFTW's code generator is also written in OCaml).

In many projects where one has to "stand on the shoulders of giants", OCaml has a limited ecosystem to draw from.

A practical compromise is F#, which is an ML language for the .NET platform. And even with Microsoft's heft behind it and a rich .NET ecosystem, F# is still a niche language used in finance and a few other specific domains.

A functional language "taking over the world" is a tough proposition. That usually requires a strong use-case (like scientific computing--and now data science--for Python) and fungibility of expertise, which in turn calls for a language with a very low barrier to entry.

Minor correction: `pugs` was written in Haskell, not OCaml.


You are right. Thanks for the correction.

You are right about the difficulty of a functional _language_ taking over the world anytime soon. But as a consolation prize we are seeing functional features making it into more and more mainstream languages.

Even people using eg C++ no longer look at you too funny if you explain that you prefer all your variables to be 'const'.

F# is even less used than OCaml and Microsoft lost interest in it.

If you are into functional programming, you should rather choose Scala. Huge ecosystem, largely used in production. Compatibility with java mean you can have all the library you want. A lot of people love to trash sbt it's build system, but for me it always work as expected. And other tooling are just awesome (you even have IDE).

According to members of the .net team, one should embrace f#: https://blogs.msdn.microsoft.com/dotnet/2017/05/31/why-you-s...

> Microsoft lost interest in it.

Do you have any sources for this?

F# clearly lags behind C# in editor attention and integration into the latest and greatest GUI techs from Microsoft...

For many on the MS treadmill this is seen as a negative, as you can't always get visual studio RCs and be months ahead of the competition on the next version of blend/WPF/silverlight/whatever.

For others, the community driven aspect and lack of obsessions with technical fashion in the MS product line (with the track record they have...), is seen as a _major_ feature.

F# has a much more active open source community and presence than C#, most of the devs are actively cross-platform in a way C# still struggles with, and it has a better .net stack than C# aimed at command line efficiency and relatively stable approaches to common problems.

F# is still a first class language from MS, and would be quite OK on its own these days. Regardless, there is no chance MS will "lose interest" in F# given the impact on data science, machine learning, cloud computing, parallel computing, and stream-processing F# has. MS invested heavily in filling out their language portfolio to avoid being excluded from the big servers and big clusters. Time has shown those concerns to be growing, not shrinking :)

Have you seen https://github.com/BuckleScript/bucklescript ?

It's outputted js is fantastic.

OCaml even does relatively well compared to Haskell: OCaml is a much simpler language with more predictable resource usage (space and time), but still gives a sizable fraction of Haskell's power.

If you like Reason, you might also be interested in BuckleScript.

This is exactly my experience. In fact, I moved from O'Caml to Haskell, but I still miss some things from O'Caml. (The most important of which is a good module language... but I think the Haskell world is getting that, eventually, via Backpack.)

On a micro-level, or-Patterns are also pretty nifty. There's nothing technically keeping Haskell from getting them, it's just a problem of how to fit them into Haskell's already crowded syntax.

> if it had better tooling and better marketing

It would known as F# :)

Project tooling and standard library are the two things keeping me from using it. If they built Cargo for OCaml and improved the standard library, I think people would flock to it. Although I never seem to quite get used to the syntax...

Maybe you'll have better luck with ReasonML syntax?

Yeah, I came to OCaml by way of reason, and I preferred reason's syntax, but the tooling and documentation were still very unpolished. The syntax niceness didn't do enough to justify sticking with it. I really like Rust's syntax and tooling (best of any functional/ML IMHO), I just don't want to spend all my time thinking through moves and borrows and lifetimes.

Tooling is getting better! Reason/BuckleScript recently shipped Elm/Rust-like errors: https://reasonml.github.io/community/blog/#way-way-waaaay-ni...

Docs are more of a WIP right now, but I'm confident they'll get a handle on them: https://reasonml.github.io/api/index.html

I'm happy to hear it. My grievances were with project tooling (and documentation therefore). It was hard to figure out how to compile a multifile project, add a dependency, etc. Some of the problem was that there was a lack of documentation about how to use the OCaml tools to build reason projects, and the other part of the problem is that OCaml project management leaves a lot to be desired.

I'm very happy to see the standard library documentation is available for Reason, that alone tells me the Reason community cares about improving the experience for new developers (which is not the impression I get from the OCaml community).

Not the parent, but ReasonML's syntax is actually not great. They fixed a bunch of pain points with OCaml but then added their own idiosyncrasies to make it look more like JS

They actually thought the JavaScript-like syntax through very carefully, trying to balance several factors (like JS appeal, future-proofing, simplifying, etc.). See https://www.reddit.com/r/reasonml/comments/6v2olv/new_syntax... for a quite informative summary of the changes.

  I feel like OCaml is one of the programming world's best kept secret
A little bit exagerated... In France we learn this language in a lot of post secondary universities / engineering schools

Do many French businesses use it?

Not sure :/ : startups seem to follow the SV technology hypes just like everyone else and well established business like Airbus, Thales etc. use languages like Ada or Java

Threading is the one thing that's missing for me, in OCaml; they may have added it, but it wasn't there last I checked.

You have light threads (Lwt or Async) that are sufficient for most purpose.

SML/CML might interest you.

http://cml.cs.uchicago.edu/ -- the CML idea

Three SML implementations with CML support




Poly/ML is the only "mainstream" implementation (and the only one on your list) with OS thread support. The rest are green threads. Various research offshoots from MLton have existed with full support for OS thread but none are still around and they haven't made it upstream. For the most part, the Standard ML native thread situation is not much better than OCaml's.

You're forgetting SML# which uses MassiveThreads library and despite the name isn't for .Net.




It was conceived at Tohoku University for use in Japan's high performance computing projects. SML# uses LLVM as a backend, has a nice FFI and EDSLs for SQL and JSON and a non-moving GC. All stemming from the use case it's designed for.

Agreed. I'm always looking for excuses to use it or F# at work but rarely come up.

This implementation is freely licenced enough that the BSDs and others that don't use Git for license reasons could use it.

It would be interesting to see an alternative implementation like this get enough feature parity that projects like the BSDs could migrate and still be protocol and worktree compatible so most users could continue using the GPL2 implementation.

Edit: Seems people here are unaware that the BSDs treat GPL software like cooties, see e.g. [1]. They wouldn't start using source control that gave them less freedom than SVN, an with CVS there's the expectation that OpenCVS might get finished.

1. https://wiki.freebsd.org/GPLinBase

"with CVS there's the expectation that OpenCVS might get finished."

Good lord. Surely, no one is working on reimplementing CVS in 2017?

The CVS repository (https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/cvs/) seems to actually have quite a bit of recent activity, so it seems to be a serious project that someone intends to deploy in the real world. There are a handful of things that the OpenBSD project does that I feel are quixotic (continuing to use CVS for everything, in general, is one of them), but this one takes the cake. CVS was good when compared to what came before (and in some regards even compared to Subversion), but there's no comparison with git (or any of the modern DVCS, really).

Unfortunately OCaml itself has its own weird licensing issues being distributed under a combination of the QPL and GNU licenses.

Every few years I take a look at it, notice the license and move on. There's no reason whatsoever for a language not to have an implementation under a permissive license.

The QPL is not used anymore. The OCaml compiler is GPL, the OCaml standard library is LGPL with the OCaml linking exception and the license of produced binaries is whatever the author wants.

If you still have issues with that .. I would really like an explanation.

People just want reasons to complain about languages, see also: racket, when they have nothing to complain about they pick the license.

No, there's no reason to avoid the GPL for languages, because you aren't building a derivative work of the language itself, so it's licensing has no bearing on your software.

That's not entirely true; GCC (and almost every other GPL-licensed compiler) has a specific exception for its runtime libraries that lets you distribute their compiled version under any license [1].

But for example, Ada Core's GNAT is licensed under GPL but doesn't have that exception, which makes the binaries it outputs be licensed under the GPL as well, to make you buy the Pro version of the compiler.


There isn't one single reason for avoiding use of GPL software that is true for all BSD variants.

Requiring OCaml in the base operating system could equally be a reason to avoid this, whatever licence is used.

> Requiring OCaml in the base operating system

I'm not sure I understand this concern in the majority use cases. Why would you need an OCaml compiler installed?

The base system is usually capable of compiling itself. If the source for your base system is stored on a remote git repository, and your implementation of git in your base system is written in OCaml, you're going to have to use an OCaml compiler to recompile it.

But you don't need source control to compile the base. You only need it to retrieve the base sources - and even then only when you want to get the latest, since the installer comes with a tarball of /usr/src.

Hmmm... in what way does Git being GPL/LGPL cause licence problems for the BSDs ?

I guess you'd want to be able to include it in your distribution. FreeBSD for example includes svn (or svnlite or whatever) so you can fetch/update the source and do a buildkernel or buildworld without having to install any ports.

But you can include Git in the distribution, it's a self-contained program, it's license won't affect anything else it ships alongside with.

    > The FreeBSD Project aims to produce a complete, 
    > BSD-licensed operating system allowing consumers of the
    > system to produce derivative products without constraint
    > or further license obligations.

Also https://www.openbsd.org/goals.html and https://www.netbsd.org/about/redistribution.html#why-berkele...

Ok, so it's license philosophy rather than any actual license legality problematics, I can understand that.

If someone wanted to produce a FreeBSD derivative would they really need to be able to ship a modified version of git?

Suffice to say, FreeBSD's current stance on licensing is that new GPL2 software in base is unacceptable, and the goal is to get to no GPL2 software in base.

Probably not, but then it wouldn't be "Free".

It does affect the users' abilities to hack on the software. Since BSDs are used a lot in commercial environments this is a real concern for BSD-hackers.

What's the problem with using GPL software if you don't want to change it?

AFAIK, software for the BSDs is usually maintained in-tree, and patched to integrate with the rest of the distribution. I don't think including GPL-licensed software would be compatible with the BSD license of the overall project. That doesn't mean you can't use anything GPL, just that it can't become part of the system.

All the BSDs have traditionally included GPL'd software in the base system, so I don't think many people share this view (it's a possible interpretation of the GPL, but not a common one). As one major example: every BSD until recently included gcc in-tree, and OpenBSD and NetBSD still do.

FreeBSD sees this as a necessary evil and aims to move to zero GPL code in base, eventually (when ports GCC works well enough for 2nd tier architectures and Clang/ias / elftoolchain works well enough for 1st tier architectures).

Doesn't FreeBSD use Clang exclusively in the base system these days, and gcc from ports?

On x86 and arm64, yeah. MIPS and Sparc still build with base GCC (4.2, last GPL2 version). I'm not sure what arm32 or ppc uses by default these days.

Who says they don't want to change it? Integration with OpenBSD pledge or FreeBSD capsicum could be useful.

You can't use GPL code in non GPL software.

I thought you can't link to GPL Software in non-GPL software?

Shipping an unmodified copy of git and calling it should be fine even in proprietary code, no?

This is an open question. The GPL itself does not define what constitutes a derivative work. The FSF does, however, promote the view that linking does create a derivative work, while simply calling an external binary in usual cases does not. But that can't function as a hard rule. Because I could just make a wrapper binary that handles all of the scenarios I need a GPL'd library for, and then call my wrapper executable from my proprietary code.

> Because I could just make a wrapper binary that handles all of the scenarios I need a GPL'd library for, and then call my wrapper executable from my proprietary code.

Yes, you can literally do this. It's not a derived work because your code is entirely independent of the GPL'd code and the GPL can't possibly cover code that you wrote independently. Simply interacting with some external GPL'd program does not expose your code to any GPL requirements.

You can't just work around the GPL by placing a command-line interface as a shim between your code and the function call interface of the GPL'd library. I'm talking about a contrived binary that just calls library functions, forwarding command line parameters to the arguments of that function as necessary.

Oh yes you can. The GPL gets its power from copyright - the rights that the authors of the original software have over their work. It does not and cannot place any restrictions on any software that you write, unless you literally incorporate the GPL'd source or binary into your software, which a CLI interface does not. The same would be true of an RPC interface.

For example, many databases are GPL'd and provide their entire functionality over RPC. Many CLI programs are GPL'd and (of course) provide their entire functionality over CLI. Neither of these cases place GPL obligations on clients, and it's no different for any given library which you wrap with a CLI.

The FSF disagrees with you. https://www.gnu.org/licenses/gpl-faq.html#GPLInProprietarySy...

A contrived wrapper that is not an independently useful program does not allow you to circumvent the GPL.

You are leaning very heavily on a technical distinction (linking vs. not linking) instead of on a holistic analysis of what a "derived work" is.

FSF is neither authoritative nor unbiased source for answer to this question. I'd like to see few court cases that have gone through the motions before thinking this subject to be settled.

Pretty sure the FSF can't magically extend what copyright means to comport with what they think it ought to mean in regard to free software.

Now I am not a lawyer, but I think by analogy the FSFs interpretation would also be needed to enforce the CreativeCommons NoDerivatives licenses. Imagine someone creates an interview-based documentary video series on a controversial scientific topic say global warming and releases it under a NoDerivatives license because they are afraid that a clever edit of their videos would present the topic incorrectly and they have made promises to the scientists about how they will and will not present them.

Now imagine that I rather than edit the video files themself create a playlist, in a (fictive?) format which has the ability to play subsequences of the linked files, and then creates a version of the series which presents the complete opposite message of the original. Would a court find this to be a derivative work of the original videos? What if in my need to edit the video I want to insert certain new short sequences, and thus I distribute a second video along with the playlist, but this video is useless in itself since it just contains a number of short clips in sequence. Would this video be considered a derivative work of the original - remember it serves no purpose in itself except along with the playlist and the original? To me it is not clear, but if they are not derivatives, then this will practically render the concept moot. Since most derivatives could potentially be formulated as the original + a diff. Now I would expect in this case that the combination of the playlist + my video file + original series be considered a derivative work, remember that in the arts transferring a work from one format to another (a dramatization for example) is considered to be a derivative work, even if all the actors would be seen on stage to read from the original book.

As I read it, this is the center of the FSFs argument. Now whether this would apply equally to software I do not know. But I have a hard time imagining that copyright is not as easily circumvented as you seem to indicate.

That's a great question. Would this be a derivative work? I suspect the answer is yes, because in your scenario you've effectively shipped a piece of software which creates a derivative work on-the-fly. I guess it's sort of like an analogy of dynamic linking.

The default in copyright is that you have no rights to a copyrightable work. The GPL grants you some right under certain conditions. If the conditions fail to be met, your granted rights are revoked. As the creators of the GPL license the FSFs widely published opinion is likely to matter to judges evaluating a case. But in the end noone knows outcomes until would come to pass. So seek to minimize legal risks.

Your comment is literally FUD. Copyright law gives the authors of the GPL'd work exactly no rights over your independent work. Distributing a GPL'd binary along side other works cannot have any impact on those other works. (Otherwise every Linux distribution would be in violation). The GPL is built on top of existing copyright laws which simply don't provide that kind of power. So there is no risk because there's no conceivable legal mechanism in copyright law which behaves like you claim.

I meant rights to the GPLed software/library in use. Not to independent works merely aggregated on the same medium.

Can you substantiate that claim?

What claim?

I think it's pretty obvious that the GPL derives it's power from copyright. Reasoning: If the GPL does not apply, then you don't have any right to anybody else's code (by the Berne Convention), so it gets its power by granting you more rights than you would have had by default. Make sense?

It might be easier to just write a permissively licensed 'gitlite' that can checkout sources (and maybe even update a clean tree). That is the potential solution I've heard discussed in BSD land.

If OP interests you, then this may too: https://pijul.org/

IIRC, when I last looked at this, they were doing dual implementations in OCaml and Scala, but now it looks like it's being done in Rust.

I'm unconvinced.

>Means no more downtime, no possibilities of censorship, be it from states or from companies.


>historically, patch-based systems have been very simple to learn and use, but slow, whereas snapshot-based systems can be extremely fast, but are usually hard to use for more than simple operations. As an example, cherry-picking is not intuitive in git/mercurial

How is cherry-picking not intuitive?

>Category theory has certainly been an inspiration for Pijul, but categories are neither algorithms nor data structures in themselves. In order to get the semantics we wanted, especially the handling of multiple files, rollbacks and unrecords, designing and implementing new algorithms and data structures was at least as useful as learning theoretical stuff.

Yet another thing with relatively little practically use, but hey, it uses category theory! It must be good! Oh look, it's also functional!

All of that said, props to Pierre-Étienne for putting in the time and effort to make this. He probably learned a lot.

This is fantastic. I have played around with Clojure and Haskell a bit, but really want to get into functional programming more. Not to hijack the thread, but I guess I am having trouble getting started. Any good resources for novices out there?

Edit: for the record I have Googled the topic and am overwhelmed by how much is out there. Looking for personal recommendations.

I highly recommend https://fsharpforfunandprofit.com/. F# is spiritually similar to OCaml, so there's a lot of crossover there. This is geared towards "enterprise" developers, and so tends to be more pragmatic than most other resources.

For what it's worth, I got a surprising amount of mileage from learning Idris[0], of all things. Even if you ignore the dependent type system, it is still essentially Haskell but with a lot of the warts removed, making it much simpler to grok.

Beyond that, the usual recommendations apply: Learn You a Haskell[1], and Real World OCaml[2] were what got me started.

[0] http://docs.idris-lang.org/en/latest/

[1] http://learnyouahaskell.com/chapters

[2] http://dev.realworldocaml.org/

The way I first learned to think functionally was reading and working through http://realmofracket.com/ It intros you to Racket, which is sort of a Scheme dialect (which is a Lisp dialect) and teaches functional programming.

Haskell, OCaml, etc take those functional idioms and apply strong typing, algebraic data types, pattern matching and something like the Option/Maybe type - but they build on the same functional concepts in Realm of Racket.

Caveats to the above:

1) Racket has an optional 'typed' variant which gives you type checking.

2) Rust is (IMHO) a little weaker with regards to pure functional programming than OCaml, but very nicely integrates pattern matching, type checking and algebraic data types in.

If you want to learn Haskell the most recommended book nowdays is Haskell Programming From First Principles[0].

[0]: http://haskellbook.com

The consensus on the Haskell subreddit was that the Haskell Wikibook (https://en.wikibooks.org/wiki/Haskell) has gotten even better than "Haskell Programming From First Principles".

In any case, that book ain't bad either. It's better than Learn You A Haskell.

I also like "Discrete Mathematics Using a Computer" (https://github.com/ryukinix/discrete-mathematics/blob/master...). It focuses on teaching discrete mathematics and introduces some Haskell as executable mathematical notation on the side.

The wikibook has gotten really good, and it's the reference I've started going to when I need a refresher on a concept, but HPfFP was the book that helped me actually "get" Haskell -- and I'd read pretty much every other beginner-to-intermediate resource by that point.

My experience has been that, generally speaking, programmers of language [X] don't have a particularly good grasp of what makes a good introduction to language [X], and I think that effect is even stronger with a difficult language like Haskell.

That said, I was about to read some of the new material in HPfFP, and I should also read the analogous wikibook content as an experiment. Thanks for linking it.

I do agree with your observation:

I remember a particularly vivid example of some of my coworkers at Google suggesting the Go tutorial, starting at https://tour.golang.org/welcome/1, to someone completely new to programming. They couldn't even understand how it's not perfectly clear and simple.

That being said the book that I've seen most success with getting someone from absolutely zero to "can start making progress on their own" is "How to Design Programs" (http://www.ccs.neu.edu/home/matthias/HtDP2e/index.html). It helps that the authors regularly teach absolute beginners.

Real World Haskell[0] is the best freely available resource imho.

If some concepts remain unclear, Learn You a Haskell[1] is a good complement, though it doesn't go as deep as Real World Haskell. For example there is no coverage of monad transformers, which is a pity.

Typeclassopedia[2] is a great overview of common category-theoretic abstractions used in Haskell programs.

[0] http://book.realworldhaskell.org/read/

[1] http://learnyouahaskell.com/chapters

[2] https://wiki.haskell.org/Typeclassopedia

RWH chapters 1-3 are great. It goes a bit weird from chapter 4. LYAH is better IMHO.

As an FP newbie, for Clojure, I used https://www.braveclojure.com/ to start with.

When I wanted to build something beyond a toy program, https://pragprog.com/book/vmclojeco/clojure-applied helped me immensely. It goes beyond "what you can do" and delves into "when you should do it". IMHO every language should have a book like this.

There’s a beginner’s guide to OCaml’s beginner’s guides: http://blog.nullspace.io/beginners-guide-to-ocaml-beginners-...

IMO, the best functional programming resource out there is this free online course by Prof. Dan Grossman at UW: https://www.coursera.org/learn/programming-languages

He focuses on teaching the syntax, semantics, and idiom of Standard ML, a classic functional programming language that's closely related to OCaml, Haskell, and others. He has a gift for explaining simply and clearly and keeps his lecture videos short (~10min) which helps to absorb their info. Check it out: https://www.coursera.org/learn/programming-languages

I was in two minds about attending FunctionalConf in Bangalore, India ... but after seeing this PR, I'm there.


What is the benefit of having an OCaml implementation of git, besides that it's written in OCaml?

Edit: I get it now: it's an OCaml library that allows you to interact with git repositories and provides many (most?) operations.

I thought there might be some comments on darcs in here. I've only just started learning OCaml so haven't yet used darcs, but am looking forward to trying it out.

Darcs is interesting but not same system as Git.

Yes, the first two lines of the front page: "Darcs is a free and open source, cross-platform version control system, like git, mercurial or svn but with a very different approach"

This is a +19k -8k line pull request. That is dumbest thing I have ever seen. For having written something that interfaces with git, this person obviously doesn't know how to use it effectively.

OP of this PR here. When Thomas Gazagnaire started ocaml-git, it's like a PoC for Irmin and a MirageOS system. He did not think about the memory consumption of an implementation of any Git operation for example.

In this case, he saw a limitation about the Garbage Collector of git (which can use a lot of your memory) and the push command. It's like just: "I want to understand what is Git (in the specification) and I will do". So, the big goal of my work (payed by the MirageOS team) was to switch to a non-blocking (memory predicted) implementation of Git.

I did not have any criticism about this approach. Sometimes you just want a project to work and ocaml-git was developed in this mind.

However, ocaml-git is used now by some big companies (like Docker, Tezos, etc.) and need a strong prediction about the memory consumption first. Then, the push command is a big lack of the last implementation. And, finally, the GC and encoding of the PACK file is the key to solve all of these problems.

As I said in this PR, I worked with Thomas Gazagnaire and some others peoples of the MirageOS eco-system to improve ocaml-git, implemented the Git GC, Git push and tried to avoid any problem about the memory consumption in a server context in the low-level API.

So, yeah, it's a +19k -8k PR, of course. But it's not like a OCaml's noob and tries to restart the world like: yeah, I recoded Git in OCaml in my only opinion and don't care about what was it happens in the OCaml world, Haskell world and industrial world, like just for fun.

In this PR, I explained IRL and in comments what happens. Why I did this choice compared to something else. The point of Thomas Gazagnaire (who reviewed my PR) and my point. What the MirageOS team expect and what I did.

So, clearly, yeah you did not read my comments and just see a big PR like a noob to try to strike all of this project but this is not what happened unfortunately for you. It's a result of a big discussion between Thomas Gazagnaire (specifically), others peoples and me to find the best for all (in the implementation and in the API).

Now, I can say this PR will be merge. I need to polish some details, improve the API and test it. So, yeah this big PR will be merge because it's what expect the creator of ocaml-git, what expect the MirageOS team, what expect my boss and what expect others users of this library if you just interest by the issues.

The pull request currently contains 81 commits. It is NOT a single monolithic commit.

I'm probably too harsh, but I can't think of a single developer who would welcome such a large PR with open arms.

But if you are a better opinion about the implementation of Git in OCaml, you can comment my PR. Then, we can talk technically about your point :) !

I'm not saying that your implementation of Git is bad. I didn't look at your code at all; it could be perfect, for all I know.

What I'm saying is you are not using git effectively. It is much easier to read and understand merge requests the smaller they are. Someone who didn't write the code should be able to go through your merge request in one sitting and understand all of its implications.

You may have written a good "implementation of Git," but you have also demonstrated that you don't know how to use it effectively.

> It is much easier to read and understand merge requests the smaller they are.

I'm always amused by people telling others what is more readable and understandable as if it was a hard fact and applied universally to all situations.

In reality, you can have a +19k diff which is easier to understand than a +100/-100 diff, and it happens quite often. Have you ever read a PR and clicked the "view" button to see the whole file? Have you ever clicked on the arrows to expand the context of a particular diff line to learn what the heck a given name is? Consider that you wouldn't have to do this if it was all in the diff in the first place.

The readability of the code is an elusive quality, with largely inconclusive research. There's very little in terms of facts to rely on. The best you can say is that a particular way of presenting the code, or changes, feels better to you. Good for you, but don't try to force that way on others, as you're guaranteed not to improve the readability and instead encounter a violent pushback.

Your personal preference for how a tool should be used is not the same as using the tool effectively. Pull requests aren't even an inherent part of the Git workflow, and not all projects prefer short-lived topic branches.

From the OP's other comment, it sounds like this merge strategy was the preference of the repository owner. Why substitute your own preferences for theirs?

ETA: Also note that the PR was reviewed several times over the summer. GitHub has a feature that allows you to only review changes to a PR since the last time you reviewed. You can also manually use `git diff` to review that difference.

What you are saying is bullshit, OP clearly explained the situation. You just had no idea about the constraint he was facing, but even after he gave you a detailed answer you still not bother.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact