Hacker News new | past | comments | ask | show | jobs | submit login
The Unison language (unisonweb.org)
262 points by ocfnash 37 days ago | hide | past | web | favorite | 141 comments



One thing that takes away from programming languages is not having code samples on the index. I think this lets someone know immediately if it's interesting to them or not. Show a sample of code with the console output or a screenshot of an UI from the code. I think Racket had it right ages back. They would show everything from a web server, to other things. This is what major programming languages (with some exceptions) basically do and it gives a developer more information than any amount of marketing speak.

That said, I had to dig in the documentation to find code, and it didn't seem as obvious cause the syntax is definitely different from what I'm used to and there's weak syntax highlighting.


TBH if you're creating a new language after 2014 you should be thinking about how to get good syntax highlighting and other tooling from the language definition itself, because you're competing with a ton of mature languages that had to do this manually through a ton of effort of their communities.


How do you get it from the language definition? By adding metadata to it?


It can be (kinda) done if your language has a parsing grammar.

It doesn't work in practice, since most languages eventually discard parser generators for custom parsing, and good tooling like syntax highlighters don't rely on the literal grammar but something close enough to the grammar to work and still be useful for programmers.

Classic example is you don't want your parser to be strictly defined and fail at any error. You want it to handle errors gracefully and detect almost correct syntax.

Having written some textmate grammars for syntax highlighting I think the post is ridiculous. I'd be terrified to generate that json from the parser, since a bit of thought and experimentation has to go into getting a good syntax highlighter that works with external tooling.

As well I think an LSP implementation is really important, but writing that in your language or generating it from the compiler is a monumental task for a young language and a terrible idea.



https://forum.nim-lang.org/t/2811

The Nim language in 2008 (when it was known as Nimrod) had originally been planned a similar approach of considering a unified AST with multiple "syntax skins," and as far as I understand there used to be a limited implementation of this. This would theoretically have been a boon for Nim, because an often complaint about Nim from emigrating programmers from languages with a C-style syntax was the (rather superficial) complaint - "no braces? Pythonic syntax? Yuck!"

https://forum.dlang.org/thread/mailman.1427.1428691340.3111....

The problem with this - and why Nim never really committed to the "syntax skin" concept - is that it would have led to far much fragmentation within Nim's own community as some users would prefer one "sub-language" over another. Given as it is the opinion of new languages (Nim included) towards a tool-based unified syntax formatting (e.g. certain naming conventions for identifiers, number of spaces for indent), "syntax skins" have become harder of a sell in favor of lingual consistency.

The original idea was that the same program could be rendered in a variety of different syntaxes in the author's IDE - but the implied maintenance cost of such skins plus that previously mentioned potential for fragmentation led to this idea falling to the wayside. Nim, as a language, grew to be quite flexible within its prime syntax over the years - e.g. the macro system (which modifies the AST directly) made way for a variety of DSLs (e.g. https://github.com/khchen/wNim#code-examples) that remain valid Nim code.


Interesting. Maybe they should add further representations of the code. This is not the only language where there are mutliple representations of the same "code".

Wolfram Mathematica (or "the Wolfram language") introduced "forms" for that, for instance https://reference.wolfram.com/language/ref/FullForm.html or https://reference.wolfram.com/language/ref/InputForm.html or, quite charming in the math context, https://reference.wolfram.com/language/ref/TraditionalForm.h... or https://reference.wolfram.com/language/ref/TeXForm.html -- See also https://reference.wolfram.com/language/tutorial/EverythingIs...


Also Bucklescript and ReasonML respectively provide OCaml and javascript-like syntax over the same AST and can translate your code between the two (as seen in the try reasonML UI).


That's not exactly how it works: Bucklescript is a JS backend for OCaml, and ReasonML is an alternative syntax for OCaml. Bucklescript is not involved in this translation, it generates the JS output on the top right pane of that UI.


Thanks for this, I totally missed this nugget. If it's on the front-page it should be made more obvious. Whether it's code or some sort of video, every language should showcase something that tells you what development is like. Code you can interactively compile from a front page is always a plus too.


> One thing that takes away from programming languages is not having code samples on the index. I think this lets someone know immediately if it's interesting to them or not.

Because when I see Chinese characters I know immediately if it's interesting to learn Mandarin or not.


Small code samples tell me next to nothing. How could they. I want a clear explanation of the foundational aspects.


I don't want small code samples. I want well defined problems that this particular language is well suited for, as demonstrated by a dozen (or a few dozen) line program which uses the core features of the language to elegantly solve that problem. That gives me a feel for what the language is, and if interested, I'll learn about the foundations.


How can you choose a language without knowing the syntax? It's not arbitrary. Some languages have elegance or gotchas in how the syntax is laid out


Syntax is easy to show, but ultimately rather unimportant. If the big idea of a language is something substantial, starting by showing the syntax arguably just creates a distraction.


Syntax stops me from even considering some languages, because to a lot of us it is important. I spend most of working hours reading code; as a result how easy it is to read is important not just for my productivity, but for my overall happiness in life.

A small sample on the front page will not tell me if a language is good enough, but it can give me an idea if it's worth spending more time reading up on it or not.


What makes a program easy to read is not its syntax but its semantic, that dictates how much context one has to keep in mind to be able to understand a small fraction of code.

For instance, C and Go are easy because they do not hide anything and what one sees is pretty much unaffected by what one does not see. On the other hand, because those languages do not permit to build efficient abstractions they are also harder to read because each given sample, although unambiguous, perform very little.

Other languages like C++ (despite a syntax that is close to that of C) or Haskell, allow the behavior of the simplest operators, including the application and the sequencing operators, to behave in unfamiliar way or hide subtle but meaningful details, which makes understanding any small bit of code like a gamble (one has to assume some behavior for operators, constructors...). On the other hand, they allow to build abstractions that make programs more terse and therefore easier to read.

In this trade of, none of the alternatives depends significantly on the syntax.

I think a lot of people have associated obscure syntax with bad programming experience because of perl, the language which syntax became synonymous for bad. Indeed, this is a case where the syntax can be teacherous, but still the language is made more problematic by its semantic, that tries hard to give a meaning to any use of any variable in any context, this delaying runtime errors even more than other runtime typed languages and even masking them entirely (then performing something else than intended).

I think this mental association must be done away with.


The problem is that readability is largely a matter of naming conventions, organization, and adherence to known patterns. The language becomes more readable as you use it. Anything that isn't a C derivative is unreadable if you only know C.

There are of course actual blockers to readability, like usage of similar-looking sigils and poor system library naming, but otherwise the syntax really doesn't tell you any useful information. It's the semantics that tell you how it'll all fall apart


I couldn't disagree more. Syntax changes whether I can look at a page of code and get an idea of structure and which elements to focus attention on, or whether I need to read everything in detail.

It doesn't matter that much when I have found a piece of code that I need to read and understand every detail of, but that is a tiny proportion of the amount of code reading I do - most of the time reading code is about grasping overall structure to learn a codebase well enough to find the points you need to focus on and read.

A clean syntax makes the difference between always having to read each token vs. being able to speed-read by pattern matching high level structure when you don't need the details of that specific piece of code.

I've worked with dozens of languages, with wildly differing syntax, and I'll say that once you reach a certain base capability-level, syntax matters more to me than semantics.

You can paper over a lot of semantic difference.

I can (and have, and so have many others) build libraries to do object-oriented programming in C, for example, and I have implemented closures in C too [1]. So the reason I don't do much C any more is not semantics, but that there is no mechanism for plugging those semantic gaps that creates readable code.

There are languages where the semantics paints you into a corner and makes plugging the holes too hard, so it's not that semantics doesn't matter at all. But it's usually easier to paper over semantic gaps than unreadable syntax.

[1] http://hokstad.com/how-to-implement-closures


Syntax is something you become used to in a few hours. But it's not what's the most important and will make your time more efficient in the end.


2000 years ago, they didn't have spaces or punctuation in writing. We could try dropping them again, and people would adjust after a few hours, but that doesn't mean that the spaces and punctuation don't provide value.


Maybe you do. I don't. Syntax is something that has a substantial impact on how I think and work. Presentation overall matters immensely.


> Syntax is something you become used to in a few hours.

This is completely subjective and not at all universal.


Arguably, the syntax is the simplest thing one can change in a language, by creating a wrapper (like there are, as evoked in this thread, such wrappers around OCaml for instance.)

Francky, syntax has little to no incidence about how well a given language fits a domain problem, and in my experience the only thing syntax is really important to, is how attractive the language is to programmers who judge languages by their cover.

I remember having read somewhere that the initial versions of lisp where missing an actual syntax, and that it was planned to add one later, but that until then users would have to write directly the AST in an ugly syntax full of parentheses and devoid of familiar syntactic landmarks. Well, time passed, an nobody cared enough to add that user friendlier syntax :)


How can you tell the elegance or gotchas from a 6 line Hello World?


It's how I picked my favorite Python web framework, I compared all the hello world samples they all had, and wound up with CherryPy. The rest had too many decorators and it just looked silly to me.

You can also figure out if it's C-like or Lisp-like, and so on from looking at syntax.


From what you describe, sounds to me you picked your framework mostly at random, like one would pick a car for its color.


you can infer a lot; is it LISPy, is it C-ish, or one of the Haskell family, etc. I think you can tell a lot from just which family of syntax.


No, take Reason as an example: it is basically Ocaml. Despite having a C-like syntax, it is quite different from it. The syntax barrier is usually much smaller than the language itself.


And yet... someone trying to “sell” me a language while seemingly refusing to show me any code samples feels like someone trying to sell me a car based entirely on a spec sheet with no pictures and without actually seeing, touching or experiencing the car. They both provoke a reaction of suspicion - “what are they trying to hide?” As if they don’t trust me, the customer, to be able to understand the very thing they want/expect me to buy.

I’m perfectly fine with a page that says, “here’s a code sample - notice that it looks similar to C, but here’s a list of reasons / an explanation of why/how it’s actually quite different from C.”

One of my favorite programming books of all time is K&R, from back before ANSI, even. It said, “here’s a simple problem to solve, and here’s how you solve it in our language.” Repeatedly. Explaining how the features helped to solve the problem at hand. It was revelatory. And it assumed its audience to be knowledgeable programmers. It didn’t spend the first chapter just describing all the features or how great the language is without showing any code.


For many programming languages the line count of "hello, world!" program is exactly one.

So you can deduce something just from a line count of simple program alone - for one such deduction is an answer to a question "are there at least some attempts to provide useful defaults?"


I agree that a Hello World is fairly useless example for anyone already familiar with programming languages. Perhaps something like a Fibonacci function should be the de facto short syntax example, as is common in functional languages. More useful though would be to have an example of how composition data structures work in the language - standard example could be a binary tree.


Little code snippets are nice to get a quick feel of the language. A good "Get Started" page is also a must have. I love languages or frameworks that have a step by step clear instructions on how to get started.

It's almost like taking test drives when you go car shopping. You don't want to open the hood and check what's inside right away you just want to be able to drive the car.


I want both.


That's the ideal amount of information.

A reasonable example of a good page that is informative and provides lots of examples (enough at least!) is the one for D:

https://dlang.org/


I also like the "Ecosystem" part of Julia: https://julialang.org/ Sadly, for some reason code samples are lacking.


Julia is a language I love that I don't see myself using sadly. It does some amazing things that leave me shocked, like it can show you the assembly you generate while using a REPL. That's impressive to me. However, I'm not the target developer for Julia.

Edit:

Course after clicking your link, I am wondering how wrong I am since it is still technically general purpose capable. Wow Julia has grown so much!


Out of curiosity, what sort of development do you do? Julia is a very flexible language. The majority of its community is still focused on numerical scientific computing, but it's great for all sorts of usecases these days.

The way I think of it, in order to build a language that was flexible enough to cover all these different weird needs you find in the scientific computing community, they ended up needing to build a general purpose language.


have a look if you want some example and what dev process looks like https://dev.to/petets/palindromes-with-unison-5h9o


Reading the tour[1], I got the impression of being struck by something not that surprising, but still fundamentally groundbreaking.

Beyond just the idea of a global program store (which I hope has space for some virtualenv equivalent), the cleverness of effect handlers as a first-class language feature is very exciting.

Once the type system is more mature, this could easily be the next kind of Haskell - a language which redefines modern ideas of good programming.

1: https://www.unisonweb.org/docs/tour


Unison seems to combine ideas from Forth, Haskell, and Clojure's Datomic and stir them together into an interesting mix:

Forth ideas: a global dictionary with pre-existing and user-definable words

Haskell ideas: functional programming and type signatures of functions / words

Clojure Datomic ideas: immutable store of object definitions that append and use references to point to the latest definitions.

I am curious how I/O and error handling are done in Unison.


If you look at the other reply to my comment, someone has linked the Unison docs on their effect handlers implementation - which is what is used for IO, and with any luck, error handling as well. Haskell has several packages for effect handlers, and they're taking off in programming and category theory circles as alternatives to monads for IO.

The short summary is that you can write code with side-effects that you don't define the implementation for, and then let call sites handle those effects or defer them.


Since it took me a while to find it, the relevant documentation is under "Abilities and Ability Handlers" [1]

[1] https://www.unisonweb.org/docs/language-reference#abilities-...


Kudos to them for not using the buzzword, but if you believe in the notion that 'blockchain is just a fancy buzzword that gives developers the ability to get managers to agree to larger budgets to clean up tech debt', this is.. blockchain programming.

One problem I did find: Somewhat deeper into the tour, the tour makes the claim:

> a Unison codebase can be versioned and synchronized with Git or any similar tool and will never generate a conflict in those tools.

This is pratically speaking of course not true: If you have some unison codebase, whilst under the hood it's all hashes, that doesn't change the simple fact that if 2 people both check out the exact same code, and then both people make a change to how the software operates, where that change is conflicting (say, I make the green 'OK' button now red, and you make it blue), you.. have a conflict. That is a fundamental property of this interaction; no language bells and whistles can ever change this fact.

So, what does happen in unison? Is it 'last committer' wins? That'd be a bit problematic. A conflict in your version control system, _IF_ it is representative of the fact that there was an actual conflict (as above with the button and the conflicting colours), is a good thing, not a bad thing.


" I make the green 'OK' button now red, and you make it blue"

I did not fully read the introduction yet, but in my mind, in a truly content-addressed system this is not a conflict: you have the hash of a main() function which ultimately makes the button red, and the other guy has a main() function that makes the button blue. No conflict in the physical sense. Yes, philosophycally there is a conflict, which is resolved by you deciding which main() function you find more useful.


Of course there are still conflicts everywhere.

Developer A makes the button red, developer B changes the label from "OK" to "Accept", both inside the same function.

You can't just pick one or the other, you have to combine the changes.

Or two developers add two different fields to a type. Or ...

Sadly the docs only have placeholders for "Concurrent work and resolving edit conflicts" and "Pull requests and collaboration". Would be very interesting to read the developers thoughts on this.

The system seems to support patches, forking and merging.


Hmmm, interesting. It is getting a bit philosophical: you can view it as a conflict. But you can also view it as: hm we have these nice 2 main functions. Let's create a merged one. Kind of what was previously conflict resolution becomes part of regular programming... A slight perspective change.


Mostly they are just not relying on Git for conflict detection and resolution, but bringing it up into the Unison system itself, in order to deal with it more gracefully and intelligently.


There's nothing philosophical about it! There will exist a conflict as far as Git (or any other) version control system is concerned. That's all he saying!


My small understanding is that the "(or any other)" part is not correct in your statement, and that you are locked into thinking of version control as being git-like.

I think the implication is that version control is not git like, and that it's not that we both changed the text on line 17, rather we both made additions to the underlying structure.

Indeed, it's impossible for us to both edit the same file, because files are never modified. They are immutable, like a blockchain transaction.

I haven't quite understood how it works in practice, though, but definitely don't think "git".


But git works the same way. All files are immutable. All directory trees are immutable. Basically all version control works that way at some level.

Storing data this way doesn't solve the problem of merging the two changes into a new single change. When you can't do that automatically, it's called a conflict.

The only mutable data in git is the list of hashes you've given names to. You can use git without branch names if you want, living in a world of pure immutable hashes. It doesn't do anything to help you get rid of merge conflicts.


That seems to be incorrect. The linked page seems to suggest that both changes will exist without any conflict at the VCS level at all.


That's like saying Git has no conflicts because a file can have different contents in different commits.

A conflict can only arise when trying to unify (aka merge) two different states. The difference is the representation, with Git (naively explained) mapping file paths to text blobs , and Unison mapping identifiers to AST nodes.


Then how do you know which main function to use? That information has to be stored somewhere. Or does the user provide the hash for the main function they wish to run?


The files holding the ASTs are append-only. So you get:

PR #1: appended stuff

PR #2: appended stuff

So there is no merge conflict in the git sense.


It seems to me that if we both edit the same function then we end up with three functions in store (original + two modified). Now obviously I want the main function to call my new version function and you want the main function to call your version. So in the end there is a conflict somewhere - even in the git sense.

If that conflict always shows up in one single line saying what hash of the main function is the one to use - then it doesn’t seem like a huge improvement in terms of conflict handling. Conflicts bubble to the root hash. I assume they thought about this and have some sort of tooling for “traversing” conflicts from the root, choosing my function, your function, or allowing a merged function to be created.


> The files holding the ASTs are append-only. So you get: PR #1: appended stuff PR #2: appended stuff

But that description also applies to git. Which could not avoid having to deal with conflicts.


I disagree: in #1 and #2, the mapping between the key and the function name are different, so there is a conflict.


This is the right idea. Conceptually there are conflicts which get resolved in the Unison tooling, but you will never get Git conflicts.


If it works precisely as you say, then most likely the last committer wins, and that's bad.

The most hopeful case is that there is ONE place that isn't 'strictly append only', and that's, for lack of a better term, the 'main pointer'. Which hash is the actual state of the app as it should be compiled and deployed right now?

Then THAT conflict, together with tooling to diff 2 different hashes, should swiftly lead to figuring it out.

But then you're still kinda hosed; how do you merge 2 nodes? It sounds like you can't; you can only pick one. With git, you can do a text merge.

I get the beauty of AST based storage and reasoning, but, hey, trying to use git to merge 2 versions of a word doc together is also a disaster, so it sounds like it'd be the same here.

In that sense, unison is worse than your average (text based) language, not better.

That's not actually a downside though; it's different. If I demerited the language for this, that's akin to complaining about a delicious roasted bell pepper because it doesn't taste anything like an onion.

But I do find it a spot irksome that the otherwise well written tour is implying something here that doesn't appear to be true. Perhaps I'm merely misinterpreting it and reading more into that sentence than is implied.


> The most hopeful case is that there is ONE place that isn't 'strictly append only', and that's, for lack of a better term, the 'main pointer'. Which hash is the actual state of the app as it should be compiled and deployed right now?

I think that's how it works. The way I understand it from reading the tour, there is a separate store for the code and for the name mappings. Both main functions would merge without conflict into the code store, but the mappings would conflict.


This [1] seems to be the "head pointer", and I assume it would conflict when merging. In fact, would it not conflict on every merge even if unrelated code was changed, since it hashes on the value of the entire codebase?

Edit: After thinking some more, it wouldn't conflict on a file-level, since the file itself has a different name. But there would be now two files in the _head folder, and I assume the `ucm` tool would detect that and present the user with merging options?

[1]: https://git.io/Jvfen


Looks pretty practical to me? Here is a described situation:

Developer A made a change which adds feature X and makes a button blue.

Developer B made a change which adds feature Y and makes a button green.

Now the system does not allow both feature X and feature Y at the same time. Our options are either drop X, drop Y, or expend more effort/programming time to make a version which includes both X and Y.

The description above applies equally well to Unison code CAS system and regular program in git. Sure, one is at the file level, and other at the function level - but you still need same kind of tooling. You want to look at the changes and somehow produce a merged version.


> Yes, philosophically there is a conflict, which is resolved by you deciding which main() function you find more useful.

it's not just philosophical though. In git terms, when merging the two sets of changes, someone has to choose which of these functions goes onto the master branch. git calls this a "merge conflict".


No changes are ever made to an existing file, so there are never any merge conflicts. All changes result in a new object/file, named based on a content hash.

So in git terms, there are never any updates, only additions, and therefore never any merge conflicts.


Git doesn't store updates or additions. It only stores new objects/files, named based on a content hash.

When the git interface shows you 'updates', it's just looking at the contents of different commits and guessing how the files are related. It's not part of the git data model. You could apply the exact same processing to Unison commits.

So the systems work the same way. They make a new object/file based on parent objects. And when it can't create that new object/file automatically, that's called a "conflict".


Ok, if you want to be pedantic, lets be pedantic:

> No changes are ever made to an existing file,

I did not say or imply that there were. it's a new version, of course.

> so there are never any merge conflicts

Does not follow. There is only one HEAD (latest version) of foo.txt on the master branch. In a merge, it has two (or more) parent objects (1).

> All changes result in a new object/file ... therefore never any merge conflicts.

And yet somehow git calls the process of deciding what's in this new (version of the) file with two parents "resolving a merge conflict". Because the two parent changes can't be automatically reconciled. They are in conflict.

1) https://softwareengineering.stackexchange.com/questions/3142...


David Bowie.


> It is append-only: once a file in the .unison directory is created, it is never modified or deleted, and files are always named uniquely and deterministically based on their content.

That's how conflicts are avoided.


So I append X to my file and you append Y to your file, do we not have a conflict?


Yes and no. Yes as in, if you'd like them to be merged into a new one, then you'll have to resolve a conflict. But no as in, you both appended different data, so you both have new files from the original one, as you're dealing with immutable data.


If you want to resolve merges using the workflow of git, you can generate scratch files and diff them, right?


Yes TFGP. This is...blockchain programming. What I find (somewhat) amusing () is that you changed your name from TTGP to TFFP. Why did she do that to you?


"Unison: a new distributed programming language" by Paul Chiusano [1]

1: https://www.youtube.com/watch?v=gCWtkvDQ2ZI


This talk highlights the benefits and interesting features of Unison much better than the docs.

Highly recommended.


The video is from Sep 15, 2019 and still more useful than their website? Anyone have a transcript to link?


I find the content-addressed, name-independent, immutable (kind of blockchainy) approach to source code very interesting. On the other hand this seems to be rather orthogonal to other (semantic) aspects of the language.

I wonder what would happen if we would change one thing at a time to test the idea: create something like unison based on an existing popular programming language. (python, javascript, java, etc...) All the syntax handling, building and version control would be done the unison way, but the interpreter would be the same as today in that popular language. The benefit would be that a lot of existing code could be transferred into the initial 'blockchain', with all the current dependency hell being solved from day one. I did not think this through deeply though... Maybe some aspects of existing languages prevent them to be handled the 'unison way'?


I tried something like that in 2016 with Javascript. I parsed the source code and put every AST node on IPFS and linked it to the other node using the hashes. The problem was that this is a _lot_ of overhead...


Spent a year doing R&D into storing a "abstract syntax DAG" consisting of extremely rich syntax nodes (effectively entire programs/modules). That overhead you mentioned is why we decided that failure was a good result.


Interesting, you have published the results/source somewhere public? Would love to take a look at it.

If you're using a language that doesn't need a AST/where the AST is the same as the source code, I'm sure we could pull out more benefits of this approach. Seems one would need a lisp or lisp-like language that is homoiconic.


I have done exactly that with a language called Pilot. It is not pure JS but based on JavaScript and all code is content addressable and indexed by the sha256 of the code


Google isn't turning up much. Is the project open source? Would love to take a look.


Sure, go to yazz.com and click on github to see the source


I'm guessing language purity is important when it comes to this. Translating the "unison way" to an imperative world of statefulness and random side effects could be rough.


You don't need it down to each function. Just having hashes this on the #include/import level of the source code would be a win for optimizing build and test of big code bases. Most build systems today just operate on file time stamps which has a lot of tricky cases where the build system gets out of sync and requires a make clean to get back into decent shape.


I have learned a lot from the creators Paul and Runar in that past. Their book on FP is legendary in some circles. Unison seems like a super ambitious undertaking, but I wouldn't bet against this team.


Algebraic effect handlers no doubt seem to be the future of getting side-effects under control in programming languages, much like in the way of what immutability has done for data.

My (admittedly little) experience with Unison though is that it's far from ready for the spotlight however. Much of the docs 404, and joining their discord mostly resulted in the advice to wait for future releases.

I wish Microsoft would invest in something like the koka language https://github.com/koka-lang/koka. Perhaps the idea of content-addressable / immutable code could find a place too, but I think it's less critical.


They already have F* for such type systems.

https://www.fstar-lang.org/


I've pondered using IPFS as a backend for content-addressable, immutable code blocks. That idea seems to fit in with the concept of structured or syntaxless programming, to assemble blocks into systems. Here's a list of the latter. https://www.reddit.com/r/nosyntax/


> I wish Microsoft would invest in something like the koka language

Given the ML-like syntax of Unison, it would be nice to see F# taking some of the key ideas.



I recently came across Unison through YouTube recommending me a series of videos from the "Strange Loop" channel [1]. The fundamental idea of uniquely addressing functions based on a hash of their AST is mind-blowing to me. Immediately my mind started to consider many of the possible paths such an idea could lead down, many of which are clearly tickling the minds of many of the commenters in this thread.

My first thought was the same insight from von Neumann architectures: code is data. So I thought of package repositories with URLs corresponding to hashes of function definitions. http://unison.repo/b89eaac7e61417341b710b727768294d0e6a277b could represent a specific function. A process/server could spin up completely dumb other than bootstrapped with the language and an ability to download necessary functions. You could seed it with a hash url pointer to the root algorithm to employ and a hash url to the data to run against and it could download all of it's dependencies then run. I imagine once such a repo was up and running you could do something like a co-routine like call and that entire process just happens in the background either on a separate thread or even scheduled to an entirely different machine in a cluster. You could then memoize that process such that the result of running repo-hash-url against data-hash-url is cached.

e.g. I have http://unison.myorg.com/func/b89eaac run against http://unison.myorg.com/data/c89889 and store the result at http://unison.myorg.com/cache/b89eaac/c89889

1.https://www.youtube.com/watch?v=gCWtkvDQ2ZI


This is cool.

I've wanted to see a Merkle tree based AST for faster incremental compilation (and incremental static analysis).

I wonder if the source manager interface makes this easier to use than editing source files?

AST can still be the source of truth. To edit need to generate source text from AST, edit as text, parse to AST, save.

In other words, I think most of the benefits of this can live behind the scenes for other languages.


I don't think you can have a proper Merkle tree if cycles are possible, and cycles definitely are possible in this AST representation: https://www.unisonweb.org/docs/faq#how-does-hashing-work-for...


I like the idea of syntax-tree rather than filesystem directories (if that's the main difference).

Seems like this could be implemented in other languages though, and don't really understand why we need yet another new language.

For instance, can I not build my JavaScript code in similar ways: I would need a compiler that stores things in trees rather than dirs.


The big issue is the presence of cyclic dependencies between parts of code. If a piece of code is identified by its content, and its identifier is the Merkle root of its parts, then you can't possibly have cycles because you can't compute the hash of a parent before computing the hash of its children (which in turn depend on the parent). You have to use DAGs or plain old trees and eliminate any cycles.

So to get it to work with other languages, you would need a way to convert their abstract syntax into a DAG compatible structure. It may be possible to do so, but I doubt there is a general approach which will work across multiple existing languages. It would require per-language hacking, if it is even possible without rewriting some parts of code.

Seems like languages like OCaml and F# might be slightly more suitable for this approach because they require compilation units to be ordered by their dependencies, and thus promote a convention of avoiding cyclic dependencies between types (although it is possible to make such dependency in the same compilation unit with `type A and B`). They also allow recursive functions, but these should be simple enough to represent with a non-cyclic syntax tree.

If you design a language from scratch, you can simplify things by requiring that no cycles exist in the language - at least at the syntax tree level. You might be able to design representations for recursion at a higher level which collapse to non-cyclic structures in the AST.


"As a result, a Unison codebase can be versioned and synchronized with Git or any similar tool and will never generate a conflict in those tools"

Conflicts are a feature, not a bug. If two developers are making divergent changes in the same piece of code, I suppose Unison would just let the divergence happen, essentially forking the code. I don't think forking code is the right default.


Very cool. Glad to see someone tackling the idea I first saw brought up by Joe Armstrong[0]. When I read his post it made so much sense, I was hoping I'd get to see someone try it.

[0] https://joearms.github.io/published/2015-03-12-The_web_of_na...


from the link:

> Once we have the SHA1 name of a file we can safely request this file from any server and don't need to bother with security.

> For example, if I request a file with SHA1 cf23df2207d99a74fbe169e3eaa035e623b65d94 from a server then I can check that the data I got back was correct by computing its SHA1 checksum.

—-

that’s crazy! i’ve had the same thought myself, and wondered why no one had done something like that... i hope this becomes more and more common (though not using sha-1 ^^)


There is bunch of systems implementing things like this, commonly referred to as "content-addressable storage" as the address of the content is based on the content itself. See https://en.wikipedia.org/wiki/Content-addressable_storage


that’s really cool, thanks for the link


No mention of Smalltalk in the comments? The "codebase" that is managed by ucm sounds very similar to Smalltalk images. I am personally delighted about this.


Content-addressed, immutable code? Reminds me of https://thedailywtf.com/articles/the-inner-json-effect


At first, this sounds like a great idea. On second thought, I have two major categories of concerns:

1) If I fix a bug in a function, (thus creating a new version of an existing function) how do I propagate that fix to all that code that transitively depends on the buggy version? Doesn't that mean I need to create new versions of all the downstream dependencies? Doesn't that defeat separate compilation? Does this scale?

2) Does this closely couple interface and implementation? Is it a hash of an implementation, or a hash of an interface? Is it possible to depend only on an interface, and not on an implementation?


> 1) If I fix a bug in a function, (thus creating a new version of an existing function) how do I propagate that fix to all that code that transitively depends on the buggy version? Doesn't that mean I need to create new versions of all the downstream dependencies? Doesn't that defeat separate compilation? Does this scale?

This can vary depending on how many dependants the code you change has. If you have some fundamental type which an entire codebase depends on, then modifying it will essentially require recomputing hashes for the most of the codebase. On the other hand, if it's something close to the entry point, you would only need to recompute a few hashes and the root hash which represents your entry point. Any parts which are unchanged effectively have their hashes cached.

> 2) Does this closely couple interface and implementation? Is it a hash of an implementation, or a hash of an interface? Is it possible to depend only on an interface, and not on an implementation?

I don't think unison has interfaces or any similar construct yet. There's an FAQ item mentioning why type classes are not included[1] yet.

Personally I think the right approach would be to use a structural type system similar to OCaml's. You would have functors which describe the code you're calling, and call sites would use the hashes of the functor definition. You can then define concrete representations of types independently of the functor, and instantiate them against the functor afterwards. Thus, if you change some implementation detail in a type, then the only hashes that will need updating are those of your concrete type and the instantiation of the functor against your type. The functor, and any code which depends on it, will remain unchanged.

[1]:https://www.unisonweb.org/docs/faq


Hell yea, we can finally move forward with editors now. No more saving files with syntax errors. I imagine you could move a lot of paredit functionality into an editor like this, or have excellent voice editing.


A TLDR for the tour[0] based on my understanding (please fix me if I’m wrong):

Unison is a functional language that treats a codebase as an content addressable database[1] where every ‘content’ is an definition. In Unison, the ‘codebase’ is a somewhat abstract concept (unlike other languages where a codebase is a set of files) where you can inject definitions, somewhat similar to a Lisp image.

One can think of a program as a graph where every node is a definition and a definition’s content can refer to other definitions. Unison content-addresses each node and aliases the address to a human-readable name.

This means you can replace a name with another definition, and since Unison knows the node a human-readable name is aliased to, you can exactly find every name’s use and replace them to another node. In practice I think this means very easy refactoring unlike today’s programming languages where it’s hard to find every use of an identifier.

I’m not sure how this can benefit in practical ways, but the concept itself is pretty interesting to see. I would like to see a better way to share a Unison codebase though, as it currently is only shareable in a format that resembles a .git folder (as git also is another CAS).

[0]: https://www.unisonweb.org/docs/tour

[1]: https://en.wikipedia.org/wiki/Content-addressable_storage

[2]: https://github.com/unisonweb/quickstart

PS: The website is very well made, but I would like to comment that there should be two buttons on the top front page, the tour and installation, as I think most people want to have a glimpse of the PL before installing. I, at least was puzzled because I thought the button was a link to an introduction of the language.


Wondering if Unison was designed as (or ever planned to become) a general-purpose language or it something more domain-specific, e.g. distributed computations.


This is seriously impressive. I've always felt that some kind of database system is needed for code in which you store each function individually. Unison not only has that, it goes one step further to identify a function by its hash. I can't wait to see what the world of programming can become along this direction.


Append only is nice, but I would think a "Garbage Collect" would be useful. Remove all the code not attached to the import tree anymore. Otherwise you will just keep collecting junk forever Honestly git has the problem too, but I expect orphaned/dead code will happen a lot more with this approach.


Git has this problem only as far as the developer's storage of the history of all files is concerned. That "junk" doesn't remain in the final program.

You can have a decades-old git repository containing hundreds of MBs of files, but output a 2kB program. Someone correct me, but it sounds like if you had the same thing with a unison project, the actual executable size can only grow?


My assumption is the content store can only grow, but any executable produced from it would only contain live code.


I suppose you could do it for the top level application code, but all past definitions of libraries must remain. That's the thing that allows safe updates of shared dependencies.


Was thinking of writing this years Advent of Code in Unison but changed my mind because of its very young age and switched to Zig instead. But I'll very much consider it for next year. I really think some of its ideas are, if anything else super cool and worth learning and thinking about.


Does anyone know how concurrency is handled on Unison? I couldn't find any references in the docs.


There's an example in this talk: https://youtu.be/gCWtkvDQ2ZI?t=1903

Basically you can quote code (like in Lisp) and pass it to an executor (thread, remote server, ...).

The main difference is that thanks to the immutability, you can also have a protocol which shares efficiently all the dependencies required to execute the code (like a git pull).

It also uses an effect system, so the implementation does not need to decide which executor to use, it only uses an interface called an "ability", and the caller can choose the implementation.


Maybe I'm missing something, but this seems like it would get unwieldy very quickly for anything more serious than toy examples.

Even the toy examples I'm having trouble following along with.

The benefit of code-as-text is that you can navigate the text without knowing what you are looking for.

This seems to take the opposite approach of making it very easy to look things up assuming you know ahead of time what you are looking for.

That is something that a good usage of grep or some more sophisticated IDE integration will already take care of for you.

The only thing I could see this being potentially useful for is "notebook" style computation, where you are really only using the language to do things like structured math calculations.

What am I missing here?


> What am I missing here?

I think IDE tooling for Unison could pretty easily give you the exact same experience. Generate code from the AST, group it reasonably, and give you a view that is indistinguishable from a traditional source tree.

It might be non-trivial to structure things nicely, but I think theres a good chance your generated structure would be better organized than a traditional codebase.


For a good introduction, watch this talk at 1.5x speed from 2:50 to 5:40

https://youtu.be/gCWtkvDQ2ZI?t=170


I really like the idea, I like the language, but the world will not be rewritten in Unison. A web assembly FFI story is needed for Unison to succeed.


Is this satire? Web assembly has no real adoption on the browser yet, is waiting for a standard I/O interface on the desktop to be useful and is itself the nth iteration of an old concept.


Web assembly is the first time a binary assembly has been adopted as an open web standard. What do you propose as an alternative safe sandbox for other languages?

Large quantities of code can already be compiled to WASM. Since Unison works best with pure functions, most system interfaces would presumably be emulated. This emulation is already possible with Browsix.


Unity targets WASM. I've run into several browser games using it in the wild.



> A friendly programming language from the future.

I loathe marketing BS like this. No, you’re not from the future, you pretentious twats.

If you have a good idea, let it stand on its own merit, don’t dress it up with manipulative language.


Is the Unison codebase manager written in Unison?

What other user-facing open-source softwares have been written in Unison?

Which of those softwares has been used for months or years by at least one person?


I found this rather incomprehensible.


Maybe I've been out of the game or something. But I don't think saying everything is "content-addressed" can land as such of a big mic-drop moment when I've never heard the term or the concept in my life and the front page does absolutely nothing to explain it beyond that.

------ EDIT: My tone here is a little more antagonistic that I meant (proofread kids, don't just check for spelling and publish) but my first impression was a kind of "buzzword-y" vibe. I've looked a bit more and the authors seem genuinely passionate about shaking things up and trying something completely different, which I can respect. I'll leave it up as is because I still think my issue are legitimate, but the tone is a little off and I apologize. ------

Let's take my discovery path so far as an example:

> Functional language: cool cool. I know what that is. Immutable: also cool. Content-addressable: no clue. Saying it'll simplify code bases and eliminate dependency and "no builds" is supposed to sound amazing, but it really doesn't land when I have no clue what the premise is.

> Alright, that's it for the front page. No code examples, nothing linking the currently known to this new paradigm-shifting stuff. K. Let's try the docs.

> The Welcome is the state of the project without saying anything about the language itself. Roadmap isn't gonna help Quickstart! Let's look there.

This amount of effort for discovery isn't very helpful for me. I found a selection of talks elsewhere on the site. One of them is labeled "This is a longer (40 min) introduction to the core ideas of Unison and probably the best talk to start with." That might be a nice link for the front page.

I unfortunately don't have anymore time to look at this this morning, but the first 5 or so minutes of the talk say waaaaaaaaay more than the first however long I spent. Honestly, I think all of that information (code examples, linking core ideas to what we already know, basis of comparison) could be on the front page.

The tl:dr I can glean from it is:

It's kinda like Haskell, but every definition is turned into AST and put in a hash table. Any references are secretly references to the table, so changing a definition is making a new hash and changing the reference, and renaming is changing the reference to name table.

Also, everything is append instead of rewrite, which makes all kinds of stuff cleaner, like versioning and refactoring.

Pretty neat idea, I just don't really like how long it took me to come to that knowledge. I don't know how well it would scale or be usable in a real world scenario, but I also have never used Haskell or any functional language in a "real world" scenario so I'm not the best one to ask.


Hey; Unison author here—

Definitely appreciate this honest and helpful, specific docs feedback, so thanks for taking the time to put it together. We do want the value of Unison to be straightforward given even a quick skim of the site, so: sorry about the current state, we'll try to improve it. :)


Just to add to that - I’m kind of in the same boat here. I had to read the homepage and intro sections a few times and I’m honestly still a bit lost in understanding who this benefits and why.

Just by way of anchoring: I’m no dunce, I’m a typical nerdy guy who knows his UNIX and has written lots of C, Python, JavaScipt, Ruby, etc. Written some mobile apps. Written plenty of web stuff. But I’m not immersed in any academic language study and not really an FP guy.

I don’t think you should dumb this down - the text itself is very good prose. But the context is missing something — perhaps a paragraph or two of anchoring for us “normal programmers” to explain why this is relevant to people who normally write Python web apps or Rust utilities or C++ games all day long (or if it’s not relevant, why, and who it is relevant for).

Good luck with the project!


No worries. I guess it sounded a bit antagonistic, I apologize for that. I am actually interested in the idea, I was just a little frustrated that it was hard to figure out what was different about the language.


It just sounded honest. ¯\_(ツ)_/¯

It's mainly just an issue of manpower on our end (we do accept doc contributions, but probably we should be doing it); having fresh reader eyes on it like this is still valuable though.


> It's mainly just an issue of manpower on our end (we do accept doc contributions, but probably we should be doing it); having fresh reader eyes on it like this is still valuable though.

That's absolutely fair, and tbh probably the problem. I've probably been thinking about the concept for so long that it doesn't occur to you that the phrase wasn't that well known.

I remember a multi-year project in college I was trying to explain about, and blew right through "potential energy surface" and lost him completely. I didn't realize it was something I'd have to explain.


content-addressable is admittedly perhaps a little bit niche. If you spend some time in the version control/distributed data structures space you will have run into it. But if you haven't d then you will perhaps never have encountered it. Your description of a hash table and references is a pretty close description of the idea at it's core.


>>code is content-addressed and immutable.

No, as I keep insistently repeating: Code is content-immutable and addressed.

Also, Urbit is a lie, Surface detail is true and you are a lie and your code is retarted.


The language of the month... (Can't wait february)


Looking at their GitHub [1], it's been actively developed and open source since 2013.

[1] https://github.com/unisonweb/unison


The name is already pretty taken....

https://en.wikipedia.org/wiki/Unison_(disambiguation)


That list looks like its very available to me? The only overlap with software are two unmaintained projects.


It's incomplete. There's at least the Unison file synchronizer as well:

https://www.cis.upenn.edu/~bcpierce/unison/


And the Unison programming language:

https://www.unisonweb.org




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: