That said, I had to dig in the documentation to find code, and it didn't seem as obvious cause the syntax is definitely different from what I'm used to and there's weak syntax highlighting.
It doesn't work in practice, since most languages eventually discard parser generators for custom parsing, and good tooling like syntax highlighters don't rely on the literal grammar but something close enough to the grammar to work and still be useful for programmers.
Classic example is you don't want your parser to be strictly defined and fail at any error. You want it to handle errors gracefully and detect almost correct syntax.
Having written some textmate grammars for syntax highlighting I think the post is ridiculous. I'd be terrified to generate that json from the parser, since a bit of thought and experimentation has to go into getting a good syntax highlighter that works with external tooling.
As well I think an LSP implementation is really important, but writing that in your language or generating it from the compiler is a monumental task for a young language and a terrible idea.
The Nim language in 2008 (when it was known as Nimrod) had originally been planned a similar approach of considering a unified AST with multiple "syntax skins," and as far as I understand there used to be a limited implementation of this. This would theoretically have been a boon for Nim, because an often complaint about Nim from emigrating programmers from languages with a C-style syntax was the (rather superficial) complaint - "no braces? Pythonic syntax? Yuck!"
The problem with this - and why Nim never really committed to the "syntax skin" concept - is that it would have led to far much fragmentation within Nim's own community as some users would prefer one "sub-language" over another. Given as it is the opinion of new languages (Nim included) towards a tool-based unified syntax formatting (e.g. certain naming conventions for identifiers, number of spaces for indent), "syntax skins" have become harder of a sell in favor of lingual consistency.
The original idea was that the same program could be rendered in a variety of different syntaxes in the author's IDE - but the implied maintenance cost of such skins plus that previously mentioned potential for fragmentation led to this idea falling to the wayside. Nim, as a language, grew to be quite flexible within its prime syntax over the years - e.g. the macro system (which modifies the AST directly) made way for a variety of DSLs (e.g. https://github.com/khchen/wNim#code-examples) that remain valid Nim code.
Wolfram Mathematica (or "the Wolfram language") introduced "forms" for that, for instance https://reference.wolfram.com/language/ref/FullForm.html or https://reference.wolfram.com/language/ref/InputForm.html or, quite charming in the math context, https://reference.wolfram.com/language/ref/TraditionalForm.h... or https://reference.wolfram.com/language/ref/TeXForm.html -- See also https://reference.wolfram.com/language/tutorial/EverythingIs...
Because when I see Chinese characters I know immediately if it's interesting to learn Mandarin or not.
A small sample on the front page will not tell me if a language is good enough, but it can give me an idea if it's worth spending more time reading up on it or not.
For instance, C and Go are easy because they do not hide anything and what one sees is pretty much unaffected by what one does not see. On the other hand, because those languages do not permit to build efficient abstractions they are also harder to read because each given sample, although unambiguous, perform very little.
Other languages like C++ (despite a syntax that is close to that of C) or Haskell, allow the behavior of the simplest operators, including the application and the sequencing operators, to behave in unfamiliar way or hide subtle but meaningful details, which makes understanding any small bit of code like a gamble (one has to assume some behavior for operators, constructors...). On the other hand, they allow to build abstractions that make programs more terse and therefore easier to read.
In this trade of, none of the alternatives depends significantly on the syntax.
I think a lot of people have associated obscure syntax with bad programming experience because of perl, the language which syntax became synonymous for bad. Indeed, this is a case where the syntax can be teacherous, but still the language is made more problematic by its semantic, that tries hard to give a meaning to any use of any variable in any context, this delaying runtime errors even more than other runtime typed languages and even masking them entirely (then performing something else than intended).
I think this mental association must be done away with.
There are of course actual blockers to readability, like usage of similar-looking sigils and poor system library naming, but otherwise the syntax really doesn't tell you any useful information. It's the semantics that tell you how it'll all fall apart
It doesn't matter that much when I have found a piece of code that I need to read and understand every detail of, but that is a tiny proportion of the amount of code reading I do - most of the time reading code is about grasping overall structure to learn a codebase well enough to find the points you need to focus on and read.
A clean syntax makes the difference between always having to read each token vs. being able to speed-read by pattern matching high level structure when you don't need the details of that specific piece of code.
I've worked with dozens of languages, with wildly differing syntax, and I'll say that once you reach a certain base capability-level, syntax matters more to me than semantics.
You can paper over a lot of semantic difference.
I can (and have, and so have many others) build libraries to do object-oriented programming in C, for example, and I have implemented closures in C too . So the reason I don't do much C any more is not semantics, but that there is no mechanism for plugging those semantic gaps that creates readable code.
There are languages where the semantics paints you into a corner and makes plugging the holes too hard, so it's not that semantics doesn't matter at all. But it's usually easier to paper over semantic gaps than unreadable syntax.
This is completely subjective and not at all universal.
Francky, syntax has little to no incidence about how well a given language fits a domain problem, and in my experience the only thing syntax is really important to, is how attractive the language is to programmers who judge languages by their cover.
I remember having read somewhere that the initial versions of lisp where missing an actual syntax, and that it was planned to add one later, but that until then users would have to write directly the AST in an ugly syntax full of parentheses and devoid of familiar syntactic landmarks. Well, time passed, an nobody cared enough to add that user friendlier syntax :)
You can also figure out if it's C-like or Lisp-like, and so on from looking at syntax.
I’m perfectly fine with a page that says, “here’s a code sample - notice that it looks similar to C, but here’s a list of reasons / an explanation of why/how it’s actually quite different from C.”
One of my favorite programming books of all time is K&R, from back before ANSI, even. It said, “here’s a simple problem to solve, and here’s how you solve it in our language.” Repeatedly. Explaining how the features helped to solve the problem at hand. It was revelatory. And it assumed its audience to be knowledgeable programmers. It didn’t spend the first chapter just describing all the features or how great the language is without showing any code.
So you can deduce something just from a line count of simple program alone - for one such deduction is an answer to a question "are there at least some attempts to provide useful defaults?"
It's almost like taking test drives when you go car shopping. You don't want to open the hood and check what's inside right away you just want to be able to drive the car.
A reasonable example of a good page that is informative and provides lots of examples (enough at least!) is the one for D:
Course after clicking your link, I am wondering how wrong I am since it is still technically general purpose capable. Wow Julia has grown so much!
The way I think of it, in order to build a language that was flexible enough to cover all these different weird needs you find in the scientific computing community, they ended up needing to build a general purpose language.
Beyond just the idea of a global program store (which I hope has space for some virtualenv equivalent), the cleverness of effect handlers as a first-class language feature is very exciting.
Once the type system is more mature, this could easily be the next kind of Haskell - a language which redefines modern ideas of good programming.
Forth ideas: a global dictionary with pre-existing and user-definable words
Haskell ideas: functional programming and type signatures of functions / words
Clojure Datomic ideas: immutable store of object definitions that append and use references to point to the latest definitions.
I am curious how I/O and error handling are done in Unison.
The short summary is that you can write code with side-effects that you don't define the implementation for, and then let call sites handle those effects or defer them.
One problem I did find: Somewhat deeper into the tour, the tour makes the claim:
> a Unison codebase can be versioned and synchronized with Git or any similar tool and will never generate a conflict in those tools.
This is pratically speaking of course not true: If you have some unison codebase, whilst under the hood it's all hashes, that doesn't change the simple fact that if 2 people both check out the exact same code, and then both people make a change to how the software operates, where that change is conflicting (say, I make the green 'OK' button now red, and you make it blue), you.. have a conflict. That is a fundamental property of this interaction; no language bells and whistles can ever change this fact.
So, what does happen in unison? Is it 'last committer' wins? That'd be a bit problematic. A conflict in your version control system, _IF_ it is representative of the fact that there was an actual conflict (as above with the button and the conflicting colours), is a good thing, not a bad thing.
I did not fully read the introduction yet, but in my mind, in a truly content-addressed system this is not a conflict:
you have the hash of a main() function which ultimately makes the button red, and the other guy has a main() function that makes the button blue. No conflict in the physical sense. Yes, philosophycally there is a conflict, which is resolved by you deciding which main() function you find more useful.
Developer A makes the button red, developer B changes the label from "OK" to "Accept", both inside the same function.
You can't just pick one or the other, you have to combine the changes.
Or two developers add two different fields to a type. Or ...
Sadly the docs only have placeholders for "Concurrent work and resolving edit conflicts" and "Pull requests and collaboration". Would be very interesting to read the developers thoughts on this.
The system seems to support patches, forking and merging.
I think the implication is that version control is not git like, and that it's not that we both changed the text on line 17, rather we both made additions to the underlying structure.
Indeed, it's impossible for us to both edit the same file, because files are never modified. They are immutable, like a blockchain transaction.
I haven't quite understood how it works in practice, though, but definitely don't think "git".
Storing data this way doesn't solve the problem of merging the two changes into a new single change. When you can't do that automatically, it's called a conflict.
The only mutable data in git is the list of hashes you've given names to. You can use git without branch names if you want, living in a world of pure immutable hashes. It doesn't do anything to help you get rid of merge conflicts.
A conflict can only arise when trying to unify (aka merge) two different states. The difference is the representation, with Git (naively explained) mapping file paths to text blobs , and Unison mapping identifiers to AST nodes.
PR #1: appended stuff
PR #2: appended stuff
So there is no merge conflict in the git sense.
If that conflict always shows up in one single line saying what hash of the main function is the one to use - then it doesn’t seem like a huge improvement in terms of conflict handling. Conflicts bubble to the root hash. I assume they thought about this and have some sort of tooling for “traversing” conflicts from the root, choosing my function, your function, or allowing a merged function to be created.
But that description also applies to git. Which could not avoid having to deal with conflicts.
The most hopeful case is that there is ONE place that isn't 'strictly append only', and that's, for lack of a better term, the 'main pointer'. Which hash is the actual state of the app as it should be compiled and deployed right now?
Then THAT conflict, together with tooling to diff 2 different hashes, should swiftly lead to figuring it out.
But then you're still kinda hosed; how do you merge 2 nodes? It sounds like you can't; you can only pick one. With git, you can do a text merge.
I get the beauty of AST based storage and reasoning, but, hey, trying to use git to merge 2 versions of a word doc together is also a disaster, so it sounds like it'd be the same here.
In that sense, unison is worse than your average (text based) language, not better.
That's not actually a downside though; it's different. If I demerited the language for this, that's akin to complaining about a delicious roasted bell pepper because it doesn't taste anything like an onion.
But I do find it a spot irksome that the otherwise well written tour is implying something here that doesn't appear to be true. Perhaps I'm merely misinterpreting it and reading more into that sentence than is implied.
I think that's how it works. The way I understand it from reading the tour, there is a separate store for the code and for the name mappings. Both main functions would merge without conflict into the code store, but the mappings would conflict.
Edit: After thinking some more, it wouldn't conflict on a file-level, since the file itself has a different name. But there would be now two files in the _head folder, and I assume the `ucm` tool would detect that and present the user with merging options?
Developer A made a change which adds feature X and makes a button blue.
Developer B made a change which adds feature Y and makes a button green.
Now the system does not allow both feature X and feature Y at the same time. Our options are either drop X, drop Y, or expend more effort/programming time to make a version which includes both X and Y.
The description above applies equally well to Unison code CAS system and regular program in git. Sure, one is at the file level, and other at the function level - but you still need same kind of tooling. You want to look at the changes and somehow produce a merged version.
it's not just philosophical though. In git terms, when merging the two sets of changes, someone has to choose which of these functions goes onto the master branch. git calls this a "merge conflict".
So in git terms, there are never any updates, only additions, and therefore never any merge conflicts.
When the git interface shows you 'updates', it's just looking at the contents of different commits and guessing how the files are related. It's not part of the git data model. You could apply the exact same processing to Unison commits.
So the systems work the same way. They make a new object/file based on parent objects. And when it can't create that new object/file automatically, that's called a "conflict".
> No changes are ever made to an existing file,
I did not say or imply that there were. it's a new version, of course.
> so there are never any merge conflicts
Does not follow. There is only one HEAD (latest version) of foo.txt on the master branch. In a merge, it has two (or more) parent objects (1).
> All changes result in a new object/file ... therefore never any merge conflicts.
And yet somehow git calls the process of deciding what's in this new (version of the) file with two parents "resolving a merge conflict". Because the two parent changes can't be automatically reconciled. They are in conflict.
That's how conflicts are avoided.
I did not think this through deeply though... Maybe some aspects of existing languages prevent them to be handled the 'unison way'?
If you're using a language that doesn't need a AST/where the AST is the same as the source code, I'm sure we could pull out more benefits of this approach. Seems one would need a lisp or lisp-like language that is homoiconic.
My (admittedly little) experience with Unison though is that it's far from ready for the spotlight however. Much of the docs 404, and joining their discord mostly resulted in the advice to wait for future releases.
I wish Microsoft would invest in something like the koka language https://github.com/koka-lang/koka. Perhaps the idea of content-addressable / immutable code could find a place too, but I think it's less critical.
Given the ML-like syntax of Unison, it would be nice to see F# taking some of the key ideas.
My first thought was the same insight from von Neumann architectures: code is data. So I thought of package repositories with URLs corresponding to hashes of function definitions. http://unison.repo/b89eaac7e61417341b710b727768294d0e6a277b could represent a specific function. A process/server could spin up completely dumb other than bootstrapped with the language and an ability to download necessary functions. You could seed it with a hash url pointer to the root algorithm to employ and a hash url to the data to run against and it could download all of it's dependencies then run. I imagine once such a repo was up and running you could do something like a co-routine like call and that entire process just happens in the background either on a separate thread or even scheduled to an entirely different machine in a cluster. You could then memoize that process such that the result of running repo-hash-url against data-hash-url is cached.
e.g. I have http://unison.myorg.com/func/b89eaac run against http://unison.myorg.com/data/c89889 and store the result at http://unison.myorg.com/cache/b89eaac/c89889
I've wanted to see a Merkle tree based AST for faster incremental compilation (and incremental static analysis).
I wonder if the source manager interface makes this easier to use than editing source files?
AST can still be the source of truth. To edit need to generate source text from AST, edit as text, parse to AST, save.
In other words, I think most of the benefits of this can live behind the scenes for other languages.
Seems like this could be implemented in other languages though, and don't really understand why we need yet another new language.
So to get it to work with other languages, you would need a way to convert their abstract syntax into a DAG compatible structure. It may be possible to do so, but I doubt there is a general approach which will work across multiple existing languages. It would require per-language hacking, if it is even possible without rewriting some parts of code.
Seems like languages like OCaml and F# might be slightly more suitable for this approach because they require compilation units to be ordered by their dependencies, and thus promote a convention of avoiding cyclic dependencies between types (although it is possible to make such dependency in the same compilation unit with `type A and B`). They also allow recursive functions, but these should be simple enough to represent with a non-cyclic syntax tree.
If you design a language from scratch, you can simplify things by requiring that no cycles exist in the language - at least at the syntax tree level. You might be able to design representations for recursion at a higher level which collapse to non-cyclic structures in the AST.
Conflicts are a feature, not a bug. If two developers are making divergent changes in the same piece of code, I suppose Unison would just let the divergence happen, essentially forking the code. I don't think forking code is the right default.
> Once we have the SHA1 name of a file we can safely request this file from any server and don't need to bother with security.
> For example, if I request a file with SHA1 cf23df2207d99a74fbe169e3eaa035e623b65d94 from a server then I can check that the data I got back was correct by computing its SHA1 checksum.
that’s crazy! i’ve had the same thought myself, and wondered why no one had done something like that... i hope this becomes more and more common (though not using sha-1 ^^)
1) If I fix a bug in a function, (thus creating a new version of an existing function) how do I propagate that fix to all that code that transitively depends on the buggy version? Doesn't that mean I need to create new versions of all the downstream dependencies? Doesn't that defeat separate compilation? Does this scale?
2) Does this closely couple interface and implementation? Is it a hash of an implementation, or a hash of an interface? Is it possible to depend only on an interface, and not on an implementation?
This can vary depending on how many dependants the code you change has. If you have some fundamental type which an entire codebase depends on, then modifying it will essentially require recomputing hashes for the most of the codebase. On the other hand, if it's something close to the entry point, you would only need to recompute a few hashes and the root hash which represents your entry point. Any parts which are unchanged effectively have their hashes cached.
> 2) Does this closely couple interface and implementation? Is it a hash of an implementation, or a hash of an interface? Is it possible to depend only on an interface, and not on an implementation?
I don't think unison has interfaces or any similar construct yet. There's an FAQ item mentioning why type classes are not included yet.
Personally I think the right approach would be to use a structural type system similar to OCaml's. You would have functors which describe the code you're calling, and call sites would use the hashes of the functor definition. You can then define concrete representations of types independently of the functor, and instantiate them against the functor afterwards. Thus, if you change some implementation detail in a type, then the only hashes that will need updating are those of your concrete type and the instantiation of the functor against your type. The functor, and any code which depends on it, will remain unchanged.
Unison is a functional language that treats a codebase as an content addressable database where every ‘content’ is an definition. In Unison, the ‘codebase’ is a somewhat abstract concept (unlike other languages where a codebase is a set of files) where you can inject definitions, somewhat similar to a Lisp image.
One can think of a program as a graph where every node is a definition and a definition’s content can refer to other definitions. Unison content-addresses each node and aliases the address to a human-readable name.
This means you can replace a name with another definition, and since Unison knows the node a human-readable name is aliased to, you can exactly find every name’s use and replace them to another node. In practice I think this means very easy refactoring unlike today’s programming languages where it’s hard to find every use of an identifier.
I’m not sure how this can benefit in practical ways, but the concept itself is pretty interesting to see. I would like to see a better way to share a Unison codebase though, as it currently is only shareable in a format that resembles a .git folder (as git also is another CAS).
PS: The website is very well made, but I would like to comment that there should be two buttons on the top front page, the tour and installation, as I think most people want to have a glimpse of the PL before installing.
I, at least was puzzled because I thought the button was a link to an introduction of the language.
You can have a decades-old git repository containing hundreds of MBs of files, but output a 2kB program. Someone correct me, but it sounds like if you had the same thing with a unison project, the actual executable size can only grow?
Basically you can quote code (like in Lisp) and pass it to an executor (thread, remote server, ...).
The main difference is that thanks to the immutability, you can also have a protocol which shares efficiently all the dependencies required to execute the code (like a git pull).
It also uses an effect system, so the implementation does not need to decide which executor to use, it only uses an interface called an "ability", and the caller can choose the implementation.
Even the toy examples I'm having trouble following along with.
The benefit of code-as-text is that you can navigate the text without knowing what you are looking for.
This seems to take the opposite approach of making it very easy to look things up assuming you know ahead of time what you are looking for.
That is something that a good usage of grep or some more sophisticated IDE integration will already take care of for you.
The only thing I could see this being potentially useful for is "notebook" style computation, where you are really only using the language to do things like structured math calculations.
What am I missing here?
I think IDE tooling for Unison could pretty easily give you the exact same experience. Generate code from the AST, group it reasonably, and give you a view that is indistinguishable from a traditional source tree.
It might be non-trivial to structure things nicely, but I think theres a good chance your generated structure would be better organized than a traditional codebase.
Large quantities of code can already be compiled to WASM. Since Unison works best with pure functions, most system interfaces would presumably be emulated. This emulation is already possible with Browsix.
A larger thread, also from then: https://news.ycombinator.com/item?id=9512955
I loathe marketing BS like this. No, you’re not from the future, you pretentious twats.
If you have a good idea, let it stand on its own merit, don’t dress it up with manipulative language.
What other user-facing open-source softwares have been written in Unison?
Which of those softwares has been used for months or years by at least one person?
EDIT: My tone here is a little more antagonistic that I meant (proofread kids, don't just check for spelling and publish) but my first impression was a kind of "buzzword-y" vibe. I've looked a bit more and the authors seem genuinely passionate about shaking things up and trying something completely different, which I can respect. I'll leave it up as is because I still think my issue are legitimate, but the tone is a little off and I apologize.
Let's take my discovery path so far as an example:
> Functional language: cool cool. I know what that is. Immutable: also cool. Content-addressable: no clue. Saying it'll simplify code bases and eliminate dependency and "no builds" is supposed to sound amazing, but it really doesn't land when I have no clue what the premise is.
> Alright, that's it for the front page. No code examples, nothing linking the currently known to this new paradigm-shifting stuff. K. Let's try the docs.
> The Welcome is the state of the project without saying anything about the language itself. Roadmap isn't gonna help Quickstart! Let's look there.
This amount of effort for discovery isn't very helpful for me. I found a selection of talks elsewhere on the site. One of them is labeled "This is a longer (40 min) introduction to the core ideas of Unison and probably the best talk to start with." That might be a nice link for the front page.
I unfortunately don't have anymore time to look at this this morning, but the first 5 or so minutes of the talk say waaaaaaaaay more than the first however long I spent. Honestly, I think all of that information (code examples, linking core ideas to what we already know, basis of comparison) could be on the front page.
The tl:dr I can glean from it is:
It's kinda like Haskell, but every definition is turned into AST and put in a hash table. Any references are secretly references to the table, so changing a definition is making a new hash and changing the reference, and renaming is changing the reference to name table.
Also, everything is append instead of rewrite, which makes all kinds of stuff cleaner, like versioning and refactoring.
Pretty neat idea, I just don't really like how long it took me to come to that knowledge. I don't know how well it would scale or be usable in a real world scenario, but I also have never used Haskell or any functional language in a "real world" scenario so I'm not the best one to ask.
Definitely appreciate this honest and helpful, specific docs feedback, so thanks for taking the time to put it together. We do want the value of Unison to be straightforward given even a quick skim of the site, so: sorry about the current state, we'll try to improve it. :)
Just by way of anchoring: I’m no dunce, I’m a typical nerdy guy who knows his UNIX and has written lots of C, Python, JavaScipt, Ruby, etc. Written some mobile apps. Written plenty of web stuff. But I’m not immersed in any academic language study and not really an FP guy.
I don’t think you should dumb this down - the text itself is very good prose. But the context is missing something — perhaps a paragraph or two of anchoring for us “normal programmers” to explain why this is relevant to people who normally write Python web apps or Rust utilities or C++ games all day long (or if it’s not relevant, why, and who it is relevant for).
Good luck with the project!
It's mainly just an issue of manpower on our end (we do accept doc contributions, but probably we should be doing it); having fresh reader eyes on it like this is still valuable though.
That's absolutely fair, and tbh probably the problem. I've probably been thinking about the concept for so long that it doesn't occur to you that the phrase wasn't that well known.
I remember a multi-year project in college I was trying to explain about, and blew right through "potential energy surface" and lost him completely. I didn't realize it was something I'd have to explain.
No, as I keep insistently repeating: Code is content-immutable and addressed.
Also, Urbit is a lie, Surface detail is true and you are a lie and your code is retarted.