Hacker News new | past | comments | ask | show | jobs | submit login
Skip – A programming language to skip the things you have already computed (skiplang.com)
283 points by pbowyer on Sept 26, 2018 | hide | past | favorite | 100 comments

Just introduce a syntax to define immutable variables (e.g. like "val" in Scala) and to mark particular functions as pure (easy to implement manually as a decorator in Python, I've been using it a lot) and memoization becomes a seemingly easy task. Why a new language?

By the way it seems very sad to me that the majority of imperative and hybrid (functional×imperative) languages lack syntax for immutable variables: in just so many cases I introduce a variable to store an intermediate result of an operation an don't mean to re-assign it ever after, it's nice of a developer to define this intention explicitly and of a compiler to raise an error when the variable is re-assigned accidentally - this simple feature is among the reasons why Scala programs usually are comparably easy to debug and run as expected as soon as they get compiled successfully.

The problem that the language is trying to solve is essentially building safe caches. Once a value is in the cache it cannot be changed from inside the program (the solution we opted for is to make it deeply immutable) and externally (if it depends on some external data, we need to invalidate the cache when it changes).

The big challenge is how do you maintain those guarantees while having a language that product engineers would want to write in.

You can decide to not allow mutability in your language but developers need to think about code in a very very different way than what they do now. It’s also going to be challenging to convert/translate existing codebases and patterns to it.

So we decided to allow both mutability and immutability. But it poses very interesting challenges. For example in the standard library, what is the type of x.map(y -> z). Which pieces are mutable or immutable. If you are not precise enough in your type system, then you are going to type it as “i don’t know (readonly)” or “mutable” but then you can’t pass the result of this to a memoized function, so you lost the point of the language.

The other interesting aspect is that the backend relies on the fact that the invariant are true in order to output optimized code and garbage collector.

So we cannot “cheat” like many gradual typed languages like Hack, TypeScript, C# where they can just say, actually this type is wrong but i’m just going to ignore it and it’ll throw an exception at runtime if the programmer got it wrong. It would segfault in Skip case.

It feels like I didn't really get it given your explanation as I still can't understand why doesn't Scala (as a language, I know its actual compiler doesn't handle memoization on itself and won't remember the exact type in many cases (see "type erasure") actually) suit these needs?

This goes far beyond having a local variables being assigned (assuming you mean the local being assigned and not mutating the object assigned to the local).

For Skip to have to know that the actual instance is immutable, and that no one has a reference to a mutable version of that instance. If the inputs/outputs change after the memoization, we cannot guarantee the correctness of the reactivity/cache-invalidation. Thus for memoizing the inputs/outputs to a function, we need to know that the object, and any of its fields, will never be modified. In other words, we need to know if the object is transitively immutable. In order to get this sort of analysis, you need the type system to provide the information, and it is not enough to have a simple local analysis.

[[This was written by Todd Nowacki but his new HN account has been flagged as commenting too fast...]]

I understand this, but can't an object be considered immutable when all its fields are immutable and all its member functions (but the constructor perhaps) are pure? And isn't this what C# structs (and Scala case classes although both can be equipped with mutable fields and impure methods if the developer wants but this isn't a recommended pattern) are meant to be? If this doesn't solve the problem then what is the solution proposed?

PS: Whatever, it pleases me that somebody has finally come to the idea of a heavily-memoized language that would still support mutable imperative parts and tried implementing it. I've been thinking "why the eck there is no language that would memoize pure functions automatically" for so many years already (but I've always been dismising the idea of building it myself as I lack sufficient background in computer science to build a compiler that would produce reasonably fast programs so it would be impractical to invest time). The only language I've found easy to add automatic memoization to is Python.

PPS: That's a pity Todd Nowacki has been flagged, HN should really learn to distinguish between garbage fast commenters and qualified resourceful writers.

I got unflagged!

> but can't an object be considered immutable when all its fields are immutable and all its member functions (but the constructor perhaps) are pure?

How exactly do you expect to track this accurately without integrating it into the type system? It's not an issue of whether or not the field itself is assignable, but whether or not the type that it points to is mutable. This becomes much more difficult to determine when generics are involved. This sort of type level knowledge for generics would make it hard to staple onto an existing type system. Even if you did, you would likely end up with a lot of code duplication as you would need two forms of types, immutable and mutable, e.g. `ImmVector` and `MutVector`, `ImmMap` and `MutMap`, `ImmMyObject` and `MutMyObject`, etc. To make this more ergonomic, Skip has a mode based mutability model, where the mutability is determined not at the class declaration, but rather at the type annotation/object instantiation. So we would then have `Vector` and `mutable Vector`, two modes of the same class.

You can read more here: http://skiplang.com/blog/2018/04/10/understanding-mutability...

I think integrating this sort of thing into the type system is long overdue.

Ideally, in a OO language, what I want is to be able author a single class, but then "overload on immutable", so to speak. So there are some class members that are shared for both implementations, and then there are some that are only there when the object is mutable, or is immutable (in the most extreme case, there are no shared data members or method implementations at all, and only the API is common).

That would imply that a class definition effectively implicitly defines a trait derived from the common members - e.g. "Vector", having operations such as "size" or "[index]" - and then two distinct classes, e.g. "mutable Vector" and "immutable Vector", both implementing the trait, and providing some extra APIs where that makes sense.

This would be helped a lot by a full decoupling of classes from types, as some modern (and not so modern - e.g. OCaml is practically a golden standard here) languages do.

I might be misremembering, but isn't overloading on immutability what C++ does with the const keyword on member functions? e.g.

    struct Foo {
      A bar(); // If the instance is non-const.
      A bar() const; // If the instance is const.
Of course this is C++ so it's up to you to make sure that all the types contained in your object are also immutable when they're const-ed (e.g. avoid pointers). And the mutable keyword is a backdoor.

Or perhaps you wanted to be able to change the representation (e.g. member variables) of your type based on whether it's mutable or not? Although in that case I'd probably go with an interface.

Const pointers are not a problem - a pointer is a value, what it points to is a whole different thing altogether, and you can mark either or both as "const" as needed. That part is done right.

As far as changing representation, that's exactly what I meant by "class members" (and why it doesn't say "class methods"). An immutable map should have a different internal representation from a mutable map, for example, to permit for efficient copy-with-update. But logically, they're both maps. C++ punts on this problem - a map is a map, and the only thing that "const" does is make its data immutable, which in practice means that immutable containers aren't used much because there's no way to do copy-with-update (e.g. given a const std::list, you can't efficiently create a new const std::list with an item prepended, like you can in Lisp or ML - you have to create a whole const std::list, copying all items from the original one).

Languages that tried to tackle this basically just came up with a pattern where you just write two completely different classes, that have a common pattern in the name by convention (like Map and ImmutableMap), and implement a common interface/trait. But there's no link between those two classes other than convention, and the common interface has to be authored manually, even though it can be derived entirely from the two classes in practice.

So, what I'm proposing is basically a better way to write Map and ImmutableMap in a way that would 1) make it clear that those are semantically related, even though they're distinct in implementation, and 2) automatically derive the common trait. So when I write a function, I can say "takes a Map", and that's a trait that covers both "mutable Map" and "immutable Map" - automatically!

Are different modes implemented differently? E.g. an immutable (but extensible) Map should be implemented as a tree, whereas mutable maps are best implemented using arrays.

This is something that was a big source of discussions in the team. The state we're in so far is that we have two different classes for those two use cases.

The hashmap version of immutable maps are really useful for the "builder pattern". This is very common in product code. You create a mutable local map by slicing the inputs in many ways and freeze it at the end. We have an optimization that makes freeze a no-op if we can prove that there are no references to the variable that escape (it's true in many cases). We often talked about doing a compaction step at this point but haven't played with it.

The tree version of immutable maps are really useful when you are mutating (hmmm...) it after it escapes the function.

The problem is that the two have a very different API and complexity trade-offs. We haven't found a way to unify the two APIs and have the compiler able to pick one or the other transparently behind the scenes.

Also, one thing the language tries to have is predictable performance. Having a different complexity based on whether an optimization is kicking in or not is something that bit us many many many times on the dynamic languages we're working on (Hack, JavaScript, Python) that it is something we're trying to avoid with Skip.

No. The modes are there so that you don't need to completely switch what you are doing and do a fully copy when you need to memoize; it does not switch the underlying implementation.

As an example in Scala, consider:

  val buf = scala.collection.mutable.ArrayBuffer.empty[Int]
  buf += 1
is perfectly valid. You can't reassign buf to another object, but you can change the underlying object. Making something "truly" immutable when that's allowed is tricky. Apache Spark (built on top of Scala) goes a long way to try and achieve this, and is able to do a number of optimizations as a result, but IIRC there are still significant cases it isn't able to resolve.

I know this (and even use this occasionally), that's why there are whole separate families of collections in Scala - mutable and immutable collections. And I think it is possible for the compiler to figure out if a type (e.g. a collection type) is mutable or not given its source code and even if it isn't it probably can be made possible by introducing some sort of special annotations to class definitions.

The ! syntax is really cosmetic, we could have gone the let/const route. The bigger aspect is deep immutability. Most languages implement memoize by doing pointer equality and have a notion of shallow immutability. But if someone mutates some value deep down in the object, then they’re fine with it.

The other approach is to do deep copies but it’s very expensive in practice.

The other other approach is to make references explicit in the syntax, so that it is possible to say things like "this is an immutable reference for a mutable object" and "this is a mutable reference for an immutable object". Because sometimes I really do want shallow mutability, and sometimes I want something even more complicated than that (e.g. deep to a certain level and then shallow, or vice versa).

Kinda like C++'s const, except that's actually "readonly" (which is also handy to have!), while what we really need is "immutable". Or rather make immutable the default, and then "readonly" and "mutable" have to be explicit everywhere.

> Scala programs usually are comparably easy to debug and run as expected as soon as they get compiled successfully.

I wholeheartedly disagree. Scala is among the most difficult languages to master. From experience, a successful compilation do not shield you from bugs and runtime exceptions. NullPointerException being my favourite, as a reminder that lots of libraries are built on top of Java's ecosystem, and thus plagued with the same curse.

And even if you're toying with pure Scala code, without any dependencies, it's still possible to end up in a state where the given traceback would be absolutely useless and full of (Anonymous) calls. Good luck with that.

beyond memoization you can do cool stuff with scheduling if you annotate pure functions and execute lazily. I actually mocked this up a little while back https://github.com/bwasti/lazy

also just shared it here https://news.ycombinator.com/item?id=18080598 if you have comments

> Scala programs usually are comparably easy to debug and run as expected as soon as they get compiled successfully.

I am quite frankly speechless that anyone could hold this opinion.

> and run as expected as soon as they get compiled successfully

If a non-trivial program runs as expected on the first try I get really suspicious.

So did I, the first Scala experience was shockingly pleasant (I indeed "couldn't believe my eyes" and literally thought "that's so weird, it probably works just so wrong that it results in an illusion of working right", it felt like magic and this way frightening): it either fails at compile time (throwing very informative error messages some of which can be unintuitive and seem weird as long as you don't know what do they mean but let you identify a problem easily as soon as you learn to read them) or just does what you meant it to do. But the programs I've written can indeed be considered trivial - I didn't make any serious use of actors or weird types, I mostly used it as "a better C#" + humble amount of FP but it seems to me that this makes my point make even more sense - write almost the same program in a very similar style in 2 languages and one of them requires a way less debugging.

The hardest part of software development is figuring out exactly what you mean it to do, though. Making a program that runs without crashing and does something is relatively easy.

That tends to happen with very strict type systems; in my earlier experience with Haskell usually non-trivial programs got correct results on the first try; though they not always had the expected time or space complexity on the first try.

Don’t you also need the requirement that inputs with the “same value” will have the same references? Just because a string is immutable doesn’t mean that two “identical” strings will have the same reference, which is something you would want for a memoized function.

Skip has a concept of a "intern heap" that contains all the values that are going to be used for memoization. By default everything is allocated on the stack/heap.

When you want to memoize a function, then we move all the arguments to the intern heap which makes structurally equal values have the same pointer. Then we can figure out quickly if two immutable values are equal.

We have an intern() function that lets you move a value to the intern heap manually. We are also thinking about adding a field for objects such that once it is interned somewhere, if you intern it again, it's a no-op and we use the pointer in the object. This would avoid accidentally increasing the complexity of an algorithm by repeatedly interning the same value.

> implement manually as a decorator in Python

Why don't you just use the functools.lru_cache decorator in the standard library?

It seems I couldn't find it by googling "python memoization" when I've initially come to the point where memoization has become a necessity for me. I've found some custom solutions and built my own inspired by them but suiting my needs better (e.g. memoize to a database + cache in the memory). I'll take a look at functools.lru_cache and consider migrating to it in the cases where I don't need persistent memoization of big datasets, it's always better to use a standard solution when it's enough, thanks.

Incremental computation is a fascinating area of research, I've followed Matthew Hammer's work on Adapton[1] for a while, as well as Daco Harkes work on IceDust[2]. The issues are obviously quite deep (i.e. how do can you change a single value and recalculate quicksort without the effects of the change fanning out massively?) whether you're targeting existing languages and runtimes or building from scratch.

I would very much like to see this as an area of active development both in programming languages and distributed computing platforms. Current streaming platforms are great and it's getting easier and easier to create declarative data flow pipelines, doing real-time processing, splitting things into windows etc. But on top of that I imagine incorporating incremental computation, allowing you to keep a durable history of the event stream (or at least manageable parts of it) in a way that automagically allows incremental recalculation of upstream data when an underlying fact is retracted or updated.

[1] http://adapton.org/

[2] https://dcharkes.github.io/

Of course if that already exists then I'd love to hear about it! There seem to be all sorts of hybrid stream/batch systems but none with quite this focus. Effectively I want stuff that streams into a time series database in realtime, with the opportunity to edit or delete events later. On top of that there would be a processing pipeline with familiar streaming paradigms (transforming data, windowing data arbitrarily, aggregating on top of that) with the additional magic that the necessary events/windows/calculations re-fire if some underlying data changes. All in a magically efficient and transparent way.

Shameless plug for something that does all of these things (perhaps not exactly as you want, but ..)


Wonderful! I feel bad that I didn't follow the work after hearing about Naiad originally. Where might I go to read more about the fault tolerance story - I note that's an ongoing area of research in the README. Also the link to the Kafka adapter is dead, is that work ongoing or does the story end at the capture/replay API? (Happy to bombard you with email if that's any more manageable for you!)

Reading the 'when not to use Timely Dataflow' section on sorting, are there any ideas from Nominal Adapton that are relevant in this distributed setting, when we're talking about small updates or inserts?

Durability is a student's research project at the moment. It's up and running for "Spark" style computations but the work has broader ambitions. In essence, all of the internal state in DD are easily (automatically) serialized LSM slabs, and it is cake to write them out and read them back (and almost cake to make sense of them).

Which Kafka link is dead? The work has quiesced because any next step seems to be involve assuming something about timestamps in the input. At the same time, it seems to take about 10 lines of code to write a Kafka source or sink, minus any careful worrying about acks for durability.

The DD and Adapton approaches may be a bit tricky to hybridize. Much incremental compute works by moving through a sequence of valid configurations, and DD's main departure is generalizing this to partial orders. So, maybe you could borrow ideas, but it would probably be original research to do so.

Email is great, as it is def no longer about skiplang.

> The Skip project concluded in 2018 and Skip is no longer under active development at Facebook.

Just a warning if you're interested in using this.

I would like to add that I intend to maintain it and build a community around it in outside of FB as it is mentioned on the front-page.

"The language, compiler and libraries are maintained as a side project by Julien Verlaguet, the main designer of the language."

Can you speak to what the takeaways for facebook were now that this project is concluded as a facebook sponsored project?

Do you see the language supporting caching beyond in memory (i.e. plugin redis/memcached etc) and allow caching between processes?

How does your language compare to the glut of modern systems-ish languages like Rust, Go, Swift, Julia, Nim? I see a lot of C++ influence here. Your docs don't explain how you make parallelism ergonomic, but the "async" keyword is getting popular. Why would I choose to invest in your language over one of these other ones that has more momentum?

Skip is not meant to be a system programming language. So the comparison with Rust for example is difficult.

> "I see a lot of C++ influence" I don't see what gave you that impression. Perhaps the syntax? But I don't think it looks more like C++ than Java or C#.

> Why would I choose to invest in your language over one of these other ones that has more momentum?

There are several reasons, but of course I am biased: 1- builtin cache invalidation 2- safe parallelism 3- predictable GC

While 2-3 can be found in other languages, I think I can argue that 1 is very unique to skip.

The other thing is, what are you trying to build? If you what you want to do is to develop an incremental tool, then Skip is probably the best option out there right now.

Let's say you decided tomorrow to build a fully incremental C++ front-end, to support auto-complete for example (I chose C++ because it is notoriously complex). What would you write this in? I think Skip should be the language of choice.

I can see why someone who stumbles on Skip would classify it as "systems-ish": General-purpose statically-typed language that compiles to native code with an emphasis on being fast with predictable performance. Go is thought of as a systems language, and it is a garbage-collected language meant for writing servers!

How would you describe when to use Skip? It looks really interesting.

I don't know anyone who considers Go a systems language. That wording was also removed from the Go website a long time ago.

The GC and the heavy weight runtime make it unsuitable in this space.

Skip seems very similar to Swift and Java, with inspiration from Rust wrt mutability.

> Skip is not meant to be a system programming language. So the comparison with Rust for example is difficult.

Can you elaborate on this comment in the context of some of the other languages listed by the grandparent (Go, Swift, Julia, Nim)? Granted, writing bare-metal operating system kernels may severely limit practical options, but I can imagine using Go, Swift, and Nim for slightly higher level "systems programming". Skip would be a reasonable alternative to these, right?


It really depends what one means by "system programming". I would add Java/C# and many others to that list.

I'm curious what your thoughts are on Nim. Especially seeing as we're both FB employees :)

I think you've done a really good job with Skip. I really love the incremental type-checking. It's a breath of fresh air to see a compiler created with IDEs in mind from the beginning.

Effect tracking is something that Nim also boasts so I'm curious how that works in Skip. Are there any docs/articles going into detail about this?

Generally system programming is that you're able to interact with the system without some sort of "bridge".

For that reason Java and C# cannot be counted as system programming languages, because neither of those can interact directly with the system they're running on.

Ex. C# needs PInvoke, where as C can just call system functions directly.

At least that's my understanding.

You have a doc section named "Lvalues." Nobody talks about lvalues outside C++.

This is the technical term for what's left of the `=` sign. If you are from a JavaScript background, it is commonly called "destructuring assignment".

The interesting about Skip is that local mutations are explicit with a ! in front of it. And it works in the context of a lvalue.

We may want to rename this title if it's confusing.

> Why would I choose to invest in your language over one of these other ones that has more momentum?

FWIW I don't think anyone is claiming you should, especially given it's not under active development (:

That's not exactly true. It's not under active development at Facebook. I still intend to work on Skip as it is said on the front-page: "The language, compiler and libraries are maintained as a side project by Julien Verlaguet, the main designer of the language."

And I am looking for people to help ;-)

This was an unfortunate choice of wording, that maybe should be updated:

> The Skip project concluded in 2018 and Skip is no longer under active development at Facebook.

This makes it sound like the project was killed off entirely, as it mentions its 'concluded'.

The key question for me is what about the project made Facebook uninterested in using the language or continuing development. On paper, Skip sounds quite compelling, so I assume there is some reason, if just politics.

(I work at FB, not on Skip.) The project was originally started with the hopes of converting Facebook's entire (Hack) web codebase to use Skip and support reactivity. It seemed like the Skip team hoped that it would be possible to convert incrementally while reaping performance wins incrementally too.

Converting a multi-million-line codebase takes many years, and in practice, it turned out that huge swaths of the code would need to be converted before any wins were had. (To fully leverage reactivity, all your dependencies also need to be reactive.) Our codebase is intertwined and doesn't have many self-contained parts that could be fully converted in isolation so this wasn't possible.

Essentially, Skip is still very promising as a language for new projects but it feels like Facebook management couldn't justify staffing a permanent team given that we don't have a path to move huge chunks of development to it except in projects being developed more from scratch, which we tend to have few of.

I wouldn't see any of this as an indictment of Skip's potential generally.

Thanks for the insight. That seems quite reasonable, especially the fact that you'd need your dependencies to implement it before seeing any speed up.

Thanks for the insight.

Looks to me like it has a lot of the benefits of Haskell without the mind-numbing learning curve.

Under Safe Parallelism it says

Skip supports ergonomic asynchronous computation with async/await syntax

Anyone have some insider knowledge on the history of this project? I'm curious as to how a big multi-year create-a-language research project like this gets pitched, accepted, and finally sunset -- Was it a full-time project? Were there deadlines? Was this considered a success or a failure?

The page itself mentions the project took place from 2015-2018. There's also what looks like a devlog. I'm not sure how far back it goes but it may have the details you're looking for.

> Skip also features a code-formatter to ensure consistent code style and a tool for running codemods.

Nice! I love this trend.

Yeah, before joining the Skip team I worked on Prettier. Once you have an automatic formatter, you can't really go back, so I ported the core algorithm of prettier to Skip and built all the formatter for it.

My biggest surprise was that the language has been designed by people that have been working on languages for their entire lives so it was dead simple to write the formatter for it compared to JavaScript.

It only took me ~2 weeks to get it in a place where we could reasonably convert the entire codebase to it. It took Prettier ~6 months to get to that point.

I recently read a couple github threads in prettier's repo that said prettier won't do import optimization/organization, nor object property organization (e.g. {c: 3, b: 2} -> {b: 2, c: 3}) because it has potential to cause AST changes and could break things for people.

Do you think it's feasible to have a "jsfmt" tool or is the technical hurdle too great?

A lot of people have successfully coupled eslint with prettier, where they enable eslint rules that do those transforms with autofix.

Yeah my first experience was with gofmt. I am currently working in both large and old java code bases so I have needed to only use tools that auto format lines that are staged for committing, but it was one of the first things I set up.

By the way, I've got a question to programming language engineers and people keeping track of emerging and experimental languages: is there a language where everything (or almost everything, excluding elementary types and structs perhaps) is an "actor" and every class method call is an asynchronous message passing? Together with an idea of a heavily-memoized language (which Skip is meant to be an implementation of) this idea won't leave my mind for years since I've first learnt about the actor model.

I believe that some Smalltalkers were able to turn objects into things that were "like actors" when implementing Croquet / OpenCobalt [1]. These systems relied upon creating their own sense of "time" via a protocol named TeaTime [2]. I have tried to get more information on the actual implementation of TeaTime but have come up short so far.

[1] https://en.wikipedia.org/wiki/Croquet_Project

[2] http://www.vpri.org/pdf/tr2003001_croq_collab.pdf

On the subject of actor-concurrency languages, you should check out Pony. In Pony, actors are very common, but they are not the ubiquitous data-type. Actors exist along side regular classes in the language. See https://tutorial.ponylang.io/types/actors.html

Well, Obj-C used true message passing to call methods. And SmallTalk is also a True OO language in that sense. I wonder if SmallTalk isn’t a good starting point for such a language.

There was Axum[0], a research language by Microsoft.

[0]: https://en.wikipedia.org/wiki/Axum_(programming_language)

Unfortunately, it has been discontinued long ago almost immediately after being introduced. And for the .Net ecosystem this means it can hardly be used nowadays as it is probably incompatible with modern libraries.

I used it for my senior project in high school (or the German equivalent of that), to write a lattice gas cellular automaton. One day after deciding on my topic, it was announced by Microsoft that they would discontinue it.

Even back then it was very buggy, and in a normal size program, you would certainly run into some bugs. Sadly Microsoft has taken most resources regarding Axum down some while after discontinuing it, so if someone would want to try it out they would run into a lot of dead links.

Based on my fading memories (but take this with a grain of salt, since I was much less experienced back then), I would say that Axum and it's actor-focused model provided not enough upsides to warrant it being its own programming languages. I had better experiences using libraries like Akka in JVM languages, or Actix in Rust.


I always had my go-to questions for commercially sponsored programming language projects after having done some of them: How did them start? Did them pivot from, say, a personal project or were them thoroughly planned within the company? How much resource was spent and how was the (current) fate of the project determined?

How are the side effects tracked? I couldn't find it in a brief read of the source code. The documentation mentions sequence points at which previous side effects are guaranteed to have finished. But what actually tracks whether IO or state changes have occurred?

IO is managed through very specific native extensions where updates to that IO source are able to be tracked outside of the language and fed back in. For example, we don't have anything that lets you read from command line or stdin. But you can open and watch a file. Any changes to that file will be tracked by the runtime.

Other non deterministic computation, like randomness or time is outright banned in the tracked environment. (There is an opt-in 'untracked' modifier if you want to write code outside of the reactive environment).

For other tricky behavior, like mutability/mutations on an object, the type system tracks the mutability mode of any object. Only objects that are fully immutable ('frozen' in the language) can be memoized inputs/outputs to a function.

> For example, we don't have anything that lets you read from command line or stdin. But you can open and watch a file.

whats the difference between opening a file and watching it and watching stdin?

inotify and seek works on most files.

FIFO file descriptors, not so much.

If you can't seek backwards, you pretty much have to cache the whole contents in memory, which seems wildly inefficient.

Apparently this was a research project. Anyone have links to papers, conclusions, results? Should be interesting…

We posted about some of those in the blog over the years. http://skiplang.com/blog

Some of the interesting ones:

- An overview of how memoization works and the MVCC model behind the scenes: http://skiplang.com/blog/2017/01/04/how-memoization-works.ht...

- How pattern matching is implemented and the tricks to make goto work in JavaScript: http://skiplang.com/blog/2017/11/15/simulating-goto-in-javas...

- The work done on making error messages much more helpful by understanding common idioms from other programming languages: http://skiplang.com/blog/2017/11/20/fixing-the-syntax-barrie...

- The macro syntax that elegantly solves a lot of use cases where dynamism is commonly used http://skiplang.com/blog/2018/07/24/macros.html

Thank you, I feel like Skip is very close to the language I've always wanted. Its MVCC looks very similar to Clojure's, I'm having trouble finding a good link about it:


The gist of it is that instead of tracking multiple separate locations in memory for values, the software transactional memory (STM) references data by value. So if you assign the value {a: 42, b: 24} to two variables x and y, that value is only stored in memory in one place that the variables both point to. Then if one of the variables changes something, for example y.b = 25, this big tree structure works like copy-on-write and makes copies of branches when mutations occur. So internally, a: 42 is one reference and b: (24 or 25) is another reference. So rather than using 4 cells of memory, we've only used 3.

This frees the developer from having to micromanage memory resources and makes a lot of other things like concurrency "just work" without locks. Do I have this correct?

Your explanation looks correct!

The practical application is having a webserver that runs a bunch of requests in parallel but that can all share the same memoization cache. We don't want to lock the full memoization cache and therefore block all the other requests when one thread writes a new value.

FWIW, MATLAB does a lot of this kind of optimization in its internal implementation of the core datatypes. There are a lot of smart compiler/JIT people that have worked hard on it over the years to optimize the internal guts despite the naive design of the language due to its age and some poor design decisions made ages ago that cannot be changed due to extreme backwards compatibility requirements. Kind of like javascript and V8/SpiderMonkey :)

Thank you, I didn't know that MATLAB did that. I'm a huge fan of it and hope that someday other matrix programming frameworks like TensorFlow, OpenCL and CUDA move to a similar general-purpose syntax that frees the developer from having to deal with the boilerplate associated with old OpenGL-style buffer representations.

No papers unfortunately.

As far as results, we saw some great numbers for the effectiveness of the recomputation. The language is self hosted. And the type checker is currently incremental. On my machine the initial type checking of the compiler itself is in the ballpark of ~40s. Changing a file and getting new type errors returns in <0.5s

This could be really useful for game development!

Certain kinds of networked games are built as a model, deterministically updated by clock ticks and commands from the server, and a view layer on top of that. And even model-level objects often monitor each other for changes. Seems like a perfect fit.

Game objects often have graph-like (not tree-like) dependencies though. E.g. two characters may want to move towards each other, creating a pointer cycle. Ideally I'd want one to be "magically" updated if the other disappears or changes state. I wonder if Skip has a good answer there, or if a good language-level answer is even theoretically possible.

Game development was a major driver for the design of Skip but for another reason: the GC.

The big problem with languages with GC is that it is impredictable: when the amount of allocated memory globally passes a certain threshold, then a GC pass happens and it’s unclear how much time it’ll take and you may miss your frame.

The idea of Skip is that GC always or never triggers for some functions. So if while you're developing your game/app the time it takes to GC is within the bounds you wanted, then it’ll keep behaving this way in production.

If it is too expensive for your use case, then you can optimize that specific function (and what it calls) using a profiler. You don’t have to think about the context of the entire app to figure out how to reduce GC pauses.

It's definitely possible. You need to "break the cycle" by adding an additional layer. You can encode any graph as an array of objects were you replaced the pointers with indexes within the array. Of course, the granularity (how big the arrays are) is up to you, and there is a tradeoff here.

The other very important thing to note is that Skip has a memory model suitable for that use case. Every function has a GC overhead, but that overhead is non-contextual. Meaning, the GC will collect the memory for only one function (instead of the entire heap). This is possible thanks to the guarantees of the type-system.

So, say I have: `characters = [A, B, ...]` and I want to maintain for each character `x` the invariant: `x.target = (first y in 'characters' where y.score > x.score)`

Can Skip maintain such a self-referential invariant automatically? Does the answer depend on A and B's mutability?

(This "graph as array" escape hatch seems to come up in many language designs with compile-time pointer analysis. It's useful, maybe even essential, but you tend to lose some nice language features/guarantees with it, in my limited experience.)

The GC thing sounds excellent!

Not responding to your question directly but I just want to mention that if you mark your class as mutable and use mutable instances, you can write code as you’d expect from other languages. The only downside is that you can’t pass those values to a memoized function unless you do a deep copy.

The IBM DataPower XSLT compiler has automatic memoization, I remember the compiler engineers implementing this. Of course XSLT is a functional language, so it's not so hard..

Bob Morgan was behind it.

Sounds a bit like netkernel to me. http://www.1060research.com/products/

Can anyone comment on the similarities with (apparently somewhat less sophisticated) laziness as in Haskell, or the memoisation/caching done by the Nix package manager, or (perhaps most interestingly) the approach of the Funflow library for Haskell[1]?

[1]: https://www.tweag.io/posts/2018-07-10-funflow-make.html

You should consider https for your site.

Does skip target js just through emscripten? Does it have good support for browser interaction?

We have js printed at the "OuterIST" phase right now. It's the phase just before we sent to the native backend which will generate llvm. You can go to the playground ( http://skiplang.com/playground/ ) and click "JavaScript" to see the output.

We talked about doing it at an earlier phase in order to output JavaScript closer to what the input code was, in a very similar fashion as BuckleScript does. But we never gotten around to as we focused on the native backend.

We also have a prototype of a wasm backend using emscriptem.

Right now none of them are really usable as is but there's a lot of potential there.

The link to specification is broken.

Where do you see a link to that?

We had a specification in the works, but it fell horribly out of date. I wasn't sure that we kept it around

It's in the footer. CTRL + F for specification.

I'll look into removing it.

If you want an overview of language features/design. I'd start with the docs: http://skiplang.com/docs/hello_world.html

FWIW, http://skiplang.com/docs/type_definitions.html says "A class can define a type, and later redefine it in on of [sic] it's children." Thanks for sharing your work.

Can I email you?

You didn't ask me personally but feel free to email me, I've helped with the open sourcing effort: vjeux@fb.com

"supports ergonomic asynchronous computation"... because you wouldn't want any of your computations getting carpal tunnel or anything.

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact