Hacker News new | past | comments | ask | show | jobs | submit login
Loci: A C++-like systems programming language (loci-lang.org)
109 points by rayiner on Feb 19, 2015 | hide | past | favorite | 61 comments



Found this the other day, and wanted to share. Source is here: https://github.com/scross99/locic. Pleasantly readable and surprisingly compact given that it implements polymorphism, templates, modules, etc, as well as code generation to LLVM.

What I think is particularly need is making lvalues and rvalues explicit constructs in the language: http://loci-lang.org/LvaluesAndRvalues.html, as well as being able to redefine the behavior of certain implicit operations relating to copies and moves: http://loci-lang.org/ImplicitOperations.html.


Note that the given C++ class in the Lvalue/Rvalue example could simply be written as so:

  class SomeType {
    public:
       SomeType(int a) : p{std::make_unique<int>(a)} {}
    private:
       std::unique_ptr<int> p;
  };
This version supports move operations as well.


Yes, this is true; the point was to highlight more generally the complexity of ensuring correctness when moving/copying C++ objects versus doing the same thing in Loci.



It really is! I wonder why they didn't specify the namespaces they were using instead of littering it with namespace and scope operators?


The intention is to avoid any kind of name clashes; I don't think this really inhibits readability.


Yep, it's not the most attractive parser ever. I chose to use Bison in order to save a lot of time/work and to take advantage of its GLR implementation; unfortunately its interaction with C++ isn't pleasant (due to the use of a union type for the symbol values).


That looks fairly clean for a Yacc parser. I do wish there was a more modern, friendlier parser-generator; yacc C++ tends to leak AST objects on error unless you use a pool allocator.


I like this language because it doesn't try to do anything too fancy. It's "just" a streamlined C++.

I do wonder how they're planning on avoiding overhead with the garbage collector, though. Once you have a single garbage collected object on your heap, it seems like you'd have to start scanning the whole thing to make sure the garbage collected object is still alive.


The overhead of GC is independent of heap size. Take the following variables:

    L = total size of live objects in heap
    H = maximum size of heap
    P = time between collections
    R_a = rate of allocation
    R_s = rate of heap scanning
It's easy to define P and R_s (assume a mark-sweep collector so we can ignore copy reserve):

    P = (H - L) / R_a
    R_s = L / P
Substitute and simplify:

    R_s = L / ((H - L) / R_a)
    R_s = L / (H - L) * R_a
In other words, it doesn't matter how big your heap is. If you make your heap twice the live size, the amount of heap scanning you do will always be equal to your allocation rate times some constant. If H = 2L, that constant is 1.

So the way to improve performance is to cut down on that allocation rate. If you use RAII for most objects, and GC only occasionally, your allocation rate, net of non-GC allocations, will be very low, which means your scanning rate will be very low. You'll have to traverse all live objects every GC cycle, but you'll rarely have to do a GC cycle.


I followed your math, but not your conclusion!

Say we use your example of H = 2L. Then we have R_s = L / (2L - L) * R_a, or R_s = R_a. Sure.

Now say we double the heap: H = 4L. Then we have R_s = L / (4L - L) * R_a, or R_s = R_a/3. For a fixed allocation rate, increasing the heap makes the scan rate go down, i.e. we scan less often.

It seems like you're assuming that the live set is a fixed fraction of the heap size, but that doesn't make sense to me. The size of the live set is a function of what the program does, not its heap size.


The calculation assumes the program is in equillibrium (no net growth of L). Many GCs, such as Boehms, maintain a fixed ratio of L and H.


Are you sure about this? The main overhead of a mark-and-sweep GC isn't the marking phase, it's the sweeping phase - you need to find and free all dead objects (to mark them as free/add them to the free list), and since you don't know where they are (by definition, since they are dead, they have no references pointing at them) you have to scan the whole heap (or at least all the pages that were objects were allocated/live on since the last GC cycle).


R_s for a naive mark-sweep looks like this:

    R_s = (L + H) / (H - L) * R_a
For a copying collector:

    R_s = L / (H - 2L) * R_a
If H is a constant multiple of L, then your scan rate will be a constant multiple of your allocation rate, independent of heap size.

In practice, you won't sweep the whole heap after each GC cycle, but will do it lazily: http://www.hboehm.info/gc/complexity.html. The point in that article about marking time dominating allocation/sweep is even more true today. Allocation and sweep access memory linearly and are easily parallelized. Meanwhile, marking accesses memory randomly, and the maximum concurrency is dependent on the shape of the live object graph.


I don't think that is generally true. Sweeping has much better cache locality than marking, which is following pointers all over who-knows-where in the heap.


Sure, but since any object could have a reference to a GC object, you still have to eat the latency of scanning the entire heap (not just the GC'd objects) every so often. That seems like a pretty big performance cliff.


Well, theoretically, you have two choices:

* Use the garbage collector. * Don't use the garbage collector.

I'd expect most users to choose option (2), however the idea is to support garbage collection for those users who find it to be worthwhile.

I say this choice is theoretical because the current implementation doesn't yet have a garbage collector (this is not a high priority task); ultimately I want to provide a second implementation of std.memory that supports garbage collection and when users build their project they can choose whether they want the garbage collected implementation.


This looks very promising. As a Python developer who primarily does systems programming, I sometimes have to jump into C for system calls and optimization, or to Java for stronger type safety and real interfaces, neither of which I particularly enjoy. I could definitely see doing new projects in a language like this.


Have you seen Nim? (http://nim-lang.org)


Nim should look very attractive to Python developers. It's a good looking language that has positioned itself in a good market: static typing, looks and feels like scripting (not verbose), and almost systems level (has GC).

I haven't got any experience with Go development, but from those I know that use it, I hear nothing but promising things about Nim. "Modern" is a word that gets used.

Between Rust and Nim, we're starting to see some really clever new languages that fix common pain points. Both of these are so neat it makes me feel as though we're in a programming language boom right now.


While it is not the case of nim, I tend to think LLVM enabled a lot of this "boom".

It gives you a high quality set of components that cover a big bunch of the things you need when implementing a compiler. Energy can be spent in the language itself.


That was my position before I discovered Scala - stronger type safety than Java, more concise/clear/expressive than Python.


I respect the design descisions that Scala made, but I don't find the syntax to be very clear or intuitive.


A lot of Python corresponds 1:1, IME. The biggest difference I notice is the _ shortcut, which I wish Python would adopt - "_ + 1" is so much shorter than "lambda x: x + 1" and no less clear. (And you can always use the "{x => x + 1}" style if you want).

Oh, and I guess the constructor syntax, which again I wish Python would adopt. I end up with too many "self.x = x" lines in my Python __init__ methods; pure syntactic ceremony.


Have you seen D? (http://dlang.org/)


[dead]


It's number 30 on the Tiobe list of programming language popularity. I'd hardly call that dead.


So you don't want the D?


What about D or nim ?


I don't understand why vtables are implemented as hash tables.

The docs [1] describe that objects (of class type) only contain data (i.e. object fields), and no vtable pointers. Only when an object is cast to an interface type is a vtable generated and passed along with the object pointer in a fat pointer.

However, instead of just structuring the vtable as an array of pointers to methods (ordered e.g. by name), Loci instead generates a hashtable with method resolution stubs in case of conflicts. I don't understand why - since the target interface's type (and methods) are known at compile-time, it would be just as easy to fix the ordering of the methods and use the array approach (like in C++), instead of using a hashtable.

[1] http://loci-lang.org/DynamicDispatch.html


This is the reason:

"This design also differs from C++ in that vtable pointers are not stored in objects, but are part of an interface type reference. This decision is particularly appropriate to the language, since Loci doesn’t require classes to explicitly implement interfaces"


This is irrelevant. Every time an object is cast to an interface, the compiler needs to know the definition of that interface. Therefore, it's just as easy to generate a hashtable or an array.


This question was asked by someone else earlier and I answered it in full here: https://github.com/scross99/locic/issues/1

I hope this helps.


After a quick read, I'm impressed. Looks like general improvement over c++.


But why? If it is already like C++ why build a new language?


I think this page answers your question: http://loci-lang.org/LanguageGoals.html

Basically I've had a lot of experience working with C++ on various projects and while I like it very much I've also observed its weaknesses (right now I'm struggling against slow build times); after a while it seemed logical to build a language that would solve a lot of these problems so that I wouldn't have to face them over and over again for each project.


I think this may fill a void in the "teaching language" space. It doesn't require lots of boilerplate but retains some key teachable moments (type system, pointers, stack vs heap, interfaces, etc.). I think a language with this sort of brevity and breadth of features would do well as a replacement for the Java/Python/C/C++ hodgepodge that currently exists at many universities.


So the kicker: is it ABI-compatible with C++? I've come to like the C compatibility in Julia and Rust (and I'm sure I'll come to like the C compatiblity in Loci, too), but the relative lack of C++ interface support (something that I know Julia's been working on, at least based on what I've seen on the mailing lists) has been a bit of a sore point for integrating with C++ projects.


A quick look at the docs make this a definite no.

The name mangling scheme is similar to (one of) C++'s so that could probably be made more compatible. However the vtable mechanism is completely incompatible and given the reasoning in the docs (to support structural typing), I don't think it's a big stretch to say that it's a fundamental incompatibility.

TBH I'd be quite surprised if many (any?) languages will aim for C++ ABI compatibility that's much beyond simple extensions to the C ABI. Even ignoring the lack of a standard C++ ABI, once you start getting into virtual functions, exception handling and, especially, templates, you end up constraining your language to a point where you're probably better off just using C++.

I may well be wrong on this, LLVM helps a lot with the grunt work of name mangling, exceptions, trampolines etc. so it may be possible to get to a sweet spot that supports a significant number of important libraries without hamstringing the language too much. Not going to hold my breath though...


I can't find any mention of if they provide a C-compatible ABI. Kinda useless as a systems language if they don't - you can't call it from anything else.



Perhaps I scanned too quickly. I probably missed since there is no actual mention of ABI (or e.g. which calling conventions it supports on what platforms). The page is confusing - it seems to discuss ABI and syntax and stuff, which isn't exactly related to "compatability" afaict.

I digress... just the usual problems with wiki-type sites.


It compiles to LLVM and LLVM abstracts the calling convention. So it might be safe to assume it supports all the calling conventions that LLVM supports.


Sure, but that requires knowledge of how LLVM works. Some dude checking out the language probably doesn't.


> the language aims to have no performance overhead versus C and C++.

Could someone expand on this for me? What is the performance overhead typically associated with C/C++?


It means Loci will perform similarly to C and C++, not that C and C++ have some particular overhead that Loci doesn't.


An example would be dynamic dispatch of virtual functions.

In C++ virtual functions are implemented as vtables attached to each object. As such you need to retrieve the object (possibly from the heap), there's a function pointer indirection, there may be trampolines if you're using multiple inheritance. That sort of thing.

The author has a nice piece in the documentation regarding how dynamic dispatch is implemented in loci in order to support structural typing. He explains the design choice, how it differs from C++ (a vtable at each interface reference call site) and gives some qualitative estimates as to how the performance would differ. It's worth the read.

C doesn't have dynamic dispatch built in to the language. If you want it, you'd have to code it by hand.


i think they mean that loci should have the same performance as c or c++.


E.g. null-terminated strings without length perform badly in some operations.


Who runs the project? Who created it?


Seems it was created by one developer called Stephen Cross (scross.co.uk). I am quite impressed by his accomplishment. The only thing I miss are custom allocators.


He's 2.5 years out of his undergraduate degree. This is incredibly sophisticated code and documentation for such a person. Quite aside from the language, the person himself is worthy of note!


The guy seems to be one of those super-productive programmers.


We should ask him how he manages it - is it lack of sleep, loads of coffee and no social life?

I found that the best thing when I needed to be "productive".


Languages are tools you use for their syntax, not for the work they do for you or the abstract paradigms they encourage.


There may be some duality on the use of the word "language". If it is used in the literal sense, I agree. In another case (i.e., if you were referring to languages and their execution environments), I beg to differ. For the end user of a programming environment, differences on syntax are mostly tangible in the aesthetic sense. Some languages are more readable or expressive, indeed, but, in the end, syntax has something to do with our perceptions on how beautifully the code lays out.

Execution environments, on the other hand, are tailored for a class of problems, incorporating useful abstractions. Yes, I know it is not so evident in a world where we have general-purpose programming environments by the dozen; when we talk about domain-specific languages, this makes a lot of sense. They can only gain expressiveness if they can represent bigger abstractions with fewer words.


Eh? Languages are tools you use for their ecosystem, and especially for what work you can have done for you and what kinds of correctness the language encourages. Syntax is something you have to put up with as part of constructing a program.


languages can be cross platform. you can talk to 2 different person and they will understand you.

syntax is a huge help, it's a shortcut to being able to understand what the program does. syntax is good.


You are probably a troll, but:

> A language that doesn't affect the way you think about programming, is not worth knowing.

- Alan Perlis


I don't think abstract paradigms really fit into "the way you think about programming". Abstraction is software design, it doesn't belong to algorithms and how you make them work in an environment.


It bothers me that this is offered anonymously. I can't invest any time into checking it out without any concept of who produced it.


If that's seriously a problem then just don't check it out. I personally find it ridiculous that it's a requirement for someone's identity to be attached to a project in order for someone to even consider seeing what it's about.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: