Hacker News new | comments | ask | show | jobs | submit login
Comparing Rust and JavaScript Ergonomics with a Simple Linked List (codesections.com)
59 points by codesections 12 days ago | hide | past | web | favorite | 57 comments





Heh, I feel like "someone" ought to write the Big List of Bad First Projects for This Language, e.g.:

Go/Erlang/Elixir: "Testing" the concurrency by parallelizing the addition of an array of integers via some sort of sending single integers over messages, or in the worst cases, literally spawning entire processes/goroutines/etc. to add two ints (and then wondering why the concurrent program fully consumes all eight CPUs but is still twenty+ times slower).

Go (as of this writing): Immediately trying to implement a generic data structure.

Rust: Complicated pointer-type data structures like linked lists, or goodness-forbid, doubly-linked lists.

Haskell: Starting right off by trying to implement an in-place sorting algorithm.

Python: Taking your C numeric algorithm and converting it into pure Python (and then being shocked at the performance).

Criteria for my inclusion is that I personally have seen each of these many times.

It's not that these tasks are necessarily impossible, or even necessarily all that hard when you know what you are doing, just that they are bad first tasks, but they seem to tempt people for some reason. For example, someone reading a Python tutorial hasn't heard of numpy, if you're just learning Rust immediately learning how to break the rules you don't fully understand yet isn't the best use of your time, etc.

In this particular case it all turned out OK in the end, but it often doesn't go this well. :)


I can see real value in such a Big List of Bad First Projects - why not toss what you have on github/gitlab/etc and ask for contributions? The explanations as to why those are Bad First Projects can be very informative.

And what's your bad first project for javascript? I'm curious because I think javscript, despite the bad parts, is on average pretty forgiving.

my two cents for C++:

-Anything where you revert back to C-style memory-management and wonder why the code is so complex/segfaulty


For JS: straight-forward batch processing. Open this CSV, iterate over the data, munge it in some way, spit it out again as a new CSV. This kind of thing is like a 4-line Python script but in JS/Node requires readline or some other kind of standard library thing, nest-y callbacks and/or promises, or maybe async and understanding how that works (which, for a brand new dev... have fun with that), etc. If you need to do IO but don't have a task that benefits from async, things in JS tend to feel unnecessarily torturous.

It's not so bad.

  const fs = require("fs");

  const filename = process.argv[2];
  const fileContents = fs.readFileSync(filename, "utf8");
  const output = munge(fileContents);
  process.stdout.write(output);
It's worse if you want to use async IO, mostly because the ecosystem is still catching up to async/await. But with a bit of boilerplate or a willingness to use experimental APIs, it's also not bad:

  // This API is still experimental
  const fs = require("fs").promises;

  // This will become unnecessary when top-level await is supported
  run().catch((err) => console.error(err));

  async function run() {
    const filename = process.argv[2];
    const fileContents = await fs.readFile(filename, "utf8");
    const output = munge(fileContents);
    process.stdout.write(output);
  }
My Node.js projects' tooling is all written in Node.js and I enjoy it.

The specific situation I've encountered is that it's a file that's bigger than I want to hold in memory and I want to read it and process it a line at a time, but I don't want to do anything fancy or threaded or concurrent. Read a line, do something to it, write a line. Bog-standard ETL stuff.

And yeah, async/await helps, but is unambiguously ergonomically worse than:

    import csv
    incsv = csv.reader(open('file1.csv'))
    outcsv = csv.writer(open('file2.csv', 'w'))

    for row in incsv:
        outcsv.writerow([row[0], row[1] + ' blah'])
and also just requires understanding a lot more stuff before you can be productive if you're new to the language.

I'm not saying it's not possible to do it in JS, just that it's not a task that plays to JS's ergonomic strengths, just like it's possible to write a linked list in Rust but kind of sucks.


I would call it a weakness of JS's ecosystem, not an ergonomic problem. I believe JS could allow nearly the same code. Haven't tried it, though:

  const fs = require("fs");
  const csv = require("some-csv-package");
  const incsv = csv.reader(fs.createReadStream("file1.csv"));
  const outcsv = csv.writer(fs.createWriteStream("file2.csv"));

  for await (row of incsv) {
    await outcsv.writeRow([ row[0], `${row[1]} blah` ]);
  }
Granted, that package doesn't exist, and the popular CSV package for Node [1] makes my eyes cross.

As for me, none of the scripting I do with Node involves files that exceed Node's 1GB memory limit, so I take the cheap and easy way out and just readFile(). Which I suppose proves your point, although, again, I would argue it's an ecosystem problem, not a ergonomic problem.

[1] https://csv.js.org/project/examples/


For JS, implement a Map data structure that is keyed by the value (not identity) of arbitrary objects.

After I implemented this and compared with just encoding the key with JSON, it turned out that JSON.stringify is so fast, that it's not really worth the effort.

Unfortunately JSON.stringify makes no guarantees about the order of keys in an Object, so that can break in subtle ways.

Aha, that's true. I wasn't using hash maps when I did this, but what you write makes sense.

"And what's your bad first project for javascript?"

Since I'm trying to target not hypothetically bad things, but things I've specifically seen numerous people try, I don't have an answer because I'm not involved enough in that community.

Ten years ago I would have said "anything involving a massive, browser-freezing computation", where "why did this O(n^3) code freeze the browser?" being very common, but I'm not sure if that's current. I'd suspect a newbie armed with a fresh tutorial is more likely to overuse promises nowadays rather than underuse them (i.e., very similar to the Go example I gave; yes, if you're doing a huge computation you may want to break it up but you do not want to literally yield a promise for every integer addition). But those with experience in the community may be able to speak to that.

Twenty years ago it would have been trying to do too much purely in the client side. That's definitely out of date.

(Single Page Apps kinda have an interesting history. They're difficult, and by modern standards, impossible in 1999, and today merely one established option in an array of options, but I wouldn't say it was any one browser advance that changed it... it was a long series of incremental advances. I remember trying to take some relatively intelligent for the time client code for IE4, which at least in theory had a dynamic HTML engine much like today's browsers (albeit one that crashed very, very, very frequently) and make it work in Netscape 4's "layers". Yeowch.)


The really big enabling feature was XmlHttpRequest (released with IE 5), which allowed client side JavaScript to make HTTP requests for the first time. And then, as you say the improvements in performance, reliability and consistency of the APIs between browsers made a big difference.

I agree that was the biggest individual piece, but even XmlHttpRequest had to grow up over time. After all... just look at the name... it was not initially a generic "run a web request without a page refresh" object. :)

JS: Code that manipulates bits, code that needs a notion of value type structs to be well-optimized.

To be fair, modern JS engines are pretty good at switching to integer representation internally if there's bit twiddling involved.

Or simply, code that needs 64-bit integers.

> Complicated pointer-type data structures like linked lists

If a singly linked list is a complicated data structure then what's a simple one?


A heap would be simple.

Rust's ownership rules make it hard to implement data structures composed of nodes that own references to other nodes. (This includes linked lists, binary trees, and naive implementations of graphs.) There is a reason for this - it's rather tricky to implement such data structures truly safely in any language without introducing hidden assumptions, that's why they are so frequently seen in interview questions for java and c++ coders.


The difficulty of such interview questions doesn't stem from the problem of ensuring memory safety, but stuff like "finding the form of the recursion you need".

It's complicated because it's self-referential, not because it's conceptually complicated. Self-referential data structures in rust are more difficult because of the way ownership works.

Of course, self-referential, mutating data structures are nefarious and hard to build in a thread-safe way. So the extra difficulty is actually an symptom of the inherit complexities of making them thread-safe.


There aren't really simple data structures that involve pointers. Pointers are just hard (where hard means they require programmers to hold state in their head), and rust makes that explicit.

The reality is that implementing data structures that require pointers is not a typical task, and they're ideally provided by the stdlib or crates.


An article you might find helpful is this one on why imperative algorithms are harder to mathematically verify than functional ones:

https://semantic-domain.blogspot.com/2018/04/are-functional-...

You don't need to be a mathematician to follow it with this excerpt summarizing its key point:

"The difficulty of imperative programming arises from the combination of state, aliasing and procedure calls. Any two of these features can be handled without that much difficulty, but the combination of all three effectively makes reasoning (nearly) as difficult as correct concurrent programming. "

The garbage-collected languages let you ignore the aliasing problems with a performance penalty. Rust in safe mode without GC forces you to (a) deal with it and (b) deal with it in a way that works for all inputs (type/memory safety). That usually requires mathematical verification using tools such as separation logic. Rust's method is much easier to use with tradeoff of limitations on expressing code in certain ways.

Folks that get along with the borrow-checker say it helps to design the program around data and easy methods of using it versus starting with control flow forcing it on a data structure. I'll also add that linked lists are inherently hard to get right on all inputs regardless of low-level language. They're actually a popular way to test new, mathematical methods for specifying and proving algorithms correct. Fortunately, Rust folks have a goto write-up on linked lists in their language:

https://cglab.ca/~abeinges/blah/too-many-lists/book/


in comparison to a vector yes it's complex.

it's recursive. it has pointers. algorithms to work with one aren't trivial and often are destructive.


Anything that doesn't involve aliased pointers will be simpler from a safety perspective.

Lists, stacks, queues, hashtables. Maybe btrees.

An array?

Hashmap or array :)

Hard to imagine a hashmap is less complicated than a singly linked list under the hood.

From a memory management perspective, a probing hashmap is just a vector/array. All of the bookkeeping is "safe" from a memory perspective.

They're roughly equal if the singly-linked list is just a head pointer, because the "each object is only reachable through a single path, which owns it" property means the compiler can reason about what's going on. It's when you add a tail pointer (or go whole hog and make it doubly-linked) that you lose that property and have to use unsafe code (or heavyweight things like Arc<Mutex<...>>) to get the compiler off your back.

A hashmap is in its simplest form just an array.

It's not that these tasks are necessarily impossible, or even necessarily all that hard when you know what you are doing, just that they are bad first tasks,

We programmers are supposed to be intelligent, but somehow we seem to be easily focused on micro-details, to the point where we start to rant and fail to pay attention to the whole. There are some people who read car reviews and just want to know the horsepower. There are other people who read car reviews and want to know about the ergonomics, visibility, handling, ride, comfort, noise, features, reliability, fuel economy, etc...

There is nothing at the level of complexity of a programming environment that is devoid of trade-offs. There are going to be hundreds if not thousands of trade-offs, reoccurring many times over a long span of time, across many different people, in an ever varying environment of changing requirements. And yet, there are many things that are like "reviews" of programming languages where there is only a micro focus on a few features. They're like car reviews where they just quote 0-60 times and horsepower stats.

Relevant: https://xkcd.com/356/

(Is "Blub" best thought of as analogous to "Corolla?")


Oh no, anything but the linked list.

Real-world Rust programs don't implement linked lists. There's LinkedList in libstd if you really wanted one, but in modern architectures cache locality is so important that almost always some other data structure is a better choice. Rust's standard library and crates.io have plenty of containers to choose from.

Linked list is popular because it's a CS101 topic, and because C is so barren. When you don't have any containers in stdlib, dependency management is hell, and due to lack of generics any 3rd party implementation will be either inefficient or full of macros, only then low-effort hand-rolled linked list seems like a sensible choice.


Linked lists have a really important property: each node can be owned by a different allocator. You can have a node which is stack memory, linking to a node which is heap allocated with libc malloc, linking to a node which is arena-allocated per frame of a game engine, linking to a node which is global static memory. And you can set this up in a maintainable, sensible way.

Linked lists are incredibly useful when an object can exist in a predetermined set of lists, and allocation failure must not be possible when adding the object to a list.


Yes, and in C++, linked lists can retain this property and be implemented in a memory safe way using non-owning reference-counting smart pointers[1].

Many defend the use of unsafe Rust in cases like these, in part by asserting that C++ would have the same (un)safety issue. I don't think that's right. A C++ implementation would not have to use (unsafe) raw pointers, or (expensive, intrusive) owning pointers from the standard library.

I think C++ has the advantage in cases like these because of the aforementioned non-owning reference-counting smart pointers. I don't think the same type of pointer/reference can be implemented in Rust (as it would need a constructor), but it's not immediately obvious to me that such a reference type couldn't be added to the language as an intrinsic element (like Rc). I think such an element is dearly missing from the Rust language. I suspect that adding it would make the language safer and easier to use.

[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus#norad-point...


I'm not a Rust expert, but aren't you describing https://doc.rust-lang.org/std/rc/struct.Rc.html?

Your comment kind of reminds me of this implementation http://www.rosettacode.org/wiki/Power_set#C for generating a power set in C. The code recursively calls a function that permutes through the members of the power set and creatively constructs linked lists pointing up the call stack.

Linked lists get shit on a lot, but they're great for a lot of use cases that Rust is ideal for. Hell, doubly linked lists are pretty much the data structure of the Linux kernel, and it seems to run pretty fast.

Brian Cantrell has some really interesting blog posts on this http://dtrace.org/blogs/bmc/2018/09/28/the-relative-performa...

Sometimes, choices aren't entirely about the structure itself, but about the affordances that a language offers you.


I recently had to write a bunch of custom linked list like structures in C++. Seeing how Rust adds that big heap of complicated boiler plate on top of the actual algorithms doesn't exactly make me a fan of that language.

It's not 'boilerplate', and rust is remarkably good at removing boilerplate. It's that in this case, what you've described as "the actual algorithms" is only a small part of the semantics that are provided by a safe, generic data structure under Rust. If you want to ignore all of that and only want the C++ level of semantics, writing the code in unsafe blocks is all you need to do.

If I have to write in unsafe blocks, then what is the point? I get the intent to prove memory accesses to be correct during compilation. But if a simple linked list turns into a nightmare project then I get the feeling that this language has missed the mark on usability as a systems language.

The thing is, that proving that the interface for "a simple linked list" is safe is actually nontrivial. It's not something that translates into day to day Rust. It's just a different language, different parts are hard (most parts are easier IMO). Picking off a linked list just happens to be one of the hardest pieces.

no double delete and no dangling pointers is the tradeoff.

i once spent two 12h long days hunting a dangling pointer. it was fun finding it, that one time.


If all you want is the C++ experience there is no complicated boilerplate.

There is so many ways you can write good C++ code. It depends on the use case. I have code that is nice, modern and concise and I have code where I recently threw all of that out and even introduced custom containers and custom memory management to get an important 2x to 3x speedup.

I'm baffled how the author could look at these code samples and say that the Rust and Javascript versions were mostly identical.

> What's more, according to that book, there simply isn't a good way to implement this structure in safe Rust. The way to go is to venture into unsafe Rust.

Noooo, you don't want to do that. Just give up on circular references, doubly-linked lists, and so on. Use different data structures and work around the [perceived] downsides of not having the data structures you're used to in Lisp, C, ECMAScript, ... You'll find it pays off.

See this HN post for more: https://news.ycombinator.com/item?id=18098239


If I had a dollar each time I needed to use and implement single linked list... (I wouldn't have many dollars)

Yeah, I 100% agree that they're not good examples of the sort of problem you'll need to solve in a given language.

That said, I still like them—at least enough to write this one, anyway!—because they're small, self-contained, and dive deep enough into the language internals that you can (start to) see the differences in the "world view" of the language. (Compare them to something like Advent of Code challenges—those are also small, self-contained, and unrealistic to real programming, which makes them pretty similar. But many of the early ones don't get deep enough into the language to really get a feel for how the language pushes you to think/approach problems)


>Rust is pretty

>Of course, you might feel differently, but one of my biggest takeaways from all of this side-by-side code is that Rust is clear, expressive

As someone who doesn't use either of these languages, I couldn't disagree more. The Javascript was intuitive and immediately made sense, while the Rust code gave me a headache.


I'd like to see this comparison, but with TypeScript instead of JavaScript. I'm sure it will still be better than Rust, but it seems like the main thing making Rust really awkward here is the memory safety.

Can someone explain why raw_tail was necessary in the add_to_tail function?

We needed to use a raw pointer there because of Rust's ownership system. Here's the short version: Rust wants everything to have exactly one owner and enforces that by disallowing multiple mutable references to the same data (or, for that matter, multiple references to the data if even one of the references is mutable).

But that doesn't play too well with linked lists. You need to have a reference to the tail node from the previous node and have a reference from the list itself. And you need to be able to mutate the tail node, so you can't just do that with immutable references.

Now, rust does have some safe ways to solve that type of problem (Arc and RefCells are two). But it turns out that the best way to solve it here is to make the jump to `unsafe` code since we can guarantee the invariants that keep it from being unsafe in practice.


Found the JS implementation horrible, he could just have use plain objects (would be fair since he used a plain struct in rust) and functions instead of 'this' everywhere.

I'd be interested to see what you have in mind/hear why you think that would be a big improvement on the way I had it set up. (I'll admit to not putting a ton of thought into the JS version, though)



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: