Hacker News new | past | comments | ask | show | jobs | submit | Fiahil's comments login

Yet, legal opinions seems to differ : https://news.ycombinator.com/item?id=40423224

Can I mix french and english when talking to it ?

stomata open, light interacts with chlorophyll, water is moved and creates a tiny depression in the xylem tube, more water moves up through capillarity

How does this compare to ZeroMQ (ZMQ) ?


Not the OP or familiar with Hatchet, but generally ZeroMQ is a bit lower down in the stack -- it's something you'd build a distributed task queue or protocol on top of, but not something you'd usually reach for if you needed one for a web service or similar unless you had very special requirements and a specific, careful design in mind.

This tool comes with more bells and whistles and presumably will be more constrained in what you can do with it, where ZeroMQ gives you the flexibility to build your own protocol. In principle they have many of the same use cases, like how you can buy ready made whipped cream or whip up your own with some heavy cream and sugar -- one approach is more constrained but works for most situations where you need some whipped cream, and the other is a lot more work and somewhat higher risk (you can over whip your cream and end up with butter), but you can do a lot more with it.

ZeroMQ is a library that implements an application layer network protocol. Hatchet is a distributed job server with durability and transaction semantics. Two completely different things at very different levels of the stack. ZeroMQ supports fan-out messaging and other messaging patterns that could maybe be used as part of a job server, but it doesn't have anything to say about durability, retries, or other concerns that job servers take care of, much less a user interface.

I used the same approach while designing a lock-free bounded broadcast log (as in a "RWLock<Vec<T>>"; a MCMP, append-only Vec). It's quite easy to do because it's bounded. However, I could not find a way to make it both unbounded and efficient.

Any ideas ?

It's very hard to beat locking when the queue needs grow. It's the performance statistics - you're going to grow more often as the app warms up, reach a steady state, and only need to grow under heavier than expected load. And in that case you probably aren't going to see performance dominated by time spent waiting to acquire a write lock.

The alternative is to dive into the literature on lock-free mpmc queues, which are kind of gnarly to implement. A lot of the literature handwaves ABA problems with stuff like "use hazard pointers and RCU" without correct pseudo code to help you.

That's why locking in unbounded queues is popular, imo. It's not actually inefficient, and the alternatives are arcane enough to avoid. No one is going to understand or trust the code anyway.

It's worth mentioning that "lock free" means "one task does not prevent other tasks from making progress" and in the case of a bounded queue, you can trivially accomplish that by busy-waiting when the queue is full. This isn't appropriate if you need consumers to receive events in the order in which they happen, but you can kind of fix that using an atomic counter (to paraphrase Mike Acton, if you have a safe counter, you have a safe queue) and time stamp events/sort on which ones are consumed.

> The alternative is to dive into the literature on lock-free mpmc queues, which are kind of gnarly to implement. A lot of the literature handwaves ABA problems with stuff like "use hazard pointers and RCU" without correct pseudo code to help you.

Yes and I do think they are fundamentally incorrect or they cannot be translated to the log use case.

I also agree that, performance-wise, the lock make little difference. In fact my old benchmarks tend to point that locks were MORE EFFICIENT than atomics on ARM64 (MBP M1), for example. It's more like a fun little exercise, and also to confirm that I'm not completely dumb and that the problem is not solvable with the simple use of 2 atomics and a counter.

I'm no expert here, but I wonder if a linked list of bounded logs would work well.

So, everyone has a pointer to the current one, and there's an atomic pointer to the next.

When the log fills up, you allocate a new one and set the next pointer to it using compare_and_swap. If someone beat you to it, you can walk the list and add yours to the end so that the allocation isn't wasted.

This way, most of the time you're using the efficient bounded log.

in the case of a bounded log, readers are expected to give an offset at which they want to perfom the read (kafka-like).

So the linked list would involve going through all the links and following end tail refs/pointers. It would make reading O(n) and that's a Nope.

However, you could imagine having a Vec index that contains a ref to all allocated inner-logs, and query the index first in order to obtain the buffers' location. That works, but then the index has to go through a lock (either a RWLock or a mutex) as the CaS operation isn't enough if we get to the end of the index and it needs to be reallocated. It's fine, and I think that's the most appropriate solution.

PS : In fact, there is a sweet spot where you'd like to have a semi-unbounded log. If your index is big enough to contain something like 4Md entries, you'd probably end-up splitting the log in several pieces for archiving and performances purposes. Loading the log (fully or partially) from disk efficiently is then more important than having a real unbounded log. Then you would not necessarily use a lock and could CaS in the index.

I am not sure! Most of the data structures I design are for embedded systems without allocators. On the desktop, I mostly defer to others.

I've used tokio's broadcast channel quite a bit before, but it is also bounded. After talking to Eliza from the tokio project, I'm fairly convinced that unbounded queues are a scary thing to have around, operationally :).

But again - this is a bit out of my actual expertise!

Not sure if it could be extended here, but I've seen a lock free hash map that supported lock free reallocation by allocating new space and moving each entry one by one, either when the entry is accessed or in a separate thread concurrently. Accessing an entry during the reallocation would check the new region first, and if not found check the old region. Entries in the old region would be marked as moved and once all entries were moved the old allocation could be freed.

For the (un)bounded logs, the whole concept reside on the fact that the log isn't going to move once allocated, and that references to an item will never be invalided until the end of the program

I'm a 5y user of Firefly-III, and was very happy with it. Unfortunately, JC5, I discovered Actual recently.

It's not that FIII is bad at was it was doing, but simply Actual was a bit faster, a bit simpler, a bit more practical to use when importing. I'm mostly using rules, categories and reporting.

> after the number of times I resorted to things like Arc<Box<..>>.

... What are you doing ? It's definitively unusual.

> ... What are you doing ? It's definitively unusual.

No it's not, it's extremely common. `Arc<Box<...>>` or `Arc<Mutex<Box<...>>>` or similar is used all the time when you want to share a mutable reference (on the heap, in the case of Box); especially in the case of interior mutability[1]. It's pretty annoying, but I have learned to love the borrow checker (although lifetime rules still confuse me). It really does make my code extremely clear, knowing exactly what parts of every struct is shareable (Arc) & mutable (Mutex).

[1] https://doc.rust-lang.org/book/ch15-05-interior-mutability.h...

> No it's not, it's extremely common.

Arc-Mutex yes, Arc-Box no.

It's like "I want a shared, immutable reference to something recursive, or unsized". The heap part makes no sense because it's already allocated there if you use an Arc : https://doc.rust-lang.org/std/sync/struct.Arc.html

> The type Arc<T> provides shared ownership of a value of type T, allocated in the heap.

Moreover, you don't even need the box for a dyn : https://gist.github.com/rust-play/f19567f8ad4cc00e3ef17ae6b3...

> Arc-Mutex yes, Arc-Box no.

Yeah, looks like Arc-Box is kind of pointless, but isn't there some thread locality reason why people wrap `Box` in `Arc`? I remember reading something about it a while ago but maybe I'm misremembering.

To play devil's advocate:

`Arc<T>` places the refcounts immediately before `T` in memory. If you are desperate to have `T` and `T`'s refcounts be on different cachelines to reduce false sharing in some contrived scenario, `Arc<Box<T>>` would technically accomplish this. I think a more realistic optimization would be to pad the start of `T` however.

`Box<Box<T>>` and `Arc<Box<T>>` are thin pointers (size ≈ size_of::<usize>()) even when `Box<T>` is a fat pointer (size ≈ 2*size_of::<usize>() because `T` is `str`, `[u8]`, `dyn SomeTrait`, etc.). While various "thin" crates are typically saner alternatives for FFI or memory density (prefixing lengths/vtables in the pointed-at data instead of incurring double-indirection and double-allocation), these double boxes are a quick and dirty way of accomplishing guaranteed thin pointers with only `std` / `alloc`.

I would not call either of these use cases "extremely common" however.

I don't exactly remember the exact thing I used that's why I said "I resorted to things like.. ". Also, I used actix web, which could be the reason of slow compiling code.

At that point a regular GC is probably faster (at least from my experience of doing memory management in C++ with ref-counted smart pointers, which has a 'death-by-a-thousand-cuts' performance profile, e.g. the refcounting overhead is smeared over the whole code base and doesn't show up as obvious hotspots in the profiler).

A Zoe is 1.5t ..

Which means the cute french car has more than 500kg of headroom until it would have to pay the increased parking tariff...

> and more than 2 tonnes for electric vehicles.

It also excludes the Toyota Prius and a few other hybrid because they weight more than 1.6t, despite not being an SUV.

FYI, a Zoe weights 1.5t

Question to my fellow Scandinavians friends (Hi! from France) :

In this matter, the union agreement is more or less what we call a "convention collective" in France ?

It's a collective agreement. I assume that's roughly the same


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact