This type of runtime type checking is quite common in Go. An example from bufio.Reader's WriteTo method:
// WriteTo implements io.WriterTo.
// This may make multiple calls to the [Reader.Read] method of the underlying [Reader].
// If the underlying reader supports the [Reader.WriteTo] method,
// this calls the underlying [Reader.WriteTo] without buffering.
func (b *Reader) WriteTo(w io.Writer) (n int64, err error) {
b.lastByte = -1
b.lastRuneSize = -1
n, err = b.writeBuf(w)
if err != nil {
return
}
if r, ok := b.rd.(io.WriterTo); ok {
m, err := r.WriteTo(w)
n += m
return n, err
}
if w, ok := w.(io.ReaderFrom); ok {
m, err := w.ReadFrom(b.rd)
n += m
return n, err
}
if b.w-b.r < len(b.buf) {
b.fill() // buffer not full
}
for b.r < b.w {
// b.r < b.w => buffer is not empty
m, err := b.writeBuf(w)
n += m
if err != nil {
return n, err
}
b.fill() // buffer is empty
}
if b.err == io.EOF {
b.err = nil
}
return n, b.readErr()
}
We recently went through a similar situation at a logistics company in Europe. We decided to bring software development in-house after facing similar frustrations with a third-party provider. It was a challenging process, especially in attracting talent, but ultimately, it allowed us to build a solution tailored to our needs.
I’ve been with the company for the past 4 years, working as a lead software engineer. If you’re interested, my contact details are in my profile, and I’m happy to setup a call
https://www.youtube.com/watch?v=wwoWei-GAPo — Project has come a long way since this. Happy that it's still around and thriving. I don't think we expected that in 2009. I don't believe Go would have been where it is without Russ. His contribution to the project has been tremendous.
> I'd also like to see less google control of the project.
That doesn't look like is going to happen — the leadership change announced here seems to me to continue on the Google path. Both Austin and Cherry are relatively unknown outside Google and are to my knowledge not active in the community outside Google.
> Both Austin and Cherry are relatively unknown outside Google and are to my knowledge not active in the community outside Google.
I don't believe this is true at all. They are both highly active in external Go development, far more active than I have been these past few years. (It's true that neither gives talks or blogs as much as I do.)
I understand and respect your perspective on Austin and Cherry’s involvement in the Go community. Their contributions may indeed be less visible but still impactful. However, the community’s perception of leadership is crucial, and visibility plays a big part in that. For instance your long form blog adds context to decisions you’ve taken in the past. I hope their active roles will become more apparent, fostering a stronger connection with the broader Go community.
See also The Linux Memory Manager: https://linuxmemory.org/chapters Last update the author sent out was in early July noting that the book is now in editing:
> I am happy to report that I have completed the first draft of the book [...]
> I am now in an editing phase, which may well take some time. Sadly I can't give a reasonable estimate as this will be done in concert with my publisher.
I cannot remember (or find) where I signed up for updates, but I get an email every 6 months (or so) from Lorenzo Stoakes personal email. Probably just send him an e-mail and he'll add you to his list.
It was pretty surreal to sit next to someone at a dinner in NYC two months ago, be introduced, and realize that they're someone you had an HN exchange with 5 years ago.
LSM trees do not need write-ahead log in general case:
- When new data arrives, it is converted to SSTable, which is then stored to disk in an atomic manner before returning 'success' to the client, who writes the data. If computer crashes in the middle of write, no problems - the partially written SSTable will be dropped on database start, since it isn't registered in the database yet.
- When computer crashes in the middle of background merge of smaller SSTables into bigger ones, then no problem - the source SSTables are still available after database restart, while partially written output SSTable can be safely dropped on database restart.
VictoriaMetrics and VictoriaLogs use LSM trees without WAL, while providing good durability levels. They can lose recently ingested metrics or logs on server crash, if they weren't converted to SSTables and weren't written to disk yet. But this is very good tradeoff comparing to data corruption or failed WAL replay in other systems, which use WAL in improper ways - https://valyala.medium.com/wal-usage-looks-broken-in-modern-... .
> TimescaleDB relies on PostgreSQL’s WAL mechanism, which puts data into WAL buffers in RAM and periodically flushes them to WAL file. This means that the the data from unflushed WAL buffers is lost on power loss or on process crash.
That links to the manpage which says "The contents of the WAL buffers are written out to disk at every transaction commit". Maybe there's a missing "TransactionDB only commits periodically" that makes the quote above true, but any suggestion that PostgreSQL does not guarantee durability of committed transactions out of the box is incorrect.
A broader reason is: it talks about how WALs may be "lost / corrupted" before fsync. Then how the "write directly to SSTable" approach just loses recently added data, and "IMHO, recently written data loss on process crash has lower severity comparing to data corruption". But in general, I'd expect these databases to have a mechanism by which they don't apply a corrupted WAL (typically involving a strong checksum on WAL entries). So ultimately these are two ways of describing the same thing. If those databases really do apply corrupt/half-written/unflushed WAL entries and thus corrupt the previously committed data, yes, that's very interesting, but the smoking gun is missing. The article is either wrong or incomplete.
LSM-trees do need a WAL. The entire idea of LSM-trees is that writes are buffered in memory and written out all at once. But a particular write doesn't wait for the memtable to be flushed. For that reason you still need a WAL (there is committed state in memory).
Those implementations use a WAL, but it seems to be only as a performance optimization to decrease the size of the in-memory index; is there a theoretical reason one is needed? It looks equivalent to a WAL-less write path combined with an almost immediate compaction. If you remove the compaction and don’t delete the WAL it seems like you can eliminate that write amplification (at least temporarily).
The original purpose of an LSM-tree is to take I/O off the critical path of a write (there are other reasons to use them though, for example reducing space amplification).
I would argue that by definition an LSM-tree buffers committed writes in memory, and that means you need a WAL for recovery.
If you are going to immediately flush the memtable then IO is on the critical path. And if you have fine grained updates you'll end up with lots of small files, which seems like a bad thing. It could be reasonable if you only receive batch updates.
Any durable commit is going to have I/O in the critical path unless you're Paxos/Raft replicating in-memory across failure domains (which we're not discussing here), but I think you mean it takes random I/O out of the critical path. You can get that without a WAL, though; just have the LSM keep appending out of order to a growing file and keep the in-memory index. That's the exact same I/O pattern that the WAL would generate, there just isn't an immediate compaction. The in-memory index will be stay fragmented for longer, though (which is why I call the WAL a performance optimization above). I suppose the WAL-less design lets you defer compaction for longer, which might be an advantage if you have lots of disk and lots of RAM, but don't want two-thirds of your throughput (read + write) taken away at critical moments.
> I would argue that by definition an LSM-tree buffers committed writes in memory, and that means you need a WAL for recovery.
This is true, but note that the WAL does not need to be in the database. You can use an event stream like Kafka and replay blocks of events in the event of a failure. ClickHouse has a feature to deduplicate blocks it has seen before, even if they land on a separate server in a cluster. You still need to store checksums of the previously seen blocks, which is what ClickHouse does. It does put the onus on users to regenerate blocks accurately but the overhead is far lower.
In-process database like SQLite is the holy grail for low-latency database access on the edge. However, the synchronous I/O is holding it back, but how does in-process database look like with asynchronous I/O?