> we compile the binary with debug symbols and a flag to compress the debug symbols sections to avoid having huge binary.
How big are the uncompressed debug symbols? I'd expected processing uncompressed debug symbols to happen via a memory mapped file, while compressed debug symbols probably need to be extracted to anonymous memory.
The compressed symbols sounds like the likely culprit. Do you really need a small executable? The uncompressed symbols need to be loaded into RAM anyway, and if it is delayed until it is needed then you will have to allocate memory to uncompress them.
For this particular service, the size does not matter really. For others, it makes more diff (several hundred of Mb) and as we deploy on customers infra, we want images' size to stay reasonable.
For now, we apply the same build rules for all our services to stay consistent.
Maybe I'm not communicating well. Or maybe I don't understand how the debug symbol compression works at runtime. But my point is that I don't think you are getting the tradeoff you think you are getting. The smaller executable may end up using more RAM. Usually at the deployment stage, that's what matters.
Smaller executables are more for things like reducing distribution sizes, or reducing process launch latency when disk throughput is the issue. When you invoke compression, you are explicitly trading off runtime performance in order to get the benefit of smaller on-disk or network transmission size. For a hosted service, that's usually not a good tradeoff.
It is most likely me reading too quickly. I was caught off guard by the article gaining traction in a Sunday, and as I have other duties during the weekend, I am reading/responding only when I can sneak in.
For your comment, I think you are right regarding compression of debug symbols that add up to the peak memory, but I think you are misleading when you think the debug symbols are uncompressed when the app/binary is started/loaded. Decompression only happens for me when this section is accessed by debugger or equivalent.
It is not the same thing as when the binary is fully compressed, like with upx for example.
I have done a quick sanity check on my desktop, I got.
From rss memory at startup I get ~128 MB, and after the panic at peak I get ~474 MB.
So the peak is taller indeed when the debug section is compressed, but the binary in memory when started is roughly equivalent. (virtual mem too)
I had some hard time getting a source that may validate my belief regarding when the debug symbol are uncompressed. But based on https://inbox.sourceware.org/binutils/20080622061003.D279F3F... and the help of claude.ai, I would say it is only when those sections are accessed.
for what is worth, the whole answer of claude.ai
The debug sections compressed with --compress-debug-sections=zlib are decompressed:
At runtime by the debugger (like GDB) when it needs to access the debug information:
When setting breakpoints
When doing backtraces
When inspecting variables
During symbol resolution
When tools need to read debug info:
During coredump analysis
When using tools like addr2line
During source-level debugging
When using readelf with the -w option
The compression is transparent to these tools - they automatically handle the decompression when needed. The sections remain compressed on disk, and are only decompressed in memory when required.
This helps reduce the binary size on disk while still maintaining full debugging capabilities, with only a small runtime performance cost when the debug info needs to be accessed.
The decompression is handled by the libelf/DWARF libraries that these tools use to parse the ELF files.
The analysis looks rather half-finished. They did not analyze why so much memory was consumed. If this is the cache which persists after the first call, if it's temporary working memory, or if it's an accumulating memory leak. And why it uses so much memory at all.
I couldn't find any other complaints about rust backtrace printing consuming a lot of memory, which I would have expected if this was normal behaviour. So I wonder if there is anything special about their environment or usecase?
I would assume that the same OOM problem would arise when printing a panic backtrace. Either their instance has enough memory to print backtraces, or it doesn't. So I don't understand why they only disable lib backtraces.
2. and that we ship our binary with debug symbols, with those options
``` ENV RUSTFLAGS="-C link-arg=-Wl,--compress-debug-sections=zlib -C force-frame-pointers=yes" ```
For the panic, indeed, I had the same question on Reddit. For this particular service, we don't expect panics at all, it is just that by default we ship all our rust binaries with backtrace enabled. And we have added an extra api endpoint to trigger a catched panic on purpose for other apps to be sure our sizing is correct.
I don't think the 4 GiB instance actually ran into an OOM error. They merely observed a 400 MiB memory spike. The crashing instances were limited to 256 and later 512 MiB.
(Assuming that the article incorrectly used Mib when they meant MiB. Used correctly b=bit, B=byte)
When using deniable authentication (e.g. Diffie-Hellman plus a MAC), the recipient can verify that the email came from the sender. But they can't prove to a third party that the email came from the sender, and wasn't forged by the recipient.
No, no, in these systems Alice and Bob both know a secret. Mallory doesn't know the secret, so, Mallory can't forge such a message.
However, Bob can't prove to the world "Alice sent me this message saying she hates cats!" because everybody knows Bob knows the same secret as Alice, so, that message could just as easily be made by Bob. Bob knows he didn't make it, and he knows the only other person who could was Alice, so he knows he's right - but Alice's cat hatred cannot be proved to others who don't just believe what Bob tells them about Alice.
Now it makes sense why Alice was sending me that kitten in a mixer video.
But seriously, in a case before a court or jury, wouldn't there be much more evidence? Down to your own lawyer sending a complete dump of your phone with all those Sandy-Hooks-conspiracies and hate messages to the opposing side?
Sometimes instead of a complete dump, you might mutually agree on a third party forensic lab to analyze your phone and only provide relevant details to the opposing side. Usually there's a few rounds of review where you can remove some details/records that are not relevant before the distilled records are sent to the opposing counsel.
Deniability is just that, the opportunity to deny. Will denying change what people believe? Well, maybe or maybe not. I'm sure you can think of your own examples without me annoying the moderators.
The idea is to use a MAC instead of a signature. As long as Alice isn't compromised and sharing her key with Mallory (which she could do even in the signature case), when Bob receives a message with a valid MAC on it, he knows that Alice authorized the message.
Of the three bad things they've been accused of, I'd consider that by far the least. Selling tracking data is an invasion of privacy. Deliberately not showing better discounts violates their core value proposition. Replacing deferral links doesn't hurt the user, and isn't much different from blocking ads.
1. Do you support compression for data stored in segments?
2. Does the choice of storage class only affect chunks or also segments?
To me the best solution seem like combining storing writes on EBS (or even NVMe) initially to minimize the time until writes can be acknowledged, and creating a chunk on S3 standard every second or so. But I assume that would require significant engineering effort for applications that require data to be replicated to several AZs before acknowledging them. Though some applications might be willing to sacrifice 1s of writes on node failure, in exchange for cheap and fast writes.
3. You could be clearer about what "latency" means. I see at least three different latencies that could be important to different applications:
a) time until a write is durably stored and acknowledged
b) time until a tailing reader sees a write
c) time to first byte after a read request for old data
4. How do you handle streams which are rarely written to? Will newly appended records to those streams remain in chunks indefinitely? Or do you create tiny segments? Or replace and existing segment with the concatenated data?
1) Storage is priced on uncompressed data. We don't currently compress segments.
2) It only affects chunk storage. We do have a 'Native' chunk store in mind, the sketch involves introducing NVMe disks (as a separate service the core depends on) - so we can offer under 5 millisecond end-to-end tail latencies.
3) The append ack latency and end-to-end latency with a tailing reader is largely equivalent for us since latest writes are in memory for a brief period after acknowledgment. If you try the CLI ping command (see GIF on landing page) from the same cloud region as us (AWS us-east-1 only currently), you'll see end-to-end and append ack latency as basically the same. TTFB for older data is ~ TTFB to get a segment data range from object storage, so it can be a few hundred milliseconds.
4) We have a deadline to free chunks, so we we PUT a tiny segment if we have to.
> To me the best solution seem like combining storing writes on EBS (or even NVMe) initially to minimize the time until writes can be acknowledged, and creating a chunk on S3 standard every second or so.
Yep, this is approximately Gazette's architecture (https://github.com/gazette/core). It buys the latency profile of flash storage, with the unbounded storage and durability of S3.
An addendum is there's no need to flush to S3 quite that frequently, if readers instead tail ACK'd content from local disk. Another neat thing you can do is hand bulk historical readers pre-signed URLs to files in cloud storage, so those bytes don't need to proxy through brokers.
Here "log" means "append-only stream of small records". This isn't just about traditional logs (including http request logs and error logs). You could use it to store events for an event-sourced application, and even as the Write-Ahead-Log (WAL) for a database.
A distributed, but still consistent and durable log is a great building block for higher level abstractions.
"Small" means 1MiB per record here. But a higher level abstraction could split one logical operation into multiple records. Just like FoundationDB has severe limits on its transaction size, while higher level databases built on top of it work around that limit.
This product offers two advantages over S3: 1) Appending a small amount of data is cheap 2) Writes are forced into a consistent order (so you don't need to implement Paxos or RAFT yourself). Neither of these are useful for backups. Raw S3 already works well for that usage-case, especially now that Amazon added support for pre-conditions.
> we are considering an in-memory emulator to open source ourselves
I'd suggest a persistent emulator, using something like SQLite (one row per record). Even for local development, many applications need persistence. And it'd be even enough to run a single node low throughput production server which doesn't need robust durability and availability. But it still has enough overhead and limitations not to compete with your cloud offering.
What's however important is being as close as possible to your production system, behavior wise. So I'd try so share as much of the frontend code (e.g. the GRPC and REST handlers) as possible between these.
How big are the uncompressed debug symbols? I'd expected processing uncompressed debug symbols to happen via a memory mapped file, while compressed debug symbols probably need to be extracted to anonymous memory.
https://github.com/llvm/llvm-project/issues/63290