Hacker Newsnew | past | comments | ask | show | jobs | submit | favorited's commentslogin

I can't say if there will be enough continued investment in RISC-V for it to catch up, but the mere fact that there is a viable x86 alternative today makes it easier for a third option to enter the picture.


For better or for worse (for them I mean), ARM seems to be selling their licenses at fairly low prices, so that they don't need to be incredibly much better than RISC-V to stay attractive. ARM just doesn't feel like the evil empire of embedded computing that everyone wants to beat.


I think the groundswell of support from the ultra-low-end is the only thing that I think can help it overcome ARM's headstart. It will not be abandoned by embedded users because of the lack of the ARM licensing fee, so it will always have some major users and thus some funding to keep supporting the community at the low-end and not letting the tooling bitrot and disappear. It's taught in universities as well. That is fertiliser, still not sure if it will bloom.


Oh man, this is great timing – I played the hell out of this game in middle school, and I've recently been investigating either getting it running on modern hardware. I got it installed & launching inside an XP VM, but that is (unsurprisingly) not ideal.

I've been thinking about building a retro gaming PC for these kinds of games, and now I can kick that can a little further down the road.


I'm glad to hear that Wind Waker HD has fan support like this. It was always a bummer to me that the best versions of Wind Waker and Skyward Sword were trapped on the Wii U, given the incredible preservation that Dolphin provides for their original versions.

Edit: I own a Wii U, I'm not trying to be a hater. For years, it really was the ultimate Zelda box.


I found the Wii U joystick too springy to win OOT or even play much of Twilight Princess. I was able to compensate for the springiness until the penultimate fight with Ganondorf at the top of the castle in OOT, at which point it got too annoying and I just gave up, but with Twilight Princess, I hit that threshold in the fire temple's boss fight. Possibly even just a miniboss.

It's possible I just had a Wii U with an unusually tight joystick or something.


> given the incredible preservation that Dolphin provides for their original versions.

Does cemu not provide comparable preservation for the HD versions? I played through both WW:HD and TP:HD on my Steam Deck using cemu and found it a great experience.


I slightly disagree, but only because Skyward Sword HD on the Switch is a pretty big improvement.

The Wii U was indeed a fantastic Zelda box though.


He also seems completely uninterested in finding that, "elegant language struggling to get out." He just asserts that it's there, as if its mere existence is a virtue.

I think Herb Sutter is at least trying to find that elegant language, with his "syntax v2" project. It's one way to preserve compatibility with the incalculable amount of C++ in the wild, while also providing a simplified syntax with better defaults and fewer foot-guns.

Of course, Herb isn't immune to making hand-wavy claims[0] of his own, but he seems to bring forward more good ideas than bad.

[0] https://herbsutter.com/2025/03/30/crate-training-tiamat-un-c...


I watched a conference talk[0] about using MSDFs for GPU text rendering recently, really interesting stuff!

[0] https://www.youtube.com/watch?v=eQefdC2xDY4


Disclaimer: I'm not an allocator engineer, this is just an anecdote.

A while back, I had a conversation with an engineer who maintained an OS allocator, and their claim was that custom allocators tend to make one process's memory allocation faster at the expense of the rest of the system. System allocators are less able to make allocation fair holistically, because one process isn't following the same patterns as the rest.

Which is why you see it recommended so frequently with services, where there is generally one process that you want to get preferential treatment over everything else.


The only way I can see that this would be true is if a custom allocator is worse about unmapping unused memory than the system allocator. After all, processes aren't sharing one heap, it's not like fragmentation in one process's address space is visible outside of that process... The only aspects of one process's memory allocation that's visible to other processes is, "that process uses N pages worth of resident memory so there's less available for me". But one of the common criticisms against glibc is that it's often really bad at unmapping its pages, so I'd think that most custom allocators are nicer to the system?

It would be interested in hearing their thoughts directly, I'm also not an allocator engineer and someone who maintains an OS allocator probably knows wayyy more about this stuff than me. I'm sure there's some missing nuance or context or which would've made it make sense.


I don't think that's really a position that can be defended. Both jemalloc and tcmalloc evolved and were refined in antagonistic multitenant environments without one overwhelming application. They are optimal for that exact thing.


> Both jemalloc and tcmalloc evolved and were refined in antagonistic multitenant environments without one overwhelming application. They are optimal for that exact thing.

They were mostly optimised on Facebook/Google server-side systems, which were likely one application per VM, no? (Unlike desktop usage where users want several applications to run cooperatively). Firefox is a different case but apparently mainline jemalloc never matched Firefox jemalloc, and even then it's entirely plausible that Firefox benefitted from a "selfish" allocator.


Google runs dozens to hundreds of unrelated workloads in lightweight containers on a single machine, in "borg". Facebook has a thing called "tupperware" with the same property.


I think Tupperware was rebranded to Twine sometime about 6-7 years ago.


It's possible that they were referring to something specific about their platform and its system allocator, but like I said it was an anecdote about one engineer's statement. I just remember thinking it sounded fair at the time.


The “system” allocator is managing memory within a process boundary. The kernel is responsible for managing it across processes. Claiming that a user space allocator is greedily inefficient is voodoo reasoning that suggests the person making the claim has a poor grasp of architecture.


There are shared resources involved though, for example one process can cause a lot of traffic in khugepaged. However I would point out that is an endemic risk of Linux's overall architecture. Any process can cause chaos by dirtying pages, or otherwise triggering reclaim.


That’s generally true of any allocator and assuming glibc’s behavior would help mitigate this is critically not something kernel engineers design around nor something glibc allocator is trying to achieve as a design goal.


For context, the "allocator engineer" I was talking to was a kernel engineer - they have an extremely solid grasp of their platform's architecture.

The whole advantage of being the platform's system allocator is that you can have a tighter relationship between the library function and the kernel implementation.


I’m not generally aware of any system allocator that’s written hand in glove with the kernel’s allocator or somehow interops better for overall system efficiency at the cost of behavior in-app. Care to provide an example?


The "greedy" part is likely not releasing pages back to the OS in a timely manner.


That seems odd though, seeing as this is one of the main criticisms of glibc's allocator.


In the containerized environments where these allocators were mainly developed, it is all but totally pointless to return memory to the kernel. You might as well keep everything your container is entitled to use, because it's not like the other containers can use it. Someone or some automatic system has written down how much memory the container is going to use.


Returning no longer used anonymous memory is not without benefits.

Returning pages allows them to be used for disk cache. They can be zeroed in the background by the kernel which may save time when they're needed again, or zeroing can be avoided if the kernel uses them as the destination of a full page DMA write.

Also, returning no longer used pages helps get closer to a useful memory used measurement. Measuring memory usage is pretty difficult of course, but making the numbers a little more accurate helps.


Zeroed pages also compress more efficiently because the compressor doesn’t actually need to process them.


I know Google has good engineering, but I find this a bit implausible?

For most applications, especially request/response type apps like web servers, "right sizing" truly correctly while accounting for spikes takes a lot of engineering effort to fully account for how much allocation a single request will need, then ensuring the maximum concurrent requests never go beyond that so you never risk OOMs.

I can see this being fine-tuned for extremely high-scale, core services like load balancers, SDNs, file systems etc., where you probably want to allocate all your data structures at startup time and never actually allocate anything after that, and you probably have whole teams of engineers devoted to just single services. But not most apps?

Surely it's better for containers to share system memory, and rely on limits and resource-driven autoscaling to make the system resilient?


The reason I hedged and said "... or some automatic system ..." was because they use a machine-learned forecast of the memory requirements of every container and use that as the soft limit for the container when it starts. You can read about that at [1]. But what I was getting at is that using less than the configured amount of memory does not lead to more containers able to be scheduled on a given machine, nor does it lead to lower economic chargeback. Machines are scheduled and operators are charged by the configured limit, not the usage.

Giving memory back to the operating system is antithetical to the nature of caching allocators ("caching" is right there in the name of "tcmalloc"). The whole point of a caching allocator is that if you needed the memory once, you'll probably need it again, and most likely right now. At most what these allocators will do unless you configure them differently is to release memory to the system very, very slowly, and only if an entirely empty huge page — a contiguous area of several megabytes — surfaces. You can read how grudgingly the tcmalloc authors allow releasing at [2]. jemalloc was once pretty aggressive about releasing to the OS, but these days it is not. I think this reflects its evolution to suit Meta internal workloads, and increased understanding of the costs of releasing memory from a huge-page-aware allocator.

1: https://dl.acm.org/doi/pdf/10.1145/3342195.3387524 2: https://github.com/google/tcmalloc/blob/master/docs/tuning.m...


glibc is not written in a containerized environment and I personally think it’s telling that a core feature of the more recent tcmalloc Google open sourced is that it returns memory efficiently, so clearly even in containerized environments it’s important. The reason for this is how kernels deal with compressing pages and pages released to the kernel are explicitly zeroed (unlike the user space allocator) which aids in the efficiency of the compression even in a containerized workload because those pages can just be skipped since they’re unused and the kernel can share the reference zeroed page for lazy allocations.

Also the kernel itself has memory needs for lots of things and it not having memory or having to go on a hunt to find contiguous pages is not good. Additionally in a VM or container environment there’s other containers and VMs running on that machine so the memory will also eventually get percolated up to the hyper visor to rebalance. None of this happens if the user space allocator hangs on to memory needlessly in a greedy fashion and indeed such an application would be more subject to the OOM killer.


I think the majority of programmers would enjoy it, but most would first need to pick an ISA (something older is probably going to be more approachable for beginners), learn enough about it to understand basic arithmetic instructions, learn enough about the dev tools to be able to assemble, link, and execute their code, etc.

For most folks, that's going to be a couple days of prep work before they can get to the fun part of solving the puzzle.


The post notes that the user-facing app was "introduced in the fall of 2024," so presumably the services aren't that legacy.


You can learn a lot when writing V2 of a thing though. You've got lots of real world experience about what worked and what didn't work with the previous design, so lots of opportunity for making data structures that suit the problem more closely and so forth.


But did they write the backend from scratch or was it based on a number of “com.apple.libs.backend-core…” that tend to bring in repeating logic and facilities they have in all their servers? Or was it a PoC they promoted to MVP and now they’re taking time to rewrite “properly” with support for whatever features are coming next?


I did some very broad testing of several PDF text extraction tools recently, and PDF.js was one of the slowest.

My use-case was specifically testing their performance as command-line tools, so that will skew the results to an extent. For example, PDFBox was very slow because you're paying the JVM startup cost with each invocation.

Poppler's pdftotext utility and pdfminer.six were generally the fastest. Both produced serviceable plain-text versions of the PDFs, with minor differences in where they placed paragraph breaks.

I also wrote a small program which extracted text using Chrome's PDFium, which also performed well, but building that project can be a nightmare unless you're Google. IBM's Docling project, which uses ML models, produced by far the best formatting, preserving much of the document's original structure – but it was, of course, enormously slower and more energy-hungry.

Disclaimer: I was testing specific PDF files that are representative of the kind of documents my software produces.


> a language with a garbage collector (Swift)

You can certainly make the case that reference counting is a form of garbage collection, but it is absolutely false to say Swift has "a garbage collector." There is no runtime process that is responsible for collecting garbage – allocated objects are freed deterministically when their reference count hits zero.

The same thing is true of `shared_ptr` instances in C++, and there's certainly no "garbage collector" there, either.


That reference counting is done at runtime. It’s a runtime garbage collector. It’s different than a generational GC, but it’s GC. Those cycles to increment and decrement a counter attached to every object at ever touch aren’t free. All the downvotes in the world won’t make that free.


> It’s a runtime garbage collector

What does "it" refer to? The function calls to _swift_release_()? Because if function calls are a "garbage collector," then free() is a garbage collector. And if free() is a garbage collector, then the term is too vague to have any useful meaning.


Yes. Garbage collectors also call free. They call functions. They do all kinds of things. They even increment and decrement reference counters on your behalf. When there’s a system that manages your memory for you at runtime, that’s a garbage collector.

Swift is great. And reference counting is exactly the right kind of GC for UIs because there are no pauses. But GC it still is. And it wrecks throughput and is not appropriate for situations where you don’t want GC.

And in reference to `shared_ptr`, or Rc and Arc in Rust, that's manual memory management because you're doing it... manually. Swift is like C++ or Rust if you were never allowed to have a reference to anything that wasn't behind an Arc. Then it's no longer manual, it's automatic.


> Yes. Garbage collectors also call free. They call functions.

Ok, what is calling `free` here? Point to the garbage collector. Show me the thing that is collecting the garbage.

> And in reference to `shared_ptr`, or Rc and Arc in Rust, that's manual memory management because you're doing it... manually.

You're also doing it manually when you decide to make a type a class in Swift. You're opting in to reference counting when you write a class, or use a type that is backed by a class.

It also seems that our goalposts have gone missing. Before, "it" (whatever "it" is) was a garbage collector because it happened at runtime:

> That reference counting is done at runtime. It’s a runtime garbage collector.

shared_ptr, Rc, and Arc also manage their memory at runtime. But now, "it's" a garbage collector because the compiler generates the retain/release calls...


The garbage collector is what wraps every reference to every object on the heap.

But fine, no GC. I wonder why every language in the world doesn’t use reference counting, since it’s not GC AND you don’t have to clean up any memory you allocate. I guess everyone who ever designed a language is kinda dumb.


> And reference counting is exactly the right kind of GC for UIs because there are no pauses.

That's not the reason it uses reference counting. The overhead of scanning memory is too high, the overhead of precisely scanning it (avoiding false-positive pointers) is higher, and the entire concept of GC assumes memory can be read quickly which isn't true in the presence of swap.

That said, precise GC means you can have compaction, which can potentially be good for swap.


> That reference counting is done at runtime.

I thought Swift uses ARC just like Objective-C? The compiler elides much of the reference counting, even across inlined functions. It’s not like Python or Javascript where a variable binding is enough to cause an increment (although IIRC the V8 JIT can elide some of that too).

I don’t disagree that it’s a runtime GC but there’s a bit of nuance to its implementation that resists simple categorization like that.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: