I'm not surprised to see that Jank's solution to this is to embed LLVM into their runtime. I really wish there was a better way to do this.
There are a lot of things I don't like about C++, and close to the top of the list is the lack of standardization for name-mangling, or even a way mangle or de-mangle names at compile-time. Sepples is a royal pain in the ass to target for a dynamic FFI because of that. It would be really nice to have some way to get symbol names and calling semantics as constexpr const char* and not have to deal with generating (or writing) a ton of boilerplate and extern "C" blocks.
It's absolutely possible, but it's not low-hanging fruit so the standards committee will never put it in. Just like they'll never add a standardized equivalent for alloca/VLAs. We're not allowed to have basic, useful things. Only more ways to abuse type deduction. Will C++26 finally give us constexpr dynamic allocations? Will compilers ever actually implement one of the three (3) compile-time reflection standards? Stay tuned to find out!
Carmack did very much almost exactly the same with the Trinity / Quake3 Engine: IIRC it was LCC, maybe tcc, one of the C compilers you can actually understand totally as an individual.
He compiled C with some builtins for syscalls, and then translated that to his own stack machine. But, he also had a target for native DLLs, so same safe syscall interface, but they can segv so you have to trust them.
Crazy to think that in one computer program (that still reads better than high-concept FAANG C++ from elite lehends, truly unique) this wasn't even the most dramatic innovation. It was the third* most dramatic revolution in one program.
If you're into this stuff, call in sick and read the plan files all day. Gives me googebumps.
Carmack actually deserves the moniker of 10x engineer. Truly his work in his domain has reached far outside it because id the quality of his ideas and methodologies
I have a bit I do where I do Carmack's voice in a fictional interview that goes something like this:
Lex Fridman: So of all the code you've written, is there any that you particularly like?
Carmack: I think the vertex groodlizer from Quake is probably the code I'm most proud of. See, it turns out that the Pentium takes a few cycles too long to render each frame and fails to hit its timing window unless the vertices are packed in canonically groodlized format. So I took a weekend, 16-hour days, and just read the relevant papers and implemented it in code over that weekend, and it basically saved the whole game.
The point being that not only is he a genius, but he also has an insane grindset that allows him to talk about doing something incredibly arcane and complex over a weekend -- devoting all his time to it -- the way you and I talk about breakfast.
Another weird thing about Carmack, now that you mention it -- and Romero, coincidentally -- is their remarkable ability to remember technical challenges they've solved over time.
For whatever reason the second I've
solved a problem or fixed a bug, it basically autopurges from my memory when I start on the next thing.
I couldn't tell you the bugs I fixed this morning, let along the "groodilizer" I optimized 20 years ago.
Oh btw Jank is awesome and Jaeye is great guy, and also a game industry dev!
The trick for remembering those things is debriefing with other devs and then documenting. And then keep talking about it.
I don't do mind blowing stuff like Carmack but: just yesterday I came across a bug that helps supporting my thesis that "splitting methods by LOC" can cause subtle programmer mistakes. Wanna write a blog post about it asap.
I find it's actually a good guideline for what to work on. If I'm 1, 3, 6, 12 months into a job or some other project and I can't remember what I was doing X months ago it tends to mean that I'm not improving during that period of time either.
Carmack is always trying to get better, do more, push the envelope further. He was never in it for money or fame, he was in it to be the best near as I can tell. And he's still among the truly terrifying hackers you wouldn't want to be up against, he just never stopped. You get that with a lot of the people I admire and try to emulate as best I can, Thompson comes to mind, Lamport, bunch of people. They just keep getting more badass from meeting their passion to the grave, a lifelong project of unbounded commitment to excellence in their craft.
Linking directly to C++ is truly hell just considering symbol mangling. The syntax <-> semantics relationship is ghastly. I haven't seen a single project tackle the C++ interface in its entirety (outside of clang). It nearly seems impossible.
There's a reason Carmack tackled the C abi and not whatever the C++ equivalent is.
C ABI is the system V abi for Unix, since C was literally created for it. And that is the abi followed by pretty much any Unix successor: Linux, Apple's OS, FreeBSD.
Windows has its own ABI.
The different abi is pretty much legacy and the fact that x86_64 ABI was built by AMD + Linux etc, while Microsoft worked with Intel for the Itanium abi.
> And that is the abi followed by pretty much any Unix successor: Linux, Apple's OS, FreeBSD.
Even limiting that to “on x64”, I don’t see how that’s true. To make a syscall, the ABI on Linux says “make the call”, while MacOS (and all the BSDs, I think) says “call the provided library function”.
Also (https://developer.apple.com/documentation/xcode/writing-64-b...): “Apple platforms typically follow the data representation and procedure call rules in the standard System V psABI for AMD64, using the LP64 programming model. However, when those rules are in conflict with the longstanding behavior of the Apple LLVM compiler (Clang) on Apple platforms, then the ABI typically diverges from the standard Processor Specific Application Binary Interface (psABI) and instead follows longstanding behavior”
Some of the exceptions mentioned there are:
- Asynchronous Swift functions receive the address of their async frame in r14. r14 is no longer a callee-saved register for such calls.
- Integer arguments that are smaller than int are required to be promoted to int by the caller, and the callee may assume that this has been done. (This includes enumerations whose underlying type is smaller than int.) For example, if the caller passes a signed short argument in a register, the low 32 bits of the register at the moment of call must represent a value between -32,768 and 32,767 (inclusive). Similar, if the caller passes an unsigned char argument in a register, the low 32 bits of the register at the moment of call must represent a value between 0 and 255 (inclusive). This rule also applies to return values and arguments passed on the stack.
Swift has its own ABI and calling convention, so that makes sense that Apple adapted to it.
The system v abi doesn't say anything about syscall.
Windows x86_64 abi is the same abi for x86, for this reason, you can only pass arguments in 4 registers ( while unix uses 6 ) because x86 only had 8 registers.
I think people have expectations that are misaligned with history and reality about this, to be honest. We can't expect all OS to do things in the same way.
C was created to rewrite the UNIX system, and POSIX compliance is followed by all successors, with minimal differences.
When it became clear that "Itanium" was a failure, Microsoft couldn't just pull an ABI out of the box and break all applications, so they just reused the same x86 ABI.
The C ABI is basically per-platform (+ variations, like 32- vs 64-bit). But you can get by quite well pretending there is something like a C ABI if you use <stdint.h>.
Yeah, my favorite Q3 engine game was Jedi Outcast (the engine must have been licensed, since the game was released before 2005, when it was open sourced)
My childhood game from ioquake3 was Tremulous and a bit of Urban Terror, around the age of 14.
Side-note: before that it was Armagetron Advanced, but that has nothing to do with Q3.
I hear you when it comes to C++ portability, ABI, and standards. I'm not sure what you would imagine jank using if not for LLVM, though.
Clojure uses the JVM, jank uses LLVM. I imagine we'd need _something_ to handle the JIT runtime, as well as jank's compiler back-end (for IR optimization and target codegen). If it's not LLVM, jank would embed something else.
Having to build both of these things myself would make an already gargantuan project insurmountable.
> the lack of standardization for name-mangling, or even a way mangle or de-mangle names at compile-time.
Like many things, this isn't a C++ problem. There is a standard and almost every target uses it ... and then there's what Microsoft does. Only if you have to deal with the latter is there a problem.
Now, standards do evolve, and this does give room for different system libraries/tools to have a different view of what is acceptable/correct (I still have nightmares of trying to work through `I...E` vs `J...E` errors) ... but all the functionality does exist and work well if you aren't on the bleeding edge (fortunately, C++11 provided the bits that are truly essential; everything since has been merely nice-to-have).
Like many things people claim "isn't a C++ problem but an implementation problem"... This is a C++ problem. Anything that's not nailed down by the standard should be expected to vary between implementations.
The fact that the standard doesn't specify a name mangling scheme leads to the completely predictable result that different implementations use different name mangling schemes.
The fact that the standard doesn't specify a mechanism to mangle and demangle names (be it at runtime or at compile time) leads to the completely predictable result that different implementations provide different mechanisms to mangle and demangle names, and that some implementations don't provide such a mechanism.
These issues could, and should, have been fixed in the only place they can be fixed -- the standard. ISO is the mechanism through which different implementation vendors collaborate and find common solutions to problems.
> The fact that the standard doesn't specify a name mangling scheme leads to the completely predictable result that different implementations use different name mangling schemes.
The ABI mess predates the standard by years and if we look that far back the Annotated C++ Reference Manual included a scheme in its description of the language. Many compiler writers back then made the intentional choice to ignore it. The modern day ISO standard would not fare any better at pushing that onto unwilling compiler writers than it fared with the c++03 export feature.
Well yeah, it can't be fixed now. It could have been specified near the beginning of the life of the language though, and the standard would've been the right place to do that.
I'm saying that the non-standard name mangling is a problem with C++. I'm not saying that it's an easily solvable problem.
> Anything that's not nailed down by the standard should be expected to vary between implementations.
When you have one implementations you have a standard. When you have two implementations and a standard you don’t actually have a standard in practice. You just have two implementations that kind of work similarly in most cases.
While the major compilers do a fantastic job they still frequently disagree about even “well defined” behavior because the standard was interpreted differently or different decisions were made.
> When you have two implementations and a standard you don’t actually have a standard in practice
This simply isn't true. Plenty of standardized things are interchangeable, from internet RFCs followed by zillions of players and implementations of various RFCs, medical device standards, encryption standards, weights and measures, currency codes, country codes, time zones, date and time formats, tons of file formats, compression standards, the ISO 9000 series, ASCII, testing standards, and on and on.
The poster above you is absolutely correct - if something is not in the standard, it can vary.
Lots of things have standards that are mostly interchangeable. But pretending like the implementations don't interpret standards differently and have important differences is naiive. This is doubly so for language implementations which frequently leave things as "implementation defined". Clang and GCC have intentionally made significant efforts to minimize their differences which is why it's less noticeable if you're just swapping between them (it didn't start out this way). MSVC has not made these efforts. Intel abandoned their compiler for Clang. So basically you already have MSVC & clang/GCC as dialects, ignoring more minor differences that readily exist between clang and GCC.
Compare this with languages like Zig, Rust, and Python that have 1 compiler and doesn't have any of the problems of C++ in terms of interop and not having dialects.
Java is the closest to C++ here but even there it's 1 reference implementation (OpenJDK that Oracle derives their release from) and a bunch of smaller implementations that everyone derives from. Java is aided here by the fact that the JDK code itself is shared between JVMs, the language itself is a very thin translation of code --> byte code, and the language is largely unchanging. JavaScript is also in a similar boat but they're aided by the same thing as Java - the language is super thin and has almost nothing in it with everything else deferred as browser APIs where there is this dialect problem despite the existence of standards.
HN is a censorship haven and all, but I'd like to point out just one thing:
>Compare this with languages like Zig, Rust, and Python that have 1 compiler and doesn't have any of the problems of C++ in terms of interop and not having dialects.
For Python, this is straight up just wrong.
Major implementations: CPython, PyPy, Stackless Python, MicroPython, CircuitPython, IronPython, Jython.
If you have an area where the standard is ambiguous about its requirements, then you have a bug in the standard. And hopefully you also have a report to send along to help it communicate itself more clearly.
This is like getting mad at ISO 8601 because it doesn't define the metric system.
No standard stands alone in its own universe; complementary standards must necessarily always exist.
Besides, even if the C++ standard suddenly did incorporate ABI standards by reference, Microsoft would just refuse to follow them, and nothing would actually be improved.
A better situation than today would be only having to deal with Microsoft and Not Microsoft, rather than multiple different ways of handling the problem that can differ unexpectedly
> There is a standard and almost every target uses it ... and then there's what Microsoft does. Only if you have to deal with the latter is there a problem.
To be fair, jank embeds both Clang and LLVM. We use Clang for C++ interop and JIT C++ compilation. We use LLVM for IR generation and jank's compiler back-end.
> they're not embedding LLVM - they're embedding clang
They're embedding both, according to the article. But it's also just sloppy semantics on my part; when I say LLVM, I don't make a distinction of the frontend or any other part of it. I'm fully relying on context to include all relevant bits of software being used. In the same way I might use "Windows" to refer to any part of the Windows operating system like dwm.exe, explorer.exe, command.com, ps.exe, etc. LLVM a generic catch-all for me, I don't say "LLI" I say "the LLVM VM", for example. I can't really consider clang to be distinct from that ecosystem, though I know it's a discrete piece of software.
> name mangling is by the easiest part of cpp FFI
And it still requires a lot of work, and increases in effort when you have multiple compilers, and if you're on a tiny code team that's already understaffed, it's not really something you can worry about.
You're right, writing platform specific code to handle this is more than possible. But it takes manhours that might just be better spent elsewhere. And that's before we get to the part where embedding a C++ compiler is extremely inappropriate when you just want a symbol name and an ABI.
But this is besides the point: The fact that it's not a problem solved by the gargantuan standard is awful. I also consider the ABI to be the exact same issue, that being absolutely awful support of runtime code loading, linking and interoperation. There's also no real reason for it, other than the standards committee being incompetent.
I don't see the point of standardizing name mangling. Imagine there is a standard, now you need to standardize the memory layout of every single class found in the standard library. Without that, instead of failing at link-time, your hypothetical program would break in ugly ways while running because eg two functions that invoke one other have differing opinions about where exactly the length of a std::string can be found in the memory.
The naive way wouldn't be any different than what it's like to dynamically load sepples binaries right now.
The real way, and the way befitting the role of the standards committee is actually putting effort into standardizing a way to talk to and understand the interfaces and structure of a C++ binary at load-time. That's exactly what linking is for. It should be the responsibility of the software using the FFI to move it's own code around and adjust it to conform with information provided by the main program as part of the dynamic linking/loading process... which is already what it's doing. You can mitigate a lot of the edge cases by making interaction outside of this standard interface as undefined behavior.
The canonical way to do your example is to get the address of std::string::length() and ask how to appropriately call it (to pass "this, for example.)
This standard already exists, it's called the ABI and the reason the STL can't evolve past 90s standards in data structures is because breaking it would cause immeasurable (read: quite measurable) harm
Like, for fuck's sake, we're using red/black trees for hash maps, in std - just because thou shalt not break thy ABI
We're using self-balancing trees for std::map because the specification for std::map effectively demands that given all the requirements (ordering, iterator and pointer stability, algorithmic complexity of various operations, and the basic fact that std::map has to implement everything in terms of std::less - it's emphatically not a hash map). It has nothing to do with ABI.
Are you rather thinking of std::unordered_map? That's the hash map of standard C++, and it's the one where people (rightfully) complain that it's woefully out of date compared to SOTA hashmap implementations. But even there an ABI break wouldn't be enough, because, again, the API guarantees in the Standard (specifically, pointer stability) prevent a truly efficient implementation.
Are there open source libraries that provide a better hash map? I have an application which I've optimized by implementing a key data structure a bunch of ways, and found boost::unordered_map to be slightly faster than std::unordered_map (which is faster than std::map and some other things), but I'd love something faster. All I need to store are ~1e6 things like std::array<int8_t, 20>.
You should probably use either boost::unordered_flat_map or absl::flat_hash_map if you don't need ordering. Especially with 20-byte values. (Though you didn't mention the key type). If you're dealing with building against Boost already, I'd just switch and measure it.
I would think name mangling is out of scope for a programming language definition, more so for C and C++, which target running on anything under the sun, including systems that do not have libraries, do not have the concept of shared libraries or do not have access to function names at runtime.
> It would be really nice to have some way to get symbol names and calling semantics
Again, I think that’s out of scope for a programming language. Also, is it even possible to have a way to describe low level calling semantics for any CPU in a way such that a program can use that info? The target CPU may not have registers or may not have a stack, may have multiple types of memory, may have segmented memory, etc.
2nding this. The "non-phonetic alphabet" is the biggest non-issue I see people raise a stink about. It really doesn't matter, context is the heavy-weight backbone of language.
On top of that, I think people really underestimate how inappropriate diacritics would be for English. It has a massive phonemic inventory, with 44 unique items. Compare with Spanish's 24. English's "phonetic" writing system would have to be as complex as a romanized tonal language like Mandarin (which has to account for 46 unique glyphs once you account for 4 tones over 6 vowels + the 22 consonants). Or you know, the absolute mess that is romanization of Afro-Asiatic languages. El 3arabizi daiman byi5ali el siza yid7ako, el Latin bas nizaam kteebe mish la2e2 3a lugha hal2ad m3a2ade.
> The "non-phonetic alphabet" is the biggest non-issue I see people raise a stink about
Myself and many friends who aren’t native have struggled with speaking fluently because of it. Most of us still mispronounce some words (my friend pronounced “draught beer” like the lack of rain, instead of like draft).
Doesn’t mean things should change, but it’s certainly not a “non-issue”
The bureaucratization of language is more problematic in my view, where things are seen as wrong and right and we try to cram the beauty of of natural language into a restricted box that can be cleanly and easily defined and worked with universally. I quite literally have nothing but detest for this conception of language, that it must bend to the whims of rigidity when it's very clearly a natural, highly chaotic dynamic system constantly undergoing evolution in unexpected ways.
How would you account for the fact that for many words, there isn't a consistent pronunciation rule for it at all? For example, I would guess that 50% of English speakers are non-rhotic.
Same way other dialect continuums account for it: you standardize spelling on some variant, or several variants if that is non-viable (which, yes, does mean that e.g. American and British English spellings would diverge somewhat).
To be clear, I'm not particularly advocating for making english a phonetic language. I'm just saying it being non-phonetic does cause issues (and makes it frustrating, but also shows a very interesting history).
Assuming we wanted to make English a phonetic language, then your question is kind of moot: phonetic means we need to pick the pronunciation rules for phonemes, which would make other ways to pronounce these phonemes incorrect. Some of currently-correct english would become incorrect english.
> For example, I would guess that 50% of English speakers are non-rhotic
Note that accent isn't really what people talk about when they complain about pronunciation. The problem is that there's no mapping from letters to phoneme in any english accent: laughter/slaughter, draught/draught, G(a)vin/D(a)vid...
All those examples follow the linguistic patterns of the languages they come from. They aren't arbitrary, they just don't teach us the context when we're learning as children.
Of course there’s always reasons. Teaching it to children isn’t really a solution: you’d need to know where words come from before reading them correctly, and also many people don’t learn English as children.
Phonetic languages do borrow words from other languages too, they adapt them to their own language keeping the pronunciation (the only example coming to mind right now is the Czech for sandwich, sendvič). English could do that just fine being phonetic was a goal
You would know where words came from based on the way they're spelt. That would let you know how to pronounce them. It's the exact same thing people do now we just do it without thinking.
The systems at work in English are not nonsensical like people like to parrot. To say it's not phonetic is just wrong on every level as well.
Frankly I'm fine with the historical oddities that have led to modern English. If non native speakers have issues, that's tough luck for them!
Does relate to the point that English still doesn't have a central linguistics authority (and likely won't ever). Just various reformers that have been more or less successful and in how distributed their reforms have been. Draught versus draft was indeed one of Noah Webster's proposed reforms that influenced a lot of American spellings and in turn is still influencing UK spellings. It's not as obvious as color versus colour, but there is a bit of US versus UK in draft versus draught.
(Webster also went on to suggest dawter over daughter, to remove more of these vestigial augh spellings, but that one still hasn't caught on even in the US. Just as the cot/caught split is its own weird remaining reform discussion.)
Pronunciation is not mandated to be correct or wrong, as long as you're within a radius, it's good. Pronunciation has changed in languages before enmasse. Look at big vowel.shift
> It has a massive phonemic inventory, with 44 unique items. Compare with Spanish's 24, or German's 25.
I'm not sure where you're getting these numbers from, but German has around 45 phonemes according to all sources I could find, depending on how you count: 17 vowels (including two different schwa sounds), 3 diphthongs, 25 consonants.
If Arabic had to cater to afro-asiatic dialects phonemes then the script would have been even more messier. I'm a speaker of one, and my dialect is heavily influenced by the indigenous Tamazight language. and I think this is why many of the Amazigh community were and some still disappointed with the neo-Tifinagh script. While it carries symbolic weight, it doesn’t offer practical readability, phonemic clarity and tech accessibility of a modern script that Tamazight deserves. Latin script, ironically, fits Tamazight much more naturally.
You don't have to make a perfect pronunciation system. It's OK if a vowel is pronounced slightly differently, as long as its pronunciation can be predicted from context. Even if it can only be predicted 99% of the time.
Insisting that the writing system captures every little distinction is a common mistake enterprising linguists do (often when designing an alphabet for a bible translation, or "modernizing" the spelling of a language which is not their own). They don't have to. Even if you do it, it won't last long. Letters only have to be a reasonably consistent shorthand for how things are pronounced. People don't like a ton of markers or, god forbid, digits sprinkled into their writing to specify a detailed pronunciation.
English has accumulated inconsistencies for so long, though, that it can't really be said to be consistent anymore. Usually, there are radicals who just cut through and start writing more sensibly here and there (without digits or quirky phonetical markers), cutting down on the worst excesses of inconsistency. But in English, these radicals have been soundly defeated in prestige by conservative writers.
Agreed. We don’t need an IPA level alignment between writing and pronunciation and you never have a workable single system that rejected all speakers.
I do think we could have a “lite touch” reform that cleaned up some of the more egregious cases like “…ough” and some others that trip people up all the time.
Diacritics don't need to be used the way they are in French, i.e. to preserve the original spelling. On the contrary, most languages use them to make their spelling more phonetic.
Nor is there a need for some insane kind of diacritics to handle English. Its phonemic inventory is considerable, yes, but it can be easily organized, especially when you keep in mind that many distinct sounds are allophones (and thus don't need a separate representation) - a good example is the glottal stop for "t" in words like "cat", it really doesn't need its own character since it's predictable.
Let's take General American as an example. First you have the consonant phonemes:
Nasals: m,n,ŋ
Plosives: p,b,t,d,k,g
Affricates: t͡ʃ, d͡ʒ
Fricatives: f,v,θ,ð,s,z,ʃ,ʒ,h
Approximants: l,r,j,w
Right away we can see that most are actually covered by the basic Latin alphabet. Affricates can be reasonably represented as plosive-fricative pairs since English doesn't have a contrast between tʃ/t͡ʃ or between dʒ/d͡ʒ; then we can repurpose Jj for ʒ. For ŋ one can adopt a phonemic analysis which treats it as an allophone of the sequence ng that only occurs at the end of the word (with g deleted in this context) and as allophone of n before velars.
Thus, distinct characters are only strictly needed for θ,ð,ʃ, and perhaps ʒ. All of these except for θ actually exist as extended Latin characters in their own right, with proper upper/lowercase pairs, so we could just use them as such: Ðð Ʃʃ Ʒʒ. And for θ there's the historical English thorn: Þþ. The same goes for Ŋŋ if we decide that we do want a distinct letter for it.
If one wants to hew closer to basic Latin look, we could use diacritics. Caron is the obvious candidate for Šš =ʃ and Žž=ʒ, and we could use e.g. crossbar for the other two: Đđ and Ŧŧ. If we're doing that, we might also take Čč for c. And if we really want a distinct letter for ŋ, we could use Ňň.
You can also consider which basic Latin letters are redundant in English when using phonemic spelling. These would be c (can always be replaced with k or s), q (can always be replaced with k), and x (can always be replaced with ks or gz). These can then be repurposed - e.g. if we go with two-letter affricates and then take c=ʃ x=ð q=θ we don't need any diacritics at all!
Moving on to vowels, in GA we have:
Monopthongs: ʌ,æ,ɑ,ɛ,ə,i,ɪ,o,u,ʊ
Diphthongs: aɪ,eɪ,ɔɪ,aʊ,oʊ
R-colored: ɑ˞,ɚ,ɔ˞.
Diphthongs can be reasonably represented using the combination of vowel + y/w for the glide, thus: ay,ey,oy,aw,ow.
For monophthongs, firstly, ʌ can be treated as stressed allophone of ə. If we do so, then all vowels (save for o which stands by itself) form natural pairs which can be expressed as diacritics: Aa=ɑ, Ää=æ, Ee=ɛ, Ëë=ə, Ii=i, Ïï=ɪ, Oo=o, Uu=u, Üü=ʊ.
For R-colored vowels, we can just adopt the phonemic analysis that treats them as vowel+r pairs: ar, er, or.
To sum it all up, we could have a decent phonemic American English spelling using just 4 extra vowel letters with diacritics: ä,ë,ï,ü - if we're okay with repurposing existing redundant letters and spelling affricates as two-letter sequences.
And worst case - if we don't repurpose letters, and with each affricate as well as ŋ getting its own letter - we need 10: ä,č,đ,ë,ï,ň,š,ŧ,ž,ü.
I don't think that's particularly excessive, not even the latter variant.
sml, specifically the MLTon dialect. It's good for all the same reasons ocaml is good, it's just a much better version of the ML-language in my opinion.
I think the only thing that ocaml has that I miss in sml is applicative functors, but in the end that just translates to slightly different module styles.
I remember working through Appel's compiler textbook in school, in SML and Java editions side by side, and the SML version was of course laughably more concise. It felt like cheating, because it practically was.
The difference is that I get to use Java at work, and never will be able to use SML, perfection usually fails in mainstream adoption, as proven by the whole worse is better mindset languages.
I rather have Java, C#, C++, Kotlin, Rust,.... with their imperfections to ML, Smalltalk, and Lisp ideas, than hoping for getting the right languages at the workplace.
> Good enough wins. Worse is better rules. Release early... and so on.
None of that applies to this case, where ML is not only a fair bit older than Java, but abides the principle of "worse is better" far more.
> While scheme, standardml and numerous other perfect little languages are mostly just Wikipedia pages now.
Frankly, I'm thankful I work in an environment with creative autonomy and can choose (and create) my own tools, rather than being forced to work with ones I don't particularly care for because they had a large marketing push.
Why do you think ml is on the worse side of the principle when compared to java?
Sml is a superpolished language, very well-defined formally, being a culmination of years of design effort and uniting a family of academia-born languages. It took 40 years of research to release the final version in 1997.
Java is on the industry side of thing: semi-formalized, numerous omissions when created, a bunch of ad hoc ideas, released in 1995. Not sure it took more than 3-4 years to get the first official version out.
"Worse is better" is an extremely misleading phrase. It's supposed to be ironic, not sarcastic. Unfortunately, the authors of "Unix Haters Handbook" exercised terrible judgement used it as a sarcastic pejorative. Now nobody understands what anybody is actually saying when they use it. I mean it in the original sense, where it's basically another way to express "Less is more."
SML is borderline minimalist, it's "worse" because it doesn't try to solve everybody's problems with a bunch of ad hoc language features and a stdlib and formalization larger than the entire legal code of the United States. It's "worse" because it comes with "less". You can't just import solution and start writing AbstractFactoryManager2 garbage. It's not actually worse to anybody but beancounters.
Also, lucky you:-) the logic of software development usually means that beginning with mid-size companies it doesn't make sense to have more than 2 major languages, usually on the boring side.
But... I feel that this will not matter all that much soon
Can you expand on what makes SML better, in your eyes, than OCaml?
IMO: it's certainly "simpler" and "cleaner" (although it's been a while but IIRC the treatment of things like equality and arithmetic is hacky in its own way), which I think causes some people to prefer SML over aesthetics, but TBH I feel like many of OCaml's features missing in SML are quite useful. You mentioned applicative functors, but there's also things like labelled arguments, polymorphic variants, GADTs, even the much-maligned object system that have their place. Is there anything SML really brings to the table besides the omission of features like this?
> the treatment of things like equality and arithmetic is hacky in its own way
mlton allows you to use a keyword to get the same facility for function overloading that is used for addition and equality. it's disabled by default for hygienic reasons, function overloading shouldn't be abused.
generally speaking if my functions are large enough for this to matter, i'd rather be passing around refs to structures so refactoring is easier.
> polymorphic variants
haven't really missed them.
> GADTs
afaik being able to store functors inside of modules would fix this (and I think sml/nj supports this), but SML's type system is more than capable of expressing virtual machines in a comfortable way with normal ADTs. if i wanted to get that cute with the type system, i'd probably go the whole country mile and reach for idris.
> even the much-maligned object system that have their place
never used it.
> Is there anything SML really brings to the table besides the omission of features like this?
mlton is whole-program optimizing (and very good at it)[1], has a much better FFI[2][3], is much less opinionated as a language, and the parallelism is about 30 years ahead[4]. the most important feature to me is that sml is more comfortable to use over ocaml. being nicer syntactically matters, and that increases in proportion with the amount of code you have to read and write. you dont go hiking in flip flops. as a knock-on effect, that simplicitly in sml ends up with a language that allows for a lot more mechanical sympathy.
all of these things combine for me, as an engineer, to what's fundamentally a more pragmatic language. the french have peculiar taste in programming languages, marseille prolog is also kind of weird. ocaml feels quirky in the same way as a french car, and i don't necessarily want that from a tool.
I respect the sheer power of what mlton does. The language itself is clean, easy to understand, reads better than anything else out there, and is also well-formalised. I read (enjoyed!) the tiger book before I knew anything about SML.
Sadly, this purism (not as in Haskell but as a vision) is what probably killed it. MLTon or not, the language needed to evolve, expand, rework the stdlib, etc.
But authors were just not interested in the boring part of language maintenance.
I'm not sure why we would want to replace mechanical systems. A portable AC unit with 10,000 BTUs uses ~650wh these days, and it'll turn a 50m^2 apartment into an icebox. That's not much more than a flagship GPU pulls. This is also about the least efficient class of aircon you can get.
Sure less energy usage is always better, and if we could get the same out of mere single digit wh power draws that would be cool. But I don't think thermoelectrics are ever going to get us there.
I know that I've stopped using Google's search primarily because it's been intentionally[1] made a worse tool. The AI overviews sound obnoxious and I'm not surprised if they're driving more people off of the platform.
I don't know what the hell is going on over there, but the systemic issues and awful decisions I see coming out of Alphabet companies in recent years paints a picture of suicidal levels of hubris at the c-suite level. The boards need to pull their heads out of their asses, otherwise incompetence is going to sink that ship before 2040.
> Apptainer is for running cli applications in immutable, rootless containers.
What's the appeal of using this over unshare + chroot to a mounted tarball with a tmpfs union mount where needed? Saner default configuration? Saner interface to cgroups?
Apptainer, like the vast majority of container solutions for Linux, take advantage of Linux namespaces. Which is a lot more robust and flexible then simple chroot.
In Linux (docker, podman, lxc, apptainer, etc) containers are produced by combining underlying Linux features in different way. All of them use Linux namespaces.
When using docker/podman/apptainer you can pick and choose when and how to use namespaces. Like I can use just use the 'mount' namespace to create a unique view of file systems, but not use the 'process', 'networking', and 'user' namespaces so that the container shares all of those things with the host OS.
For example when using podman the default is to use the networking namespace so it gets its own IP address. When you are using rootless (unprivileged) mode it will use "usermode network" in the form of slirp4netns. This is good enough for most things, but it is limited and slow.
Well I can turn that off. So that applications running in a podman container share the networking with the host OS. I do this for things like 'syncthing' so that the container version of that runs with the same performance as non-containered services without having to require special permissions for setting up rootful network (ie: macvlans or linux bridges with veth devices, etc)
)
By default apptainer just uses mount and user namespaces. But it can take advantage of more Linux isolation features if you want it to.
So the process ids, networking, and the rest of it is shared with the host OS.
The mount namespace is like chroot on steroids. It is relatively trivial to break out of chroot jails. In fact it can happen accidentally.
And it makes it easier to take adavantage of container image formats (like apptainer's SIF or more traditional OCI containers)
This is Linux's approach as opposed to the BSD one of BSD Jails were the traditional limited Chroot feature was enhanced to make it robust.
I'm aware of all of this, it might not be clear if you've not used it directly yourself, but unshare(1) is your shell interface to namespaces. You still need to use a chroot if you want the namespace to look like a normal system. Just try it without chrooting:
unshare --mount -- /bin/bash
> It is relatively trivial to break out of chroot jails. In fact it can happen accidentally.
Usability, for one. I know how to `apptainer run ...`. I don't know how to do what you're describing, and I'd say I'm more Linux savvy than most HPC users.
Engineers like to work from the bottom up because nature isn't made out of straight lines and regularity. Breaking things apart into abstractions works to a certain point, then the natural irregularity of reality steps in and you need to be able to react to that and adjust your foundation. So yes, fundamentally it is about thermodynamic knock-on-effects of handling things from the top-down.
Messing around with lisp and smalltalk environments is fun. But the moment you have to stick your hand beneath the surface, you can very quickly end up in someone else's overengineering hell. Most serious GNU Emacs users shudder at the thought of hitting Doom Emacs with a wrench until it suits their tastes. It's universally agreed that building your Emacs environment up from the defaults is fundamentally a wiser choice. Now realize that a full Lisp operating system is several orders of magnitude more complex than Doom Emacs, and most things you're going to program emacs to do are absolutely trivial in comparison to most industrial-grade application logic.
> That's simply a reality, just as weavers saw massive layoffs in the wake of the automated loom, or scribes lost work after the printing press, or human calculators became pointless after high-precision calculators became commonplace.
See, this is the kind of conception of a programmer I find completely befuddling. Programming isn't like those jobs at all. There's a reason people who are overly attached to code and see their job as "writing code" are pejoratively called "code monkeys." Did CAD kill the engineer? No. It didn't. The idea is ridiculous.
I'm sure you understand the analogy was about automation and reduction in workforce, and that each of these professions have both commonalities and differences. You should assume good faith and interpret comments on Hacker News in the best reasonable light.
> There's a reason people who are overly attached to code and see their job as "writing code" are pejoratively called "code monkeys."
Strange. My experience is that "code monkeys" don't give a crap about the quality of their code or its impact with regards to the product, which is why they remain programmers and don't move on to roles which incorporate management or product responsibilities. Actually, the people who are "overly attached to code" tend to be computer scientists who are deeply interested in computation and its expression.
> Did CAD kill the engineer? No. It didn't. The idea is ridiculous.
Of course not. It led to a reduction in draftsmen, as now draftsmen can work more quickly and engineers can take on work that used to be done by draftsmen. The US Bureau of Labor Statistics states[0]:
Expected employment decreases will be driven by the use of computer-aided design (CAD) and building information modeling (BIM) technologies. These technologies increase drafter productivity and allow engineers and architects to perform many tasks that used to be done by drafters.
Similarly, the other professions I mentioned were absorbed into higher-level professions. It has been stated many times that the future focus of software engineers will be less about programming and more about product design and management.
I saw this a decade ago at the start of my professional career and from the start have been product and design focused, using code as a tool to get things done. That is not to say that I don't care deeply about computer science, I find coding and product development to each be incredibly creatively rewarding, and I find that a comprehensive understanding of each unlocks an entirely new way to see and act on the world.
My father in law was a draftsman. Lost his job when the defense industry contracted in the '90s. When he was looking for a new job everything required CAD which he had no experience in (he also had a learning disability, it made learning CAD hard).
He couldn't land a job that paid more than minimum wage after that.
Wow, that's a sad story. It really sucks to spend your life mastering a craft and suddenly find it obsolete and your best years behind you. My heart goes out to your father.
This is a phenomenon that seems to be experienced more and more frequently as the industrial revolution continues... The craft of drafting goes back to 2000 B.C.[0] and while techniques and requirements gradually changed over thousands of years, the digital revolution suddenly changed a ton of things all at once in drafting and many other crafts. This created a literacy gap many never recovered from.
I wonder if we'll see a similar split here with engineers and developers regarding agentic and LLM literacy.
> Actually, the people who are "overly attached to code" tend to be computer scientists who are deeply interested in computation and its expression.
What academics are you rubbing shoulders with? Every single computer scientist I have ever met has projects where every increment in the major version goes like:
"I was really happy with my experimental kernel, but then I thought it might be nice to have hotpatching, so I abandoned the old codebase and started over from scratch."
The more novel and cutting edge the work you do is, the more harmful legacy code becomes.
"I saw this a decade ago at the start of my professional career and from the start have been product and design focused"
I have similar view of the future as you do. But I'm just curious what the quoted text here means in practice. Did you go into product management instead of software engineer for example?
Purity has nothing to do with functional programming.
> It is also not a list based language, but a tree based one.
It's not called Trep, it's called Lisp. Yes, you can represent trees as lists embedded in lists, and the cons cell allows that thanks to type recursion. But it's still based around lists. Trees don't allow cycles, lists formed from ordered pairs do. Every single piece of literature written by McCarthy talks about lists[1].
> The fundamental point of lisps is that you can manipulate the abstract syntax tree directly
Macros weren't introduced until 1963[2]. They might be the reason you use a Lisp, and they may have become synonymous with Lisp after 1.5, but they are not a "fundamental point" of Lisp. The fundamental point was always about lambda calculus[1].
From even before Lisp was implemented for the first time, MacCarthy envisioned it as a langauge for symbolic processing: manipulating representations of formulas, exhibiting nesting. So yes, it was actually tree processing, using lists.
When Lisp was implemented for the first time, the evaluation semantics was already expressed as operations on a tree.
Datalog is simpler, not turing complete , and IIRC uses forward chaining which has knock-on effects in its performance and memory characteristics. Huge search spaces that a trivial in Prolog are impossible to represent in Datalog because it eats too much memory.
Datalog is a commuter car with a CVT. Prolog is an F1 car. Basically, it's not about improvement. It's about lobotomizing Prolog into something people won't blow their legs off with. Something that's also much easier to implement and embed in another application (though Prologs can be very easy to embed.)
If you're used to Prolog, you'll mostly just find Datalog to be claustrophobic. No call/3? No term/goal expansion? Datalog is basically designed to pull out the LCD featureset of Prolog for use as an interactive database search.
It's easier to write fast Datalog code but the ceiling is also way lower. Prolog can be written in a way to allow for concurrency, but that's an intermediate level task that requires understanding of your implementation. Guarded Horn Clauses and their derived languages[2] were developed to formalize some of that, but Japanese advancements over Prolog are extremely esoteric. Prolog performance really depends on the programmer and the implementation being used and where it's being used. Prolog, like a Lisp, can be used to generate native machine code from a DSL at compile-time.
If you understand how the underlying implementation of your Prolog works, and how to write code with the grain of your implementation, it's absolutely "fast enough". Unfortunately, that requires years of writing Prolog code with a single implementation. There's a lot of work on optimizing[3][4] prolog compilers out there, as well as some proprietary examples[5].
There are a lot of things I don't like about C++, and close to the top of the list is the lack of standardization for name-mangling, or even a way mangle or de-mangle names at compile-time. Sepples is a royal pain in the ass to target for a dynamic FFI because of that. It would be really nice to have some way to get symbol names and calling semantics as constexpr const char* and not have to deal with generating (or writing) a ton of boilerplate and extern "C" blocks.
It's absolutely possible, but it's not low-hanging fruit so the standards committee will never put it in. Just like they'll never add a standardized equivalent for alloca/VLAs. We're not allowed to have basic, useful things. Only more ways to abuse type deduction. Will C++26 finally give us constexpr dynamic allocations? Will compilers ever actually implement one of the three (3) compile-time reflection standards? Stay tuned to find out!
reply