I remember being shocked when I saw someone (on HN) use the phrase “running on the bare iron” to mean “in user space of a computer running a multitasking operating system”. They in turn were incredulous that it could mean anything else — apparently an OS is needed, right?
Still, in some ways it’s a good thing. People can write good code without knowing what an ALU is, much less a flip flop.
The bigger problem is that there is little incentive these days to write good code at all; those who do of necessity know more than just their tiny domain.
There are many terms about computers in English which have been borrowed from similar acts in real life. Compiling, interpreting, context switching, branching, etc. Similarly, rendering is reserved for the last step, converting whatever you have to something which can be shown.
You render videos, text, etc. The thing is, they are not modified further, they are shown.
When you "render" to HTML, you "render it again" for display. So it's not rendering in "Computer Science Sense", hence making the usage wrong.
Otherwise, JIT and interpreted would mean the same thing, but they're not. Because JIT allows tons of other things from a techincal point of view.
In the sense that the signal that’s generated by the photons hitting your eyes are transformed by your brain. They don’t change the essence.
However, going from text to pixels is a bigger and more profound change. You’re still in the same domain from framebuffer to the brain, with an optoisolator in the middle.
Nobody reinvented the wheel. The old wheel didn't run interactively on the client. The React devs that came up with the term used to be PHP devs. Do you remember XHP?
SSR simply refers to the "render" method of the react-dom package being called on server. It's similar to, but distinct from template engine/MVC-style server side generated HTML.
In my experience it is now such that even just writing and running a C++ program on a modern computer is considered "extremely low level". It's almost an extreme thing to do to decide to write a native program.
Eh, that's just an issue of term reuse being rampant in this industry.
In the context of servers, it's accepted usage to use "baremetal" to mean "running on the physical host's operating system" as opposed to in a VM or container.
Its a domain thing. And one domain in the world is expanding. Everyone in the world is becoming a web developer. Actually it already happened. Throw a dart at a group of software developers most likely that dart will hit a web developer.
Web developers can write shitty code because most of the time the bottleneck is in the database. You just need to write code thats faster then the database and you don't need to optimize beyond that.
Now all the optimizations center around pointless refactorings to keep up with the newest technology trend.
IME it's quite rare for the database to be the bottleneck. People just don't know how to use it right. It's a pretty simple optimization to gather web requests into batches, and that can easily 10x your database throughput. At my last job we had architects pushing to do the opposite and turn batch files into single requests! Argh!
Also if people are using RDS with EBS, they'll think storage IO is ~500x slower than a modern NVMe disk really is, which will warp their perception of how well an RDBMS should scale. Their gp3 SSD storage comes with 3k IOPS baseline up to 16k[0]. lol. "SSD storage". The volume size "scales" up to 16TB. You can get a new condition P5410 for $420 retail: 8 TB/drive with 100x the performance of a maxed out EBS volume.
Similar misconceptions must exist with application performance. Lambda advertises being able to scale up to "tens of thousands" of concurrent requests[1]. My 6th gen i5 can do that with a few GB of RAM allocated to a single JVM process...
Database is the primary bottleneck of almost all web apps. This is different from solving bottlenecks you may find in other places.
Let me put it this way. If you did everything right, then the slowest part of your web application is usually the database. Now you may find problems in the web application and fix bottlenecks there but that's not what I'm talking about. I'm talking about the primary bottleneck.
In web basically the database is doing all the work the web application is just letting data pass through. The database is so slow that the speed of the web application becomes largely irrelevant. That's why languages like ruby or python can be used for the application, that's why C++ is rarely needed.
This ultimately makes sense logically. The memory hierarchy in terms of speed goes from CPU cache to random access to fs with fs being the slowest.
Web applications primarily use fs as it's all just about manipulating and reading state from the db. while a triple A game primarily runs on random access. Games do use the fs but usually that's buffering done in the background or load screens.
This is why your reply doesn't refute a single thing I said. It doesn't matter if dbs use nvme. FS is still slower than an in memory store by a huge magnitude. Not to mention the ipc or network connection is another bottleneck. Web apps need to only be trivially made magnitudes faster then then the ipc and database and it's fine.
Your batch calls to the db are only optimizations to the db which is still magnitudes slower then ram. Try switching those batch requests to in memory storage on the same process and keep the storage from bleeding pages to the fs and that will be waaay faster.
After you fix that the bottlenecks of most web apps lie in the http calls and parsing and deserializarion of the data. But usually nobody in web cares for that stuff because like I said the db is way slower.
Do you think a game engine can afford to do this stupid extra parsing and serialization step in between the renderer and the world updater? Hell no. That's why web developers can put all kinds of crap in the http handlers, because it doesn't matter.
Though I will say I have seen a bottleneck where parsing was actually slower than access to redis. But that's rare and redis is in-memory so it's mostly the ipc or network connection here.
I don't know how your performance expectations are calibrated, but for example on my i5 (4 cores) using a SATA SSD, I can get ~100k "Hello World" web requests/second, and ~60-70k web requests that update the database/second. The database can do ~300k updates/second on my hardware by doing something like generating synthetic updates in SQL, or updating by joining to another table, or doing a LOAD DATA INFILE.
Obviously the app is the primary bottleneck (well, they both are; CPU time is the bottleneck. Using an NVMe disk doesn't move the needle much). That's without TLS on the http requests. Turns out json parsing has a noticable cost.
Meanwhile I see posts about mastodon running into scaling issues with only a few hundred thousand users. Are they each doing 1+ toot/second average or something? 'cause postgres should not be a bottleneck on pretty much any computer with that few users. Nothing should be a bottleneck at that level.
Db has to do io too which you likely aren't measuring.
You are prob doing 300k writes with only one io call to indicate 300k was done. Then for the web app your doing 100k io calls. If you want to do an equivalent test on the web app you need to have the web app write to in memory like 300k times on one request. But that's not the best way to test.
The full end to end test is the io call to the web app then processing in the web app then the io call from the web app to the db then processing in the db, then the return trip.
Picture a timeline with different colored bars of different length with the length indicating the amount of time spent in each section of the full e2e tests. You will find that the web app should be fastest section. (It should also be divided into two for the initial request and the return trip with db access in between)
Flamegraph generators can easily create this visualization. Also be sure to handle what the other replier said as well. Make sure the db isn't caching stuff.
I would say a read test here is a good one. The web app parses a request, translates it to SQL, the. The SQL has to join and search through some tables and deliver results back to the user.
I would run the profiling with many simultaneous requests as well. This isn't fully accurate though as write requests lock sections of the db which actually are a big slow down as well which may be too much work to simulate here.
My point is instead of doing 100k transactions in your web app, you should look at how to gather them into batches. Think of how an old school bank mainframe programmer would do something, then realize you can do that every couple milliseconds so it feels realtime.
If you submit a batch every 5ms, then on average you are adding 2.5 ms of latency per request sitting in the batch buffer. But with 100k requests/second for example, you are getting batches of 500, which will get a lot more out of your database. Realistically, most people have far fewer than 100k requests/second, but the idea works at lower levels too. Even batches of 10 will give you a good boost. If you don't have enough requests/second to put together 10 requests/5 ms batch (2k req/s), then why worry about bottlenecks?
Wall time spent on a single request is the wrong way to understand performance for scalability. You can amortize wall time across many requests.
Also your db should be caching stuff for many workloads. Reddit gets about 80 GB/month of new posts/comments based on scraped dumps. An entire month of threads should fit in buffer pool cache no problem. If they really wanted to, their whole text history is only ~2.5TB, which you can fit in RAM for a relatively small amount of money compared to an engineer's salary. They should be getting very high cache hit ratios. Workloads with smaller working sets should always be cached.
> My point is instead of doing 100k transactions in your web app, you should look at how to gather them into batches.
This sounds odd to me. Assuming 100k independent web requests, a reasonable web app ought to be 100k transactions, and the database is the bottleneck. Suggesting that the web app should rework all this into one bulk transaction is ignoring the supposed value-add of the database abstraction and re-implementing it in the web app layer.
And most attempts are probably going to do it poorly. They're going to fail at maintaining the coherence and reliability of the naive solution that leaves this work to the database.
Of course, one ought to design appropriate bulk operations in a web app where there is really one client trying to accomplish batches of things. Then, you can also push the batch semantics up the stack, potentially all the way to the UX. But that's a lot different than going through heroics to merge independent client request streams into batches while telling yourself the database is not actually the bottleneck...
There aren't really heroics involved in doing this. e.g. in a promise-based framework, put your request onto a Queue[(RequestData, Promise)]. Await the promise as you would with a normal db request in your web handler. Have a db worker thread that pulls off the queue with a max wait time. Now pretend you received a batch file for your business logic. After processing the batch, complete the promises with their results. The basic scaffolding is maybe 20 LoC in a decent language/framework.
Much easier than alternatives like sharding or microservices or eventual consistency anyway.
I think somewhere I have a more built-out example with error handling and and where you have more than just "body" in the model. In the real world the business logic for processing the batch and error handling is going to be a decent amount more complicated. etc. etc. But the point is more about how the batching mechanic works.
The batch request just eliminated io. It doesn't change the speed of the database overall. It would make that e2e test I mentioned harder to measure as how would you profile the specific query related to the request within that batch? I can imagine an improvement in overall speed, but I don't think this is a common pattern and the time spent in the db will still be slower than the web app for all processing related to the request.
Also for the Cache... Its a hard thing to measure and define right? Because it can live anywhere. It can live in the web app or on another third party app and depending on where it is, there's another io rt cost.
Because of this Usually I don't classify the cache as part of the db for these types of benchmarking. I just classify actual hits to the fs as the db bottleneck.
I mean if everything just hits an in memory cache that's more of bottleneck bypass in my opinion. Definitely important to have cache but measuring time spent in the cache is not equivalent to measuring where the bottleneck of your application lives.
> Web developers can write shitty code because most of the time the bottleneck is in the database. You just need to write code thats faster then the database and you don't need to optimize beyond that.
Is the wrong way to look at things. Shitty code might bottleneck because of the way it uses the database. But that's like saying "you can write shitty code because it will bottleneck on global locks". But you can write better code that uses fine-grained locks or read-write locks or something, or uses a lock-free algorithm, and now it's not your bottleneck. Or you could write shitty code that bottlenecks on single byte unbuffered IO, or you could not do that and not have that bottleneck. It's the exact same idea, really. Write code with more efficient access patterns, and your database won't be as much of a bottleneck.
To say something is a bottleneck, you need to look at what it's capable of under optimal usage patterns. If you have 1TB of data to transfer, and you have a 10 Gb/s link, it's going to take you a little while. If you have 100 MB of data, but you have a tiny receive window and send small packets, it might take you a while, but you're not bottlenecking on the network. Your code just isn't using the network well.
And in the real world, if your working set is smaller than a few TB, then you shouldn't be doing disk IO on your read workloads. You definitely shouldn't be doing read IO if your working set is in the 10s of GB or less. That's not a "bypass" if that's how real-world performance works. Don't run your database on a phone with 8 GB of RAM. Batching writes will cut down on fsync calls for the WAL, so you'll get better real-world performance with actual databases.
You're right that it's not a common pattern among web developers. That's why I'm bringing it up. It's not hard, but it's also not talked about (it is known among lower level programmers obviously. e.g. they're generally not doing unbuffered single byte file IO).
I'm not saying you can write ANY code and code so shitty that it's slower than the database. Heck you can have the handler sleep for a minute before forwarding the request to the DB. I obviously don't mean that.
I'm talking about a specific kind of shitty code in the sense you don't have to worry about move semantics, memory management. You don't have to preallocate memory, you don't have to think about a lot of stuff and optimize for the heap or the stack. That kind of thing.
Just write some trivial non-blocking code don't worry about anything else and you're overall good. I guess "shitty code" is the wrong choice of words here. You don't need to spend time optimizing is more what I mean. If you know what you're doing it's fine... no major effort needed to change things.
> Web developers can write shitty code because most of the time the bottleneck is in the database.
That is because most of the time web developers are bad at database things, it isn't an intrinsic property of database engines.
A modern database engine on a single AWS VM can readily support 10M writes/sec through disk storage, not even in memory. And run operational queries against that data at the same time. Most web apps don't have requirements anywhere close to that. In practice, you are much more likely to run out of network bandwidth to the database than database throughput if your systems are properly designed. Even the database engines themselves are usually bandwidth bound these days.
It is an intrinsic property of database engines. Database engines access the fs. That is the slowest part of memory in a computer. The web app is stateless, it's never suppose to touch the fs.
Additionally database engines usually exist as separate processes or servers necessitating need for ipc which is another intrinsic bottleneck.
If an AWS vm can support 10 million writes/sec to disk storage then a stateless web app on the same vm should be even faster doing the same reads and writes to in memory storage.
I can agree with your latter statement that io can become the bigger bottleneck these days.
You do not understand how modern databases actually work.
Databases don't need to access the filesystem, fast ones work with the block devices directly. The slowest part of modern hardware is the network, not the storage. IPC performance isn't a real issue. Databases do their own I/O and execution scheduling, that is how they achieve extremely high write throughput. Unless your web app is designed this way (it isn't), you cannot achieve anything close to the throughput of a modern database engine.
Performance is a product of software architecture. Being stateless does not make your software fast in any absolute sense, many stateless in-memory systems have mediocre performance. A disk-backed design with superior architecture can out-perform an in-memory design, both in theory and practice.
DBMSs do not bypass filesystem. If this was the case, the table names would not be case-insensitive under Windows and case-sensitive under Linux (in MySQL). What they do is to allocate large space on the file system (by the way, the data is still visible as a file / set of files in the underlying operating system) and manage internal data structure. This lower the fragmentation and the overall overhead. In a similar way cache systems works - Varnish allocates entire memory it needs with a single call to operating system, then maintains internal data structure.
>Unless your web app is designed this way (it isn't), you cannot achieve anything close to the throughput of a modern database engine.
Your saying if I write to a section of memory on the heap or stack 4000 times with some different random byte every time it will be slower than doing the same thing to a db? Let's be real.
>Performance is a product of software architecture. Being stateless does not make your software fast in any absolute sense, many stateless in-memory systems have mediocre performance. A disk-backed design with superior architecture can out-perform an in-memory design, both in theory and practice.
Uh fundamental performance of software is absolutely limited by hardware. Stateless removes some fundamental limits. Sure architecture can change things but not not absolutely. Given the best architecture for all cases the best possible performance is in the end determined by which one is stateless. The absolute speed in the end is determined by hardware. SW architecture is changeable and thus NOT absolute.
I would know I work in HPC. We don't have any databases in any of our processing pipelines for obvious reasons.
Optimization has taken a back-seat now, I'd say. We focus more on developer productivity, and that extra mile optimization isn't deemed necessary if that effort and time can be put toward another project, squeezing out more value from the developer.
I wouldn't call syscall (and strace) knowledge, "hardware knowledge". It's mostly about OS knowledge. You can go a long way with strace without knowing how a PCI bus work or other hardware details.
Which brings me to another point.
Virtualization ensured you could decouple (to an extent) the behavior of an operating system from the actual hardware used underneath. Therefore, some hardware knowledge is often ignored, but you still need to know about the OS.
With containers, you are decoupling the application from the host operating system! Which means some OS knowledge is often ignored.
That said, abstractions can be leaky. Most of the time and in the most common scenarios, you can ignore the lower level details. No one develops a simple website thinking about NUMA cores or C-state transitions. But if you want to really squeeze every ounce of performance, or if if you run very complicated systems, then you still need to peek at the lower levels from time to time. Which means looking at the hypervisors for virtual machines (assuming you can... on public cloud you cannot), or looking at the operating system for containers.
> With containers, you are decoupling the application from the host operating system!
Yes and no. The host’s kernel is still running the show, which catches people by surprise. Reference Node (or more accurately, libuv) incorrectly reporting CPU count by relying on /proc/cpuinfo, etc.
> But if you want to really squeeze every ounce of performance
Or if your C-suite orders a massive reduction in cloud cost. My company’s devs were recently told to cut vCPU allocation by 50%. Then, when they did so, they were told to do it again. My team was told to right-size DB instances. In fairness, some of them were grossly overprovisoned. ZIRP was quite a thing.
Unsurprisingly, this has resulted in a lot of problems that were being papered over with massive headroom.
So you downsized databases? I get that you can, to an extent, scale databases horizontally (especially if you don't need JOINs or you just use them for read operations), but autoscaling databases sounds a bit risky
Not autoscaling; you just change the instance size. If you have a separate reader and writer instance you can do basically zero downtime by shifting everything to the primary, resizing the reader, then failing over and repeating.
Interesting. Where I work we just jumped from huge databases on prem to managed databases on AWS, so we totally skipped that step. Interestingly, we are now starting to experiment with smaller databases managed by us, so we may find ourselves in a similar situation.
> With containers, you are decoupling the application from the host operating system! Which means some OS knowledge is often ignored.
containers are just a fancy "tar" package with a fancy "chroot". they shouldn't really have ever been viewed as being decoupled from the host operating system. you're still just installing software on the host.
Believe it or not, there are these things called books which contain all sorts of knowledge about things like low level C, system calls, assembly, even digital logic. I know because I have several of them sitting on my office shelf thanks to an EE oriented education.
I could have picked up the same set of books and obtained the same knowledge with a more CS focused background. The hardware stuff is harder to pick up compared to software when self learning (for most - I know counter examples). But it’s by no means impossible.
Point being, I don’t agree with the author. If you’re a cloud native engineer and you want to learn the lower levels of abstraction, you can. But the point of cloud native is that you don’t need hardware knowledge and you can focus on your product/users/hobbyist itch instead.
One can fund the other though. Besides, if you’re tinkering with hardware, it really is super cheap already. ATmega’s are less than a meal. Various capacitors and resistors on Amazon in variety packs for a couple 20s. A soldering iron station for $50. Power supply for $40. It’s the same cost of entry as an entry level laptop…
How do you program your atmega? with a laptop. Also you won't go anywhere with few resistors and capacitors, you also need transistors, diodes, maybe LED, maybe some soldering, wires, breadboard, some sensors to have fun with, some ICs such as 555 timer and other op-amp. Eventually some voltage converters, a multimeter and logic gates.
The list can be long and at the end, for software you need a laptop, for electronic you need a laptop + many things. And those things can cost (the shipping cost more, actually).
I'm seeing electronics kits on amazon for $55... Yes, you need a laptop (or another atmega, or a rpi) to program your atmega. I'm saying it's not as capital restrictive as it seems. For <$500 USD you can be in for both the laptop and the necessary components, sensors, soldering iron, flux, resistors, caps, diodes, transistors, buttons, breadboard, brass wool, A book on electronics, and multimeter.
The list is long, but the prices aren't. All of this is less than an NVidia card.
Has nothing to do with 8-bit. Deep OS level knowledge can and should be taught starting from 64 bit (where most OSs run) and maybe tiny speck of 16 bit because most BIOS still runs on 16 bit intel
It's about exposure to syscalls, C, assembly, lower level debugging and just making sure people aren't afraid to touch that layer. Same goes with other foundational knowledge like packets and networking protocols
Are you suggesting that 'Deep OS knowledge', syscalls, C, assembly, packets and networking protocols should be taught in schools?
What I think the author is getting at with '8-bit' is that presenting children with a complete machine that is simple enough to understand from top to bottom gives a foundation that they can build on when they encounter more complex systems later on.
The 32-bit ARMv7-M micro-controllers with Cortex-M4 or Cortex-M7 cores that are available for instance on $10 development boards from ST or from many other companies are conceptually simpler and easier to understand when programmed at the bare-metal level than the 8-bit computers of 40 to 50 years ago.
They are a much better target for teaching children about hardware.
Hard disagree. 8 bit micros are far simpler to understand from every perspective and depth than the 32 bit micros.
One of the main benefits of teaching on 8 bit micros is they function like basic HW Harvard state machines, meaning their code execution is easily predictable and explainable form the diagrams on paper/whitrebaord, to the disassembly in the debugger in practice, as every CPU instruction takes exactly one clock cycle, there's no memory caching, no PLL and various multipliers and dividers for the 50 on-chip clock domains and IRQ tables, but they have on single clock from an external oscillator from which the CPU, RAM and FLASH memory, and all peripherals run on only that clock meaning it's easier to draw clock/signal diagrams to show "on this rising edge the CPU fetches the instruction, on the next clock edge, the ADC triggers the sampling, etc"
Secondly, 8 bit micros come in DIP packages easier to breadboard and are 5V tolerant meaning you can just hook them up to USB, and most of the times they have high current pins with Schottky diodes meaning you can have them rectify AC voltage or direct drive LEDs without transistors.
Thirdly, IMHO, 8bit ASM disassembly, like the Atmega AVR, is much easier to follow and understand in the debugger for beginners than the 32 bit ARM due to lower instruction count and simpler less powerful instructions that are always 8 bit, and less features that can confuse you like configurable IRQ tables or powerful instructions that cand do two operations at once per clock cycle for extra performance or optimized instructions that are 16 bit instead of 32 bit to save code space like the ARM Thumb.
8bit micros like the old Arduinos are far ore capable for learning and hobbyist tinkering than people give them credit for. People just needlessly fixate against them because "32 bit is bigger than 8 bit".
> If you wanna play with 32 bit machines to understand how they work you can just build and debug 32 bit code on your x86_64 machine, no need to go embedded.
That’s like comparing a sand castle to a hospital complex. You might want some intermediate steps in between.
I have begun learning about computer hardware many years ago on computers with 8-bit Intel 8080, Zilog Z80 and Motorola MC6809 CPUs and I was among those who could read hexadecimal Z80 machine instruction codes as easily as a source text in a high-level programming language, so I know from first-hand experience how it is to learn on 8-bit CPUs.
Nevertheless, given a choice, I would never use those again for this purpose.
None of the traditional 8-bit CPUs (i.e. introduced before 1980) had many instructions that took exactly one clock cycle. Most of their instructions required multiple clock cycles with a number different for each instruction. Nevertheless, on many 8-bit CPUs you could guess approximately the number of clock cycles for many instructions based on the number of memory cycles needed for executing the instruction (including the cycles required for fetching the instruction code).
On the contrary, modern 32-bit microcontrollers have a majority of instructions that are executed in one clock cycle. Regarding the cache memory, the slower models with a clock frequency below 100 MHz, e.g. from 24 to 72 MHz may have no cache, and even if they have a cache, not enabling it would not make much difference.
From all that you have said I agree only with the fact that a 32-bit MCU will have usually a complex clock tree with various PLLs that must be programmed for maximum performance.
Nevertheless, besides the fact that programming the PLLs is usually needed only in a single place, at startup, it is also optional for many applications. All 32-bit MCUs that I have seen start after reboot in a safe mode using an internal RC oscillator. The speed is low, but it works. You need to program the PLLs only if you want to reach the maximum speed by using an external quartz oscillator or resonator. The 32-bit MCUs may have a large number of complex internal peripherals but at least in the beginning it is possible to ignore their existence. Only the few peripherals that are used must be enabled and configured and the MCU vendors provide example programs that can be used before learning how to modify them.
8-bit disassembly does not have a lower instruction count but it always has a much higher instruction count for doing the same task, because each 8-bit instruction does less work.
The easiest to program in assembly language are the 64-bit CPUs, because the range of 64-bit numbers is big enough to use them in most applications without worries. Already with 32-bit numbers you must be careful to avoid overflows, while with 16-bit or 8-bit numbers you must be prepared to handle overflow at each operation. So on CPUs with narrower hardware instructions and registers, you must almost always write or use a library implementing operations with wider numbers, which makes the programs more complex than for 32-bit CPUs, because only seldom you can do an addition or multiplication etc. by just using one hardware instruction.
For the 32-bit ARM CPUs all the software tools that one may need are free. For most 8-bit CPUs the tools are either proprietary or of lower quality than for the 32-bit CPUs.
There are 32-bit ARM CPUs that are soldered together with support circuits on a very small PCB with DIP pins (e.g. various models of STM32 Nucleo-32 boards), which can be inserted in DIP sockets or in solderless breadboards, so DIP is not an advantage exclusive to obsolete 8-bit CPUs.
There are many such 32-bit microcontrollers that have high-current pins that can drive directly LEDs and most cheap development boards may include buffers to provide 5-V tolerant I/O pins on the headers mounted on the board.
While for the cheapest 32-bit development boards the user interface must be done on some PC to which the development board is connected via USB, or the board may be used through a console on a serial interface or through a remote shell over Ethernet (for the boards including an Ethernet RJ-45 connector), there are more expensive boards, e.g. around $50, to which it is possible to connect a display panel directly.
It is true that I have never used an Arduino, because every time when I looked at any model it appeared overpriced and without any advantage whatsoever over a Cortex-M board, so I could not understand why would someone waste time with it.
On the other hand, the costs of learning to program a development board with Cortex-M are close to zero and the experience is much more valuable.
> For most 8-bit CPUs the tools are either proprietary or of lower quality than for the 32-bit CPUs.
Arduino IDE is definitely not low quality.
>It is true that I have never used an Arduino, because every time when I looked at any model it appeared overpriced and without any advantage whatsoever over a Cortex-M board, so I could not understand why would someone waste time with it.
- Because you don't need 32 bits to turn on a LED and read an ADC.
- Because in 2008-2014 there were not many cheap Cortex M boards, and hobbyist ARM tooling was low quality or expensive (Keil, IAR, etc) and didn't come out of the box with pre-made sample projects that could get you up and running on stuff like UART, ADC, SPI, etc.
- Because Arduino coade was easily portable meaning sharing projects that drove particular displays or other external peripherals or widgets would just be copy-paste plug and play instead of having to tailor the C code you got from someone else to your particular ARM microcontroller, and configure the peripherals, clocks and keep fucking with it to get it to work like ti worked for the other person, etc.
Project portability and cross compatibility was the big selling point of the Arduino ecosystem. You didn't need to know any programming or C to get some humidity or ultrasonic senor from Adafruit to work on an Arduino, because most likely someone already published the code or library for it.
For real world problem solving, a 32 or 64-bit processor is easier. But for learning it doesn't make any difference. In fact the limited size of 8-bit is an educational advantage as the programmer is forced to deal with overflow rather than just ignoring the possibility. You can work on interesting problems like "write the code to multiply two 16-bit values". I find these sort of problems an excellent way to start learning assembly language.
While 8-bit processors take multiple cycles per instruction, the number of cycles is usually fixed. On superscalar pipeline processors it is very difficult for the programmer to predict the performance without profiling.
Looking back at what I studied in my EE degree in the early 90s, my first introduction to assembly language and hardware was with the 6809. Then I moved on to the 68000. I am probably biased but I think starting with 8-bit then moving to a simple 32-bit architecture is a good introduction to computer hardware.
As someone pretty good at embedded, I always found 32-bit micros absolutely more difficult and and opaque compared to old good AVR and 6502s. ARMs are a lot less convenient to program in assembler, and not only due to unfriendly ISA, but also due to complex hardware interaction. Speaking of instruction count - no, for simple IO tasks AVR code would certainly be denser than ARMs.
This isn’t what the comment I replied to was suggesting.
Also, I can see there is a case for using a simple modern design such as Cortex-M as a base but - as the peer comment also says - it’s really not the case that a Cortex M7 is conceptually simpler than a 8-bit design.
Programming in assembly language a Cortex-M7 for doing any non-trivial task is much easier than writing an equivalent program for a Zilog Z80 or a MOS Technology 6502 or any other classic 8-bit CPU.
For most operations that can be done in one machine instruction on a 32-bit CPU like Cortex-M7 (which also includes support for floating-point numbers), you need to write a complex sequence of instructions for 8-bit CPUs like Z80, 6502 and the like, where you do not have even a multiplication instruction, much less more complex operations.
To be able to write performant programs on ancient 8-bit CPUs requires much more knowledge and experience than on modern 32-bit CPUs, because you need to implement in software many algorithms for things that are done in hardware on the modern CPUs.
Yeah, there's another thing that is bugging me a bit about all this...
8-bit basically means ASCII when you deal with text (doesn't it ?) - how do you teach middle-schoolers that don't use a latin-based alphabet in their native tongue ??
The problem is that there are so many layers between someone's code and the bare metal that optimization often gets completely lost.
The mentality is that computing is cheap and developer time is expensive. Write an inefficient function that wastes several thousand compute cycles but saves an hour of developer time is given priority in most cases.
This mentality works great when the function is only run a few million times (or much less) over its lifetime. But when it gets added to some popular library; distributed across millions of machines; and used thousands of times each hour on each machine; those wasted cycles can really add up.
What percentage of worldwide compute (with its associated wasted electricity and heat) can be attributed to inefficient code? Spending a few hours optimizing some popular code might actually do more to help the environment that driving an EV for an entire year.
I agree. And it's common to any engineering topics: DO a prototype that achieve the thing it is designed for, and IF you're going to deploy it in mass, optimize it. ELSE you don't need to.
Honestly, maybe a majority of modern engineers not being familiar with anything but the pointy tip of the stack is a good thing.
I come very much from the old world - I learned to code on graph paper, as that was how you saved your work, and being able to wield a scope and iron was kinda mandatory for being able to meaningfully use a computer.
As tech grew up, so did I - and while it’s useful to be able to understand what is happening from clicking a button in a gui down to electrons tunnelling through a bandgap, particularly when it comes to debugging the truly arcane, I actually find that the level of abstraction I carry around in my head sometimes gets in the way.
I look at newer techies bouncing around with their lofty pyramids of containerised cloud based abstracted infrastructures, and I almost envy the superficiality of their engagement - I can’t help but look at it and see immense complexity, because I see behind the simple and intuitive userland, and that makes me run for vim and some nice bare metal where I know what my hardware is doing.
You're probably getting old, but the truth is that modern stacks are stupidly complex.
Layer after layer of abstraction, along with immense bloat. I've run immensely complex web properties on hardware 20 years ago, that dwarfed what the average dev deploys today, yet without all that cruft?
Well, let play php and node, and use laravel as an example. Eaily 10000x the codebase, for the same result.
It's not efficient, or lean, or performant at all.
But it does do one thing.
Allow people without extensive security, database, and coding experience to push safe code quickly.
You can throw a new grad at Laravel, and they're off to the races.
I liken it to C replacing assembler. A way to abstract.
> I can’t help but look at it and see immense complexity, because I see behind the simple and intuitive userland, and that makes me run for vim and some nice bare metal where I know what my hardware is doing.
But is it humanly possible to know details in every level of abstraction? There should be a balance between abstractions and knowing some details beneath some layers of abstraction. What do you think?
Maybe not to an expert level (for some, undoubtedly it is), but it’s eminently possible to have a working knowledge of the stack, from hardware to the nth abstraction.
I’ve recently become interested in how data makes it from {INSERT,SELECT} to the disk, and so set about understanding that. You can read Postgres or MySQL’s source code and use a debugger to understand how it goes through that layer, then strace to follow the base syscalls, then blkparse to watch it go to the device driver. From there, you can experiment with forcing OS page cache drops to see when a “disk read” isn’t really a disk read, and also observe read-ahead behavior. I’ve skipped the device driver layer for now, but beyond that you can also use dd to observe changes to the raw block device. The latter is easier on a simplistic level with a few bytes at a time.
You have to be genuinely interested in this stuff though, because there are precious few companies who are going to pay you to play around like this all day.
Oh, it isn’t always play - sometimes you discover that you are IO bound when you shouldn’t be, and end up tracing your woes through a similar route to that which you describe to a bug in the hypervisor scheduler where network and disk IO are blocking each other due to a shared interrupt specific to the combination of disk controller and silicon that’s running the kernel. You blow a day of engineering time, but you improve performance of your product by an order of magnitude on the same hardware.
No, as it was a configuration issue down to the VPS provider, who are now huge but at the time were just starting out, and a known issue - we coached them through fixing it.
I do have bug reports and patches floating around for all sorts of other arcane stuff, from drivers to the Linux kernel to memcached to nginx - most of which are from 10-15 years ago now, as I ended up bogged down running the business.
These days The Understanding is instead deployed for living off grid and making infrastructure work.
sqlite has a fantastic codebase for understanding how databases work.
there's a pretty comfortable parser generator in there as well (lemon), that I found nicer than bison when I was still messing around with them instead of just defaulting to recursive descent as I do these days.
Why not? It's really not that many principles- to me, it is entities that add complexity by deviating standards or introducing esoteric nomenclature on-top of existing concepts.
Nah, but I hear you- sometimes its just knowledge bias. Much of the techie scene is dogmatic and bounces as you say, from one trend to the next. To me, this is why we are in such a low period of innovation.
Yeah, it might look nice and easy. But this is when it works. When it doesn't they are unable to understand many kind of problems and have to resort and try&error.
> I can’t help but look at it and see immense complexity, because I see behind the simple and intuitive userland
An idiot admires complexity, a genius admires simplicity, a physicist tries to make it simple, for an idiot anything the more complicated it is the more he will admire it. If you make something so clusterfucked he can't understand it he's gonna think you're a god cause you made it so complicated nobody can understand it.
Douglas Adams said this thing about how you should never consider how
utterly improbable it is that miraculously complex things keep
working, because suddenly they won't. I mean, if you woke up each day
and pondered the intricacies of cellular mitochondia and your own
endocrine system... the anxiety would be crippling.
After 50 odd years of computing which began soldering my own from TTL
chips and transistors in the 1980s I find the only way to navigate
current levels of abstraction is to join the kids in their kind of
optimistic, faith-based wishful thinking, just let go and float above
it all.
But every now and again something goes wrong and absolutely nobody but
people like you and I can even imagine what it might be.
More terrifying is meeting people with very inflated job titles, in
positions of enormous wealth, power and responsibility, whose
knowledge of even the most elementary physics and logic is absent.
They're actually kind of proud of that, and positively celebrate
ignorance all the "things they don't need to know". Part of me finds
that insulting and disrespectful.
I find myself thinking about the respiratory complexes more than I probably should. It’s unbelievable that we function, at the most basic level. Molecular motors? Complexes that can only work in conjunction? Good grief.
When I first started university I had already been tinkering with computers and programming for about 10 years. I started in Computer Science, where I took a Java class from Urs Hölzle, who later went on to be Google's first VP of engineering. It was an amazing class, and I almost scored a 100% on every test, except for the one where I accidentally wrote "<=" as a less-than sign with a line under it and got dinged a point. However at the end of my first year I felt profoundly unsatisfied, like I was just learning superficial tricks to get computers to do what you wanted, but I didn't feel like I really understood what was actually happening.
I switched schools and majors to a Computer Engineering course taught primarily by EE professors, hoping to learn about the lower-level stuff so I didn't feel so ignorant. I learned about logic gates, K-maps, BJT transistor characteristics, N/P-well doping, VLSI, and so forth. All was going swimmingly for me until it came time to take a physics course on crystal structures. At that point I realized that the rabbit hole goes very, very deep -- much deeper than I could ever hope to really fully "understand" from the quantum level all the way up to Java code.
Recognizing that I only had so much time and mental capacity to learn about the whole system, I had to come to peace with knowing that I would have to choose a layer in which to increase my expertise and then largely stick to that. For me that started out being architecture-level operating system coding in C and some assembly, but I popped up the stack a smidge to then get to C and applied cryptography.
I'm now one of the few people at my company that (IMO) understands operating systems at all. The organization is deathly afraid to do anything with the kernel that's being deployed in their Cloud instances, and most view it as some mystical "black magic" that just works if you don't touch it. Hardly anybody fully understands what a so-called "container" actually is, but they sure know all the API calls to "make the container work" and push container images around repositories. They're often surprised when things that I know will be issues with their containers ahead of time happen to them. Whenever they run into problems, they'll often open two dozen Stack Overflow tabs and try copying and pasting stuff until something seems to work. People approach me as some sort of mystical oracle who can bestow arcane knowledge of syscalls and what not when their Stack Overflow searches turn up dry.
I feel like the pendulum perhaps has swung too far into the layers of abstraction in our universities, but I'm not sure what to do about that. I wonder what will happen as people like me ride off into the sunset.
Your self reflection to go deeper and then resurface to the level you are comfortable expanding your expertise in parallels some of mine. Hailing from a university in my country which didn’t go deep enough in any of the subjects, I was left with this insatiable desire to do something about it. It led me to my Master’s in US where I got (or at least I thought I did) once in a lifetime opportunity to take one of the hardest and most fulfilling classes: Operating Systems. It opened my eyes to the magic I couldn’t understand and gave me the confidence of “I can do anything if I put my mind to it” by building a kernel. I learned enough to realise that it is not something I can do as a day job but still venture into should I need to while working on something. To this day, I thank my past self for making this decision of taking on a task that I felt so insurmountable. Whenever I come across a hard problem now which seems insurmountable, the confidence from that experience is what keeps me going.
Simon Wardley would like a word…in his model this is the natural order of things. As technology matures and standardized and a new generation of tools is built on top of new abstractions, and the details of that tech no longer need to be understood in order to use it.
Subjects and skills that were requisite basics a generation* ago, become advanced, under the hood topics for specialists. The next generation of people need different skills in the day to day.
This post is a great account of what that feels like from the inside, from the perspective of the newer generation learning these (now) ‘advanced’ topics.
(Funnily enough, I don’t (yet) see anyone commenting "real men write assembler" - a skill that has long ago moved from required by all developers to super-specialized and not particularly useful to most people.)
*I am using the word generation in the broadest sense as it relates to cycles of technology
Whether or not this state of affairs is "natural", I do not think it is "good".
Civil engineers still need to understand calculus and how to analyze structural integrity even though they can rely on modern computer modeling to do the heavy lifting.
All engineers are expected to have some requisite level of knowledge and skill. Only in software do we accept engineers having the absolute bare minimum knowledge and skill to complete their specific job.
Not that we shouldn't use modern tools, but having a generation of developers unable to do anything outside their chosen layer of abstraction is a sad state of affairs.
> Only in software do we accept engineers having the absolute bare minimum knowledge and skill to complete their specific job.
You can require that your frontend engineer absolutely must have good assembly knowledge but you'll pay more for them and fall behind your competitors. You can require that your DBA knows how to centre text with CSS, but you'll pay more for them and fall behind your competitors. You can require that the people managing the data centre understand the internals of the transformer architecture or that the data scientists fine tuning it understand the power requirements and layout of the nodes and how that applies to the specific data centre, you'll just pay more for someone who understands both.
Everyone requires the bare minimum knowledge to accomplish their job that's pretty much the definition of "require" and "minimum", limited by your definition of someones job.
"software" is such a ludicrously broad topic that you may as well bemoan that the person who specifies the complex mix of your concrete doesn't understand how the HVAC system works because it's all "physical stuff".
> but having a generation of developers unable to do anything outside their chosen layer of abstraction is a sad state of affairs.
Whether it's sad depends if they're better in their narrower field, surely. It's great if we can have a system where the genius at mixing concrete to the required specs doesn't need to know the airflow requirements of the highrise because someone else does, compared to requiring a large group of people who all know everything.
Yeah, the flip side of there being 'less skilled' developers who operate at a higher level of abstraction is that it is easier to train more of them.
In absolute numbers, there are probably more people today who understand the fundamentals of computer hardware then there were 40 years ago, but it's a much smaller percentage of all the computing professionals.
> but having a generation of developers unable to do anything outside their chosen layer of abstraction is a sad state of affairs.
This is the normal state of affairs, and is really the only reason we can build meaningful software systems. Software is much too complicated, to understand even one layer of abstraction can be a multi decade journey. The important thing though, is that when the abstractions are leaky (which they always are), the leakiness follows a good learning curve. This is not true for cloud though.
> All engineers are expected to have some requisite level of knowledge and skill. Only in software do we accept engineers having the absolute bare minimum knowledge and skill to complete their specific job.
Most software engineers just produce websites and nothing that impacts the safety of other humans. Other types of engineers have to ensure people do not die.
> All engineers are expected to have some requisite level of knowledge and skill. Only in software do we accept engineers having the absolute bare minimum knowledge and skill to complete their specific job.
If that was true, then there would be opportunities for entry into professional software engineering careers. Because the only opportunities there are for software engineering jobs are opportunities for "senior" software engineers. Which entails much more than the absolute bare minimum knowledge and skill.
So there's some inconsistency going on within the mindset of people who measure competence and fitness in engineering, in the broadest sense of the concept of engineering.
Maybe engineering itself, then, isn't even remotely the noble profession it is widely believed to be? Maybe engineers and even scientists aren't that really intelligent? Or intelligent at all? Maybe science and mathematics should be abandoned in favor of more promising pursuits?
Engineering as applied to software is completely watered down in practice compared to Professional Engineering as implemented by many states.
If a software engineer "signs off" on software design, they have no personal or professional liability in the eyes of the law, or anywhere near the same expectations and professional/ethical oversight that comes with the territory of being a PE.
Until a "Software Engineer" can basically look a company in the face and deny a permit to implement or operate a particular stack/implementation, this will not change.
And yes, I am fully aware that this software engineer would basically become an "approver of valid automated business process implementations". This would also essentially be a social engineering exploitable position for implementing nepotistic dominion over a business jurisdiction. Hence why I'm not sure it is even a desirable path to go down.
> Until a "Software Engineer" can basically look a company in the face and deny a permit to implement or operate a particular stack/implementation, this will not change.
The possibility of a business not earning revenue or income as a result of its software development attempt is a form of software authorization that prefers "good" coding over "bad" coding. Whatever the global industrialist landscape decides is good and bad.
And, interestingly, earning income with software development is a much harder hazing ritual than the paths of traditional academia.
There are plenty of entry level software roles out there. They are often listed as senior and may not align with your particular definition of entry level, but there are definitely people that are getting those jobs who have limited prior professional experience.
> Not that we shouldn't use modern tools, but having a generation of developers unable to do anything outside their chosen layer of abstraction is a sad state of affairs.
Funnily enough my day job is writing software for structural engineers (and I am a licensed engineer). Your comments are absolutely on point. One of the most important discussions I have with senior engineers is "how will we train tomorrow’s engineers, now that the computer does so much work?"
40 years ago, the junior engineers were the calculators, using methods like moment distribution, portal frame, etc… today the computer does the calculation using the finite element method. Engineers coming straight out of school are plunged right into higher level work right away - the type of work that junior engineers a couple of generations ago might not have seen for 5-10 years.
My first career development discussion with a senior engineer was "Just work for 10-15 years, then you'll know what you need to be a good engineer."
I have discussed this under the theme of Generation Gap (https://www.youtube.com/watch?v=5gqz2AeqkaQ&t=147s, 2:27 - 8:58), and have a similar conclusion to you: what at first appears as a different generational approaches are actually different facets of a well-rounded, senior technical skill set. Maybe the kids are just learning things in a different order than we did?
Lots of HN commenters are younger generation folks, and lots of them have poor fundamentals. They will certainly deny the need for wider scope of knowledge, as they do not have it themselves.
While I mostly agree, I think one thing to keep in mind is that we still need people somewhere who know how to do that. e.g. FAANG might have data center people and sysadmins that know the hardware... we (they? not sure) just need to ensure that in the future, we still have _some_ people that posses that knowledge.
I do not think it is requisite that _all_ developers have that knowledge.
Yes, absolutely - skills move from mainstream to niche, but are still required! For example, a much smaller proportion of the population knows how to farm today than 100 years ago, but it's still important :)
I disagree. I think it's about pivot time, not having a warmed up stable of skilled workers just in case. Nature never optimizes for that and it shouldn't. We should lazy-load that skillset if and when it's necessary. We have writing to carry knowledge forward. Also, video and other media. People are smart and I'm sure a large cohort could be assembled with the right amount of money in fairly short order. As long as that's cheaper than keeping a battalion ready just in case, then I'd argue it's the "correct" way to approach it.
"What can one do in the face of a relatively shrinking population?" is the more interesting question to me.
As someone whose managed a team before, there is a minimum population of people practically required to sustain a particular corpus of actionable information without suffering severe degradation in terms of said information's application.
Once one ends up below that point; things tend to go the way "from scratch rediscovery required", until such time as the population of people capable of acting on it is restored.
Whether that actually happens is a prioritization decision balancing against everything else that still has to be done.
When the industry will cry for it, universities will magically start offering (and heavily incentivize) Computer Engineering degrees instead of Computer Science ones
I feel like most of the stuff the author is complaining about is taught in computer engineering and not computer science. Computer engineering curriculums usually focus on lower level subjects and computer science on higher level abstraction.
If the author wants people with the lower level skills then they should filter resumes based on the degree. But then they will probably be missing higher level skills.
In the end there is not enough time to learn everything in 4 years. If you add a new class then an old class has to be dropped or modified.
I'm in integrated circuit / semiconductor design and we aren't going to even look at someone with only a computer science degree. We look at people with electrical and computer engineering degrees. Even a bachelors degree is not really enough anymore. 90% of the new grads we hire have a masters degree and there is still a ton for them to learn on the job.
The thing is, what the original article actually describes as "hardware knowledge" in their examples doesn't even touch on electrical engineering. At most it's stuff you'd expect to learn in an operating systems class, which many CS graduates will take. OP isn't close to asking "wtf is flip flop?" it's more like "wtf is select()?".
All of which makes me wonder how the author is interviewing people. I've been working over 25 years and interviewed at least 100 people at all levels of experience. The hardest to interview are college students because they have so little real world experience. But that doesn't really matter.
If the author thinks people should have 8-bit computer experience then ask the question during the interview. If the author wants to ask about operating system topics then ask those during the interview.
I've worked with some interns and new grads that didn't have the skills I thought they should. So I added those things to my list of interview questions.
Most (good) computer people don’t learn about computers in university, they learn at home through tinkering. When everything comes in a container and recommends using AWS, there’s no need to bang your head on the wall trying to figure something out, which is what actually leads to learning and tinkering with hardware at home.
A few bright spots, however, would be gaming (the constant need to pay attention to new GPUs), and smart home/arduino type stuff.
In other words, The cloud has no soul. Just as devs were partitioned into front and backend and full stack. Admins are going through a similar transformation.
I really don't know how bad all of this is, but it's surprising that graduate engineers don't know how to configure a basic network without DHCP.
On one hand, you can have a router at home from your ISP, and everything seems to work fine. On the other hand, there are a lot of existing misconfigurations and security issues with these routers, which ISPs used to update their firmware automatically.
If people can set up and operate a basic Debian server and a bit of networking that I suspect is more than enough for a cloud where anything lower is concealed anyway
I would love to see at minimum a basic "here's how datacenters are designed" even if the only use is to see how "the cloud" just maps those concepts to APIs. You'll imminently understand why AWS networking and EC2 have all the different concepts they have.
I had to explain the point of Finesse OS (used in the original NTI PIX firewalls[1]) to someone technical the other day, within the context of a single-purpose (in fact single service) OS concept. “But it was Linux right?” Was a cognitive hurdle it took several attempts to clear.
You will never know the joy of diagnosing a bad mainboard over the course of serval hours or days. Just stop, start and the problem magically goes away.
You’ll not realize that you could pay $1k for the server and run it yourself instead of renting forever.
You’ll never learn that arranging your cables nicely and labeling everything takes five times as long and is 100% worth it.
I used to be half decent at navigating and using Linux, writing ksh, etc. With cloud, I don't have any need to know that information anymore. I guess it's kind of good in general, but feels bad to have significant portions of my professional skills be essentially obsolete.
Pretty soon developers will be measured by how fast they can tell an LLM what to build for them.
Absolutely. I've also witnessed a significant decline in networking knowledge. Example, a simple conversation on routing, web proxies and firewalls rapidly creates uncomfortable silences in most teams I interact with...
I had two markedly different experiences interviewing for DBRE roles recently.
One was at a well-known SaaS “here’s a spreadsheet we call a database” company. I knew way more about Python than the interviewers, which became a problem because I used its stdlib to blow through the tasks they had. I then had a second interview, where they effectively asked me to code DFS. I am terrible with LC. I explained DFS perfectly, but struggled to turn it into code. The interviewer even commented that it had nothing to do with the position (you don’t say).
The other was at a quant trading firm. I had multiple interviews, ranging from “can you use a programming language to accomplish tasks” (again, Python and its stdlib was quite helpful, and wasn’t met with pushback), to DB-specific questions, to Linux internals. All of it felt extremely on-task, and had they not required me to relocate for a remote position (?!) I would’ve taken the job in a heartbeat.
I'm not convinced that 8-bit computers are the ideal starting point for education. My reasoning is that Dijkstra had a point when he wrote:
"It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration."
This is of course exaggeration, and many great programmers did in fact start with BASIC, but more importantly, Dijkstra wrote this in 1975, when BASIC really did have some glaring flaws. '10 PRINT "HELLO WORLD"\n 20 GOTO 10' has appealing immediacy, but it sets students down the path to spaghetti code. A modern structured BASIC, or perhaps MicroPython, is a much better starting point.
And that's problematic on an 8-bit machine. IMO, an educational computer should have a flat address space without paging. Paging isn't relevant to modern computers; it complicates things without teaching anything valuable. The natural address bus for an 8-bit machine is 16 bits wide, and that limits you to only 64KiB address space without paging. For simplicity, the frame buffer should live in this address space too, so you don't have much left for a modern programming language (ideally stored in ROM in the same address space), especially if you want things like user-friendly error reporting and built-in documentation, which you probably do (think of QBASIC).
Therefore, I think a 16-bit computer is better. I'd prefer something like a 16-bit version of Ben Eater's breadboard CPU[0], using SMD components on a PCB to allow for speeds fast enough for interactive graphics. All chips should also be available in DIP for easy breadboard experimentation too. It should support fully static operation, with a pause switch and a single-step clock button, indicator LEDs for all the registers, and test points everywhere you might want to inspect with an oscilloscope. It should have a simple GPU that just converts a 320x240 1-bit monochrome frame buffer to VGA, with memory access interleaved with the CPU. The CPU can poll for vblank. Support for PS/2 or similar serial keyboard input, with a FIFO buffer so you don't lose keystrokes. A very primitive distinction between user space and kernel space, where some RAM is read only unless the program counter is in ROM. Although I generally like the PEEK/POKE anywhere freedom of the old 8-bit machines, I want this feature so it can support a memory monitor and disassembler in ROM, which is jumped to by triggering the single non-maskable interrupt. The monitor can copy all the registers and framebuffer to the kernel RAM on entry and and restore them on exit, so it's always possible to inspect things or debug any problems you might run into and continue afterwards. Data storage could be via a simple audio interface, like the cassette tape data storage on old home microcomputers. Advances in both software and audio recording hardware should make this substantially faster and more reliable than it was back them.
This computer would have maximum observeability and ease of understanding, while still being advanced enough to be user friendly and interesting. It should even be powerful enough for games, which will provide some extra motivation.
It doesn't match your concept precisely but it has concretely implemented a good combination of observability and capability - graphics, sound, keyboard support. The BASICs can be safely ignored in favor of Forth, which goes directly to the point of tangibly understanding the machine - Forth programs are already structured code, interactive, with extensible syntax, but they also let you crash the computer in moments with a bad memory write. The pedagogical value is immense within an 8-bit environment.
Still, in some ways it’s a good thing. People can write good code without knowing what an ALU is, much less a flip flop.
The bigger problem is that there is little incentive these days to write good code at all; those who do of necessity know more than just their tiny domain.