Hacker News new | past | comments | ask | show | jobs | submit login
Running a Unix-like OS on a home-built CPU with a home-built C compiler (edby.coffee)
604 points by abc_tkys on Oct 4, 2020 | hide | past | favorite | 69 comments

> Reinventing the wheel is generally said to be something to be avoided, but there’s quite a bit to learn from actually doing it. It made me realize that I didn’t understand it as well as I could implement it from scratch.

I wrote a bitmap image library in C a few months ago and when I realized that I was in too deep I just kept going because I can never abandon a project I start.

I felt bad when I came out the other end because of all the time I spent on it that I could have spent doing other things and because I could have just taken a library off the shelf.

In doing the exercise I did realize that what appears to be a simple and straight forward image format, at first glance, is actually ambiguously defined and full of edge cases. There was no way for me to know that when I started.

> There was no way for me to know that when I started.

Any format/protocol/RFC that's been around for a significant amount of time is going to be bastardized and bent to solve people's problems. Right now I'm going through that with RTSP. I did my best to try to avoid writing my own client/server implementation because I knew that I would be taking on the responsibility to run down the quirks, but after some major vendor problems here we are =(

Just throwing this out there for folks new in their career: if it's been "a thing" for awhile you can almost guarantee that doing a full implementation against it is going to be time-consuming because of the quirks/edge-cases. If it's something common like handling bitmaps you can get ahead of your problem by reading open-source image libraries to get an idea of how much "edge-case support" you're going to have to dig through.

I just don't like forcing myself through "clean-room"/from-RFC/from-spec implementations unless I professionally have to - they're almost always painful and impossible to time-manage.

What problems are you encountering with RTSP? I used GstRTSPServer for an open-source project (https://github.com/shortstheory/adaptive-streaming) and it's filled my requirements pretty well. RTSP clients on the other hand...are a much more difficult story.

HA. Oh boy.

Well. I sorta mis-spoke. I need to ship RTSP/RTP with hella-different stream profiles over WebSocket. Imagine slamming the entire RTP and RTSP datastream into binary packets shipped around via WebSocket only to have the packets reassembled in-order on the other end.

RTSP isn't quite the problem, especially if I had your use case. I work in PSaaS (physical security as a service) which has really unique streaming circumstance where security + latency are prime. Also, there's tons of additional features I need from RTSP as well, ex: "hey give me a new I-frame right now".

Unfortunately the RTSP spec is all over the place with my vendors... and throw the whole ONVIF WebSocket thing at it + trying to actually keep stuff cryptographically secure and welcome to my nightmare. I'm talking about craziness like not having an "ack" to the final "PLAY" command in the prelude but the server still shipping the data, packet parsers/fixers implemented in ECMA, browsers acting as the RTSP session client... unfortunately the list goes on.

There's a lot of overlap with your world. I even have your project bookmarked + labeled "sane light-weight RTSP gstreamer implementation" from when I was in my research phase! Small world =)

If you're interested you can check out the ONVIF streaming spec: https://www.onvif.org/specs/stream/ONVIF-Streaming-Spec.pdf

And the pipeline tech that most of my jazz is based on: https://github.com/AxisCommunications/media-stream-library-j...

Thanks for being an open-source resource with well-documented work!!!

Haha thanks, I appreciate the compliment! I'm actually in the middle of a rewrite to make adding new camera types a lot easier and the code a lot cleaner in general.

I wasn't aware that it was possible that you could request I-frames in the RTSP spec. Which clients provide such functionality? I am looking for some web based RTSP streaming solutions, so your project could be very useful :-) Good luck for it!

> I-frames in the RTSP spec

It's non-standard for the spec but is common for security industry. All of my Axis cameras support it, and some of my other vendors support it too... unfortunately it's vendor-specific but it's super useful when you're dropping someone into an pre-established stream and/or recovering from adverse stream conditions.

> I am looking for some web based RTSP streaming solutions

Start searching GH for "RTSP WebSocket" and you'll end up with a TON of stuff! It's how I had your work bookmarked in the first place.

I would also take a serious look at that previous Axis Communications lib I sent over as well for web-based stuff. One thing I'll warn you on is that you'll need to work within the HTML video element sink to look for "seek drift" and correct it when it comes up... VLC has some of these same problems too where I want to "FF to the end of the buffer" for low-latency video feeds VS. every "hiccup" bumping me further and further back on a buffer. This all gets weird for use cases like ours because the normal social contract for video is "don't lose frames", where ours is likely "give me the most up-to-date/valid video frame possible". (which is why requesting a new I-frame is SUPER useful!)

Also let me know if you know of any modern communities around this stuff... I'm seriously on Archive.org ripping through the wayback machine to learn (rtsp.org) or reading through ancient Live555 docs/GH projects... I could really use an accessible Discord channel around modern video engineering with experts like yourself in it!

Sorry for the late reply, but I found #gstreamer on FreeNode IRC super useful for getting support on how GStreamer works internally! The developers are usually active all the time so you should be able to find someone around to help out.

I think a better way to phrase it is that reinventing the wheel has very little business value.

But IMO you can learn a lot reinventing things badly and correct software has little business value anyway.

As the saying goes: we reinvent the wheel not because we need more wheels, but because we need more inventors.

I like to call these discoveries as unknown unknowns. It’s when not only the answer/fact is not known, but even the existence of the question/fact is unknown.

In this case, not only the fact that the format is ambiguous (the answer), but also the fact that there exists something interesting/unusual about the format (the question).

For example, you could have learned the answer if you could have searched “is there something unusual about the format?”, or even “is the format actually ambiguous?”, but those ideas were not present!

Active mentors can help with this a lot.

...but, you learned something new, and hopefully had at least some fun...

This was an excellent read. Congrats to the team on getting their port of Xv6 up and running!

I had the opportunity to help with a project to port UNIX to a data general NOVA computer as an undergrad and the combination of "non toy" problems and results you can see is really fun. We didn't have to write the compiler though! That is an amazing part of this story too.

Given the availability of RISC-V descriptions and FPGA boards that will host it with memory, this sort of project should be within reach of anybody willing to invest about $150 in parts and a lot of labor. It would make for an interesting basis for a demo contest!

That's a current project for me right now. When I took computer architecture in university, it was fairly light on hard details about things like superscalar and out of order designs. Sure there's the concepts, but there are so many interesting problems it didn't cover.

E.g. how do you, at a hardware level, actually do the reservation stations for a out of order design. How do you actually implement a register file with enough read and write ports to satisfy such a design without taking up a whole die for it?

I know there are a few Linux capable soft-core RISC-V designs out there (VexRisc, etc.) and microcontroller class ones (PicoRV32, etc.). If my goal was implement a system and it needed one of those things, sure, I'd use an off the shelf one. But I really want to understand how the CPUs work, and the best way to do that is doing it myself without looking at the answer key.

Turns out register files are complicated and fascinating. I'd never come across "register file banking" in my architecture courses. Makes what I had to deal with in CUDA make a lot more sense now.

That sounds awesome! You should definitely post a Show HN on it.

I am going to comment on this though: But I really want to understand how the CPUs work, and the best way to do that is doing it myself without looking at the answer key.

I am right there with you this, however with experience I've come to appreciate that there is a lot of complexity in this topic and I personally have a limit on how steep a learning curve I'm willing to climb in my spare time. As a result I've taken to trying to isolate topics to learn around things that are known to work. Here is an example ...

In 2015 I discovered you could get software radios for $25 and there was a bunch of open source software one could use to play with them. I wanted to write my own modulator and de-modulator code but kept running up against too many unknowns so having it be impossible to debug. Was it my code? Was it the radio setup? Was it the signal format?

I didn't start making real progress until I got a real Spectrum Analyzer and Vector Signal Generator. This let me work from a "known good source" and generate signal that I could compare on my Spectrum Analyzer with the signal the VSG generates. THAT let me find bugs in my code and once I understood more of the basics of the DSP that was going on then I could branch into things like front end selectors and polyphase filtering.

So I applaud your effort, and more power to you if you punch through and get this done. It will be huge. But if someone reading this were to think this is the only, or best, way to do something I would encourage them to recognize that one can break these problems apart into smaller, sometimes more manageable chunks.

Another detail I never see covered is the implementation of variable length instruction decoders. Most every book seems to assume a classic RISC design with 32-bit wide instructions, single-issue and non-microcoded. Are there any advanced undergraduate or graduate level books that cover details like this?

Modern Processor Design: Fundamentals of Superscalar Processors is a great book. It uses the P6 uarch (so Pentium Pro through Pentium III) as a case study to contrast against the PowerPC 620 and is a great place to start as it covers a lot of the details you're asking about. That P6 arch is the basis for modern Intel cores after they dropped P4/NetBurst when Dennard scaling hit a wall. Yes there's been updates, but the book is still basically on point.

Real quick overview for some of the archs I know (that happen to all be x86), cache lines coming in from the I$ fill into a shift register. Each byte of the shift register has logic that in parallel reports "if an instruction started here, how long would the instruction be or say IDK". That information is used to select the instruction boundaries, which are then passed to the full instruction decoders (generally in another pipeline stage). After the byte lengths recognized are consumed, new bytes are shifted in, and the process starts over. This separation between length detection and full decode lets you have 16 or whatever length decoders but only three or four full decoders. Additionally the rare and very complex instructions are generally special cased and only decoded by the byte 0 length/instruction decoders. And even then, sometimes even the byte 0 decoder takes a few cycles to fully decode (like in the case of lots of prefix bytes).

I imagine superscalar processors for other CISC archs have very similar decoders, maybe just aligned on halfwords rather than bytes if that's all the arch needs (like for 68k and s/390).

Right, most lecture notes I've seen on the internet only go as far as implementing a simple 32-bit RISC machine where everything happens in one cycle. For variable-length instructions I think you'll need a multicycle state machine: fetch the first instruction byte on the first cycle, then fetch the next instruction byte if needed on the second cycle, etc.

I came across these notes a while ago when trying to implement something similar: https://my.eng.utah.edu/~cs6710/slides/mipsx2.pdf (it's an 8-bit MIPS, so it needs 4 cycles to fetch an instruction).

Note that you can easily 'cheat' on this, especially if you already have the hardware to support unaligned reads. Just load-unaligned a (say) 64-bit 'instruction' and then only increase IP/PC by however many bytes you actually used.

That will carry a pretty heavy performance penalty though.

Yes, that's what makes it cheating, rather than good parsimonious design that intellegently reuses preexisting resources.

> We didn't have to write the compiler though! That is an amazing part of this story too.

As noted in the article, the compiler was written in OCaml. If you follow links, the compiler emits assembly which is then assembled by an assembler written in Python.

I think developing the compiler in a more productive language (certainly for writing compilers) than C greatly helped.

To be honest, I'm kinda jealous. I do have a general understanding of how a CPU works, and I do understand what preemptive multitasking operating systems do, but my understanding is mostly abstract. And, fwiw, I studied most of this by myself out of sheer curiosity.

I did graduate a university but the closest thing we had was using a custom CPU emulator, given by the teacher, to run programs, given by teacher, and tracing every step it takes. It was so tedious and boring (the thing ONLY accepted binary, too, hexadecimal would be too convenient I guess), I wrote my own emulator that took an assembly file as input and generated the paper as the output, you only had to add your name to it before printing and turning it in. It then took me 5 whole minutes to generate semester worth of those. Of course I got a passing grade. Still, no substantial understanding of CPU inner workings was gained during the process.

I was reading a book on early computing and I believe folks working on some of the original computers at MIT (?) would add their own instructions. They would just wire them in.

Steven Levy's book Hackers talks about this in chapter 5, "The Midnight Computer Wiring Society."

I’ve always wanted to do this. I helped my housemates implement a CPU with custom ISA in college (this was before FPGAs, so there were a lot of chips in that CPU) but that’s as close as I’ve gotten.

I kind of feel like you can do something like this in college, or after you retire, but nowhere in between. :)

Why not? I made my first Phone OS (RTOS) on 8051 and 256KB ROM in college 20 years ago. Then I worked on several projects and my skills improved. Then I designed a processor architecture, made C-like programming language, created a compiler, an IDE, created a mobile OS and want to build a hardware chip in my startup https://animationcpu.com/

That’s awesome! To be clear, I’ve been able to pick (or create) some fantastic “day jobs” where I got to do lots of things I wanted to do. For some reason it’s harder for me to focus on a big long-term “outside interest” project like this. But you’re exactly right...why the heck not?

I did a course like this while on exchange programme to UCSB! In the ECE 152B Digital Design Methodologies class, teams of 2 people had to design and build a CPU from scratch.

We had to choose all our own chips and wire them together by hand. We had an FPGA for the Control Unit, but had to design the bus, pipelining, stack, carry-borrow, all the way up to running a demonstration program.

We did that in teams of 2 people, with only 1 semester. I was broke, so we chose the cheapest chips (I built MUXes out of NANDs). I realised we couldn't finish on time. I wandered the hallways, saw a project poster from a previous year, took some photos, reverse-engineered and reimplemented their design. We didn't have the FPGA code, but knowing the breadboard layout helped enormously. A few more nights in the lab, working to exhaustion and sleeping under the desk, and we passed the class. If you think industrial espionage only happens in other countries, that taught me that it happens in the US too. Arguably we made enough changes that it wasn't total plagiarism, but it did help to have a known-working design to build on.

The most lasting memory of that project, though, was when it worked fine during debugging, but failed whenever we removed the test probes. We were using 2 breadboards, but no common GND. During programming, they were grounded together through the GND on the PC's USB port. Always check for a common ground!

Very cool. I once taught a class similar in spirit to this, although it was aimed at first-year undergraduates with only a basic knowledge of programming, squashed into a single quarter, and designed on the fly, so we only got partway through implementing a C compiler. Instead of writing a full C compiler ourselves, we considered "cross-compiling" one by taking one that compiled to bytecode, and writing an emulator for that in assembly, taking that to be our first "compiler" and then progressively bootstrapping from there to a better compiler until it became feasible to start writing (or porting) a C-based OS.

Reminds me a little of Wirth's Project Oberon

Indeed, while i was reading the blog post i had in my mind Wirth's Project Oberon which is also about implementing a custom CPU on FPGA and then a complete OS with its own compiler, tools, GUI, etc on it.

Jaw-droppingly impressive to have gone from nothing to a custom ISA, C compiler toolchain and a whole freaking OS with userland apps in six months!

My favorite detail was that having spent all this time on extra credit, they only managed to complete their primary task an hour before the presentation - an unmistakable mark of a true hacker if there ever was one.

They completed the primary task way before that. This was free time and they did it for fun. The fact that they re-implemented their initial task as a program in the OS itself was the cherry on top, to impress their teachers (and also to learn a heap load of stuff in the process).

> The original task of the CPU experiment was “Run the given ray-tracing program on your homebrew CPU”. Now that you’ve got an operating system running on your CPU, you know what you’re supposed to do, right? We decided to run the ray-tracing program “on the OS “on our own CPU. We had a few bugs, but we managed to finish it an hour before the final presentation.

> to impress their teachers

If I was that teacher, I would have flunked them for "doing something just because you can" and not using their capabilities for a more useful goal. And also because I'm tired of yet another toy CPU and toy OS that's just like all the others.

I suspect you’re failing to understand the goal of being a teacher, which is to nurture your students. Toy operating systems are a time-tested way to learn.

No, I would have taught them the valuable lesson of not trying to invent the wheel. These students are obviously smarter than the average student, and therefore I would expect more of them, like inventing new OS concepts instead of mimicking age-old designs.

When I was 12 or so, I was fascinated by a video [0] visualizing the memory architecture of the Nintendo Entertainment System. I started researching the technical details of the console in order to try to make sense of what I was watching, and within a few hours I had decided to try to write a NES emulator.

Now, the NES emulation scene is extremely mature, and so I quickly came across a number of warnings essentially saying "the world has too may NES emulators, please don't write another one." I decided to go ahead anyway, and after spending about 2 years' worth of afternoons and weekends, the result was just another bad emulator. It was slow, buggy, incomplete, and the code was a mess.

By your standards, I wasted 2 years of my life making "yet another toy" emulator instead of "using [my] capabilities for a more useful goal." But I had a ton of fun, and the learning experience was absolutely invaluable. I learned in depth how a computer actually works at the hardware level (even though the NES is much simpler than a modern computer, the concepts and skills transferred over very quickly).

As a result, my practical programming knowledge today is vastly improved over where it would have been had I not "wasted" all those afternoons. Many of the career opportunities available to me so far have been directly attributable to the skills I gained working on that emulator.

[0]: https://www.youtube.com/watch?v=xI3xZAn7r2A

Link to your emulator? I don't care if it's buggy, I want to try playing a game on it!

Thanks :) I never published it anywhere and it's Mac-only, but if you really want to try it you can download the source from https://nobodynada.com/emulator.zip.

You can't build the next wheel without first getting a handle on how & why the existing wheels were built the way they were. Even CS undergrads start with toy operating systems that they fill in the parts for (adding a filesystem, writing a compiler for a toy language, etc). There's simply no better way to learn.

> These students are obviously smarter than the average student, and therefore I would expect more of them, like inventing new OS concepts instead of mimicking age-old designs.

You can disagree but here are the facts:

1. The students completed the task.

2. The professor had intentionally built slack into the schedule to let ambitious students take this further to play around with the tools & their knowledge to try doing creative things.

3. The students used this to recruit others students to form a larger team around a more ambitious end goal

4. Self-organized to distribute work & build a schedule that minimized interdependencies

5. Completed their goal.

Aside from the technical stuff, these all sound like valuable soft skills that were learned/applied in addition to the technical achievement. I'd say both the professor & students did a good job here.

What's hard for me to say is what year this is. In my old engineering school there was a final project in years 3 & 4. Your team would pick some kind of vague final project (with consultation from your teacher), you'd get a budget for materials + connections to companies/vendors for sponsorship, & then go about building your concept. That's a bit more complex than this but also lasts 1.5 years and happens largely outside of school. This blog post is about a project done within the context of 1 subject AFAICT. That makes it way more impressive.

Students at this level simply don't yet (at least generally) have the experience nor a good understanding of the space to understand what are more significant problems to go tackle. From what I've seen you usually start that journey as a masters/PhD student.

Im from the same uni, but from a different department. The project is famous as well as notorious for the commitment required.

Jp uni students typically write a thesis in the fourth/final year (is it called senior in the US?). This project is for the third year (junior in the US?). Probably there is also a difference between a (e.g. mechanical) engineering and a CS department. A typical CS conference paper does not need 1.5 year from initiation to publication, while I understand a mech work will need a lot more time. I moved from B.Eng to CS. With a proper guidance that lets a student focus on a particular subject (a research theme is given by the advisor), they do incredible work, although they may not have a broader view of the research field yet.

> These students are obviously smarter than the average student,

And yes these national university students are the top brains of the country. BTW, I am very sad that typical Japanese corporate organization basically rejects them as they are too smart and do not fit in their age-based structure. The author is in Microsoft anyways.

It's an undergrad class, and they sailed right past the expectations of the rubric and fit in with the culture of taking their project a bit further than what's required (but in a different direction than most.

Perhaps at some point one would expect students to invent something new, but it's generally after the OS/arch classes where you do things like implement toy OSes or toy CPUs. :P

By your logic, nobody should do almost anything unless its completely new or novel. Nobody would learn enough to actually make anything new and a lot of us would get bored or frustrated and leave to do something more fun instead.

I'm glad we can do whatever we find fun or interesting or because we want to learn it, rather than being told by someone like you that we're not allowed.

There's obvious educational value in having students build tools like compilers and operating systems.

> I would expect more of them, like inventing new OS concepts

We're talking about undergraduates, not researchers.

Yeah, a certain Linus should not have tried to invent the wheel.

new doesn't come before knowing what exists and why, and to understand the latter making it is important. as a teacher if you fail to understand you should probably spend some time to read more why people study what exists in school and college for the better part of their life.

But, this was done by third-year students in an educational context. It is just a way of learning by doing instead of by only studying books, and even bright students have to learn the basics first.

I'm sympathetic for your goals, if not your means. I would point out that they ported an existing OS: a sign that perhaps they wouldn't have had enough time to finish with a novel design.

I hope that the code in the commons can continue to become more modular to the point that is is practical to try out some fresh ideas with like a library OS / exokernel that provides all non-novel bits without constraining the design space.

How they can invent new concepts before trying to invent the wheel?

Good luck inventing something new and better without first understanding what already exists. In my experience it's much more efficient to master what already exists, first, noting down any novel ideas as they come to you, and then move on to implementing those novel ideas.

How ironic that you say this on Hacker News.

That is one of the ideas at the very heart of hacker culture—doing things just because you can.

True, but these students were following an academic education, not "hacker university".

It was their free time. You get the idea behind that concept, yes? As in do whatever they want. And they wanted an OS! opposite of let's say drinking in bars trying to pick girls.

Well, they also fulfilled the requirements for the credit (which was to get the raytracing program running bare-metal on their CPU), so if you want to be pedantic about them being in an academic course, then you should honour the original assignment requirements, right?

Why do people even engage with this guy. Lol

Unbelievable, you understand that Unix is developed out of free time to play games and not as a requirement right?

If you had a say you might point to Multics and shutdown the Unix project altogether, fortunately, you did not.

The author also did another project named Noah that's like WSL on macOS. https://news.ycombinator.com/item?id=18765184

WOW. that's all I have got to say. I wish my country's universities had such educational quality, help guiding students properly and would encourage students to work on such projects.

If you want to do the lite version of this there is a book called “From NAND to Tetris” that is an instructional book for a similar class and walks through doing much of the same thing. It helped me connect the dots between “I know how a transistor works” and “I know how assembly language works” that I have never learned otherwise. Even if you just skim the book without doing the project you will likely learn some things.

The Elements of Computing Systems: Building a Modern Computer from First Principles https://www.amazon.com/dp/0262640686/ref=cm_sw_r_cp_api_i_yH...

I’m doing this right now and it’s a lot of fun and very accessible.

I saw there’s a highly rated Coursera version too (I prefer books to videos so I haven’t looked at it)

> Reinventing the wheel is generally said to be something to be avoided, but there’s quite a bit to learn from actually doing it. It made me realize that I didn’t understand it as well as I could implement it from scratch.

The paradox of any performance that seems effortless, is that tons of effort invariably went into producing that effect. Good going mate!

What would be the best FPGA entry point today, for hobbyist?

I would prefer if tool chain works on linux ?

All the major toolchains work on Linux. I personally liked Pynq http://www.pynq.io/, it works fairly well and is beefy enough to do non-toy projects with it. For something totally barebones you can try https://www.latticesemi.com/Products/DevelopmentBoardsAndKit..., it has the advantage that it has a very nice open source tool flow available for it: http://www.clifford.at/icestorm/. I own both boards and I work professionally with EDA tools, honestly the biggest cliff is setting up tooling and processes around a given project. If you don't want to do everything from scratch you need to either rely on proprietary libraries, which will cost you a fortune and/or lock you into a very specific tool flow (Synopsis Designware libraries). The other alternative is to use one of the open source ecosystems like https://github.com/m-labs/migen or tooling around Chisel or various scattered libraries of open-source components which are often hard to get to work together, but also this requires a massive amount of time investment to just get the synthesis / simulation flow working.

The situation is slowly improving and there are various university research groups and corporations that have released larger amounts of open source code that one can use as a starting point:

- https://github.com/openhwgroup/cva6 (mostly done by ETHZ people initially, also see https://github.com/pulp-platform/pulpino)

- https://github.com/chipsalliance/Cores-SweRV-EL2 (one of several high quality open-source RISC-V cores by Western Digital)

- https://github.com/chipsalliance/rocket-chip (UCB by initial creators of RISC-V)

- https://github.com/electronicvisions/nux (POWER based embedded processor, that we have used in several neuromorphic chip tapeouts (silicon proven))

On the tooling side one great thing that happened is that the verilator open source simulator has gained a lot of traction.

- https://github.com/verilator/verilator

On the hardware synthesis side some exciting developments are also happening, mostly driven by the desire of companies like Google to be vendor independent for their Deep learning accelerator designs and through funding by DARPA.

Crowd Supply has a variety of very cheap FPGA breakout boards that make it easy to get started.

Very interesting article. An ambitious project and I'm certain a great learning experience

I see "cofee" written a bunch across the site is that some alternate spelling or just a typo?

Made my day :)


These days there is a low barrier of entry to create "own" CPU. Just get your favourite FPGA ecosystem, configure ready made core and off you go. Route manufacturer's reference design in Eagle or Kicad add your peripherals then send the design to JLCPCB or other fab and job done. There is plenty of good soldering tutorials on YT.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact