I wrote a bitmap image library in C a few months ago and when I realized that I was in too deep I just kept going because I can never abandon a project I start.
I felt bad when I came out the other end because of all the time I spent on it that I could have spent doing other things and because I could have just taken a library off the shelf.
In doing the exercise I did realize that what appears to be a simple and straight forward image format, at first glance, is actually ambiguously defined and full of edge cases. There was no way for me to know that when I started.
Any format/protocol/RFC that's been around for a significant amount of time is going to be bastardized and bent to solve people's problems. Right now I'm going through that with RTSP. I did my best to try to avoid writing my own client/server implementation because I knew that I would be taking on the responsibility to run down the quirks, but after some major vendor problems here we are =(
Just throwing this out there for folks new in their career: if it's been "a thing" for awhile you can almost guarantee that doing a full implementation against it is going to be time-consuming because of the quirks/edge-cases. If it's something common like handling bitmaps you can get ahead of your problem by reading open-source image libraries to get an idea of how much "edge-case support" you're going to have to dig through.
I just don't like forcing myself through "clean-room"/from-RFC/from-spec implementations unless I professionally have to - they're almost always painful and impossible to time-manage.
Well. I sorta mis-spoke. I need to ship RTSP/RTP with hella-different stream profiles over WebSocket. Imagine slamming the entire RTP and RTSP datastream into binary packets shipped around via WebSocket only to have the packets reassembled in-order on the other end.
RTSP isn't quite the problem, especially if I had your use case. I work in PSaaS (physical security as a service) which has really unique streaming circumstance where security + latency are prime. Also, there's tons of additional features I need from RTSP as well, ex: "hey give me a new I-frame right now".
Unfortunately the RTSP spec is all over the place with my vendors... and throw the whole ONVIF WebSocket thing at it + trying to actually keep stuff cryptographically secure and welcome to my nightmare. I'm talking about craziness like not having an "ack" to the final "PLAY" command in the prelude but the server still shipping the data, packet parsers/fixers implemented in ECMA, browsers acting as the RTSP session client... unfortunately the list goes on.
There's a lot of overlap with your world. I even have your project bookmarked + labeled "sane light-weight RTSP gstreamer implementation" from when I was in my research phase! Small world =)
If you're interested you can check out the ONVIF streaming spec:
And the pipeline tech that most of my jazz is based on:
Thanks for being an open-source resource with well-documented work!!!
I wasn't aware that it was possible that you could request I-frames in the RTSP spec. Which clients provide such functionality? I am looking for some web based RTSP streaming solutions, so your project could be very useful :-) Good luck for it!
It's non-standard for the spec but is common for security industry. All of my Axis cameras support it, and some of my other vendors support it too... unfortunately it's vendor-specific but it's super useful when you're dropping someone into an pre-established stream and/or recovering from adverse stream conditions.
> I am looking for some web based RTSP streaming solutions
Start searching GH for "RTSP WebSocket" and you'll end up with a TON of stuff! It's how I had your work bookmarked in the first place.
I would also take a serious look at that previous Axis Communications lib I sent over as well for web-based stuff. One thing I'll warn you on is that you'll need to work within the HTML video element sink to look for "seek drift" and correct it when it comes up... VLC has some of these same problems too where I want to "FF to the end of the buffer" for low-latency video feeds VS. every "hiccup" bumping me further and further back on a buffer. This all gets weird for use cases like ours because the normal social contract for video is "don't lose frames", where ours is likely "give me the most up-to-date/valid video frame possible". (which is why requesting a new I-frame is SUPER useful!)
Also let me know if you know of any modern communities around this stuff... I'm seriously on Archive.org ripping through the wayback machine to learn (rtsp.org) or reading through ancient Live555 docs/GH projects... I could really use an accessible Discord channel around modern video engineering with experts like yourself in it!
But IMO you can learn a lot reinventing things badly and correct software has little business value anyway.
In this case, not only the fact that the format is ambiguous (the answer), but also the fact that there exists something interesting/unusual about the format (the question).
For example, you could have learned the answer if you could have searched “is there something unusual about the format?”, or even “is the format actually ambiguous?”, but those ideas were not present!
Active mentors can help with this a lot.
I had the opportunity to help with a project to port UNIX to a data general NOVA computer as an undergrad and the combination of "non toy" problems and results you can see is really fun. We didn't have to write the compiler though! That is an amazing part of this story too.
Given the availability of RISC-V descriptions and FPGA boards that will host it with memory, this sort of project should be within reach of anybody willing to invest about $150 in parts and a lot of labor. It would make for an interesting basis for a demo contest!
E.g. how do you, at a hardware level, actually do the reservation stations for a out of order design. How do you actually implement a register file with enough read and write ports to satisfy such a design without taking up a whole die for it?
I know there are a few Linux capable soft-core RISC-V designs out there (VexRisc, etc.) and microcontroller class ones (PicoRV32, etc.). If my goal was implement a system and it needed one of those things, sure, I'd use an off the shelf one. But I really want to understand how the CPUs work, and the best way to do that is doing it myself without looking at the answer key.
Turns out register files are complicated and fascinating. I'd never come across "register file banking" in my architecture courses. Makes what I had to deal with in CUDA make a lot more sense now.
I am going to comment on this though: But I really want to understand how the CPUs work, and the best way to do that is doing it myself without looking at the answer key.
I am right there with you this, however with experience I've come to appreciate that there is a lot of complexity in this topic and I personally have a limit on how steep a learning curve I'm willing to climb in my spare time. As a result I've taken to trying to isolate topics to learn around things that are known to work. Here is an example ...
In 2015 I discovered you could get software radios for $25 and there was a bunch of open source software one could use to play with them. I wanted to write my own modulator and de-modulator code but kept running up against too many unknowns so having it be impossible to debug. Was it my code? Was it the radio setup? Was it the signal format?
I didn't start making real progress until I got a real Spectrum Analyzer and Vector Signal Generator. This let me work from a "known good source" and generate signal that I could compare on my Spectrum Analyzer with the signal the VSG generates. THAT let me find bugs in my code and once I understood more of the basics of the DSP that was going on then I could branch into things like front end selectors and polyphase filtering.
So I applaud your effort, and more power to you if you punch through and get this done. It will be huge. But if someone reading this were to think this is the only, or best, way to do something I would encourage them to recognize that one can break these problems apart into smaller, sometimes more manageable chunks.
Real quick overview for some of the archs I know (that happen to all be x86), cache lines coming in from the I$ fill into a shift register. Each byte of the shift register has logic that in parallel reports "if an instruction started here, how long would the instruction be or say IDK". That information is used to select the instruction boundaries, which are then passed to the full instruction decoders (generally in another pipeline stage). After the byte lengths recognized are consumed, new bytes are shifted in, and the process starts over. This separation between length detection and full decode lets you have 16 or whatever length decoders but only three or four full decoders. Additionally the rare and very complex instructions are generally special cased and only decoded by the byte 0 length/instruction decoders. And even then, sometimes even the byte 0 decoder takes a few cycles to fully decode (like in the case of lots of prefix bytes).
I imagine superscalar processors for other CISC archs have very similar decoders, maybe just aligned on halfwords rather than bytes if that's all the arch needs (like for 68k and s/390).
I came across these notes a while ago when trying to implement something similar: https://my.eng.utah.edu/~cs6710/slides/mipsx2.pdf (it's an 8-bit MIPS, so it needs 4 cycles to fetch an instruction).
As noted in the article, the compiler was written in OCaml. If you follow links, the compiler emits assembly which is then assembled by an assembler written in Python.
I think developing the compiler in a more productive language (certainly for writing compilers) than C greatly helped.
I did graduate a university but the closest thing we had was using a custom CPU emulator, given by the teacher, to run programs, given by teacher, and tracing every step it takes. It was so tedious and boring (the thing ONLY accepted binary, too, hexadecimal would be too convenient I guess), I wrote my own emulator that took an assembly file as input and generated the paper as the output, you only had to add your name to it before printing and turning it in. It then took me 5 whole minutes to generate semester worth of those. Of course I got a passing grade. Still, no substantial understanding of CPU inner workings was gained during the process.
I kind of feel like you can do something like this in college, or after you retire, but nowhere in between. :)
We had to choose all our own chips and wire them together by hand. We had an FPGA for the Control Unit, but had to design the bus, pipelining, stack, carry-borrow, all the way up to running a demonstration program.
We did that in teams of 2 people, with only 1 semester. I was broke, so we chose the cheapest chips (I built MUXes out of NANDs). I realised we couldn't finish on time. I wandered the hallways, saw a project poster from a previous year, took some photos, reverse-engineered and reimplemented their design. We didn't have the FPGA code, but knowing the breadboard layout helped enormously. A few more nights in the lab, working to exhaustion and sleeping under the desk, and we passed the class. If you think industrial espionage only happens in other countries, that taught me that it happens in the US too. Arguably we made enough changes that it wasn't total plagiarism, but it did help to have a known-working design to build on.
The most lasting memory of that project, though, was when it worked fine during debugging, but failed whenever we removed the test probes. We were using 2 breadboards, but no common GND. During programming, they were grounded together through the GND on the PC's USB port. Always check for a common ground!
My favorite detail was that having spent all this time on extra credit, they only managed to complete their primary task an hour before the presentation - an unmistakable mark of a true hacker if there ever was one.
If I was that teacher, I would have flunked them for "doing something just because you can" and not using their capabilities for a more useful goal. And also because I'm tired of yet another toy CPU and toy OS that's just like all the others.
Now, the NES emulation scene is extremely mature, and so I quickly came across a number of warnings essentially saying "the world has too may NES emulators, please don't write another one." I decided to go ahead anyway, and after spending about 2 years' worth of afternoons and weekends, the result was just another bad emulator. It was slow, buggy, incomplete, and the code was a mess.
By your standards, I wasted 2 years of my life making "yet another toy" emulator instead of "using [my] capabilities for a more useful goal." But I had a ton of fun, and the learning experience was absolutely invaluable. I learned in depth how a computer actually works at the hardware level (even though the NES is much simpler than a modern computer, the concepts and skills transferred over very quickly).
As a result, my practical programming knowledge today is vastly improved over where it would have been had I not "wasted" all those afternoons. Many of the career opportunities available to me so far have been directly attributable to the skills I gained working on that emulator.
> These students are obviously smarter than the average student, and therefore I would expect more of them, like inventing new OS concepts instead of mimicking age-old designs.
You can disagree but here are the facts:
1. The students completed the task.
2. The professor had intentionally built slack into the schedule to let ambitious students take this further to play around with the tools & their knowledge to try doing creative things.
3. The students used this to recruit others students to form a larger team around a more ambitious end goal
4. Self-organized to distribute work & build a schedule that minimized interdependencies
5. Completed their goal.
Aside from the technical stuff, these all sound like valuable soft skills that were learned/applied in addition to the technical achievement. I'd say both the professor & students did a good job here.
What's hard for me to say is what year this is. In my old engineering school there was a final project in years 3 & 4. Your team would pick some kind of vague final project (with consultation from your teacher), you'd get a budget for materials + connections to companies/vendors for sponsorship, & then go about building your concept. That's a bit more complex than this but also lasts 1.5 years and happens largely outside of school. This blog post is about a project done within the context of 1 subject AFAICT. That makes it way more impressive.
Students at this level simply don't yet (at least generally) have the experience nor a good understanding of the space to understand what are more significant problems to go tackle. From what I've seen you usually start that journey as a masters/PhD student.
Jp uni students typically write a thesis in the fourth/final year (is it called senior in the US?). This project is for the third year (junior in the US?). Probably there is also a difference between a (e.g. mechanical) engineering and a CS department. A typical CS conference paper does not need 1.5 year from initiation to publication, while I understand a mech work will need a lot more time. I moved from B.Eng to CS. With a proper guidance that lets a student focus on a particular subject (a research theme is given by the advisor), they do incredible work, although they may not have a broader view of the research field yet.
> These students are obviously smarter than the average student,
And yes these national university students are the top brains of the country. BTW, I am very sad that typical Japanese corporate organization basically rejects them as they are too smart and do not fit in their age-based structure. The author is in Microsoft anyways.
Perhaps at some point one would expect students to invent something new, but it's generally after the OS/arch classes where you do things like implement toy OSes or toy CPUs. :P
I'm glad we can do whatever we find fun or interesting or because we want to learn it, rather than being told by someone like you that we're not allowed.
> I would expect more of them, like inventing new OS concepts
We're talking about undergraduates, not researchers.
I hope that the code in the commons can continue to become more modular to the point that is is practical to try out some fresh ideas with like a library OS / exokernel that provides all non-novel bits without constraining the design space.
That is one of the ideas at the very heart of hacker culture—doing things just because you can.
If you had a say you might point to Multics and shutdown the Unix project altogether, fortunately, you did not.
The Elements of Computing Systems: Building a Modern Computer from First Principles https://www.amazon.com/dp/0262640686/ref=cm_sw_r_cp_api_i_yH...
I saw there’s a highly rated Coursera version too (I prefer books to videos so I haven’t looked at it)
The paradox of any performance that seems effortless, is that tons of effort invariably went into producing that effect. Good going mate!
I would prefer if tool chain works on linux ?
The situation is slowly improving and there are various university research groups and corporations that have released larger amounts of open source code that one can use as a starting point:
- https://github.com/openhwgroup/cva6 (mostly done by ETHZ people initially, also see https://github.com/pulp-platform/pulpino)
- https://github.com/chipsalliance/Cores-SweRV-EL2 (one of several high quality open-source RISC-V cores by Western Digital)
- https://github.com/chipsalliance/rocket-chip (UCB by initial creators of RISC-V)
- https://github.com/electronicvisions/nux (POWER based embedded processor, that we have used in several neuromorphic chip tapeouts (silicon proven))
On the tooling side one great thing that happened is that the verilator open source simulator has gained a lot of traction.
On the hardware synthesis side some exciting developments are also happening, mostly driven by the desire of companies like Google to be vendor independent for their Deep learning accelerator designs and through funding by DARPA.
I see "cofee" written a bunch across the site is that some alternate spelling or just a typo?