While writing a text editor, a compiler, an operating system, or a raytracer might make you a better programmer, it won't make you a better software engineer. In fact, it might make you worse at software engineering, because it embodies the disastrous "Not Invented Here" doctrine.
Hackers like to obsess about Big-O, data structures, HoTT, and other high-theory stuff, yet the following skills, essential for software engineering, are almost never discussed and even more rarely practiced:
- Deciding what to write yourself and what to take from a library
- Identifying high-quality libraries and frameworks that meet your project needs
- Deciding where optimization is worth the effort and where it is not
- Writing code that will still be readable to you (and others) a few years from now
- Thinking about the project as a large-scale, complex system with software and non-software dependencies
In that spirit, I offer the following alternative challenge: Create a web search engine. Don't bother with string matching algorithms etc., others have already done that for you. "Just" make a search engine (and crawler) that can actually work, even if it only supports a subset of the web and a single concurrent user at the beginning.
I don't understand why you think a search engine requires the use of "real engineering skills" like picking and choosing libraries and identifying which opportunities will yield fruitful optimization.
Literally all of the listed projects, text editors, compilers, operating systems, and ray tracers, can exercise the exact same activities.
I'm more inclined to think that your comment is really more revealing about what kind of knowledge is in your own personal wheelhouse. Having worked on a ray tracer that was subsequently used on many feature films, I can assure you that all of your bullet points apply to that project, too.
It's much more a matter of whether you want to do something small scale and fun, or whether you want to suck all the joy out of it by applying the same soul crushing constraints we already get paid to do in our day jobs. Bleh.
> Literally all of the listed projects, text editors, compilers, operating systems, and ray tracers, can exercise the exact same activities.
In the linked article, these projects are all explicitly described as opportunities to learn about low-level stuff like how to efficiently store editable text. The difference with a web search engine is that nobody today can build such a thing completely from scratch, therefore it forces you to give up the toxic NIH mentality, which in my experience is usually driven by elitism and ego (on full display in this comment thread), not by some lofty desire to learn something new.
If you want to build a compiler from scratch, you must first invent the universe. Peeling back abstractions to see how things could or should work is perfectly fine, even for professionals.
Case in point, I've spent a year excising bloated frameworks from my stack at work and replacing the few corners we needed from those frameworks with, e.g. 50 lines of curl calls. The C compiles instantly and is tailored for our tiny use case, produced much quicker delivery on our one related feature we wanted, and removed chains of dependencies.
Being reliant on far-too-abstract libraries and frameworks to do simple jobs is also a curse. But nobody at work has the experience to know that curl was sitting right there on our image available for our use. And nobody has that experience because nobody took the time to build something from low level libraries. Now we know how and can make an intelligent decision without defaulting in either direction because we were afraid to try.
20,000 fewer lines of code, faster compile / deploy times, significantly less interfacing / translation between "their" types and "their" apis, and a much better control over types and structure across our codebase b/c we didn't need "their" types and structure anywhere. It had crept everywhere.
It's just faster, cleaner, and easier in a few cases to do precisely what you need right now, rather than anticipate a million things you might need and refactor your code to adopt a given "solution".
Yeah, the problem / benefit is never the one framework you added / removed, it is the mindset of reaching for another dependency as a default which leads to a behemoth which no one enjoys working with.
I also think a person who has had to strip out or replace dependencies is more likely to be careful when choosing them in the future. This is definitely the case for me, at least.
You know those car guys who will rebuild their engine, just because? Or those retro computer guys that will recap an ancient board rather than buying a modern pc?
For you it is a job, for me it is a hobby. I have no interest in making something 'professional' I want to take it apart to understand how it works. Want to understand how a text editor works? Write one.
What component of that is ego?
It seems to me you're the elitist, you're not far off saying mere users shouldn't be allowed to modify their own software, shouldn't be allowed to install software that hasn't been okayed by the people who know what they're doing.
Your lack of constraints on personal exploration in software is interpreted as hubris by people who came to software seeking high paying regimented recipe-following.
Assuming you get the TCP/IP stack for free, you still need to build fully-featured HTTPS and a "webscale" multi-server database for document storage from scratch. The crawler is easy and so is something like PageRank, but then building the sharded keyword text search engine itself that operates at webscale is a whole other project...
The point is that it's too much work for a single person to build all the parts themselves. It's only feasible if you rely on pre-existing HTTPS libraries and database and text search technologies.
Simple methods of search like exact matching are very fast using textbook algorithms. There are well known algorithm like suffix tree which could search in millions of documents in milliseconds.
That's not enough. It needs to be sharded and handle things like 10 search terms, each of which match a million documents and you're trying to find the intersection. Across shared results from 20 servers. Quickly.
Intersecting postingslists is a solved problem. You can do it in sublinear time with standard search engine algorithms.
The problem space is embarassingly parallel, so sharding is no problem. Although, realistically, you probably only need 1 server to cope with the load and storage needs. This isn't 2004. Servers are big and fast now, as long as you don't try to use cloud compute.
1 server to index the whole internet? I don't care how big your server is -- that's going not to happen.
And of course all of this is based on "standard" algorithms. But it doesn't change the fact that implementing them all to build a functional performant web crawler+database+search is a project that is larger than any one person could build from scratch.
It would only be feasible to build on top of existing libraries, which is the whole point here.
> It would only be feasible to build on top of existing libraries, which is the whole point here.
I don't think you've made a single point that substantiates that claim and neither has anyone else, really. The person you're arguing with has built (as per their own estimation) more from scratch when it comes to the actual search engine core than anything else in their own project (https://marginalia.nu/).
Honestly, it seems more like the people arguing a search engine is somehow "off limits" from scratch are doing so because they imagine it's less feasible, probably because they simply don't know or are in general pretty bad at implementing things from scratch, period.
To a large degree there is also a wide difference in actual project scope. Remember that we are comparing this to implementing Space Invaders, one of the simplest games you can make, and one that is doable in an evening even if you write your own renderer.
To talk about "webscale" (a term that is so deliciously dissonant with most of the web world which runs at a scale several 100 times slower and less efficient than things should be) is like suddenly talking about implementing deferred shading for the Space Invaders game. Even if this was something you'd want to explore later, it's just not something you're doing right now because this is something you're using to learn as you go.
> 1 server to index the whole internet? I don't care how big your server is -- that's going not to happen.
Well let's do the math!
1 billion bytes is 1 GB. The average text-size of a document is ~5 KB, that is the HTML, uncompressed. 1 billion documents is thus about 5 TB raw HTML. In practice the index would be smaller, say 1 TB, because you're not indexing the raw HTML, but a fixed width representation of the words therein, typically with some form of compression as well. (In general, the index is significantly smaller than the raw data unless you're doing something strange)
You could buy a thumbdrive that would a search index for 1 billion documents. A real server uses enterprise SSDs for this stuff, but even so, you don't even have to dip into storage server specs to outfit a single server with hundreds of terabytes of SSD storage.
> And of course all of this is based on "standard" algorithms. But it doesn't change the fact that implementing them all to build a functional performant web crawler+database+search is a project that is larger than any one person could build from scratch.
I'm still not seeing why this is impossible.
> It would only be feasible to build on top of existing libraries, which is the whole point here.
I don't think there are existing libraries for most of this stuff. Most off-the-shelf stuff in the search space is for indexing smaller corpuses, they generally aren't built to scale up to corpus sizes in the billions (e.g. using 32 bit keys etc.)
No, it doesn't require. If you take all public web text it is just something like 40 trillion characters, which could fit in a single hard disk uncompressed including suffix tree structure.
It is a joke based on needing to define "from scratch". Similar to, "how do you bake a cake from scratch? First you must create the universe." Op is gathering the raw ingredients to start fabricating his chips.
Back when online learn-to-code courses like Codecadamy and Udemy were a fad, I remember that one of them (and unfortunately I don't remember which, and Google, ironically, turns up nothing) taught how to build a search engine in Python from scratch as a first project for complete beginners. I thought it had a reasonable level of complexity for this task.
You can still find search-engine-from-scratch courses on Udemy, complete with all the necessary algorithms [1].
While nih is ultimately dependent on multiple different aspects, I would argue that, ironically, it's more likely to create nih-qualifying product with your aproach.
Because when you write things from scratch, you actually have the space to innovate. But when building product with preexisting puzzles, there is much less space to actually make any usefull changes that would make your product actually standout from existing alternatives (which nih is all about)
but a text editor, a space invaders clone, a tiny basic compiler, or a mini operating system is an afternoon project. a spreadsheet or a video game console emulator is maybe comparable to implementing datalog
>> The difference with a web search engine is that nobody today can build such a thing completely from scratch
Really? I think there are a couple out there. The main issue today is scale, but you could constrain that by limiting your crawler to a fixed set of sites.
I enjoy reading your discussion and just wanted to add that some people write a big scale software from scratch nowadays - for instance Marginalia for web search and Andreas Kling and team for operating system and web browser
Marginalia, which I'm a big fan of, is not "written from scratch" (it would be stupid to do so). Check the project on GitHub, it has lots of third-party dependencies.
I definitely lean more toward NIH than what's conventionally considered wise, but most of the time it's not NIH for the sake of NIH.
I do pull a lot of libraries, but an enormous amount of what the search engine does is very much built from scratch. The libraries generally deal with parsing common formats, compression, serialization, and various service glue like dependency injection. I think the number of explicit dependencies is a bit inflated by the choice to not use a framework like springboot, which pulls many of the same (or equivalent ones) implicitly.
What makes the search engine a search engine, the indexing software (all the way down to database primitives like btrees etc.), a large chunk of the language processing, and so forth; that's all bespoke. I think it needs to be. A lot of existing code just doesn't scale, or has too many customizations that would add unnecessary glue and complexity to my own code.
I'm going to echo SerenityOS Andreas and suggest that it's a skill like any other. If you shy away from building custom solutions to hard problems, you will never be good at it; and it will become a self-fulfilling prophecy that these NIH solutions are too hard to build.
At the same time, there's a time and a place and you should indeed be judicious as to when to roll your own solutions, but maybe that time and place is exactly in a hobby project like the ones suggested in this thread (and is how my search engine started out; a place to dick around with difficult problems).
I'd also add that being able to tackle problems yourself, rather than needing a library to do all the heavy lifting at all times, is a great enabler. Sometimes there is no adequate library, but that doesn't mean the conclusion has to be "welp, I guess we can't do that yet..."
Well, if I’m going to build a web search engine from scratch don’t I first need to write a compiler from scratch? Which means I first need to write an editor from scratch…
I encountered some push back on this in my "code editor from the ground up" post[1]. I think the only reasonable definition of from scratch is:
Does not have domain-specific dependencies.
So a code editor based on ACE or CodeMirror would not be from scratch, obviously, but one that involves writing all of the domain-specific logic would be. Using generic libraries doesn't stop something being from scratch. (In my case Tree-sitter is arguably domain-specific, but an early version did use a hand-coded JavaScript tokeniser in its place.)
> It's much more a matter of whether you want to do something small scale and fun, or whether you want to suck all the joy out of it by applying the same soul crushing constraints we already get paid to do in our day jobs. Bleh.
Amen. And further, what better prepares a programmer to assess the relative costs of implementing a thing vs using a library providing that thing than having attempted an implementation?
Learning by doing is a valid approach, and this can even be called fun.
This sounds like sensible advice. But there are a few problems with making software purely from ready-made building blocks. Here are two of them:
1) Much more often than not, the ready-made building blocks are crap. Your software will reflect that crappiness, and your life will consist not of writing software, but of maintaining and massaging crap.
2) Even if your ready-made building blocks are high-quality, they limit what kind of software you can write. Ideally, you imagine what you want your software to do, and then you write software to do it. You create the building blocks in that process. If you are only using ready-made building blocks, there is a lot of great software you just cannot write. And I think we all see that, because there isn't much great software around.
These problems might not be a concern for you. If you write software that is pretty much just more of the same, 2) does not apply to you. And while 1) is still problematic, it will be just manageable because you use these building blocks mostly on their happy path.
But I have bad news for you. This kind of software will soon be written not by you, but by an AI.
> If you are only using ready-made building blocks, there is a lot of great software you just cannot write. And I think we all see that, because there isn't much great software around.
I claim that the opposite is true: The reason there are so few software projects that are really great is because high-level skills, not low-level ones, are in short supply. Every CS graduate can implement Paxos. The number of people with the experience needed to take an implementation of Paxos, and many other components, and put them together into something that works really well is much, much smaller.
> But I have bad news for you. This kind of software will soon be written not by you, but by an AI.
AI is actually much better at writing low-level code than high-level code. If you tell it to implement a piece table or a Fibonacci heap, it will do so without problems, in any language. If you tell it to create a simple CRUD application using existing libraries, it will output something that doesn't actually work.
I guess it all depends on your definition of low-level. High-level can also be low-level, just with different building blocks, if the building blocks are based on well-specified abstractions.
I am actually currently writing a novel text editor, and I can guarantee you, my requirements mean that an AI cannot help me with that directly, because the underlying mechanisms I need don't exist yet. The AI is still helpful though when researching existing mechanisms, and to pinpoint why they don't work for me.
> AI is actually much better at writing low-level code than high-level code.
> If you tell it to create a simple CRUD application using existing libraries, it will output something that doesn't actually work.
If your low-level code is just a copy or adaptation of something existing, then yes, AI is really good at that. Something novel though, AI cannot help you much here.
On the other hand, simple CRUD applications will all be done by AI very soon. Probably not using existing libraries though, because these are crap.
But there is certainly a role for humans currently in high-level design, I agree with you here. But this high-level design crucially relies on your ability for low-level design, with the ability to imagine and then implement (possibly with the help of AI) the building blocks you need, not just reuse the ones that are there.
There is a world of difference between "can be done" and "can be done well". Existing building blocks typically is good for the former but often not the latter, or you have to exercise a good amount of judgement when piecing things together.
No AI needed. I use a custom code generator to generate 80% to 90% of the code I need in CRUD applications. And I can actually trust the code unlike code written by an AI.
If you feel like telling us about your novel approach to text editing, I'd like to hear it (or I'll just wait for the announcement post. That's cool too).
You’ve just shifted the goalpost twice here. I never learned about Paxos, but I did learn about consensus algorithms. Second, learning about something, and implementing something are completely different. Most CS graduates have an idea of how syntax parsing works. How many can implement a parser? What about a syntax highlighter? Most graduates have an idea of how an OS works. How many can build an OS?
I guess if you’re always using libraries though, you may mistakenly be thinking that these libraries are just doing trivial work. Once you dive in and try to implement the low level stuff, then you realize how big of a disconnect there is between a fuzzy idea in your head and the lines of code that constitute that idea in reality.
I have a Comp Eng degree from the 90s, been in the industry since then (~25 years), and I have never heard of Paxos. Even the top results from a Google search yields that Paxos is one of: 1. a blockchain company, or 2. a Greek island. More than halfway down the page there is mention of an algorithm of some kind. Probably not a good example of "baseline knowledge" for a software engineer.
This is like telling a student learning an instrument never to practice scales or études because there are other skills necessary for playing in an orchestra. Those other skills are important, but you have to develop your chops at some point, and being a better a programmer will certainly help with basically all those skills you listed anyway.
Nowadays (and increasingly, going forward) it's possible to be a very productive programmer without knowing much low-level stuff. This may seem unfair to those who spent years wrestling assembly and then C pointers but that's just today's reality.
It's not possible to be a "productive" musician on a traditional instrument (i.e. excluding iPads) without knowing how to play scales, chords, etc.
Hard disagree. Those “productive” programmers are a nightmare, their code is a barely decipherable jumble of randomly gluing function calls together until they kinda sorta do something that approximates working some of the time. They cause an unfathomable amount of damage, lost productivity due to bugs and data nightmares, lost data, and headaches.
They do make idiot managers happy though, because look at all those features and scrum points they “finished!” That’s why armies of them will always be there in the software ecosystem, blithely and ignorantly a net drain on whatever unlucky company is currently employing them, busily making a mess for more diligent programmers to clean up for the rest of eternity. It’s called “job security.”
According to what evidence? I have worked with developers who later ended up working for Google and they were OK but not great. They didn’t stand out in any way.
> That’s demands for faster delivery of software and ridiculous demands on user experience.
Absolutely not. It's about bloat. Many times people "just want to code" or eschew optimizing repeating the mantra that "premature optimization is the root of all evil" that you often end up with bloated software.
They can't read music, but they can still make their hands move in repeatable patterns without thinking about each finger position, which I think is roughly the equivalent of implementing quicksort.
As someone who has played guitar on and off for years as a hobby, I'm shocked by how much harder it is than programming.
I can pick up almost any random codebase on GitHub, and as long as they use libraries and don't touch low level stuff, I can probably start working with it in 10 minutes to a day at most.
Then, I could leave that project for a year and be almost as productive as when I left. So much of the action is on the screen, not in the head, and there's no muscle memory required.
If any instrument was as easy as coding... I think a lot of coders might have music careers instead....
> This is like telling a student learning an instrument never to practice scales or études
No, it's like recognizing that an orchestra conductor doesn't need to play every instrument (or even any instrument!) in order to make music.
Or alternatively, that a violin player doesn't need to understand the physics of acoustic dispersion in order to be the best in the world at playing the violin.
Yes a software engineer understanding data structures and runtime complexity is akin to a violin player understanding the physics of acoustic dispersion.
So here’s the thing… I did an EE/CS dual degree. I’m going to focus on the EE side of it for a bit.
We learn the high level practical behaviour of devices: resistors, capacitors, BJT and MOSFET transistors, opamps, logic gates, etc in 2nd year. What we learn there is enough to be an effective practical circuit designer.
We also dive deep into how that stuff works under the hood. One course I remember fondly started the first lecture with “This course is called semiconductor physics. To understand semiconductor physics you need to understand solid state physics. To understand solid state physics, you need to understand quantum mechanics. We have a lot of material to cover so let’s get started.” A similar course was Electric and Magnetic Fields, where we spent a ton of time applying vector calculus to situations of varying complexity to get a better understanding of how the things we learned in 2nd year actually work and some of the limitations that you wouldn’t pick up on from the practical (generally first-order) models.
You don’t get to graduate with an EE degree without learning this stuff. And in my mind, that stuff is like a mixture of data structures, runtime complexity, assembler, and cache coherency. Fundamental understanding that while you won’t use directly in your day-to-day work but underpins everything you do.
The job market for EEs...still isn't as hot as for CS people. However, you could also focus on DSPs in EE and use it to get a heads up on machine learning. This assumes you are ready to specialize, however. Anyways, none of it should go to waste.
I’m quite a ways into my career and being able to dance up and down the “full stack” from designing circuit boards, to writing low level firmware, to backends in Python/Elixir/Java/etc, to frontends in JS/ObjC/Kotlin/Swift/Dart has been a lot of fun. I did the dual EE/CS degree because the “computer engineering” program wasn’t introduced until a few years after I graduated.
That’s all kind of missing the point though. This whole thread has been a discussion about whether Software Engineers need to get down into the nitty gritty low-level stuff. My take is absolutely yes. You might not use it directly every day, but you’ll make consistently better design decisions if you really feel it in your bones. Over on the electrical engineering side of the world you don’t get to graduate without that background understanding. In the software engineering world some people seem to take pride in their ignorance of the foundations they’re building in; in my 20-ish years of experience, those people often end up building things that work poorly and they don’t have the background understanding needed to fix it.
> an orchestra conductor doesn't need to play [...] any instrument! in order to make music
Ok, this is clearly a side-topic AND at the risk of being pedantic: Is this actually true?
Like, I can see how theoretically one could learn to sight-read music well enough to be able to direct an orchestra of individual musicians, then do enough ear-training to identify enough notes to keep tabs on everyone (especially if you're conducting a high school or (god help you :) ) middle school orchestra), etc, etc.
But does anyone actually do that? Has anyone ever done that?
"I'm going to learn how to conduct an orchestra without learning any instruments" kinda feels like "I'm going to become a software engineer using an LLM instead of learning any foundations)"
(To be clear - the LLM path might be viable in the future, possibly the near future, but at least today it's not quite there. My apologies in advance if my analogy doesn't work in the future :) )
No, it's not true, there are no prominent conductors that cannot play an instrument (or sing at a high level).
A conductor should have a deep understanding of music, theory, and rehearsal pedagogy. At least in current western schools of music I don't see how you would explore these topics outside of the study of an instrument. Maybe there is some esoteric path to this; I just can't imagine how you go to Berklee and begin to explore the nuances of a composition without ever engaging with it as an instrumentalist.
On top of that, the conductor isn't just engaged at performances, they're also responsible for leading rehearsals. If they've never deliberate practiced the learning of music from an instrumentalist perspective I think they would be very hard-pressed to structure strategies for getting the larger group to a high level.
I guess it depends what you mean by "play an instrument" - almost all proficient conductors have proficiency in at least one orchestral instrument.
There is at least one (so I'm speculating he's not the only one, but it is likely to be rare) proficient conductor (Leopold Stokowski) who had no real proficiency with any instrument but he did have some very rudimentary piano training... and then pretty much taught himself conducting. Whether that rudimentary piano ability counts as "play an instrument"
I was intrigued and looked into this. According to Wikipedia:
"He studied at the Royal College of Music, where he first enrolled in 1896 at the age of thirteen, making him one of the youngest students to do so. In his later life in the United States, Stokowski would perform six of the nine symphonies composed by his fellow organ student Ralph Vaughan Williams. Stokowski sang in the choir of the St Marylebone Parish Church, and later he became the assistant organist to Sir Walford Davies at The Temple Church. By age 16, Stokowski was elected to membership of the Royal College of Organists. In 1900, he formed the choir of St. Mary's Church, Charing Cross Road, where he trained the choirboys and played the organ. In 1902, he was appointed the organist and choir director of St. James's Church, Piccadilly. He also attended The Queen's College, Oxford, where he earned a Bachelor of Music degree in 1903" [1]
That sounds like a bit more than rudimentary piano training to me :)
the skills of conducting do not require any instrument. Many conductors tell someone how to play their part despite not knowing how to play it themself. However it is hard to imagine anyone learning music theory not in context of learning an instrument.
> it's like recognizing that an orchestra conductor doesn't need to play (...) any instrument
i heard this before and it seemed dubious to me
on investigating, it turned out that there were literally zero famous orchestra conductors who don't know how to play any instruments. most of them know how to play numerous instruments and do so in their spare time, though there were one or two who had stopped. from my experience with musicians i think it's likely that they could play any conventional orchestra instrument after a short exploration period, even if they don't have documented histories of playing it
i suspect that this is true of non-famous orchestra conductors as well, but i could only find public information about the famous ones
of course that doesn't prove anything about programming, which is not the same skill as playing a musical instrument or conducting an orchestra, but it does show that you're constructing your arguments without much concern for veracity
the fact that in https://news.ycombinator.com/item?id=38769008 you claimed that web search engines are not built by people who work on string matching or distributed database transaction consistency further calls your credibility into question
I'd rather put the violin player with understanding of acoustic dispersion in the same bracket as a software engineer who understands the effects of magnetic interference on a memory controller.
Speaking for myself, software engineering is something I do because I have bills to pay. Programming is something I started doing because it’s fun. The projects in this article are the types of projects I would do for fun if I had the time.
>Speaking for myself, software engineering is something I do because I have bills to pay. Programming is something I started doing because it’s fun.
Same for me. While I strive to do stuff fast, in a way that is easy to maintain, modify and extend for work, see the big picture, take the right time of shortcuts, understand business and customers, there is not were the fun is.
I enjoy the time when I am alone with the PC, far from user stories and scrums.
I love to program, to convince the PC to do the trick I tell it to do. What I enjoy also is CS, algorithms, data structures, abstract thinking.
> Software engineering is something I do because I have bills to pay. Programming is something I started doing because it’s fun.
Sudden clarity.
I feel like printing it and putting it next to my workstation, so that I remember this both when doing something fun and when having to work so that my bills get paid.
Also, I'd add that doing HoTT or other such "high theory" stuff actually enhances my understanding of the fundamentals and gives me the machinery to handle complexities systematically that building a mere web search engine would not. I already have a job for building such commodity stuff everyday.
I like watching Tsoding in my spare time and I love what he calls "recreational programming". That is what we need imho, more recreational programming.
> In that spirit, I offer the following alternative challenge: Create a web search engine.
Yea, no thanks.
If I'm going to work on something in my spare time, I am going to work on something that I am interested in and find fun. I generally loathe the dirty work often involved in software engineering and debugging all of these dependencies all the time. It's exhausting. Doing things from scratch puts me in charge.
And I mean, creating a search engine is the ultimate "not invented here" project. I'd just pay for and use Google, Bing, or whatever else API to do things for me. So it doesn't even make sense in the context of your objections.
> And I mean, creating a search engine is the ultimate "not invented here" project. I'd just pay for and use Google, Bing, or whatever else API to do things for me. So it doesn't even make sense in the context of your objections.
Except that none of the established search engines actually does full-text search (anymore). There is no way to prevent those engines from breaking apart and rewriting your query as they see fit. No matter how many quotation marks you put around it, and even if you enable "Verbatim" mode or whatever.
So if you just implement the option to do that (which is actually the default if you leverage a standard FTS library), you already have something that AFAIK you can't get anywhere else. Then you can work on kicking the blogspam out of your index, and before long you will have a search engine of real value for yourself (and possibly others).
Or, you can build a crappy clone of a text editor from the 70s.
There are a few things on this comments that is causing noise in my mind.
>might make you a better programmer, it won't make you a better software engineer
I think it is a short-sighted definition of SE but in any case, there are not any mention anywhere in the article claiming that this was aimed to software engineers.
Also, although I appreciate the concerns of "Not Invented Here" doctrine, we also have the other side of the spectrum, where we don't know how to do a left-pad anymore [1], and introduce dependencies everywhere.
The third point is that I don't think that having more knowledge on how things works contributes with the "Not Invented Here" mindset. I would argue is the opposite. When engineers are hungry on learning, they find excuses to make address that appetite at work, but once they know the tradeoffs and amount of effort to make it work, they will think twice before starting anything from scratch.
If it is not not invented here, on the limit is not invented anywhere because nobody knows how to build these things, which we have been on that path more or less.
Perhaps you were visualizing that guy on your job who creates his own libraries and obscure systems within the system before telling anyone and that is generally a bad peer.
But the Universe is much bigger and diverse than that, people who programmed their own things but do a Google search first and use other software because they know it’s more feature complete, more known, trusted and better future supported are the best to work with.
We are all using the products of not invented here syndrome right now, don’t forget that.
If you want learn how things scale across a team and last years, read or contribute to open source code.
It takes years for a single person to get a project to the point where it's a good learning ground for scaling and maintenance.
Gluing a few libraries together is real software engineering but unless you're really invested in the outcome it's not that engaging and it's not that educational.
> Gluing a few libraries together is real software engineering but unless you're really invested in the outcome it's not that engaging and it's not that educational.
That's only true if complex systems don't interest you.
Personally, I have always found the experience of "putting the pieces together" and orchestrating highly diverse systems into a coherent whole to be much more educational than learning about algorithmic details. I also generally find making things work well more interesting than making things work.
So if I'm understanding your argument correctly, if you enjoy it, it's what everybody else should do. The other points are just post-hoc justification.
>While writing a text editor, a compiler, an operating system, or a raytracer might make you a better programmer, it won't make you a better software engineer.
To me learning the engineering part is easiest. And also not terribly valuable if you don't know CS fundamentals.
I don't understand why you believe low-level "intrusive" programming and general software engineering are mutually exclusive. There is a big, open field for high quality, low-level software to be written well bearing good practices and practical decisions.
1. To be faif, these kinds of "hacking" projects aren't technically meant for learning software engineering in the context of a job, but they are really interesting and a treasure trove of new knowledge for anyone willing to put in the effort. I don't think it's a bad thing. A person can learn to be a good software developer and simultaneously have deep knowledge of the systems they are working with.
2. The goal of this project is not to research and implement some novel algorithm to replace existing works. The goal is to learn how things work. You are still going to use libraries, procedures and algorithms designed by other people and compile them into a cohesive system, it is merely the depth and complexity of the problem that makes it an interesting learning challenge. It's not about some absurd NIH philosophy, it's about curiosity.
3. Software development is not as straightforward as you make it seem. It's not just about best practices, management and finding an optimal way to combine existing libraries into your project. Sometimes you run into unforeseen issues, strange niches that your libraries may not account for. Sometimes you need to dive into the specifics of these libraries, understand at a lower level why they do not function for your use case. You might need to "hack" these libraries. Fork and customise them to truly fit in your project and produce the desired results. And the only thing that will help you then is deep knowledge and the ability to work with low level software.
>I don't understand why you believe low-level "intrusive" programming and general software engineering are mutually exclusive.
Hmm, while I don't fully agree with his comment,
then your comment reminded me how I hate to discuss programming languages and programming ecosystems with C people.
It feels like C/low lvl people often refer to some "standard" that they treat as a bible of programming languages even when nobody mentions C and they treat GCC/CLANG as a perfect reference for compilers.
So, while high level devs dont understand low lvl stuff, then I believe low lvl programmers aren't creative at very abstract level, very abstract system modeling cuz they're too used to modeling everything with integers in C.
And they're too used to being hurt by C ecosystem that they believe it is "normal".
High level people (web developers) spend a lot of time arguing about their buzzwords like microservices, ddd, oop vs fp, etc.
Honestly, that is fair. My perspective on this is as a high-level developer, looking into how the things I am using daily actually work. I haven't actually interacted with any proper low-level developer, kind of hard to come across them where I am tbh, so I don't have any clue about how snobbish they can be, but I have seen a fair share of high level devs do it too, as you mentioned. I kind of get it? It's like a way to get validation on the technology you are stuck with for the rest of your working lives, I'm guilty of pushing Flutter onto people despite having had a painful experience with it (state management is a nightmare).
But hey ultimately I just think people should satisfy their curiosities. My work is often always in Python, JS or Java, but at some point I got tired of just doing regular software development stuff and lately I've happened to gain interest in low level development. (Like any definitely sane person I chose to start with rust and decided my first ever rust project should be an emulator, so maybe I'm just dead inside and I don't know it yet)
There is a need for both "solving complex problems alone" and "learning how to work as a team developing a complex project within a timeline". The Software Engineering part is best learnt in industry under a mentor.
The tooth extraction skill is fundamental if you are a dentist. You learn it in school. Knowing how to market and sell that skill is valuable but you can learn it while working.
CS I learned in school, engineering I learned while working.
Who do you think puts systems like a web search engine together?
It's certainly not people who are lost in details like string matching or how ACID is implemented in distributed databases.
Of course you need all these things in order for the search engine to work – but if you write them yourself, even if you think about them excessively, you will never finish.
Software engineering is knowing that you don't need to know everything, you only need to know where to find everything. "Programming" is an important part of it, but far from the only one.
> Who do you think puts systems like a web search engine together?
i've met a good fraction of the first 128 google employees, and this could not be farther from the truth:
> It's certainly not people who are lost in details like string matching or how ACID is implemented in distributed databases.
the people who thought those things were unimportant details were the ones who tried to compete against google and failed, like lycos, inktomi, and pets.com
like, check out the wikipedia article on udi manber, who is best known for spending 15 years 'lost' in string matching:
> In 2002, he joined Amazon.com, where he became "chief algorithms officer" and a vice president. (...) In 2004, Google promoted sponsored listings for its own recruiting whenever someone searched for his name on Google's search engine. In 2006, he was hired by Google as one of their vice presidents of engineering. (...) In October 2010, he was responsible for all the search products at Google.
BTW at Yahoo in the 2000s I learned that Manber had invented at least one new trade-secret algorithm while he was there. The one I ran across wasn't for string matching but it was a fairly basic kind of thing.
I find your comment quite ironic considering the engineers that built Google search did it by developing novel software systems, to great success. The Google codebase is basically the definition of NIH syndrome.
> Who do you think puts systems like a web search engine together?
Literally by stitching together people you’ve listed here and making them work together:
> It's certainly not people who are lost in details like string matching or how ACID is implemented in distributed databases.
> Of course you need all these things in order for the search engine to work – but if you write them yourself, even if you think about them excessively, you will never finish.
Being able to write something from scratch is not the same as always writing from scratch. Let me turn this around. Who would you trust with writing search engine, a software engineer who worked on and wrote search engine from scratch or someone who only theoretically knows about it?
> Software engineering is knowing that you don't need to know everything, you only need to know where to find everything. "Programming" is an important part of it, but far from the only one.
I hear this excuse more and more when it comes to threads like this. You don’t need to defend your life position, it’s okay to not know everything.
I believe this more and more as I go through my career. There is usually one person carrying progress hard. If you can get three or four of those people and another one to coordinate them you can do truly amazing things. Usually though you just need one unencumbered by bureaucracy.
To expand, the software industry moves fast, and you have to keep learning to stay afloat, or risk stagnating and ending your career early. look at where software was at 5 or 10 or even 20 years ago. There was no react and no rust 20 years ago, nor was there even git! Whatever you know now is going to be out of date in a matter of years. good luck getting a job if that's all you ever want to learn.
The good news is that the field of computer programming is huge - so there is room for all sorts of preferences, skills, and abilities.
And sure, use of the word "engineer" will cause angst among some - because it used to have a precise meaning which has been diluted (and in the case of software, ignored.)
Indeed the list of projects runs the gamut of where a programmer might go in life. They might do user UI, or games, or work on an OS or compiler. More likely they'll work at some random company doing databases and reports. Or they'll be Web focused doing lots of JavaScript. Or they'll be deeply involved in the field of Advertising (Advertising Engineer anyone?)
Of course wherever you end up, you can still have some fun. And if you're just starting out then it can help you understand your desires, and limitations.
So don't worry too much about the "engineer" word. It's pretty irrelevant. Even the title of the thread is "programmer" not "engineer".
Not ignored! It's precisely because software engineering discards the very parts of programming that are closest to real engineering that this term came to describe it. "Don't bother learning the fundamentals, but focus on bureaucratic ritual instead" is a message that requires excellent marketing.
Maybe the *term" "software engineer" is not an established thing.
But these points raised by GP... are basically self-evident.
> - Deciding what to write yourself and what to take from a library
> - Identifying high-quality libraries and frameworks that meet your project needs
> - Deciding where optimization is worth the effort and where it is not
> - Writing code that will still be readable to you (and others) a few years from now
> - Thinking about the project as a large-scale, complex system with software and non-software dependencies
... But since they don't have immediate effect (along the tune of "look I made this shiny thing myself in a weekend!"), it's hard to brag about it and perhaps even hard to establish causality that your skills in the areas above contributed to the software/business' success. But you can always brag about how you implemented a Fibonacci heap or something and made your heap operations 0.2ms slower.
I'm working on a hobby OS. It's mostly just a fun project, but there's certainly a lot of deciding what to write and what to use, and identifying quality libraries when you want to use something. You can start from nothing and build your own board and bring it up with no software you didn't write or you can use a commercially available board, bios or uefi boot, use an existing boot loader, use an existing libc, etc etc, and explore the specific thing you're interested in, while still earning a lot of systems knowledge.
Deciding when to optimize isn't too hard when nobody is ever going to use it (only optimize run time if something really takes too long, otherwise don't do anything tremendously stupid, but optimize for less development time because this is taking forever)
Readable code would be nice, because sometimes it takes a long time to get back to it, but I've also designed in a real reason to come back and retouch everything (x86 -> amd64; aarch64 as a stretch) and maybe I'll make things reasonable then... Although it's not my strongest skill.
Even most simple OSes become complex. Maybe the dependency chain isn't too long though.
I figured that was the point here. It's nice to follow tutorials for a new language or framework or whatnot, but to really stress and test what you absorbed, you want some non-trivial project with minimal dependencies. the game section exemplies this the best:
>The idea here is to implement a well-defined game from start to finish without getting bogged down on the other fun stuff (e.g., game design and art). Also, it is best if you use a barebones 2D graphics library (e.g., SDL, SFML, PyGame), not a big game engine that'll hide all of the interesting bits from you.
This kind of game won't go on sale nor even be a portfolio piece for a non-junior engineer, but it's a good enough idea for someone like me who say, wanted to really wanted to solidify their Rust skills and show what my and the language's weaknesses are.
After I can get that done, I may feel confident enough to move on to work on a toy renderer based on what I figured out in the game. And then later contributing to a proper engine that already has a lot of groundwork figured out. I'd just be getting in the way if I jumped straight into trying to throw anything but the simplest PRs into Bevy.
I think those who deal with OS kernel, compiler etc on daily basis also understand those points. They use various tools to make code review, time management etc easier.
Not all hackers are overly obsessed with the most efficient Big O and various low level (sorry pun intended) details.
Most of the time, big O is not relevant if you're not writing new algorithm. For the usual CRUD app? Not so much. Heck even for more involved work, you will not touch them because your solution will be called or will call some external tools.
The good news is that, while still valuable, you won't have to lose time creating a new broken wheel. The bad news is that this part of CompSci is now mostly fundamental research.
And then you get needlessly inefficient CRUD apps that take minutes or even hours to generate a simple report because they're doing accidentally quadratic or even cubic work somewhere.
This is just wrong. Understanding the difference between O(1), O(n) etc is essential for literally everyone who writes code. Every single programmer is better with this understanding than without it. You should know the complexity of the code you write - and most of the time that doesn't even require actively thinking about it. If you know the basics of complexity analysis it's just intuitive.
Sure, you don't strictly need it to write working code. But you will very quickly run into situations where you're writing unnecessarily slow code because you don't know what you're doing.
Understanding algorithm costs can be extremely important when building a simple CRUD app because it will help you reason how things will scale with the size of the data you’ll see in the real world.
It’s easy to make things that are accidentally quadratic but are fine until your big customer has 10 times the data and they suddenly aren’t fine anymore.
Doesn’t mean you need to optimise everything, but it doesn’t mean you shouldn’t think about this stuff.
> In fact, it might make you worse at software engineering, because it embodies the disastrous "Not Invented Here" doctrine
I get your point against NIH, but I don't see how doing such a project on your own in your freetime could possibly make you a worse programmer. If anything, you would likely learn exactly what it takes to go down the NIH path so strictly and be able to make better choices
Going from O(n) to O(1) on an operation where n is 10 could be a performance downgrade. That doesn't mean it's useless, it just means there's more to it than the big O.
Asymptotic complexity is about how things scale up. You should care about it when you're working with things that scale. Doesn't mean it's useless if your data is static, just means you need to understand when to go for the high scaling solution and when not to.
① you're writing a real-time control program such as a motor controller and you care about not just the asymptotic complexity of your algorithms but their worst-case execution time, in microseconds; or
② you should just do the computation by hand with pencil and paper instead of writing and debugging a program to do it for you
at the point that the input data becomes too big for ② to be an appealing option, n is at least 100, which means you probably care a lot about whether your program is O(n) or O(n⁴), at least if you wrote it in something slow like python
maybe the time to start worrying about it is after you start the program debugged and running for the first time
It can be useful in general, but it is hard to provide the type of tolerances expressed in physical units you’d expect in an engineered solution when all the constants are ignored.
I'm just curious about what you're talking about. Tolerances and physical units are not usually mentioned in relation to time/space complexity. I'm open to the idea that you may know something I don't, feel free to enlighten me.
Actually reimplementing something that’s already exists is a great way to get an understanding of it precisely so you can deal with the questions you mention.
My rule of thumb is that you should always understand at least one level of abstraction down.
There is a lot of disagreement towards this idea in the thread, but I really like this project suggestion for someone to make a multi-faceted project that'll help someone "grow up" and it resonates with something I've long felt missing from comp sci education: that it doesn't focus enough on studying and replicated big successful projects.
You mostly learn by doing, true, but you learn by copying what the masters did. Software engineering education feels more like taking classes and writing code for exercises, whereas it probably should have more projects of the form "study this complicated-implementation-of-X and use those techniques in your implementation of Y", where x = google search, postgresql, whatevers and y is a complicated project [for one person or a small team] like your own single concurrent user search engine
I've held this idea for a really long time, in fact, I've not actually looked at comp sci education recently, so maybe this isn't true anymore, or maybe it never has been
It really depends on your goals. Why would someone want to become a better software engineer? To me, the things you listed aren't fun, so I wouldn't do them in my spare time unless I see the benefit in it.
If we are talking making more money at a job, I'd say that for most people I worked with and me included there are more important things to concentrate on. Like, soft skills, communicating clearly, understanding politics at work.
Myself, I'm okay with learning things you listed from more senior colleagues, on the job, and just having fun in my spare time.
You're making a strawman here, based on a very much false dichotomy. The things you list here are all very important.
But something you learn from writing a larger piece of software yourself is that you get to appreciate complexity and "bulk" (for lack of a better term) that you yourself created. You get to ask yourself questions like:
Is all this code needed? (Why is it 400 lines to do X?!)
Why did this get so involved/ complicated?!
How do I progress (eat the elefant)? Can I (teach myself to) make small iterative units of progress every day, for instance?
These kinds of katas are very very useful for a 5-7 year beginner (which I use as a label of respect - the longer we can remain beginners, the longer we learn with an open attitude!)
> might make you a better programmer, it won't make you a better software engineer
This is the distinction between solving "local" problems and solving problems at scale.
But I think there is a transitional phase where requirements (gathering, documenting), unit testing, UI design (human or programmatical) are essential. Learning these as nuts-and-bolts in any language is essential to becoming skilled in the craft.
one wonders whether you’ve created having completed two of the items on the list, i can confirm that it makes you better, whichever way you’d like to measure it. as a google engineer told me once: operating systems and compilers are where all the good ideas are. study them! only thing that beats that is probably writing them yourself, even if toy versions.
I agree that these are all valuable things for a software engineer to learn, but in my experience, it’s valuable to at least take a crack at the projects listed in the post. I wasn’t under any illusion that I was going to build anything amazing, and gave me deeper appreciation of why you don’t try and build everything in-house.
It’s simple really: write libraries and frameworks yourself and use them for real projects. It will give you a great understanding of what makes a quality library/framework and make you a better programmer.
Are you perchance working in academia? Your suggestion seems to assume unlimited time and resources to write everything yourself, while the purpose of using third-party components usually is to save time and resources.
Think about it as a hobby. Most people have time for hobbies. The rest spend way too much time on social media or watching YouTube. Manage your time better.
Thank you for the advice which is no doubt well-intentioned.
But don't kid yourself. If you use your spare time for research and development to solve problems at work, then it is not really a hobby, it is just you working overtime for free. If that makes you happy, good for you, but it doesn't change the fact that time and resources are limited. The time is just cheaper in your case.
Is not this the main point of the editor war? programmers built tool for enhancing programming abilities, but they argued with which tool is best one...
I wish. As you can plainly see in this thread, most "real programmers" consider such things beneath them, and the scarcity of relevant resources is a natural consequence.
>> Writing code that will still be readable to you (and others) a few years from now
-
This will be a wonderful feature for copilots, to allow for the human to provide a verbal narrative of the logic/process or even the environment and of others in the room commenting on why a such-and-such is being done, and the copilot elegantly documenting the narrative.
Wait until we have "Thick Code" (where an app comes with an AI generated Mini-Documentary-Series on the creation of the piece (as an optional function of your enterprise Copilot Agent (With the [Security Assistant] [Compliance Assistant] [Legal Assistant] [Media Assistant] and whatever Alignment Agents are required and inform the narrative of the project.
EDIT: @Tao3300 - I do that all the time. You just need to be a better MeatPT and infer "like, my, opinion, man"
Honestly this just sounds like me when I first encountered coding puzzles like project euler / leetcode and interview challenge practice problems. I made up excuses like this because I simply didn't want to spend time on lower level problems and instead wanted to feel good sticking to my high level application comfort zone.
I eventually learned to like solving lower level problems and I'm much better off for it. And these days I can tell people who haven't paid their dues because the only solutions they ever come up with are slow quadratic ones no matter the occasion.
No, building your own text editor, compiler, OS, ray tracer is not going to make you a worse engineer nor keep you from learning how to google and evaluate libraries or think critically, nor are those things reserved for an engineer who isn't building those things.
I strongly disagree. Software engineering is simply the next higher abstraction level above programming. Just like programming is the level of abstraction above writing machine code, and writing machine code is the level above hardware manipulation.
Almost all "programming" problems have been solved. Today's software is laughably inefficient not because we need better implementations of data structures, but because putting all those existing pieces together properly is a neglected discipline.
The best place to see this is in Machine Learning. The vast majority of ML/AI software that is published (which often implements revolutionary techniques!) is of incredibly poor quality from a software engineering standpoint. As a result, practical AI applications are orders of magnitude slower than they could be.
Not to split hairs but writing machine code is programming. I'm not sure what you're thinking about as "hardware manipulation". Programming is just one (and a very important one) part of what it takes to be a "software engineer" (or software developer, or whatever you want to call having skills to deliver, or be part of a team that delivers, major software projects).
"The vast majority of ML/AI software that is published (which often implements revolutionary techniques!) is of incredibly poor quality from a software engineering standpoint. As a result, practical AI applications are orders of magnitude slower than they could be."
That's quite the claim. Why should anyone take your word for it?
There are literally performance breakthroughs being made every few weeks (most recently, PowerInfer, which can speed up LLM inference by 10x or more) where the main improvement amounts to the application of caching/preloading techniques.
If the "engineering" part of ML had kept up with the "science" part, I have no doubt that performance for typical use cases would be 100x-1000x higher than we are seeing today.
> If the "engineering" part of ML had kept up with the "science" part, I have no doubt that performance for typical use cases would be 100x-1000x higher than we are seeing today.
>Almost all "programming" problems have been solved. Today's software is laughably inefficient not because we need better implementations of data structures, but because putting all those existing pieces together properly is a neglected discipline.
YES!
But You learn that writing low level code, not high level.
Writing Text editor, or even a compiler, you don't spend much time on algorithm optimization. You spend it on designing good abstractions. You are given basic tools, and have to come up with good architecture on your own. And because you are forced to use more levels then with pre-made libraries (that create the structure for you), you will see your mistakes much faster.
I genuinely love your sentiment. We have way too much faux asceticism and way too little self-mockery in this godforsaken industry that pays us all way too much money. Programming is wizardry. But being a professional one (a software engineer) is a ridiculous occupation we're coerced into. We all know this, otherwise Silicon Valley wouldn't have been a universally cathartic show.
But GP is completely correct that all of the fun, magical projects in the OP article are antithetical to the profession.
For a more UI / Web based slant, I would recommend these (beyond a Spreadsheet - which is incredibly helpful for understanding data flow systems):
* Simple video game using Unity or Unreal (understanding perf limitations in a game where 30-60 fps is critical helps make performant interfaces elsewhere - even on Web)
* A simple Javascript framework similar to React (again, will help understand data flow & handling events)
* An http library wrapper around XMLHTTPRequest (fetch exists now, but understanding how to send & read HTTP requests from scratch will help in debugging any problems down the line like CORS issues, OPTIONS requests, etc.).
maybe godot would be a better alternative to unity and unreal for this? because, quite aside from not having to wallow in proprietary licensing, you can find out what's going on
Us, application developers, rely on many OS features: memory management, filesystem, etc. I'm sure eventually we'll ask "how such things are done behind the scene?"
That's why I tinker with xv6 (https://github.com/mit-pdos/xv6-public) during sparetime. Learning various process scheduling algorithms from textbook is a thing. Implementing it is another thing. I learn a lot. And it's definitely fun, even though there's almost zero chance the knowledge gained is relevant for my job (I'm a mobile app dev).
Doing some 2D game dev without an engine was the most humbling experience that showed me just how ridiculously fast computers have become. You do things that you swear are far too slow to work, and yet it works even on a Game Boy because it’ll do a million operations in a second and a million is apparently a big number.
“Wait… so I’m looking up and drawing every sprite every time? There’s no clever diffing happening to know which ones to change? There’s no second buffer to reference the previous frame and steal work from?!” And then you find the much more complex games that do start using these tricks and it’s ridiculous.
well, there ARE some very interesting tricks back in the old days on both the application and hardware side to help out with these exact problems (for one thing, this is exactly how a SpriteAtlas as a data structure is utilized. Why waste time uploading a dozen sprites that usually render in a predictable way when you can upload one big image and sample parts of the image?).
But in terms of today, yes. You can certainly brute force render 99% of anything from 15+ years ago on modest commercial hardware (i.e. not even gaming PCs) by drawing and re-drawing a scene.
I got my start doing 2D games on the TRS80 color computer and it really gave me a good foundational understanding of computers. Because it was just bare hardware and assembly, you had to do everything manually, graphics, sound, input, memory mgmt, etc.
I think an ideal intro for programming would be if there was an environment where a new programmer could handle all of these things to create a 2D game, but in a simplified way, so they could get the idea without having too much excess baggage just to make it work.
Not sure if you’re joking, but in a way this feels very true to me. After a decade or so of doing OOP, I started doing a lot of 6502 assembly (for a home brew computer) and not applying OOP principles felt dirty at first. Same as not having 100% (or any for that matter) test coverage. I felt like I was cheating by doing things a simpler way. But then I started to really enjoy it and it felt liberating.
I would really love to try my hands on something much more physical, like a robot; or a drone with autopilot; maybe accurate simulation of flight dynamics of a spaceplane with programmable GNC parameters?
I have a copy of "Fundamentals of Astrodynamics" by Bate, Mueller et al and I would love to do something with it this holiday season.
I say simulation because all the rest of the stuff seems to cost a lot. I am really interested in the GNC aspect of robots, and would love some good pointers on the topic!
https://ksp-kos.github.io/KOS/ Kerbal Space Program might not simulate aerodynamics quite to the realism level you prefer, but a lot of people seem to enjoy writing GNC software for it.
I bet a self-balancing two-wheeled robot would be a fun (and relatively safe) project. I mean a Segway-like thing that can just stand in one place without falling over or yeeting itself off the table.
You'd need a microcontroller, an IMU, a stepper motor controller, a motor, some LEGO wheels, and maybe a block of wood.
I haven't tried this, so my guesses are probably way off.
Sounds like a good project to me (we did this for a couple of labs in an undergraduate [sophomore maybe junior year] control systems class). The "inverted pendulum" problem and its subsequent derivations should be a good model for this. When the object is nearly upright (or at small angles from the vertical axis) a nice linear control loop should suffice but if you leave that region the control solution becomes more difficult (would make a good target for improvement after initially getting it going).
I feel the same way - almost all my personal projects have a hardware component. After spending all day on software I very much appreciate working towards a physical object.
I think a little toy raytracer is another great thing to try out. Just something that outputs bitmap graphics of spheres and does diffuse and specular reflection, couple light sources. Should be a relatively self-limited project if you don't go too crazy with it.
I went to a university with a great computer science & video games major, and while I didn’t finish out all the credits, I took a couple of the videogame classes; holy shit, you will never learn more about how to abuse geometry and data structures than trying to render graphics.
Hey, putting a ray tracer and a web browser in the same category of "more challenging projects" seems a bit weird. A ray tracer is a weekend project. A web browser is a multiple man-years project, unless you use a third-party HTML+CSS engine and a third-party JS engine.
depends on the ray tracer. You can make "a" ray tracer in a weekend. You can spend months making a mid-scoped tracer if you decide to pick up PBRT (which is online for completely free now!): https://pbr-book.org/4ed/contents
If you want to upgrade your "weekend" ray tracer to a 2-4 week project, I'd suggest:
1. have it take input a scene/model file (FBX is the industry standard but also a pain in the butt because Autodesk. I'd suggest looking at gltf or blender scenes). This may or may not mean supporting triangles/quads if you only focused on spheres and planes.
2. texture support. Which sounds easy and then you enter the wonderful world of sampling. you can dive as shallow or as deep as you want there.
Try using USD instead[0]. It's an open scene description from Pixar that is well documentented and seeing adoption across the board - including apps like Blender.
My point is that a barebones ray tracer (spheres, planes, metal vs plastic) is much, much simpler than a barebones web browser (a minimal HTML parser + a minimal CSS parser and cascading + a minimal JS implementation + a miimal layout engine + a minimal renderer).
A text-only browser without scripting support, like lynx, could be a more reasonable project, but still larger than a basic ray tracer, or even a ray tracer + a basic material system.
It's a great list. Would be a good to keep it as some live list so we can help evolve it over time.
I'd add two ideas:
Build lightweight memcached
- it requires mixture of basic algorithms as well as good system design practices
- it's a "simple" problem, but requires to think well from multiple angles - invalidation, concurrency, memory management...
Build your own docker
- it's not as complex as it might sound
- gives opportunity to understand basics of OS programming, which most engineers nowadays don't really use in day to day jobs
> The biggest challenge is figuring out how to store the text document in memory. My first thought was to use an array, but that has horrible performance if the user inserts text anywhere other than the end of the document.
I guess this is only an issue in low level languages, as I just used a JavaScript string and I don't think it's been a perf issue in 2+ years using my editor full time. Plenty of other things have been, of course - rendering long horizontal lines comes to mind, as my approach to optimisation assumed rendering a single line would be cheap and expanding the logic to render partial lines would add another whole dimension of complexity.
(By "long" I mean an entire minified file, for example.)
And here I was thinking the author must have been using a higher level language. I remembered reading this blog[0] where the author stated that the worst case scenario for an insertion at the beginning of a 2mb file only took 2ms on an old laptop while using an array as the data structure. I believe the author was using Zig here.
A text editor can use an array as data structure. You only need it to be fast during the typing, where you are only changing one line. But for entering new lines, the extra latency needed for rebuilding the array after pressing enter is not noticeable for several million lines.
The more challenging part of a text editor is making sure you only render what the user sees.
And the most challenging part is making a text editor that actually works well from a usability perspective.
Which has almost nothing to do with data structures and optimization. I've used dozens of text editors. I have never once thought "man, that thing is slow". But I have thought "man, that thing is a bug-ridden, unintuitive piece of garbage" many, many times.
That will not be because of an array data structure for the lines.
When properly done, plugins/extensions are in separate process which do not hang the main UI thread. It helps if these processes has a low priority. Also, ionice is good when processing many documents.
Another flaw of many extensions/language servers have, is the lack of abort controller implementation. This means that debouncing doesn't work. So if there are many document updates, the server would hang.
I'm sure then you'd be perfectly content if your otherwise perfect car were replaced with one identical in all ways except that the acceleration was 1/4 of the previous rate.
Try re-indenting a large xml file, or something else that will result in a very large number of small deletions and insertions throughout a file. Even on a modern computer the underlying data structures will make the difference between something near instantaneous and the user giving up in despair after a few minutes. A simple array does not cut it.
Formatting a file could be a single operation that creates a single array of lines. Normally the formatter is a separate program that actually gets the input as string and returns a new string as output. So the extra concatenation and splitting would not cost much extra.
The formatting is often difficult enough that it will need a separate data structure anyways.
the original Eclipse was so godawful slow, this was back in the days where you would install Textpad after installing Java because it would pick up your classpath during installation and configure it for you. Java 1.4 I think.
Total long shot. But does anyone have any good side projects that would center around simulating fluid dynamics? I’ve always been interested in aerodynamics and I’ve wanted to see if there is a way to learn more about it with my programming skills.
I recently went this route. I didn’t want to set up or use Unity so I wrote my own 2D fluid simulator based on some of the same papers using Metal compute shaders (though I’d love to try again using webgpu). Sebastian’s video is great and the implementation is good. But this was a great (and fun) opportunity to look for ways to improve on it.
For starters, the way he’s doing the spatial lookup has poor cache performance, each neighbor lookup is another scattered read. Instead of rearranging an array of indices when doing the sort, just rearrange the particle values themselves. That way you're doing sequential reads for each grid cell you look for neighbors in, instead of a series of scattered reads. The performance improvement I got was about 2x, which was pretty impressive for such a simple change.
The sorting algorithm used isn’t the fastest, counting sort had much better performance for me and was simpler for me to conceptualize. It involves doing a prefix sum though, which is easy to do sequentially on the CPU but more of a challenge if you want to try keeping it on the GPU. "Fast Fixed-Radius Nearest Neighbors: Interactive Million-Particle Fluids", by Hoetzlein et al [0].
Or, if you want to keep using bitonic sort, you can take advantage of threadgroup memory to act as a sort of workspace during bitonic merge steps that are working on small enough chunks of memory. The threadgroup memory is located on the GPU die, so it has better read/write performance.
I ended up converting his pure SPH implementation to use PBF ("Position Based Fluids", Macklin et al, [1]), which is still SPH-based but maintains constant density using a density constraint solver instead of a pressure force. It seems to squeeze more stability out of each “iteration” (for SPH that’s breaking up a single frame into multiple substeps, but with PBF you can also run more iterations of the constraint solver). It’s also a whole lot less “bouncy”. One note: I had to multiply the position updates by a stiffness factor (about 0.1 in my case) to get stability, the paper doesn’t talk about this so maybe I’m doing something wrong.
The PBF paper talks about doing vorticity confinement. It’s implemented exactly as stated in the paper but I struggled for a bit to realize I could still do this in 2D. You just have to recognize that while the first cross product produces the signed magnitude of a vector pointing out of the screen, the second cross product will produce a 2D vector in the same plane as the screen. So there’s no funny business in 2D like I had originally thought. Though, you can skip vorticity confinement, the changes aren't very significant.
There’s a better (maybe a bit more expensive) method of doing surface tension/avoiding particle clustering. It behaves a lot more like fluids in real life do and avoids the “tendril-y” behavior he mentions in the video. "Versatile surface tension and adhesion for SPH fluids" by Akinci et al [2].
One of the comments on Sebastian's video mentions that doing density kernel corrections using Shepard interpolation should improve the fluid surface. I searched and found this method in a bunch of papers, including "Consistent Shepard Interpolation for SPH-Based Fluid Animation" by Reinhardt et al, [3] (I never implemented the full solution that paper proposes, though). There's kernel corrections, and then there's kernel gradient corrections, which I never got working. With the kernel corrections alone, the surface of the fluid seems to "bunch up" less when it moves, and it was pretty simple to implement. Otherwise, the surface looks a bit like a slinky or crinkling paper with particles being pushed out from the surface boundary.
I found [0] and [1] on my own but I found [2] through a thesis, "Real-time Interactive Simulation of Diverse Particle-Based Fluids" by Niall Tessier-Lavigne [4]. I also use the 2nd order integration step formula from that paper. It has some other excellent ideas that are worth trying.
Many years ago I used a paper (that is in fact one referenced by Sebastian’s video) and some C sample code I found to write an SPH simulator in OpenCL. I had been wanting to write one again but this time get a real understanding of the underlying mathematics now that I have some more tools under my belt. I owe it to Sebastian that I finally started on my implementation and I understand SPH a lot more now.
If you're interested to learn more about aerodynamics I would highly suggest learning a bit of classical aerodynamics. It will not be software oriented, since most of the theory deals with approximating very complicated behavior with simple analytical models.
It could be interesting to do a comparison with finite volume methods to see when/how those approximations break down.
Totally newbie question - 'approximating very complicated behavior' - this seems like a perfect problem for ML to me. Is this something that's used or explored ?
It's absolutely being explored. There is a lot of active research into using ML to learn solutions of PDEs (Navier-Stokes in this case). It's not my field so I don't know much about the specifics.
The works that I've read train an NN on numerical solutions for different geometries and boundary conditions. Then they try to infer the solutions for configurations outside the training set, which should be much faster than recomputing the numerical solution.
First, be aware of FluidX3d [0] [1] - which is awesome for CFD simulations, OSS, etc...
Here is the premise of the CFD question:
It has long been known that Eddies [2] were studied by da vinci - and he was the first to propose the eddy pump... and how eddies work in hydrodynamics - and aerodynamics.
The barnacles on the leading edge of a Wales fin is also thought to cause beneficial eddies in the fin's ability to cut through water more efficiently with less drag.
Dimples on a golf ball affect the air-flow in tiny micro eddies, but at extraordinary speeds - where (I surmise) a certain amount of 'cavitation' may occur with a very thin film around the ball - kind of like water-tension, but with tiny eddies [4]
SO:
Create a Helicopter blade with leading-edge 'Barnacles' similar to the shapes of the Acorn barnacles on wale fins, which will create eddies as the air passes/affect the flow of the air over the foil.
Add dimples of varying shape profiles (such as convex round dimples to hexagonally based dimples (much easier in aircraft which are already based on titanium honey-comb-sandwhich materil)
But make the dimples morphic - being able to electrostaticly "activate" the dimples (meaning they are either on or off for the simulation)
The goal is to determine the characteristic of having dimples and/or barnacles have a net positive impact on the flow and conditions of air over a foil in the helicoptor blade - or the fixed wing of larger craft - or the entire fuselage dimpled like a golf-ball affecting fuel efficiency or other factors of lift or flight that could be visualized easily using something like [0]
I guess you could do something ultra-simple (no loadable programs or me port allocation for example) but that doesn’t feel like an OS to me.
You can sort of do that with an emulator too. For example on the GB you could do just the CPU/memory at first, skipping some rare/tough/quirky ones. Graphics aren’t that bad, you can avoid emulating some bugs/ultra-tight timing. Controls are quite easy.
Honestly none of it is that tough as long as you’ve skip sound. Sound is the worst part, and what’s next on my plan for my Swift based emulator.
Of course if you wanted to keep going you could add on color Game Boy support, Game Boy advanced support, cheat codes, save states, stuff like that.
A GameBoy emulator has a lot of edge cases to cover. Yes, it's straightforward to get most games running in a playable state, but there are several games (Prehistorik Man) and demos that rely on precise timing of the PPU relative to the CPU and those are notoriously hard to get right.
Having checked off most of the list, I would agree the Game Boy emulator is far easier than the OS.
I guess with the emulator it depends how much you want to do. You'd be foolish to write a console emulator and not import someone else's CPU code. When I wrote a Game Gear emulator 26 years ago I used off-the-shelf CPU code and just mapped all the in/out and did the graphics bits. Took me and a friend a single evening to get it happily playing games.
And OS is harder. Well, it was last time I wrote one, again about 26 or 27 years ago. Getting something to boot at all was tricky. I used this book as my main source of inspiration:
Let's agree to disagree. Something being done before is hardly relevant when you're using a project to learn. "Optimized to hell" I don't know about; it's a CPU emulator, there isn't that much you should be doing with it anyway. If you're doing accurate emulation speed doesn't seem like your primary factor as long as you're not implementing it in something ridiculous that will involve a GC.
Also, I don't get the point about research: It's a learning project... Doing research and learning is the actual point. Sure, maybe you slice off one part right now if you're not interested in learning how the CPU works, but to say that you'd be foolish to not use someone else's implementation seems to me to be missing the point by quite a margin.
Isn't that most of the point of making an emulator? If you are just gonna focus on the graphics you might as well just make your own game in SDL or whatever.
I did a gameboy emulator one week, when I had time from things like waiting for planes to arrive. I set the goal of playing the logo animation and sound. I achieved that goal within the time I allowed, and then I stopped. I have a good background in game development on Z80-based game systems, so what I learned had more to do with the original GameBoy than with how to write a game.
>So whenever I don't know what to build or I want to learn a new programming language or framework
While the mentioned have some merits if you learn how to program, doing them once might be enough. Not sure how implementing a text editor the nth time will help you learn language X.
If I am learning a new language, I try to implement what interests me at that time, from small to big. It might be a small game, it might be an AI algorithm, it might be an authentication system, it might be an ORM. Doing thing that I like or need helps.
So unless you fancy text editors, I wouldn't churn one after another.
What’s with many programmers’ obsession with writing their own text editors? In part a rhetorical question, but it still boggles me. Not that many people that are obsessed with manufacturing their own hammers and nails, for example, hence why I’m asking.
Well you would normally use hammers and nails for woodworking but you wouldn't be able to use woodworking to make hammers and nails... But I guess with a text editor it's a weird recursive thing where the programming tool is itself a programming project. I think people just do it for fun and because they can. You'll probably never make something you can productionize but maybe you will learn some cool things along the way.
The same reason as all the other projects mentioned, because it teaches you many useful concepts that show up elsewhere when writing code. A carpenter likely won't learn much by manufacturing their own hammer that they can apply in their own workflow, a blacksmith on the other hand might.
It's a nice self-contained project that's relatively easy to make fast progress on, while also being challenging. Plus it's something that programmers have to use every day so we're bound to have pain points with existing options.
I've never seen an explanation of Gröbner bases that's accessible enough to the average non-mathematician developer to make that practical. Any pointers?
Writing a tool that can do large template conversions are particularly fun. Like handling Jinja into markdown. You can use this for future projects to build out documents without relying on too much effort if the documentation for things you work on follows a formula.
When I was 17 I gave myself the challenge of making a profitable and useful app. The discipline and real market skills I learned made the 6 month development processes worth it, even though the app ended up being scrapped.
Those are all good easy projects to do. Good intermediate-level follow-ons would be a JIT binary translator from any architecture to any other and expanding that simple OS to support SMP scheduling.
All of these are fairly easy to make in languages like Scratch but hard to make in lower-level ones. I don't know enough about coding theory and such to understand why, though.
He forgot to mention to build a small database (DBMS). For me, the big 3 projects are: 1. computer language (I already did), 2. Small operation system and 3. a small database (DBMS).
Another interesting and challenging project: design and implement a file system. I only assume it is interesting and challenging, having not done it myself.
I feel the profession has layers. I had a dream in which I had a seat next to Bill Gates on a commercial flight (bear with me!). I asked him how he pursued software development now that he was running the world’s largest software company, and he said, by running the world’s largest software company. Of course that wasn’t necessarily his actual opinion, more like my own brain laundering my own opinion.
Thank you for this idea, I was inspired by it years ago and wrote a delay queue using Golang. But it is dependent on the Redis, recently I want to remove the Redis and write a Key-value store by myself. Welcome to contribute your code to it: https://github.com/raymondmars/go-delayqueue
I wish every dev would try projects and things completely unrelated to computers.
I enjoy being outside.
These headlines “projects every dev should try” really annoy me. I try to put in enough effort at work, and that’s tough. But I have been messing with computers for over 20 years now.
We're on Hacker News. Do you go to a fitness community and say "I wish every gym bro would put down a dumbell and pick up a book?"
But if you're curious, I hike, am learning Japanese, and want to eventually clean up and pick back up my saxophone once I can get the money to get it fixed. Tech is a big but not the only part of my life.
I think it’s important that those who want a 9-5-only are able to get it, and those who want to make it their calling in life are also able to have that
>This site selects for people who have made computers their entire life
isn't it the other way around? The site allows anyone to post on nearly any topic, but the longest standing audience veers tech. there still is quite a few topics on medicine, transportation, economics, and politics as well.
> Consider this an unsolicited reminder to touch grass.
Do you find yourself personally attacked by someone doing something that you don’t enjoy?
Or is the notion of them having fun AND improving their direct skill?
Hackers like to obsess about Big-O, data structures, HoTT, and other high-theory stuff, yet the following skills, essential for software engineering, are almost never discussed and even more rarely practiced:
- Deciding what to write yourself and what to take from a library
- Identifying high-quality libraries and frameworks that meet your project needs
- Deciding where optimization is worth the effort and where it is not
- Writing code that will still be readable to you (and others) a few years from now
- Thinking about the project as a large-scale, complex system with software and non-software dependencies
In that spirit, I offer the following alternative challenge: Create a web search engine. Don't bother with string matching algorithms etc., others have already done that for you. "Just" make a search engine (and crawler) that can actually work, even if it only supports a subset of the web and a single concurrent user at the beginning.