Hacker News new | past | comments | ask | show | jobs | submit login
Challenging projects every programmer should try (utk.edu)
911 points by azhenley on Dec 14, 2019 | hide | past | favorite | 297 comments



I think the #1 project every programmer should try is a project that has customers. There's so much involved in achieving that. It will possibly prove more difficult than any of these. It will probably lead to a different understanding of every other project that programmer then takes on, too.


Solving a programming problem is very different from solving a business problem, and building something that has customers falls more into the latter category. Lots of people want to simply improve their programming and problem-solving skills, and this list is perfect for that. Customers add a whole new dimension of complexity which most people might not want to deal with, especially with fun side projects. So setting the expectation that everything you build needs to have customers is definitely wrong.

Building stuff like a text editor, simple game, database etc. from scratch which will never be used by anyone other than yourself can be immensely challenging and satisfying.


Build an oss driver for something popular. Or an alternative. Even just addons and plugins for popular products will get users. They don’t actually have to pay you for them to be customers, but that’s a benefit. Go where the demand is.


That kind of work is a whole lot less satisfying. Why would I want to deal with GitHub issues and fixing bugs when I could solve genuinely interesting problems?


I’d wager that many don’t find it less satisfying, and maybe even more so, if only for the reason that solving an issue for a customer comes with the benefit of immediately and clearly improving the life of someone else. Solving curiosity-sating programming challenges doesn’t have the same social implication built in. Personally, I’d muuuuch rather work on a mundane problem that immediately improves someone’s day than an issue that is in and of itself more interesting, but also lacking the clear customer/user demand.

This isn’t to say that solving beat programming challenges doesn’t improve anyone’s day, but the demand for the solution isn’t being made public and clear, which many people see as a necessary requirement for meaningful work.


> the benefit of immediately and clearly improving the life of someone else

More like the benefit of people being ungrateful for the effort you put in and then being hostile about your lack of continued motivation to solve their problems. All for absolutely zero financial gain, so the difference between that and a personal project is that you feel good for solving something, then bad because people hate you.


> Personally, I’d muuuuch rather work on a mundane problem that immediately improves someone’s day than an issue that is in and of itself more interesting

Don't get me wrong, helping people is great and I do in fact find it one of my greatest motivators, especially for things I would otherwise consider boring (or indeed, mundane).

But something this is in and of itself more interesting? If you wouldn't rather work on that, doesn't that mean it's just not interesting enough? :)

I guess everybody is different. Even though I have found that helping people is for me probably the strongest motivator there is, if there is one thing that can make me procrastinate on that, it's problems that are in and of itself more interesting :)


I discovered over the years that I like solving business problems, but in that context programming is purely utilitarian and just a tool like I would use in another industry. But since I also love coding, I code in my free time without any constraint other than having fun and improving.

So for me the solution was simply moving away from an every day developer job at work and more into a product management/CTO like role.


I started off with non-tech businesses (literally house painting as a teenager) and I've got to say I like solving business problems with engineering a lot more than solving them with literal sweat and pain.


How did you start making a change? I feel like I'm in the same boat


What worked for me was transitioning to a “solution engineer” role. The name can vary but basically it’s consulting and integration to help clients use the product. Here I got an overview of the whole lifecycle of a product from a client perspective: their business problem, the need for a solution, the product fit and the sales process, then the integration challenges, user training, and hopefully a good reference and testimonial for securing the renewal and providing the marketing with material. There was always a “tension” between what the client wanted and what the product was doing, but learning how to understand their need and the appropriate answer (technical or not), and when does it really make sense to add a feature to a product was the key skill I learned to develop in that role. Participating in sales meetings was invaluable for starting a business later but maybe less relevant strictly for product management.


It definitely will not be for everyone. It's a bit like how many people suggest learning to code just a little, even if not intending to pursue it as your career. It still leads to this huge new understanding of things. Similarly with programmers and customers.


From a technical standpoint, I disagree. Customers tend to get in the way of good code, they have everchanging requirements, set deadlines and priorities that more or less force you to be messy. Experimenting with techniques, redoing things, etc... is hard when you have a customer expecting this or that feature to be implemented yesterday, and yet, it is important if you want to progress.

Working with a customer will make you a better businessman, not really a better programmer.

Doing a project with customers is extremely valuable if you intend to start a business, but if you hate that and just want to be a good coder and make a living out of it, there is a solution: find a good boss/manager. He will take care about the business stuff and allow you to put all the skills you got doing toy projects to good use.

Otherwise, if you want to do something that looks more like a real-world project without the hassle of dealing with customers, make software that you really use, and share it. A good candidate would be a video game that you enjoy playing, or some tool for a hobby of yours, or a utility. That way, no one will stop you from experimenting, but you still need to think about usability.


>Customers tend to get in the way of good code

Wow, everything wrong with the tech industry right here.


They're not wrong, but neither are you. As a programmer who doesn't write for customers, I'll say, yeah it's sometimes true. But if I were in the business of writing software for customers, I would word that very differently. The issue described is a challenge in said industry, and being good at navigating that problem is a quality that separates the great from the mediocre. Describing that as "getting in the way" ignores the skill and pride in taking on that challenge.

Actually I'm not even sure. I have a little experience with freelance web development around 2007 (browser war veteran), and I found navigating between customers expectations and "good code" to not be hard at all. I did read a lot of blogs on webdev (including the business side of it, contracts and expectation management, etc) so I probably got some great advice somewhere[0]. Reading about it actually got me more enthusiastic about tackling the problem as well as I could. But usually the details that made it good code or not were technical details that I didn't bother them with. Or they were things I could sell; browser compatibility and accessibility (back then I could use the line "Google is your most important blind user" -- now no longer true).

Of course I am aware that as a freelancer, I enjoyed a lot of freedoms that you don't always get as a programmer-for-customers in different business environments. But if that's the case, that really proves it's not the customers that get in the way, but the institution.

[0] I kinda miss the ... blogosphere.


"The tech industry"; maybe. "Hacker culture"; no.


Not even close to everything wrong in the tech industry


(Note: I speak only for myself and others who share my personal preferences)

This comment really hit home for me. Making something good is often at odds with making something that's profitable/popular. I've often found that the most popular software is far more bloated/limited than the best software. This might be because people like me tend to favor smaller programs that are FOSS and work well with other FOSS; this view is held by a minority, and is thus a less profitable niche to cater to.

One example is mpd+ncmpcpp versus Spotify or iTunes. The former is far more well-designed, performant, featureful, and flexible; however, it will never be popular/profitable because it's FOSS targeting end-users, it runs in the terminal, it lacks CPU-intensive pretty animations, and requires users to understand that it involves a client and a daemon.

The list goes on; from IRC/Matrix versus Discord, to Linux/BSD vs macOS/Windows (an opinion that's a bit more controversial here on HN, but held by many nonetheless), the list goes on. I've defaulted to assuming that if a program intended to be used directly by end-users is mainstream and/or quite profitable, it's likely not for me. The few exceptions (e.g., massive web browsers like Firefox) exist only because there is no alternative I can get away with.


At the end of the day, the code is written for someone, to do something. Not to be. You gotta serve somebody as Bod Dylan said : https://www.youtube.com/watch?v=wC10VWDTzmU

[EDIT] To elaborate, i recently thought about coding, and i understood that at some point all the code "touches" reality in some way, and that place is most important. Either it moves something physical, or provides answer to person, or makes him happy. But when coding is a good thing to ask your self, how my code "touches" reality.


Lennon's response to Dylan is just as apropos here as it was in the original context: "Serve yourself".

The distinction here is probably between being an "engineer" vs. being a "programmer". The former's concerns include all this stuff that's for the business. The latter is just interested in the craft of writing computer programs. And that's all the article was about: skill as a programmer. Which might help you be more effective as an engineer, depending on the projects you face. But the point is self-edification; not everything has to be in service of money.


> But when coding is a good thing to ask your self, how my code "touches" reality.

What an odd thing to say on a site named after the Y-combinator ... ;-)


I don't disagree with you, but I think you're missing the author's point. The author suggested these projects as a quick way to learn a new language or get familiar with a set of tools. Arguably, working on a more time-consuming product for customers is not the most efficient way to do this.

The author's suggested projects hone in on programming mastery, whereas your suggested project hones in on a plethora of skills. We can't really say that one substitutes the other as they achieve two different things. I wouldn't necessarily start a startup just to learn WebAssembly, but I would learn WebAssembly if I needed to for a startup. In the former case, my goal is to learn WebAssembly, whereas in the latter, it's to start a startup.


Not sure these are peojects to tackle with a new language. Simple repeatable projects like hn clones are better suited for those use-cases. More along projects to do in your favourite language bonus points for using language that isn't typical for the domain.

Like: A compiler in php or a spreadsheet in cobol or javascript console emulators.


Counter-point: if a programmer is successful in delivering business value, the programmer might have an inflated view of their programming abilities. I know I certainly have been affected by this. Being able to get things done does not necessarily make a person a great and talented programmer, but it may make them accomplished.

My projects have grossed a lot of revenue, but several of these projects would be challenging for me and push my the limits of my skills.

What I am saying is that there is room for more nuanced language to describe all these matters in a way that is detached and clinical.


Counter-counter-point: Delivering business value is a more desired and compensated skill than raw programming ability.

Moreover, learning how to deliver value builds empathy for non-programmers who also deliver value and also the realization that programming ability is not where a company lives and dies. These are the number one and number two things most often lacking in programmers.

I have to fight with "seasoned pros" on the regular to get them to stop doing things like sending passwords in email or worse, put them in text files in git. Because A) what the hell are you thinking and B) holy shit, we're a public company WHAT THE HELL ARE YOU THINKING!? You have to explain to them, repeatedly, why this is bad and also things like why a production-ready database isn't the same as the single-instance point and click AMI they spun up...

All of this because the only virtue that they know is "I shouldn't be blocked by anything." Unfortunately some of them are such skilled programmers that they'll drag entire IT and GRC organizations screaming in their wake trying to make sense of the mess.


> things like sending passwords in email or worse, put them in text files in git.

This doesn't sound like the activity of someone with a high level of 'raw programming ability'.


Being a great programmer doesn't stop you from doing really stupid shit.


>Delivering business value is a more desired and compensated skill than raw programming ability

Not so much. First, great programmers are very well compensated, and second, most tech companies are organized to keep great programmers from even considering business issues: That's the land of managers, product owners, and scrum leads. Programmers are supposed to implement the requirements they're given quickly. Not how it should be maybe, but how it generally is.


That's the way it is at Google and the FAANGs in general and companies that try to ape Google's practice but that's far from most companies. This is very much a selection bias. "Most companies" don't have the budget for the roles you've listed or aren't middle-manager heavy organizationally like Google is. Most of my career I've been my own project manager.

At least in my 20 years of experience, the most effective companies have developers (sometimes in those roles you listed) involved in the requirements gathering or at least work planning phases. Does anyone really enjoy just being a cog in the "feature factory" you described? I doubt it.

Given the number of broken Google SDKs or Cloud features I have to deal with in my day to day though (and game of whackamole that we have to play with them), this seems accurate.


Isn’t that part of the problem with Google? If they focused more on the actual user they wouldn’t have five failed messaging initiatives including three that they were working on simultaneously.

Developers not caring about the customer explains most of Google’s major failures outside of advertising.


> Delivering business value is a more desired and compensated skill

Desired for who? Your employer? Programmers are human beings that can do things for their own personal enjoyment, not just to increase shareholder value.


What’s the use of “programming skills” in the abstract that doesn’t serve anyone’s needs?

“programming” is just a tool to me - not an end goal. I’m just as proud of the code that I was smart enough not to write and use an existing product/service/module for as I am for the code that I did write.


This is me. Made some money myself, made a lot for bigco's, and constantly feeling that literally anyone else is better at writing software, actual programs than I am.

Also see: imposter syndrome :-)


Author here.

I agree with you! This list was about projects for learning. A few months ago I had a blog post that was more product focused, regarding the lessons I learned from releasing games (I think they're generalizable though). Definitely a different world when you have customers to please and motivate you!

http://web.eecs.utk.edu/~azh/blog/8lessons8games.html


Or better, just forget about business customers, concentrate on problem solving skills, and build tools just for programmers.

Basically I found out the few customers you have and the further you are from business, the happier you are, i.e. considering you are still programming of course.

*Edit: For programmers focusing on businesses, the only happy solution, imo, is to be a consultant. But this basically asks for taking business skills. Of course this is a biased view as I work as a BA, who deals with business every minute.


That’s what we do at work all day. Business problems tend to be technically shallow, except when they aren’t, and to handle those you need to keep your skills up.


Some problems are "technically straightforward", where it is "easy" to build but "hard" to find and work with customers. To-do lists are a classic example.

Others are technically challenging. That difficulty is precisely why others don't tackle the problems, but generally customers are looking for a solution and you have a ton of flexibility when working with customers.

To put it in different terms, in some cases "if you build it, they will come" is true. The hard part is determining when it applies.


So I'm someone who codes for their own use, what I write isn't meant for customers, it's just tools that make the machine do what I need. Programming as a way of using the computer.

However I also used to do some freelance web development back in the day, and sure enough I did learn things. Mostly things about customers, not about programming. They don't know what they want, they don't know what is good and they need help with that. This is called requirements engineering, and IMVHO starting a project that has customers is the literal worst moment to start learning that. Fortunately I had a class about it in uni and I happen to be great at explaining technical things to non-technical people. I also learned that I'm actually really good at what I did (looks great + works well + happy customers), they gave loads of recommendations and I got requests for at least a year after. Too bad about the burnout that happened soon after, for no particular reason, that turned my life upside down.

Either way, yes it was educational but in no way did it make me a better programmer. Maybe a better business person? (And I'm not a very good business person)

All in all, I think I would recommend "a project that has customers" only to people who are young. It has less stakes when you're young and it's chock full of wonderful learning opportunities for young people. If you're older, I think that most of the lessons you would also have gotten through general life experience. Which again demonstrates how little this exercise has to do with programming. But yeah, it can definitely be a valuable experience.


I don't have experience with a side project that has customers, but several times I did do a side project for a paying client.

It was one of the most stressful periods of my life. I was picking them up with another programmer, and I completely failed to understand the amount and the quality they would deliver. I solved it by working day and night for a year or two, or so.

I'd say, do try a project that has customers. Think twice about a project for a client.


Wait, what's the difference between a customer and a client? I'm not a native speaker and always thought the words meant the same thing (they have the same root in my language).

I thought "client" was just the fancier term, therefore you have customers at a book shop and clients at a law firm. Does a hairdresser or tattoo artist have customers or clients?


I'm not a native speaker as well, maybe that causes the problem :D

I meant the following: customers -> group of people who buy or subscribe to your application, preferably via the App Store, Google Play or some other in-between party. Usually no contract involved.

Paying client -> a single person or business that pays you to develop software, probably with a contract that is signed and specifies what you're going to build, on which timeline, etc.


I agree with that and would add that dev tools are a manageable way for programmers early in their career to get feedback and learn what matters.

It’s pretty hard for one person to solve enough of a problem to have users in the general public. It obviously happens, but it’s not a sure thing.

It’s hard to get users that are not in the public too. I had internships early in my career working on IT stuff, at a large business and a university. I found those jobs were actually terrible for getting feedback from users because there was so much bureaucracy. The management would actually insulate you too much from complaints or feedback.

Dev tools are a good learning experience because programmers will give honest feedback if they don’t like something :)


I think the takeaway is that you shouldn't get feedback from users filtered through management. It's exactly the same problem that your "write dev tools for programmers" solution solves. You like it better because you expect you'll get the feedback from the programmers directly instead of through management. If that feedback would have to filter through someone's supervisor before it landed in your inbox, it would be about just as useless.


How many programmers don’t work for companies with customers?


Definition of programmer is different from developer.


The biggest challenge is figuring out how to store the text document in memory. My first thought was to use an array, but that has horrible performance if the user inserts text anywhere other than the end of the document.

The counter-argument to that is that processors are ridiculously fast in human timescales --- copying memory at gigabytes per second --- so unless you're focusing your use-case on editing correspondingly huge files, there's no real need to make your implementation more complicated than a single array. Even when DOS machines with <640K of memory and memcpy() speeds in the low MB/s were the norm, people edited text files of similar sizes, with editors that used a single array buffer, and for that purpose they weren't noticeably slower than ones today.

My ideas for challenging projects are a little less open-ended, so they exercise different set of skills: being able to implement a specification correctly and efficiently.

    - TTF renderer
    - GIF or JPEG en/decoder
    - Video decoder (start with H.261)
IMHO being able to consume existing content with code you wrote is very rewarding.


Check out: https://en.wikipedia.org/wiki/Rope_(data_structure)

>> A rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.


I just read from somewhere that VSCode uses a "piece tree" for code strings. Damn couldn't find the link...



Thanks yeah that's the name


> there's no real need to make your implementation more complicated than a single array

Yeah, good luck enabling line numbers in such an editor.

In Emacs, which uses a gap-buffer for storing text, line numbers have had notoriously slow. It's gotten a bit better lately, but suffice to say, a naïve flat array / gap-buffer approach is not good enough for some relatively common scenarios even on modern hardware.


I suspect that slowness is due to something else; remember that computers these days can execute a few billion instructions per second.

I've written code to do word wrapping, and it was surprising how fast it was. Line numbers are similarly complex.


We expect modern computers to do something more than just run a full-screen text editor.


I don't think there should be a problem with line numbers. I would make two helper arrays containing the indices of the new-line characters, corresponding to the two gap buffer text arrays (new-line positions are sorted ascending for the first array, descending for the second array).

Speaking as someone who's gone all the way from implementing a Red-black tree to making a rope data structure using the RB tree, to making a text editor that can edit almost arbitrarily large text files (dozens of gigabytes) without user-perceivable latency ;-)


But line numbers are trivial to update when a gap buffer needs a move.

A list of strings is more elegant, of course, where only the line being edited becomes a gap buffer. It taxes the allocator a bit more, though, which might be a concern on computers of the time when Emacs was born.


That's a problem to deal with when/if you need to add line numbers. Not a minute before that!


While that is true to an extent, I've made a lot of money cleaning up after people that didn't architect and design their code to cleanly grow into a fairly obvious potential use case, requiring major rewrites. It isn't a premature optimization to avoid walling yourself into a corner..


This is a complex and nuanced topic.

I agree strongly with designing your code so it's easily changeable into whatever new features are needed. This is much easier said than done, and I don't know if anyone has written well about the tricks of that trade.

But anyway, if you have that kind of code, swapping out whatever you need to make line numbers happen is no more work later than sooner.

Code bases with features implemented that are never used, but you still have to keep working through all changes, because someone imagines it will be a real requirement someday, are what my nightmares are made of.


It's all about the interfaces. More performant solutions require (in general) more complex interfaces.

If your application has grown as long as it could with the simple implementation, and now it is all too slow, chances are there's a lot of code depending on the interface. If your interface (and the implementation) is too simplistic, then all of that code will need rearchitecting, too.


IIRC many DOS-era editors used an array/list of line buffers. Which to me seems like good middle ground solution. Certainly for the todays typical usecase when you care about performance, ie. editing >100MB text data file by hand, which is giant pain in emacs because the gap buffer simply is not good structure for doing few simple edits across three places 10MBs away (as you spend most of the time moving the gap around, while touching essentially all of the memory)


Do you know of any reason why the gap buffer couldn't be easily replaced by a proper rope?


as far as how the emacs API works it should not be an issue, but on the other hand there is bunch of elisp code, that realy expects an gap buffer based implementation and in turn is depended by who knows what...


I don't edit really huge files that often (maybe a couple of times a week), but when I do I want to be able to use the same editor I use for everything else. A really great text editor is fast and flexible and powerful regardless of the size of the file you're trying to edit.


Thats one issue I have with these proposed programming tasks.

You are not going to write a really great text editor as a learning exercise. It has been done by better programmers who had better overview of the problems and over thousands man hours.

This automatically means the task is as useless as a gameboy emulator or basic compiler. The underlying "Things to learn" points are good, but tasks themselves are not.


Writing experimental text editors for fun in various programming languages has been one of the most rewarding learning exercises of my life.

It's not really clear what your point is. You say the task is "useless"—what does that mean? Personally I can say that you are categorically wrong, because the skills I gained building things that are not completely new ideas fueled my passion for programming and opened up doors for me that otherwise would have remained closed. Even if I didn't still use a lot of these projects myself (because I built them to fit me), the value I derived from them would still be significant in the "grand" scheme of my life.

If a programmer is excited about the idea of writing her own text editor, what would you suggest she build instead that will sustain that same excitement and offer exploration into the same diverse subject matter but also satisfy your nebulous criterion of not being "useless"?


> there's no real need to make your implementation more complicated than a single array.

I think you are misunderstood about the concept an array. An array has 1) an interface that is easy to use. On the other hand, by definition, an array is 2) contiguous in memory. Property 1 is good but 2 can cause problems. I think you want only 1.

The solution is to create a data type that has the interface of an array but a different implementation under the hood. You can have a linked-list of arrays, a tree of strings, etc.


I think the original commenter knows full well what an array is.

Vague justifications like "can cause problems" is probably exactly what he's referring to, in fact - people who know that inserting elements into an array is "slow" and end up making large and complex code as a result. Yes, it's O(N) on the length of your code, but the point is that for a couple of megs of text, O(N) is perfectly acceptable.

At least on a desktop, that'll fit in L3 cache which these days is around 175GB/sec. Or to put it another way, inserting that single char can probably be done at around 40,000 times per second. Which is faster than I can type, at any rate.


You'd be correct if people used editors for opening only source code files. The problem is that people usually open data files too which can be not only larger than L3 cache, but larger the entire system memory. The magic of a good editor like Vim is the capability to handle such files.

The other problem with your comment is the support for Undo operation. Even if you use a flat array, you need a more sophisticated data structure for storing previous changes. Storing a separate array for every single change is not an option.


Whether an array is contiguous in memory depends on the language (and the specific implementation of that language). JavaScript uses hash tables for its arrays which are really objects.


Good point! Dynamic languages are different in their terminology. AFAIK strongly typed languages have a clear definition of arrays. The OP was talking about arrays having "horrible performance if the user inserts text anywhere other than the end of the document". I think this statement has an implicit assumption that arrays are contiguous which is not true in Javascript.


> Dynamic languages are different in their terminology

By different you mean wrong. PHP calling an ordered hash map an array doesn't make it one.


I wonder what a text editor made by HN would be like, everyone is already thinking up strategies :)


But did you try editing a multi megabyte file with that method? Cause I have seen enough editors struggle with big files (especially if the file is a single line and you're going through it with word-wrap on), that I think the basic straight-forward approach already isn't sufficient on such workloads.

from the article:

> Luckily, there are some nice data structures to learn to solve this.

You could have also learned a new data structure!

I mean, it should be obvious that "this thing that the cursor does when moving lines" isn't the big takeaway from this challenge. It's almost cute that the author never noticed it (as a programmer), because I actually use that behaviour to navigate code sometimes. Who hasn't done a quick arrow-left/right to make the cursor lose its memory of which column it used to be on?

> Even when DOS machines with <640K of memory and memcpy() speeds in the low MB/s were the norm, people edited text files of similar sizes, with editors that used a single array buffer, and for that purpose they weren't noticeably slower than ones today.

No way. Every reasonably performant text editor in those days used special data structures and not just an array. Imagine having to copy the entire buffer on each key press (so, when inserting at the start of the file). Believe me, on a 640K DOS machine you'll feel that.

This isn't new stuff, I learned about these data structures in uni -- except I don't remember them because back then I was young and arrogant and didn't think you'd need these fancy data structures for something as simple as an editor :) :)

... but if you never tried to write one, it's hard to see in what ways these editors are not as easy as you think.


I dont see any problem with an array. Make it huge so you only have to reallocate every megabyte or so. Keep track of the document length and only move as much as needed. Your processor can do this every character faster than you can type. No need for fancy data structures, and trivial to load and save files. The interesting part then becomes formatting for the display.


Imagine inserting text in the middle of 1GB file. Moving 500MB of data will definitely take longer than 18ms, and thus will cause at least some visible lag.


This is an editor for textual documents. Where did you get a 1GB file?


For example a server log file?


Not a document and why are you editing it?


This. The former should be using awk/perl on that and just operating on the chunks he found, never at the whole file at once.

But, OFC, these new "programmers" can't even figure basic Unix tools. Or performance.


Try a 100MB file.


Takes less than 100 ms to read in the file (calling realloc in a loop), insert a byte in the middle (realloc + memmove), and write the modified file out on stdout. The byte insertion amounts to about 4ms.

That's hardly fast, yet still a lot snappier than most modern editors' UI or the web, where apparently achieving 60 fps for a few hundred dynamic DOM nodes is some kind of an achievement.

https://gist.github.com/hmkemppainen/376b973c568fc122e2d8c84...

This approach really starts to suck when you implement macros that are going to perform a lot of one-char inserts quickly. Or when you're editing multi-gigabyte files.


I must admit I was surprised, although I shouldn't be. Are we at > 10GB/s memory bandwidth now?

> This approach really starts to suck when you implement macros that are going to perform a lot of one-char inserts quickly. Or when you're editing multi-gigabyte files.

I'm working on an editor that I've optimized for such cases. In a test it made random edits to a 4GB file in < 50 microseconds. But, it cost a load of sweat and blood to get that rope data structure right. And it loads files only at about 100MB/s (should optimize for bulk inserts). https://github.com/jstimpfle/astedit


Are we at > 10GB/s memory bandwidth now?

It's been around a decade since that line was crossed. The peak bandwidth of DDR3-1333 is just a bit over 10GB/s.


Interesting project. You don't say so specifically, but it looks like it should compile on both Windows and Linux?


Yes, I make it to compile on both platforms from time to time. The current commit should compile using MSVC, gcc, and clang I believe. I'm happy to fix any issues if you find them :-)


>> This approach really starts to suck when you implement macros that are going to perform a lot of one-char inserts quickly.

What operation is that? Search and replace might have that effect but could be done by copying the entire buffer with replacement happening along the way.


The counter-argument to that is that processors are ridiculously fast in human timescales

Until you actually have to implement your algorithm on a mobile device that is both memory, and power constrained and that doesn’t have a swap file. The OS will either lol your program for being too memory or power inefficient, kill another program running in the background (not a great user experience) and/or force the use of the high power cores using unnecessary battery life when a more efficient algorithm could have used the lower power cores.

Attitudes like this also explains why developers don’t think twice about delivering battery consuming Electron apps.


Umm the dos 640k were paged, non contiguous. Additionally, smoothly editing larger texts back then required some clever linked lists of blocks to give a truly instantaneous editing experience for inserting text at the beginning of a large text. Those were the 286/386 days.

Today you have fancy rendering, and an instantaneous editing experience for that reason again suggests a more sophisticated data structure for the editor.

Which all text editors have, when you look inside vi/emacs/nano/whatever...


> so unless you're focusing your use-case on editing correspondingly huge files, there's no real need to make your implementation more complicated than a single array.

I think the other major corner case is when you need concurrent, distributed editing (although that's not popular or anything these days), in which case an array is a very poor datastructure.


TIF and GIF are trivial compared to JPEG and H.261, saying that having implemented LZW and GIF (in assembly for 8086) as teenager


For TIFF (and most formats) that's heavily dependent on if you're talking about implementing a reader or a writer for the format. TIFF readers need to handle JPEG streams, so in that sense implementing a general purpose TIFF reader is a superset of implementing a general purpose JPEG reader.

On the other hand, TIFF writers can (very conveniently!) be almost as simple as you want, including no compression at all, just blobs of raw pixel values, and a smattering of tags for width, height, pixel format, and that's it. The only thing simpler to output IMO would be uncompressed ASCII formats like XPM.

So in that sense you're correct- the simplest possible JPEG writer is much more complicated than the simplest possible TIFF writer, but TIFF in general is extensible to a fault (arguably), in the sense that the number of possible combinations of pixel and metadata encodings you have to prepare yourself for when opening arbitrary .tif files are far greater than when opening arbitrary .jpg files, including JPEGs within TIFFs.


Back in the days TIF was just a large uncompressed file.

The initial format is older than GIF87a (no animation which people associate GIF nowadays with). It had header but that pretty much it. Of course the format developed with time and even added LZW once the patent expired. Currently TIF is all kind of things, so writing a fully feature reader is a proper challenge (perhaps not coding-wise, but understand it and implementing the myriads of types/extensions, etc.)


Spec implementation is indeed very valuable. But in your cases these are data oriented problems. I'd add some systemic cases: networking, security..

my 2 cents


Last I looked into writing a JPEG en/decoder, I ran into the issue that I was unable to find a specification not behind a $800 paywall.


Here you go:

https://www.w3.org/Graphics/JPEG/itu-t81.pdf

https://www.w3.org/Graphics/JPEG/jfif3.pdf

I'll also link Cristi Cuturicu's "A note about the JPEG decoding algorithm", which is where I started my decoder implementation from, and it was indeed a ton of fun.

http://www.opennet.ru/docs/formats/jpeg.txt


The JPEG reference source code is pretty readable.


"no real need to make your implementation more complicated than a single array"

That's our industry in a nutshell. Our computers, instead of becoming more capable over time, can barely keep pace with the increasing naivety of our programmers.


Yes, this is engineering in a nutshell: determining a course of action within a set of constraints that meets your objectives. Where constraints can be time, cost, physical limitations (processor speed, memory size, disk space), etc; and objectives can be functional (user can edit files), nonfunctional (user can edit large files in < X seconds, energy usage), personal learning, or any number of other requirements.

The GP offered a valid decision point to consider based upon what an engineer is solving for. I don’t think he said that an array was the solution he’d ship in a production text editor to millions of end-users.

Engineering is hardly naive. :)


Actually, that's exactly what I was saying --- plenty of existing text editors use the "stupid" single array, yet no one complains about their performance.

One example? Notepad.


Notepad is notepad because someone, god bless their soul, had the sense to put new features into a different app as Wordpad.

In some terrible, dark dimension, Notepad has a ribbon interface and supports PDFs.


Just because people tend not to edit large files in Notepad doesn't mean they'll complain about it when they do. Actual complaints are of course sparse because hardly anyone uses Notepad for anything serious if they can use an alternative. BUT when they do, oh they will complain.

I believe an older version of Notepad even had a (fairly low) limit on file size it would open.

I mean that's the reverse argument, computers have gigabytes of memory today, and are super fast, so you should be able to load a multi gigabyte text file and edit it, on a single line, with word wrapping.


In webdev, increasingly often, the expectation is that programmers not only to do the backend, but also the database management, the frontend (which used to be graphic design, css/html, and js, separately) and everything devops.

Outside of webdev, Unity springs to mind, as another great example of this: The stuff you can do as a single game developer is mind boggling, or at least used to be, until indie devs everywhere started boggling our minds on a daily basis and thus raising the standard of what consumers expect an indie game to be.

This is, of course, not possible because within 50 years humans evolved to be a lot better or smarter or faster than their predecessors. It is made possible through more flexible higher level tooling, that you don't have to understand the inner workings of to take advantage of, and more abundant computing resources, that in tandem, enable work that will be in the "good enough" territory for most use cases.

This is also not a choice that programmers as individuals or even a group make. It's a choice that the market makes.

There is nothing naive about it. Naive is assuming, it would be any other way.


In web dev I have observed the opposite trend: when I first started my career everyone was expected to be full stack and know how to deploy a thing. nowadays devs tend to be strictly front end or back end or dev ops, etc. Devs that can optimize a sql query, model a db schema and then write a well organized react or angular front end seem to be the exception not the rule.


> In webdev, increasingly often, the expectation is that programmers not only to do the backend, but also the database management, the frontend (which used to be graphic design, css/html, and js, separately) and everything devops.

What do you mean, increasingly often? This was the case 15 years ago already and I see only examples that it has gotten less, because of all the frameworks that exist.

Also it's exactly what I liked about webdev. When your existing talents for graphics design and explainer-of-technical-things shine in a tech context, that feels good. A lot of programmers have no feel for this, and a lot of designers write awful code. Which could have, but historically did NOT improve at all with higher level tooling, mainly because of this "good enough" attitude. Feel free to prove me otherwise, but what did happen: Thanks to things like Bootstrap, now programmers can avoid the worst design mistakes without having to learn design. Graphics Designers, however, well .. I don't know? Are there tools that allow them to write or generate code that doesn't suck? (Without programming skills, like the coders without design skills).

> This is also not a choice that programmers as individuals or even a group make. It's a choice that the market makes.

> There is nothing naive about it. Naive is assuming, it would be any other way.

I don't know ... Do you believe there no longer exist people that deliver quality over this entire skill set? Or that they somehow exist outside of the market?


People jump for complex and overly optimized solutions too quickly, IMHO. From a conceptual perspective, I enjoy these sort of challenges but that's where it ends.

For product demands where deadlines are constantly unrealistic, underfunded, underscoped and demands are ever changing, I'm a fan of providing the simplest conceptual solution to the task at hand and not focusing on developing complex abstractions and optimizations too early.

From my experience, that time is typically wasted until functionality is zeroed in and real money is available to pay for the work, as the early complex abstractions typically fail to meet pace with demands and the optimizations break when ever changing requirements.. change. That's just my experience, YMMV.


Are you suggesting that it’s bad to use the simple uncomplicated approach because it’s inefficient, or that it’s bad to add layers upon layers of complexity which end up bringing modern computers to their knees?

Personally I’m in the latter camp. There’s so many layers of abstraction nowadays which each in theory make programming better/safer/easier which in practice end up creating an incredibly inefficient mess.


Complexity != abstraction != leverage.

Today's software suffers from too many layers of complexity that are each pretty dumb and serve mostly bookkeeping. The result looks like an overinflated bureaucracy. In the example above, using a more efficient data structure for text representation will add at most one layer of abstraction (but there's a good chance you'd create that layer to hide the array anyway), but offer significant benefits in terms of performance, at a cost of little and well-isolated complexity.

This is the best kind of abstraction: complex, deep behavior hidden behind simple interface.


Same. I generally write in C without too many layers between my code and the CPU, and it is just incredible how fast modern CPUs are with naive code that doesn't even attempt to be optimal.

I wish others understood that, because the things I work on are losing performance (and a massive amount of developer time, which could be used for optimization or other useful work) to excess complexity, not too simplistic code.


Vim is 25 years old. Efficient text-handling data structures aren't newfangled gobbledygook.


And vi is even older. Plus ed...look at the release date.


Yet if you read the rest of the comment you would realize this specific use-case (editing text) was done fine with a single array buffer when computers had less than 1mb of memory to work with.

This is a perfect example of when it’s stupid to keep optimizing.


How does 1mb computer keep a 2mb textfile in an array in RAM?


Do you have text files you need to edit that are that big?

I've opened files that big in a text editor before, but it was definitely the wrong tool for the job.


pointers?


I wish that were our industry. Instead, we make things super complicated and make them slower at the same time.

Let's take the text editor example. Let's say we use it to edit a large document. Is Moby Dick large enough? It's around a megabyte of (pure) text. Let's figure out a persistence solution. How about "we save the entire text to disk"? So a megabyte to disk. My laptop's SSD does (large) writes at 2GB/s. So the ultra simple solution could save the entire text around 2000 times per second.

That's a lot faster than I can type.


Your laptop's SSD sure, 2GB/s - that 5400 rpm laptop hard disk that your user has is writing at a measly 1mb/s because the disk is also being accessed by 5 other programs.

Now the user is either queuing up a bunch of background saves leading to overload or is forced to wait 1s per keystroke.

Well done!

I guess the simple solution then is to tell the user to buy a $3000 laptop just so it's capable of running notepad.


Hmm...Mac laptops have been SSD-only for how many years?

Anyway, even laptop drives are well over 40-50 MB/s these days, and any disk scheduler worth its salt will schedule this kind of write (one contiguous chunk) near optimally, so still 40-50 writes/s.

And of course, you queue these writes asynchronously, dropping as needed, so if you actually manage to out-type your disk, all that happens is that your save-rate drops to every couple of characters. Big whoop.

Also remember that this is Moby Dick we're talking about. 700+ pages, so something that vastly exceeds the size of the kinds of documents people are likely to attempt with a Notepad class application.

Last not least, this is a thought experiment to demonstrate just how incredibly fast today's machines are, and that if something is slow, it is almost certainly because someone did something stupid, often in the name of "optimization" that turned into pessimization, because doing Doing the Simplest Thing that Could Possible Work™, i.e. brute-forcing would have been not only simpler but significantly faster.


Raise your hand if just running your web browser has pegged your top-end, multiprocessor, high-mem system in the last month. Both Firefox and Chrome have for me.


I think that's largely due to JavaScript and its ecosystem of abstraction-bloat that has been mentioned in another comment here, along with the trend of "appifying" what should really be static sites. A static page that contains even dozens of MB of content won't stress a browser as much as a "web app" containing only a few hundred KB of countless JavaScript frameworks glued together --- despite the latter presenting a fraction of the actual content.


I use to read the blog on virtualdub.org (video capture and processing) and enjoy his rants on bundled library bloat. Virtualdub was small in footprint and great to use. So do programmers become reliant on scaffolding too much, or is it a necessity as you learn?


Why not both? I mean, I wouldn't say that it's actually necessary, but scaffolding exists to hide away the incidental complexities of the problem being solved, revealing the problem for what it is. Demonstrations of recursion and pattern matching tend to use the same problems because they're such a good fit that there's a very close correspondence between the high-level explanation of how to solve the problem and the code itself.

At the same time we ought to be aware of that scaffolding and how it works (or could work), and how to build such abstractions ourselves. Not just because all abstractions leak[1][2] and potentially introduce bloat, but also because it means I don't have to pull in another dependency to save me a page (or three lines) of trivial code. Or maybe because the "standard" solution doesn't quite support your use case (I can't count the number of times that I've rewritten python's lru_cache[3] because of it not accepting lists and dicts).

[1] https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

[2] https://www.joelonsoftware.com/2001/12/11/back-to-basics/

[3] https://docs.python.org/3/library/functools.html#functools.l...


Not sure about Gb of throughout makes editor memory representations unimportant. Just this week my friend said he kill emacs with a few-MB text file. I was astonished that a software of that esteem would struggle with that.


In my first job I would routinely open 10-20 MB files in Emacs. It handled it just fine. I mean, it gives a warning that this is considered big, but I ignored it.

Now if you open a large text file in something other than text mode, it could bring it to its knees depending on the mode. As an example, opening an XML file in the nXML mode is quite expensive, because nXML mode is powerful and utilizes your XML structure. I just tried a 12 MB XML file and told it to go to the end of the file. It's taking Emacs forever to do it (easily over 30s). But if I switch to text mode for that same file, it handles it just fine.

I just tried an 800 MB text file. It handled it fine.

The one thing where you can easily get in trouble: Long lines. Emacs cannot handle long lines well. Kinda sad.


Are you sure text mode is fine? I usually have to use fundamental mode to edit big files (more than a dozen MBs or so).


Yup. Text mode is fine. If that's causing problems, perhaps you have things enabled in your config that causes problems?

As an example, I have anzu minor mode selected. So if I try to search in the 800MB file, it hangs until I cancel.


It's unlikely plain emacs struggled with that file.


Try `emacs -nw -q` for a stock experience. That should have no problem with any reasonable text file.


> there's no real need to make your implementation more complicated than a single array

Ugh. That's just offensive.


Not really, it's a commonly used scheme. Read on gap buffers: https://en.wikipedia.org/wiki/Gap_buffer

It's really an array with a gap at the cursor location. Used by emacs and others for decades.


In the article, I listed a rope, gap buffer, and piece table as potential solutions instead of "just an array".


VSCode uses a JS piece tree, which allows it to be faster* at manipulating large files than Sublime’s native code implementation.

https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...

*or at least competitive with. I’ve measured it to be faster but I’ve heard others have had different experiences


From a quick read it's actually pretty close to exactly what Sublime Text does. Source: Worked on that code at sublime recently.


But that's massively better than just an array. It changes the time complexity from O(char_inserts x file_len) to O(cursur_move x file_len), which is likely a couple orders of magnitude better.


That's not "just an array".


How big can file can a gap buffer be used on until it starts to slow down with current hardware?


That's not a single array though (per the OP's point)


Kind of related, there's this list of "Build Your Own $FOO" (that I'm pretty sure I learned about here on HN originally :) https://github.com/danistefanovic/build-your-own-x

- Build Your Own Text Editor

- Build Your Own Shell

- Build Your Own Git (!)

etc.

Comes complete with the Feynman quote "What I cannot create, I do not understand".


Wasn't the quote, what I cannot explain I do not understand?


Hah, yes, good point, I think you're correct about that.

The repo has an adaptation of the quote, then. :)


Honestly, I hate the way these articles are framed: "X things every programmer should [read|try|learn]". I've certainly seen a list of 1000 books everyone should read. I hate the way these are framed because you'd spend your entire life doing all of them so it's clearly not practical.

Can we step away from the hyperbole here and just start saying (in this case) "Interesting project ideas" or somesuch?

This of course leads to people piping up their own "must dos" like "write a compiler" (huge undertaking).

Interestingly I see a comment here like "do something with actual customers" and the replies are interesting, essentially dismissing this as a business rather than technical problem.

I find this interesting because software exists to solve problems for people so this is probably the most useful advice I've seen. The ability to identify a pain point and use software to solve it is arguably the most useful ability a software engineer can have.

You will of course learn things by writing a compiler or . text editor or a ray tracer and if scratches an itch for you, by all means go for it.


The author thinks these are valuable challenges every programmer should try. Not must, should. It's just an opinion. If you don't agree with it, then don't do the challenges. Why does it cause you to write a 200 word rant?


Why did cletus' post cause you to write a 40 word rant?

I thought it was a valid point. 1000 books I should read? If I devoted all my free time, for the rest of my life, I might make it. But that leaves me exactly zero time left for the next person who's got some fine-sounding "should" for me, or the next, or the one after that. It's all my time for the rest of my life - for one person's "should".

I don't think it changes the problem to say "should" instead of "must". It's just an opinion. It doesn't create any more of a "should" for me than it does of a "must".


I second this, when such advice comes from people who have some authority, it can be very confusing to a lot of young people entering the field. It could even be disheartening and really demotivating for some, who otherwise might have exceptional logicial/design/frontend/database/statistics/ML... skills.


These are great suggestions!

A few years ago I worked through building a spreadsheet in JavaScript. It was a great introduction to interpreters. I read through Writing an Interpreter in Go by Thorsten Ball [1]. Constraining the interpreter to execute formulas in cells was a straight-forward way to approach building one from scratch.

Writing a Pratt parser as part of this forced me to understand how it works. Figuring out how to process a sheet led me into algorithms and structures like directed acyclic graphs (as mentioned in the article). I found myself referencing Introduction to Algorithms and really studying it [2].

In the end I turned it into a talk at Big Sky Dev Con in Montana. The whole thing was a good experience - from researching how to do it, to sticking it out through the implementation, to distilling it to a 45 minute talk. Be sure to check out the recording [3] and code [4] if you're interested.

Any of these suggestions will lead you down a rabbit hole of learning with a clear objective in sight to keep you motivated to dig deeper.

[1] https://interpreterbook.com/

[2] https://www.amazon.com/Introduction-Algorithms-3rd-MIT-Press...

[3] https://www.youtube.com/watch?v=Sj4h0DcVLL0

[4] https://github.com/lancefisher/cellularjs


Thank you for sharing this, Lance! I'm a "recovering actuary/Excel power user/Excel developer (shudder)" currently in Lambda School in their web development track.

I'm really enjoying watching the recording of your talk, and in addition to learning one way to build a spreadsheet, I'm also learning lots of good software development practices orthogonal to the specific project, which is great.

Again, thank you for sharing!


Thanks! Glad you like it!


By the way, I'm curious, how did the company you work(ed?) for in Missoula, MT, get started? I'm always interested when I come across software companies in non-megalopolises.


The three founders live here, and wanted to build a software company. They worked hard to make something people wanted, and raised money from the three Fs (friends, family, and fools). They applied to YC on a lark, actually got in, moved to Mountain View for 3 months, learned a ton and made good connections.

After gaining traction in the market it was possible to raise from angels and then VCs.

I believe you can build and sell software from anywhere if you have the drive and find a way to solve problems that people are willing to pay money to solve.

Here's an article that talks more about it: https://missoulacurrent.com/business/2018/08/missoula-tech-s...


Write a database. Depending on what angle you take, it can look like a compiler (SQL parsing, predicate evaluation, optimizing for index usage), OS (file systems for in-place updates of complex data structures, and scheduling with lock dependencies), distributed systems (consistency and availability tradeoffs across machines, with possible partitions).

And then there's the whole mental model of relational algebra and stream processing of queries.

It'll give new appreciation for existing databases and what they can and can't do for you.


One of my favorite courses in college was basically this. I learned so much about filesystems, data structures and concurrency that has been useful in my career!


For me, it's hard to learn by making useless things. Without any impact for users, I have no motivation to make correct design decisions.


I totally agree. I personally will never work on any of these things unless the end result is being used, either by myself or someone else. Or I'm getting paid for it. Spending all that time working on something that ultimately reinvents the wheel and just ends up gathering dust in a github repo just feels so pointless to me.

No offence if you enjoy this sort of thing though, of course. Just not my bag.


It differs from person to person I guess. For me it is more about learning and understanding stuff. It motivates me more than doing my day to day job, because I encounter completely different domains and problems :)


1000% the same. I often get motivated to mess around with or learn something for the sake of it, but have a very hard time choosing something to get motivated to mess around with or learn about.

Give me someone who needs me to build something I don't already know how to build, though, and I will figure that shit out.

I've been lucky that this has worked out well for me so far, but it means I need to always try to get on projects with unfamiliar things or take new jobs involving unfamiliar things, or I'll have a really hard time expanding my skillset.

Oh, and school sucked.


:( These are the projects that I look forward to though. Not having anyone to please is a very nice break. I find that to be the opposite of useless.


Then make something useful, at least for yourself. A tool that helps your productivity. An emulator that you will actually play games on.


IMHO I think databases could be added to the list.

It’s one of the most complex system one can develop and you end up learning about multiple areas, such as OS, compilers, distributed systems, data structures, parallelism etc.


The problem with writing a database, and maybe a few other of these, is that given the enormous compute/io capabilities of a modern machine, its quite possible to implement it completely wrong and never really know.

AKA, a toy database might be enough to handle some simple storage/retrieval problems but be full of hidden O(n^2) or higher logic which would fall down hard with even fairly simple usage in the "real world".

Reminds me of my own text editor, written in Applesoft basic when I was in middle school. It worked for its intended purpose (editing small assembly files), but was really quite terrible all things considered. I remember it being quite slow to save/restore, and it was only really capable of editing files of a few hundred lines before it started breaking BASICs memory allocation schemes. AKA, I didn't really learn any of the datastructure finesse needed to implement a "real" text editor with line wrap/etc.

Worse I remember trying to read the code a few years later, and while it fit on two printed pages, it was 100% unreadable.

(for those that don't know, applesoft's speed was influenced by "formatting" if you will. It encouraged line number usage only really for control flow, plus the long list of call/peek/poke magic numbers required a handy cheat sheet of what each address did)


> It’s one of the most complex system one can develop

The same could probably be said about the internal combustion engine, but it might soon be replaced by electric batteries, which provide a much more elegant solution.

I believe that "unbundled" databases, such as Crux[1], can become the electric batteries of the database world by making a lot of the current complexity irrelevant.

[1] https://opencrux.com/


Whoa! somebody actually try to implement the log as the central component of a db!

After read https://www.confluent.io/blog/turning-the-database-inside-ou... I wondered about that. I think that make the log first class and "plug" relational tables (optionally) will make a amazing database engine. In short, you PERSIST your commands:

    POST /City ..
    PUT /City ..
    DELETE /City ..
and put listeners that decide if persist or not the commands, this allow to do:

POST /SendMail (to:...)

and have the flexibility to bundle the domain logic on top of the data logic in a single lang. This is my long term goal..


What makes crux's internals less complicated than a more conventional database?


1) the single-writer principle of the transaction log means there's no need for any transactional locking

2) the separation of reads and writes allows for elegant horizontal read-scaling without coordination/consensus

3) pluggable storage backends implemented as simple Clojure protocols (as the sibling comment mentions), which eliminates a large number of performance and durability concerns

4) combining schema-on-read with entity-attribute-value indexing means there's no need to interpret and support a user-defined schema

5) Datalog is simpler to implement and use than the full SQL standard or alternative graph query languages

...I work on Crux :)


Please tell us more about point 5!


Crux uses a Worst-Case Optimal Join [0] algorithm with bitemporal indexes, and the Datalog-specific query layer is implemented in less than a thousand lines of Clojure: https://github.com/juxt/crux/blob/master/crux-core/src/crux/...

SQL certainly provides a lot of bells and whistles but Crux has the advantage of consistent in-process queries (i.e. the "database as a value") which means you can combine custom code with multiple queries efficiently to achieve a much larger range of possibilities, such as graph algorithms like Bidirectional BFS [1].

[0] https://arxiv.org/pdf/1803.09930.pdf

[1] https://github.com/juxt/crux/blob/master/docs/example/imdb/s...


I suspect the gp was referring to it using either Kafka or RocksDB for storage.


This. Just implementing a persistent key value store with b-trees is a great jumping off point

This is a rabbit hole that goes as deep as you want it, which has both positives and negatives of course


I'm doing a relational lang(http://tablam.org), that could be considered to be a in memory kind of db.

Is certainly challenging.

Just look at joins. You have (at least) 2 nested loop joins algos, then sorted and hash joins, and then you have cross and left, right, inner and outers. All of them with small subtle tricks to make it performant (in theory: You can build all on top of CROSS. But! That will be very wastefull very fast!)


The caveat there is you can write up a fairly simple nosql database in an afternoon. What I like about the other projects is that the barrier to even a minimal thing is a bit higher. I think that leads to more opportunity to work your creative muscles.

Though if you add some constraints like it must have jdbc compatability then that becomes interesting.


This is a surprisingly nostalgic list. It's not that there's no benefit to these projects, it's that there are more modern projects - custom web servers, practical DSLs, 2D and 3D iOS/Android games, simple AR, simple ML, simple embedded hardware - at a similar level of difficulty, which are easier to extend to real commercial applications, and also teach important CS fundamentals.

You could also do worse than learn Erlang, Haskell, Julia, or some other interesting-but-not-really-mainstream language.

I'm surprised the author appears to be a millennial. I would have expected a list like this from someone my age, who started coding when TinyBASIC and Space Invaders were the new shiny.


   practical DSLs
Isn't embedding a "practical DSLs" another toy compiler?

Except that with a DSL, you have to worry about gnarly but not-so-interesting problems like what happens to my syntax (given by a CFG) when I add another syntax (another CFG)? Are the combined CFGs ambiguous etc?

Keep things simple!


A few things I did in the past I would definitely recommend:

* Make a Lisp https://github.com/kanaka/mal — tutorial on how to make a lisp language interpreter.

* Make an 8 bit computer https://eater.net/8bit — tutorial on making a simple 8 bit computer out of a bunch of AND, OR, NAND chips. Really helps you understand how computers work, what microcode is, etc. Honestly, building it wasn't quite as much fun as I thought it would be, but just learning how to build one was very useful.


When I tried MaL I got hung up on the regex in one of the first steps, which is complex and tricky to translate into any of the umpteen other variants of regular expressions, and utterly vital to the overall project. Frustrating.


I would like to add that a good project is to fix some bugs or add some feature to a project like, say, GNU Octave; especially for new developers. It really expands your mind as a new developer learning how to read code and understand where to make changes and add features.


I agree! I'll be teaching a graduate-level SE course in the future and I intend on making contributions to open source projects a requirement.


I sat around with a desire to “make an open source contribution” for years. Followed the advice about looking through bug trackers for tractable issues. Never could make heads or tails of a project.

Then at work I become an ambitious user of a relatively immature open source platform. As a power user, I finally had real motivation to fix specific defects, and a good enough mental map to get it done. It came naturally.


Won't this mean that those projects' maintainers will suddenly get a random influx of half-baked pull requests, which are somehow tied to a grade? This sounds a little too close to the "term papers must be good enough to be submitted to conferences" anti-pattern.


* Low latency. Receive at 10k messages per second (one every 100 microseconds), have non-trivial amount of business logic, ensure all responses are sent within 50 microseconds. I did a proof of concept of algorithmic trading platform in C/Common Lisp for a brokerage house that was interfacing stock exchange, was tested to withstand 10k mps over an hour with 10 microseconds end-to-end latency (measured on switch). I learned A LOT.

* Non-trivial application with very restricted amount of memory, crappy networking, that has to be very reliable. I did an entire payment POS application (EMV, contactless) on a very ancient device. It had total of 2MB of memory (SRAM + flash, it had execute in place). I needed to do a lot of research to optimize the application for memory usage -- for example, had to implement transactional database to work within constant amount of memory. I also needed to research a lot of techniques to make my application reliable regardless of what happened. I also learned A LOT on that project.


In typical HN fashion, this just sounds like an excuse to jaw about your resume projects.

How would these clearly niche projects be something "every programmer" should try? When in enterprise engineering is 2MB a normal amount of memory for something to need? I'ved worked in videogames, fintech, e-commerce and worked on many kinds of systems and not run into anything with such requirements.

Not to mention, the sorts of "lessons learned" on such projects could actually hurt an engineer when working on more realistic enterprise systems. At my current job (where I manage many teams of engineers), if I saw engineers micro-optimizing to save 2 MB of memory instead of getting features shipped I would ask their tech-lead wtf was going on.

I've seen far more damage done by premature optimization in my career than I've seen it actually help matters. Engineers trying to be "extra smart" and show of their extreme memory/cpu optimization skills so often leads to wasted time, hard to debug code, or worse, annoying bugs which are much more difficult to diagnose later.


You miss the whole point of the list. The idea is that experience of working on variety of projects makes you better developer in whatever field you are in.

This assumes you will understand where to use and when not to use those techniques.

The list gives examples of projects each with unique requirements. Each requiring you to think about the problem in a different way.

I just gave two more examples that I personally found very illuminating, that also have unique requirements, different from other projects on the list.

I may be working on some more mundane software atm but, for example, I know how specifying size of every buffer or any data structure is important for reliability of the service I am working on.

In fact, the service had bad track record of reliability which was quickly fixed by specifying what it means to do stream processing (no unbound data structures in memory, etc.) and quickly verifying each component against this spec.

I learned this and many others from MISRA which is standard to help write reliable software for automotive industry. I adopted it for my embedded app to help me work on complex app that had to be very reliable.


Can you tell us more about what you learned from point one and two?


A lot of these things involve graphics. Every time I try doing my own graphics (not using an engine like unreal/unity) I have trouble even getting a triangle to show up on screen. There is nearly always something that doesn’t work like the tutorial says it should and that isn’t common enough to come up when I google the error. In C/c++ I can never get the toolchain working correctly and compiling step one never works even when I pull the authors project exactly. Last time I tried pygame it turns out it like just doesn’t work on Macs or something? None of the examples worked, just got blank boxes. Couldn’t make it past the first few steps in lazyfoo’s tutorial, even unreal engine’s tutorials are constantly being outdated due to short release cycles, although I’ve had the most success with unreal.

I’m not sure if I’m just thick or something, but any project that involves low-ish level graphics I shy away from because I’ve had so much trouble in the past.


don't worry, it's not you alone.

openGL is only a specification. and there are many openGL versions. and drivers have different implementations. and modern openGL uses shaders. which use another language, GLSL, which also has different versions. and openGL needs a context to draw on. each OS has different display servers to create those contexts and windows. here we also have to differentiate protocols and actual implementations. and then you might want to write openGL in a certain language different from C/C++. now choose a library or two or five, where each one will deal with one or two or twenty of those issues, plus other things like keyboard and mouse input, etc.

so, either you follow config instructions very closely and repeat until you find one that works, or you try to start understanding all this, get very annoyed and throw your computer through the real window.


> Last time I tried pygame it turns out it like just doesn’t work on Macs or something? None of the examples worked, just got blank boxes

PyGame seems to work if you install the official Python 3 binary from python.org and install PyGame with the pip package manager included with the Python installation. (Last time I've checked Conda Python and the version from Homebrew have some problems when installing PyGame.)


I image most of these 'low level' approaches involve OpenGL which seems to be a bit troublesome on macOS. I can recommend to have a look at webGL, it's basically a limited openGL subset that is called from Javascript. Really easy to get started as it involves no dependencies, compilers, etc. except for (basically any) webbrowser.

Used it for my first graphics related programming project. Was lots of fun and I learned a lot.


Why not write a terminal app? Everything except for the game emulator can be usefully done inside the terminal, which is pretty easy to get up-and-running with (and provides its own unique set of design constraints).


It might be genuinely easier to do it in emulated MSDOS QBASIC or BBC BASIC.


Check out QB64. Modern implementation of quickbasic. Looks, behaves, and feels like the original if you want it to.


all my attempts at learning graphical programming consisted of getting about 3 tutorials in and then getting to a point where they just didnt work anymore

eventually I ended up learning via the HTML5 canvas element and WebGL, which gets rid of the problem of having having a different setup to the author


>how do you generate a dynamic number of enemies? The factory pattern helps a lot.

Why?

The factory pattern is about adding logic and state to the creation of objects. What's wrong with creating enemies using their usual constructors and dumping them into a resizable array?


You use a factory when you have code that needs to create something but shouldn't determine the details of their creation. If you want to create enemies from multiple places (seems common) you don't want to have to refactor every place when you change something about the creation or pooling.


Yes, if you determine that creating your objects requires passing in state or using some logic that you don't want to put in the constructor, then using a factory or factory function makes sense. However that has nothing to do with creating a dynamic number of enemies.

Object pooling is a good point actually.


It does make it a lot easier to hide implementation details down the road, for instance if you start using object pooling.


If anyone has any other good ideas, please let me know. I'll make a list of them at the end of the article and provide credit.


My suggestions:

Implement basic widget toolkit, making something that essentially works is simple (regardless whether you implement in on top of dumb framebuffer, existing display server like X11/WinAPI or even on top of HTML). But there are lots of nuanced details that you will probably learn along the way, some of them overlap the text editor problem, then there are things like not opening windows/popup menus outside the display area (there are widely used toolkits that get this wrong even for the simplest “select box on bottom of the screen” case), clipping, handling scrolling (both detecting if something is too large and should be scrollable and actually implementing scrolling efficiently) and last but not least designing the API to be both powerful and simple to use.

Simple transactional database that really has ACID semantics. Locking, BTrees, journaling, isolation levels, how to actually store all that in files.

Toy unix block filesystem (there can be some overlap with previous point and for journaling design it might be better to start with the FS case). Recommended reading: https://news.ycombinator.com/item?id=12309686

And, which is maybe somehow motivated by doing this mostly pointless thing as long abandoned pet project, and thus maybe slightly nonsencial: implement AFS-like “distributed” filesystem. The interesting part there lies in RPC and authentication mechanisms, not in doing some kind of distributed consensus (thats why “AFS-like”, as AFS has single master server for each volume)

And at last, although the theme is somewhat different: Implement a tool that given a path to unix file prints a list of users who have access to the file and what kind of access they have. This sounds like simple tool, but is surprisingly non-trivial even without takong ACLs into account. (This is not from me, but I read this in some similar list ~15 years ago, IIRC by Andrew Tridgell)


One professional project that really changed some of the ways that I thought about programming was writing a PCIe Linux driver. Learning how the kernel lays out data structures and how they provide callbacks and passing data back to the 'user' was really just different than I had thought about things before. Definitely a lot of pitfalls, but lots of learning.


My goto project is a terminal emulator, for reasons I went into here: http://neugierig.org/software/blog/2016/07/terminal-emulator... (summary: it touches a surprising breadth of details!)


Are you familiar with Etudes for Programmers? There's a bit of overlap with your list and it's also an interesting look back at what someone thought programmers ought to try their hands at a few decades ago.


- A load balancer module that accepts generic worker functions.

- A service that can take a real-time timestamped data and generate an internalized time series. For example returns a first, last, high, low, and average value at fifteen second intervals.

- A mechanism for serializing structured data. Something like Google Protobuf.

- A chart widget optimized to display and navigate date based content.


It's fun trying to recreate GNU coreutils. Stuff like grep.


Port Scanner/nmap clone - This project is much simpler than the others, but is more approachable. It's a great project to implement when picking up a new language. If it's your first time writing one, it forces you to learn a bit about networking and concurrency if you want it to scan quickly.


MS paint clone.


You'll have to establish an MVP program on top of the clone:

https://weblogs.asp.net/alex_papadimoulis/430478

</fun>


I've never tried but I can see that being really fun. Someone else sent me an email suggesting a vector graphics editor. I'll add both, thanks!


A lightweight clone of git: implement basic commands such as log, merge, commit, fork.


What is the “fork” command? I don’t see it here https://git-scm.com/docs


Create a new branch via "git checkout -b"


An emulator of something is a pretty good idea I think.


a simple paint program in assembly


Perhaps something web-based--perhaps collaborative or real-time--and get exposure to sessions, database work, networking, APIs, and the tooling ecosystem, etc.


Seconded, a minimal full-stack system like a messaging service could go in many directions depending on your interests.


I am curious why the author suggests programming spreadsheets is "hard" but programming an operating system isn't. Arguably, all you're doing in a spreadsheet is building a dependency graph of nodes, then re-rendering them when an input changes. It's about as complex as building your own React.


Didn't somebody win an IOCCC contest with a spreadsheet coded in under 2K bytes?


What a great list! Adding some suggestions... some I've done, others are goals (especially APL and Prolog)

1. Design a new programming language. Prove it Turing complete (or go back to the design phase). Maybe do a compiler for it.

2. Design some hardware. There's cheap FPGAs and nice walkthroughs for architectures like the 6502, but my next project will probably be a custom architecture because I've never done a proper compiler before.

3. Raytracers! I did one in JavaScript back in '02... ah, how languages evolve...

4. Break your language paradigms! Learn Lisp, APL, Prolog, Haskell... This is a meta-challenge: do another challenging project in an unfamiliar language, and get to know powerful idioms in the language and how they're handled behind the scenes.


These aren't bad. They are, by no means, a definitive list. If you can do all of these, good stuff; but there's plenty more where they came from.

To be honest, the very first professional project I ever did (in 1987), was a complete embedded OS.

In 1983, I wrote a Space Invaders game on my VIC-20, in Machine Code. In those days, you had to use characters to represent aliens, so I made up my own font.

I've done a number of text editors over the last 30-some years. Nowadays, most operating system frameworks have 99% of a full-featured text editor built in, and you can do it with a few calls to system resources.

I haven't written any real compilers (haven't ever felt the need), but I have done a ton of parsers and whatnot.


What are you referring to when you say “operating system frameworks”?


UIKit, AppKit, Windows SDK, Cocoa, Android SDK, ChromeOS, etc.

In AppKit, you can do a pretty full-featured text editor as a "Hello World" project.


Which is pretty much exactly what TextEdit is: https://developer.apple.com/library/archive/samplecode/TextE...


How about contibuting to someone else hard project ? Because doing it alone, whatever the level of a task, is often easier than grasping existing code base and work with others ;-)


I think the best project to try is the one that you have little to no knoludge of. I find sometimes the best way to learn is by failing. For myself; i hate the feeling, and failing isnt somthing that I do well. Pushing through setbacks strengthens your abilities and produces a new confidence that anything is possible. Also you can bring in an outside perspective to a project that isnt typical to that field or group. When the project is complete; you look back and realize how far you have traveled! At that point the wealth of knoledge can be quantitatively measured. When we streach beoyond what seems possible, dig in a little deeper, and strive to become a little better than we are; at that point we grasp the intangible, at that point we define ourselves. Its thoes small victories that give birth to new innovation,define generations,and forges a way for everythings' future. If you stand up and tackel the unknow, brush off the ridicule of ignorance,and get crazy enough to defy logic itself; I belive thoes minuscule moments in the history of humanity are the very iota's that allowed God to see the future of man and understand his creation is 'good.'


I understand this is posted on a University website, so I guess it is ok, but gosh I've wasted way too many days of my early career chasing things that the 'seniors' said I 'should' do as a programmer.

This is just not true, but worse still it confuses people who are just starting. You can be an exceptional front end developer with HTML/CSS/JS scripts and not necessarily have a mini OS in your past projects.


There is a difference between "must" and "should". Take this list as an advise on how to become better at understanding the fundamentals. A front end developer for instance could take html,js, and the DOM as gospel. Or they could try to understand their foundation and how the same concepts apply in different domains (e.g., what and why are the distinctive advantages/shortcomings of js). I prefer the latter and I think it makes me better at my job.

Also it is kind of sad when a fellow developer is stuck with a problem that can easily be debugged with some OS knowledge.


I can't help but notice that this post is just below PG's "Having Kids" post. That's a challenging project right there.


Either a raytracer or a software rasterizer. Or both.


I almost added raytracer! I'll add it to the end with some other community suggested projects. Thanks for the suggestion.


Raytracer was the thing that sprang to mind for me.

Currently the only two I have left to do on the list are the Basic compiler and the spreadsheet. I plan to do the compiler quite soon, once the emulator IDE is nice enough. If I decide to do a Basic without line numbers I'll probably be doing another text editor as well.

Not sure if I could bring myself to write a spreadsheet. I feel queasy just looking at them.


Seconded! Raytracer is phenominal, especially if you do it “purely” (no GPU/OpenGL/etc.).

It’s relatively straightforward and the satisfaction of seeing even a basic shaded sphere that you literally created from scratch in code by writing pixels to a bitmap file is awesome. I did this about a year ago and it was still one of the most fun/educational projects I’ve ever done.


Writing an emulator is a great project indeed. I've been working on my GBA emulator for almost a year now and I learned so many different things about hardware, assembly programming and video output.


https://adventofcode.com should probably be on the list. You end up doing bits of several of the suggestions along the way.


My suggestion: build a basic calculator then extend it... Add scientific and fixed formats. Add functions like log, sine, exp - especially if you create the math ops from scratch. Detect roundoff error and suppress it gracefully. Add binary, octal, and hex base numbers and binary, shift, rotate and bitwise operations. Add 2D graphs and 3D surfaces. Add control of illumination and stacking order for multiple plots. Add animation. Evaluate expressions. Add programmability.

Or just invent your own private Emacs.


If the 'every programmer' already has a paying job, then there're may be other ways to advance their craft. For example, joining an open-source project that's worth their while and maybe contribute when ready.

Of course, one's motivations may be as diverse as their perceived rewards. Still, solving problems which need a solution always has a room for programming/design challenges, while working with others lets one connect and learn from the collective knowledge.

Eager minds with free time are precious!


Another interesting project is to do your own web framework. Routing, ORM, validations that work both server and client side, rendering server and client side transparently are some of the challenges. Just do something and improve some actual pain point you see with a popular one. I did mine and, even thou I have never used it for anything important, I still think it is a good learning experience and I can point to some aspects of it that are better than all the major web frameworks.


I did a text editor once; it was bad.


For those who want a real challenge I'd recommend trying to write a code that could read/write PDF files without using any purpose built libraries.


I guess most of the time we do not know what we want to do. That's the time you can productively spend working on improving your skills. But unfortunately it is too hard to work on something that does not lead to worthiness of some sort. So I guess it is good to look for something worthy to be done in their life while sharpening your skills.


Implement Forth, Lisp, and Prolog. (For extra fun write your Lisp in Forth and then your Prolog in Lisp...)


Here's a good list of project based tutorials that you should follow https://github.com/tuvtran/project-based-learning/


That list sounds great, and I can see how each of those projects presents a unique set of problems to solve.

For me, though, working on my day job and my "for fun" side project take enough of my time and brainpower. I never lack for problems to solve.


Does anyone have a good guide like this but with good answers?

I can come up with decent solutions to problems, but the best way to learn is to be able to see alternatives you hadn't thought of so you learn something new.


I was surprised to discover that interpreting BASIC to be more difficult than a C-like language, as most statements required their own syntax rule.


My "problem" is that I struggle to find the motivation to program side-projects, when I already spend 8hours a day programming at work


You could try an realtime online multiplayer game if you want a category of difficulty past the emulator. Lol


Great idea for projects!


Web developers should write a simple web server.


Just a quick nitpick; could the link be updated to use https? I'm a bit surprised it doesn't automatically redirect in the first place.


I'd probably do a doubly linked list for the test document. Then you're looking at O(1) for operations I think.


Not for searching. Linked lists are fine if you are at the point where you need to do an operation, but getting to that point can be expensive.


I'd strongly recommend using a gap buffer for storing the text in while it's being edited, and within reason I wouldn't worry too much about runtime performance, as any reasonable solution should be able to keep up with a user typing.


A neural network from scratch?


I thought about saying this as well, but this isn't really a challenging programming assignment. It's a challenging math assignment, but once you know the math the programming is very easy (assuming using a numerical library that supports vectorization)


That's fair, and you're right about that. I'd still highly recommend anyone interested in ML to implement their favorite algorithms from scratch.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: