Hacker News new | past | comments | ask | show | jobs | submit login
Challenging projects every programmer should try (utk.edu)
911 points by azhenley on Dec 14, 2019 | hide | past | favorite | 297 comments



I think the #1 project every programmer should try is a project that has customers. There's so much involved in achieving that. It will possibly prove more difficult than any of these. It will probably lead to a different understanding of every other project that programmer then takes on, too.


Solving a programming problem is very different from solving a business problem, and building something that has customers falls more into the latter category. Lots of people want to simply improve their programming and problem-solving skills, and this list is perfect for that. Customers add a whole new dimension of complexity which most people might not want to deal with, especially with fun side projects. So setting the expectation that everything you build needs to have customers is definitely wrong.

Building stuff like a text editor, simple game, database etc. from scratch which will never be used by anyone other than yourself can be immensely challenging and satisfying.


Build an oss driver for something popular. Or an alternative. Even just addons and plugins for popular products will get users. They don’t actually have to pay you for them to be customers, but that’s a benefit. Go where the demand is.


That kind of work is a whole lot less satisfying. Why would I want to deal with GitHub issues and fixing bugs when I could solve genuinely interesting problems?


I’d wager that many don’t find it less satisfying, and maybe even more so, if only for the reason that solving an issue for a customer comes with the benefit of immediately and clearly improving the life of someone else. Solving curiosity-sating programming challenges doesn’t have the same social implication built in. Personally, I’d muuuuch rather work on a mundane problem that immediately improves someone’s day than an issue that is in and of itself more interesting, but also lacking the clear customer/user demand.

This isn’t to say that solving beat programming challenges doesn’t improve anyone’s day, but the demand for the solution isn’t being made public and clear, which many people see as a necessary requirement for meaningful work.


> the benefit of immediately and clearly improving the life of someone else

More like the benefit of people being ungrateful for the effort you put in and then being hostile about your lack of continued motivation to solve their problems. All for absolutely zero financial gain, so the difference between that and a personal project is that you feel good for solving something, then bad because people hate you.


> Personally, I’d muuuuch rather work on a mundane problem that immediately improves someone’s day than an issue that is in and of itself more interesting

Don't get me wrong, helping people is great and I do in fact find it one of my greatest motivators, especially for things I would otherwise consider boring (or indeed, mundane).

But something this is in and of itself more interesting? If you wouldn't rather work on that, doesn't that mean it's just not interesting enough? :)

I guess everybody is different. Even though I have found that helping people is for me probably the strongest motivator there is, if there is one thing that can make me procrastinate on that, it's problems that are in and of itself more interesting :)


I discovered over the years that I like solving business problems, but in that context programming is purely utilitarian and just a tool like I would use in another industry. But since I also love coding, I code in my free time without any constraint other than having fun and improving.

So for me the solution was simply moving away from an every day developer job at work and more into a product management/CTO like role.


I started off with non-tech businesses (literally house painting as a teenager) and I've got to say I like solving business problems with engineering a lot more than solving them with literal sweat and pain.


How did you start making a change? I feel like I'm in the same boat


What worked for me was transitioning to a “solution engineer” role. The name can vary but basically it’s consulting and integration to help clients use the product. Here I got an overview of the whole lifecycle of a product from a client perspective: their business problem, the need for a solution, the product fit and the sales process, then the integration challenges, user training, and hopefully a good reference and testimonial for securing the renewal and providing the marketing with material. There was always a “tension” between what the client wanted and what the product was doing, but learning how to understand their need and the appropriate answer (technical or not), and when does it really make sense to add a feature to a product was the key skill I learned to develop in that role. Participating in sales meetings was invaluable for starting a business later but maybe less relevant strictly for product management.


It definitely will not be for everyone. It's a bit like how many people suggest learning to code just a little, even if not intending to pursue it as your career. It still leads to this huge new understanding of things. Similarly with programmers and customers.


From a technical standpoint, I disagree. Customers tend to get in the way of good code, they have everchanging requirements, set deadlines and priorities that more or less force you to be messy. Experimenting with techniques, redoing things, etc... is hard when you have a customer expecting this or that feature to be implemented yesterday, and yet, it is important if you want to progress.

Working with a customer will make you a better businessman, not really a better programmer.

Doing a project with customers is extremely valuable if you intend to start a business, but if you hate that and just want to be a good coder and make a living out of it, there is a solution: find a good boss/manager. He will take care about the business stuff and allow you to put all the skills you got doing toy projects to good use.

Otherwise, if you want to do something that looks more like a real-world project without the hassle of dealing with customers, make software that you really use, and share it. A good candidate would be a video game that you enjoy playing, or some tool for a hobby of yours, or a utility. That way, no one will stop you from experimenting, but you still need to think about usability.


>Customers tend to get in the way of good code

Wow, everything wrong with the tech industry right here.


They're not wrong, but neither are you. As a programmer who doesn't write for customers, I'll say, yeah it's sometimes true. But if I were in the business of writing software for customers, I would word that very differently. The issue described is a challenge in said industry, and being good at navigating that problem is a quality that separates the great from the mediocre. Describing that as "getting in the way" ignores the skill and pride in taking on that challenge.

Actually I'm not even sure. I have a little experience with freelance web development around 2007 (browser war veteran), and I found navigating between customers expectations and "good code" to not be hard at all. I did read a lot of blogs on webdev (including the business side of it, contracts and expectation management, etc) so I probably got some great advice somewhere[0]. Reading about it actually got me more enthusiastic about tackling the problem as well as I could. But usually the details that made it good code or not were technical details that I didn't bother them with. Or they were things I could sell; browser compatibility and accessibility (back then I could use the line "Google is your most important blind user" -- now no longer true).

Of course I am aware that as a freelancer, I enjoyed a lot of freedoms that you don't always get as a programmer-for-customers in different business environments. But if that's the case, that really proves it's not the customers that get in the way, but the institution.

[0] I kinda miss the ... blogosphere.


"The tech industry"; maybe. "Hacker culture"; no.


Not even close to everything wrong in the tech industry


(Note: I speak only for myself and others who share my personal preferences)

This comment really hit home for me. Making something good is often at odds with making something that's profitable/popular. I've often found that the most popular software is far more bloated/limited than the best software. This might be because people like me tend to favor smaller programs that are FOSS and work well with other FOSS; this view is held by a minority, and is thus a less profitable niche to cater to.

One example is mpd+ncmpcpp versus Spotify or iTunes. The former is far more well-designed, performant, featureful, and flexible; however, it will never be popular/profitable because it's FOSS targeting end-users, it runs in the terminal, it lacks CPU-intensive pretty animations, and requires users to understand that it involves a client and a daemon.

The list goes on; from IRC/Matrix versus Discord, to Linux/BSD vs macOS/Windows (an opinion that's a bit more controversial here on HN, but held by many nonetheless), the list goes on. I've defaulted to assuming that if a program intended to be used directly by end-users is mainstream and/or quite profitable, it's likely not for me. The few exceptions (e.g., massive web browsers like Firefox) exist only because there is no alternative I can get away with.


At the end of the day, the code is written for someone, to do something. Not to be. You gotta serve somebody as Bod Dylan said : https://www.youtube.com/watch?v=wC10VWDTzmU

[EDIT] To elaborate, i recently thought about coding, and i understood that at some point all the code "touches" reality in some way, and that place is most important. Either it moves something physical, or provides answer to person, or makes him happy. But when coding is a good thing to ask your self, how my code "touches" reality.


Lennon's response to Dylan is just as apropos here as it was in the original context: "Serve yourself".

The distinction here is probably between being an "engineer" vs. being a "programmer". The former's concerns include all this stuff that's for the business. The latter is just interested in the craft of writing computer programs. And that's all the article was about: skill as a programmer. Which might help you be more effective as an engineer, depending on the projects you face. But the point is self-edification; not everything has to be in service of money.


> But when coding is a good thing to ask your self, how my code "touches" reality.

What an odd thing to say on a site named after the Y-combinator ... ;-)


I don't disagree with you, but I think you're missing the author's point. The author suggested these projects as a quick way to learn a new language or get familiar with a set of tools. Arguably, working on a more time-consuming product for customers is not the most efficient way to do this.

The author's suggested projects hone in on programming mastery, whereas your suggested project hones in on a plethora of skills. We can't really say that one substitutes the other as they achieve two different things. I wouldn't necessarily start a startup just to learn WebAssembly, but I would learn WebAssembly if I needed to for a startup. In the former case, my goal is to learn WebAssembly, whereas in the latter, it's to start a startup.


Not sure these are peojects to tackle with a new language. Simple repeatable projects like hn clones are better suited for those use-cases. More along projects to do in your favourite language bonus points for using language that isn't typical for the domain.

Like: A compiler in php or a spreadsheet in cobol or javascript console emulators.


Counter-point: if a programmer is successful in delivering business value, the programmer might have an inflated view of their programming abilities. I know I certainly have been affected by this. Being able to get things done does not necessarily make a person a great and talented programmer, but it may make them accomplished.

My projects have grossed a lot of revenue, but several of these projects would be challenging for me and push my the limits of my skills.

What I am saying is that there is room for more nuanced language to describe all these matters in a way that is detached and clinical.


Counter-counter-point: Delivering business value is a more desired and compensated skill than raw programming ability.

Moreover, learning how to deliver value builds empathy for non-programmers who also deliver value and also the realization that programming ability is not where a company lives and dies. These are the number one and number two things most often lacking in programmers.

I have to fight with "seasoned pros" on the regular to get them to stop doing things like sending passwords in email or worse, put them in text files in git. Because A) what the hell are you thinking and B) holy shit, we're a public company WHAT THE HELL ARE YOU THINKING!? You have to explain to them, repeatedly, why this is bad and also things like why a production-ready database isn't the same as the single-instance point and click AMI they spun up...

All of this because the only virtue that they know is "I shouldn't be blocked by anything." Unfortunately some of them are such skilled programmers that they'll drag entire IT and GRC organizations screaming in their wake trying to make sense of the mess.


> things like sending passwords in email or worse, put them in text files in git.

This doesn't sound like the activity of someone with a high level of 'raw programming ability'.


Being a great programmer doesn't stop you from doing really stupid shit.


>Delivering business value is a more desired and compensated skill than raw programming ability

Not so much. First, great programmers are very well compensated, and second, most tech companies are organized to keep great programmers from even considering business issues: That's the land of managers, product owners, and scrum leads. Programmers are supposed to implement the requirements they're given quickly. Not how it should be maybe, but how it generally is.


That's the way it is at Google and the FAANGs in general and companies that try to ape Google's practice but that's far from most companies. This is very much a selection bias. "Most companies" don't have the budget for the roles you've listed or aren't middle-manager heavy organizationally like Google is. Most of my career I've been my own project manager.

At least in my 20 years of experience, the most effective companies have developers (sometimes in those roles you listed) involved in the requirements gathering or at least work planning phases. Does anyone really enjoy just being a cog in the "feature factory" you described? I doubt it.

Given the number of broken Google SDKs or Cloud features I have to deal with in my day to day though (and game of whackamole that we have to play with them), this seems accurate.


Isn’t that part of the problem with Google? If they focused more on the actual user they wouldn’t have five failed messaging initiatives including three that they were working on simultaneously.

Developers not caring about the customer explains most of Google’s major failures outside of advertising.


> Delivering business value is a more desired and compensated skill

Desired for who? Your employer? Programmers are human beings that can do things for their own personal enjoyment, not just to increase shareholder value.


What’s the use of “programming skills” in the abstract that doesn’t serve anyone’s needs?

“programming” is just a tool to me - not an end goal. I’m just as proud of the code that I was smart enough not to write and use an existing product/service/module for as I am for the code that I did write.


This is me. Made some money myself, made a lot for bigco's, and constantly feeling that literally anyone else is better at writing software, actual programs than I am.

Also see: imposter syndrome :-)


Author here.

I agree with you! This list was about projects for learning. A few months ago I had a blog post that was more product focused, regarding the lessons I learned from releasing games (I think they're generalizable though). Definitely a different world when you have customers to please and motivate you!

http://web.eecs.utk.edu/~azh/blog/8lessons8games.html


Or better, just forget about business customers, concentrate on problem solving skills, and build tools just for programmers.

Basically I found out the few customers you have and the further you are from business, the happier you are, i.e. considering you are still programming of course.

*Edit: For programmers focusing on businesses, the only happy solution, imo, is to be a consultant. But this basically asks for taking business skills. Of course this is a biased view as I work as a BA, who deals with business every minute.


That’s what we do at work all day. Business problems tend to be technically shallow, except when they aren’t, and to handle those you need to keep your skills up.


Some problems are "technically straightforward", where it is "easy" to build but "hard" to find and work with customers. To-do lists are a classic example.

Others are technically challenging. That difficulty is precisely why others don't tackle the problems, but generally customers are looking for a solution and you have a ton of flexibility when working with customers.

To put it in different terms, in some cases "if you build it, they will come" is true. The hard part is determining when it applies.


So I'm someone who codes for their own use, what I write isn't meant for customers, it's just tools that make the machine do what I need. Programming as a way of using the computer.

However I also used to do some freelance web development back in the day, and sure enough I did learn things. Mostly things about customers, not about programming. They don't know what they want, they don't know what is good and they need help with that. This is called requirements engineering, and IMVHO starting a project that has customers is the literal worst moment to start learning that. Fortunately I had a class about it in uni and I happen to be great at explaining technical things to non-technical people. I also learned that I'm actually really good at what I did (looks great + works well + happy customers), they gave loads of recommendations and I got requests for at least a year after. Too bad about the burnout that happened soon after, for no particular reason, that turned my life upside down.

Either way, yes it was educational but in no way did it make me a better programmer. Maybe a better business person? (And I'm not a very good business person)

All in all, I think I would recommend "a project that has customers" only to people who are young. It has less stakes when you're young and it's chock full of wonderful learning opportunities for young people. If you're older, I think that most of the lessons you would also have gotten through general life experience. Which again demonstrates how little this exercise has to do with programming. But yeah, it can definitely be a valuable experience.


I don't have experience with a side project that has customers, but several times I did do a side project for a paying client.

It was one of the most stressful periods of my life. I was picking them up with another programmer, and I completely failed to understand the amount and the quality they would deliver. I solved it by working day and night for a year or two, or so.

I'd say, do try a project that has customers. Think twice about a project for a client.


Wait, what's the difference between a customer and a client? I'm not a native speaker and always thought the words meant the same thing (they have the same root in my language).

I thought "client" was just the fancier term, therefore you have customers at a book shop and clients at a law firm. Does a hairdresser or tattoo artist have customers or clients?


I'm not a native speaker as well, maybe that causes the problem :D

I meant the following: customers -> group of people who buy or subscribe to your application, preferably via the App Store, Google Play or some other in-between party. Usually no contract involved.

Paying client -> a single person or business that pays you to develop software, probably with a contract that is signed and specifies what you're going to build, on which timeline, etc.


I agree with that and would add that dev tools are a manageable way for programmers early in their career to get feedback and learn what matters.

It’s pretty hard for one person to solve enough of a problem to have users in the general public. It obviously happens, but it’s not a sure thing.

It’s hard to get users that are not in the public too. I had internships early in my career working on IT stuff, at a large business and a university. I found those jobs were actually terrible for getting feedback from users because there was so much bureaucracy. The management would actually insulate you too much from complaints or feedback.

Dev tools are a good learning experience because programmers will give honest feedback if they don’t like something :)


I think the takeaway is that you shouldn't get feedback from users filtered through management. It's exactly the same problem that your "write dev tools for programmers" solution solves. You like it better because you expect you'll get the feedback from the programmers directly instead of through management. If that feedback would have to filter through someone's supervisor before it landed in your inbox, it would be about just as useless.


How many programmers don’t work for companies with customers?


Definition of programmer is different from developer.


The biggest challenge is figuring out how to store the text document in memory. My first thought was to use an array, but that has horrible performance if the user inserts text anywhere other than the end of the document.

The counter-argument to that is that processors are ridiculously fast in human timescales --- copying memory at gigabytes per second --- so unless you're focusing your use-case on editing correspondingly huge files, there's no real need to make your implementation more complicated than a single array. Even when DOS machines with <640K of memory and memcpy() speeds in the low MB/s were the norm, people edited text files of similar sizes, with editors that used a single array buffer, and for that purpose they weren't noticeably slower than ones today.

My ideas for challenging projects are a little less open-ended, so they exercise different set of skills: being able to implement a specification correctly and efficiently.

    - TTF renderer
    - GIF or JPEG en/decoder
    - Video decoder (start with H.261)
IMHO being able to consume existing content with code you wrote is very rewarding.


Check out: https://en.wikipedia.org/wiki/Rope_(data_structure)

>> A rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.


I just read from somewhere that VSCode uses a "piece tree" for code strings. Damn couldn't find the link...



Thanks yeah that's the name


> there's no real need to make your implementation more complicated than a single array

Yeah, good luck enabling line numbers in such an editor.

In Emacs, which uses a gap-buffer for storing text, line numbers have had notoriously slow. It's gotten a bit better lately, but suffice to say, a naïve flat array / gap-buffer approach is not good enough for some relatively common scenarios even on modern hardware.


I suspect that slowness is due to something else; remember that computers these days can execute a few billion instructions per second.

I've written code to do word wrapping, and it was surprising how fast it was. Line numbers are similarly complex.


We expect modern computers to do something more than just run a full-screen text editor.


I don't think there should be a problem with line numbers. I would make two helper arrays containing the indices of the new-line characters, corresponding to the two gap buffer text arrays (new-line positions are sorted ascending for the first array, descending for the second array).

Speaking as someone who's gone all the way from implementing a Red-black tree to making a rope data structure using the RB tree, to making a text editor that can edit almost arbitrarily large text files (dozens of gigabytes) without user-perceivable latency ;-)


But line numbers are trivial to update when a gap buffer needs a move.

A list of strings is more elegant, of course, where only the line being edited becomes a gap buffer. It taxes the allocator a bit more, though, which might be a concern on computers of the time when Emacs was born.


That's a problem to deal with when/if you need to add line numbers. Not a minute before that!


While that is true to an extent, I've made a lot of money cleaning up after people that didn't architect and design their code to cleanly grow into a fairly obvious potential use case, requiring major rewrites. It isn't a premature optimization to avoid walling yourself into a corner..


This is a complex and nuanced topic.

I agree strongly with designing your code so it's easily changeable into whatever new features are needed. This is much easier said than done, and I don't know if anyone has written well about the tricks of that trade.

But anyway, if you have that kind of code, swapping out whatever you need to make line numbers happen is no more work later than sooner.

Code bases with features implemented that are never used, but you still have to keep working through all changes, because someone imagines it will be a real requirement someday, are what my nightmares are made of.


It's all about the interfaces. More performant solutions require (in general) more complex interfaces.

If your application has grown as long as it could with the simple implementation, and now it is all too slow, chances are there's a lot of code depending on the interface. If your interface (and the implementation) is too simplistic, then all of that code will need rearchitecting, too.


IIRC many DOS-era editors used an array/list of line buffers. Which to me seems like good middle ground solution. Certainly for the todays typical usecase when you care about performance, ie. editing >100MB text data file by hand, which is giant pain in emacs because the gap buffer simply is not good structure for doing few simple edits across three places 10MBs away (as you spend most of the time moving the gap around, while touching essentially all of the memory)


Do you know of any reason why the gap buffer couldn't be easily replaced by a proper rope?


as far as how the emacs API works it should not be an issue, but on the other hand there is bunch of elisp code, that realy expects an gap buffer based implementation and in turn is depended by who knows what...


I don't edit really huge files that often (maybe a couple of times a week), but when I do I want to be able to use the same editor I use for everything else. A really great text editor is fast and flexible and powerful regardless of the size of the file you're trying to edit.


Thats one issue I have with these proposed programming tasks.

You are not going to write a really great text editor as a learning exercise. It has been done by better programmers who had better overview of the problems and over thousands man hours.

This automatically means the task is as useless as a gameboy emulator or basic compiler. The underlying "Things to learn" points are good, but tasks themselves are not.


Writing experimental text editors for fun in various programming languages has been one of the most rewarding learning exercises of my life.

It's not really clear what your point is. You say the task is "useless"—what does that mean? Personally I can say that you are categorically wrong, because the skills I gained building things that are not completely new ideas fueled my passion for programming and opened up doors for me that otherwise would have remained closed. Even if I didn't still use a lot of these projects myself (because I built them to fit me), the value I derived from them would still be significant in the "grand" scheme of my life.

If a programmer is excited about the idea of writing her own text editor, what would you suggest she build instead that will sustain that same excitement and offer exploration into the same diverse subject matter but also satisfy your nebulous criterion of not being "useless"?


> there's no real need to make your implementation more complicated than a single array.

I think you are misunderstood about the concept an array. An array has 1) an interface that is easy to use. On the other hand, by definition, an array is 2) contiguous in memory. Property 1 is good but 2 can cause problems. I think you want only 1.

The solution is to create a data type that has the interface of an array but a different implementation under the hood. You can have a linked-list of arrays, a tree of strings, etc.


I think the original commenter knows full well what an array is.

Vague justifications like "can cause problems" is probably exactly what he's referring to, in fact - people who know that inserting elements into an array is "slow" and end up making large and complex code as a result. Yes, it's O(N) on the length of your code, but the point is that for a couple of megs of text, O(N) is perfectly acceptable.

At least on a desktop, that'll fit in L3 cache which these days is around 175GB/sec. Or to put it another way, inserting that single char can probably be done at around 40,000 times per second. Which is faster than I can type, at any rate.


You'd be correct if people used editors for opening only source code files. The problem is that people usually open data files too which can be not only larger than L3 cache, but larger the entire system memory. The magic of a good editor like Vim is the capability to handle such files.

The other problem with your comment is the support for Undo operation. Even if you use a flat array, you need a more sophisticated data structure for storing previous changes. Storing a separate array for every single change is not an option.


Whether an array is contiguous in memory depends on the language (and the specific implementation of that language). JavaScript uses hash tables for its arrays which are really objects.


Good point! Dynamic languages are different in their terminology. AFAIK strongly typed languages have a clear definition of arrays. The OP was talking about arrays having "horrible performance if the user inserts text anywhere other than the end of the document". I think this statement has an implicit assumption that arrays are contiguous which is not true in Javascript.


> Dynamic languages are different in their terminology

By different you mean wrong. PHP calling an ordered hash map an array doesn't make it one.


I wonder what a text editor made by HN would be like, everyone is already thinking up strategies :)


But did you try editing a multi megabyte file with that method? Cause I have seen enough editors struggle with big files (especially if the file is a single line and you're going through it with word-wrap on), that I think the basic straight-forward approach already isn't sufficient on such workloads.

from the article:

> Luckily, there are some nice data structures to learn to solve this.

You could have also learned a new data structure!

I mean, it should be obvious that "this thing that the cursor does when moving lines" isn't the big takeaway from this challenge. It's almost cute that the author never noticed it (as a programmer), because I actually use that behaviour to navigate code sometimes. Who hasn't done a quick arrow-left/right to make the cursor lose its memory of which column it used to be on?

> Even when DOS machines with <640K of memory and memcpy() speeds in the low MB/s were the norm, people edited text files of similar sizes, with editors that used a single array buffer, and for that purpose they weren't noticeably slower than ones today.

No way. Every reasonably performant text editor in those days used special data structures and not just an array. Imagine having to copy the entire buffer on each key press (so, when inserting at the start of the file). Believe me, on a 640K DOS machine you'll feel that.

This isn't new stuff, I learned about these data structures in uni -- except I don't remember them because back then I was young and arrogant and didn't think you'd need these fancy data structures for something as simple as an editor :) :)

... but if you never tried to write one, it's hard to see in what ways these editors are not as easy as you think.


I dont see any problem with an array. Make it huge so you only have to reallocate every megabyte or so. Keep track of the document length and only move as much as needed. Your processor can do this every character faster than you can type. No need for fancy data structures, and trivial to load and save files. The interesting part then becomes formatting for the display.


Imagine inserting text in the middle of 1GB file. Moving 500MB of data will definitely take longer than 18ms, and thus will cause at least some visible lag.


This is an editor for textual documents. Where did you get a 1GB file?


For example a server log file?


Not a document and why are you editing it?


This. The former should be using awk/perl on that and just operating on the chunks he found, never at the whole file at once.

But, OFC, these new "programmers" can't even figure basic Unix tools. Or performance.


Try a 100MB file.


Takes less than 100 ms to read in the file (calling realloc in a loop), insert a byte in the middle (realloc + memmove), and write the modified file out on stdout. The byte insertion amounts to about 4ms.

That's hardly fast, yet still a lot snappier than most modern editors' UI or the web, where apparently achieving 60 fps for a few hundred dynamic DOM nodes is some kind of an achievement.

https://gist.github.com/hmkemppainen/376b973c568fc122e2d8c84...

This approach really starts to suck when you implement macros that are going to perform a lot of one-char inserts quickly. Or when you're editing multi-gigabyte files.


I must admit I was surprised, although I shouldn't be. Are we at > 10GB/s memory bandwidth now?

> This approach really starts to suck when you implement macros that are going to perform a lot of one-char inserts quickly. Or when you're editing multi-gigabyte files.

I'm working on an editor that I've optimized for such cases. In a test it made random edits to a 4GB file in < 50 microseconds. But, it cost a load of sweat and blood to get that rope data structure right. And it loads files only at about 100MB/s (should optimize for bulk inserts). https://github.com/jstimpfle/astedit


Are we at > 10GB/s memory bandwidth now?

It's been around a decade since that line was crossed. The peak bandwidth of DDR3-1333 is just a bit over 10GB/s.


Interesting project. You don't say so specifically, but it looks like it should compile on both Windows and Linux?


Yes, I make it to compile on both platforms from time to time. The current commit should compile using MSVC, gcc, and clang I believe. I'm happy to fix any issues if you find them :-)


>> This approach really starts to suck when you implement macros that are going to perform a lot of one-char inserts quickly.

What operation is that? Search and replace might have that effect but could be done by copying the entire buffer with replacement happening along the way.


The counter-argument to that is that processors are ridiculously fast in human timescales

Until you actually have to implement your algorithm on a mobile device that is both memory, and power constrained and that doesn’t have a swap file. The OS will either lol your program for being too memory or power inefficient, kill another program running in the background (not a great user experience) and/or force the use of the high power cores using unnecessary battery life when a more efficient algorithm could have used the lower power cores.

Attitudes like this also explains why developers don’t think twice about delivering battery consuming Electron apps.


Umm the dos 640k were paged, non contiguous. Additionally, smoothly editing larger texts back then required some clever linked lists of blocks to give a truly instantaneous editing experience for inserting text at the beginning of a large text. Those were the 286/386 days.

Today you have fancy rendering, and an instantaneous editing experience for that reason again suggests a more sophisticated data structure for the editor.

Which all text editors have, when you look inside vi/emacs/nano/whatever...


> so unless you're focusing your use-case on editing correspondingly huge files, there's no real need to make your implementation more complicated than a single array.

I think the other major corner case is when you need concurrent, distributed editing (although that's not popular or anything these days), in which case an array is a very poor datastructure.


TIF and GIF are trivial compared to JPEG and H.261, saying that having implemented LZW and GIF (in assembly for 8086) as teenager


For TIFF (and most formats) that's heavily dependent on if you're talking about implementing a reader or a writer for the format. TIFF readers need to handle JPEG streams, so in that sense implementing a general purpose TIFF reader is a superset of implementing a general purpose JPEG reader.

On the other hand, TIFF writers can (very conveniently!) be almost as simple as you want, including no compression at all, just blobs of raw pixel values, and a smattering of tags for width, height, pixel format, and that's it. The only thing simpler to output IMO would be uncompressed ASCII formats like XPM.

So in that sense you're correct- the simplest possible JPEG writer is much more complicated than the simplest possible TIFF writer, but TIFF in general is extensible to a fault (arguably), in the sense that the number of possible combinations of pixel and metadata encodings you have to prepare yourself for when opening arbitrary .tif files are far greater than when opening arbitrary .jpg files, including JPEGs within TIFFs.


Back in the days TIF was just a large uncompressed file.

The initial format is older than GIF87a (no animation which people associate GIF nowadays with). It had header but that pretty much it. Of course the format developed with time and even added LZW once the patent expired. Currently TIF is all kind of things, so writing a fully feature reader is a proper challenge (perhaps not coding-wise, but understand it and implementing the myriads of types/extensions, etc.)


Spec implementation is indeed very valuable. But in your cases these are data oriented problems. I'd add some systemic cases: networking, security..

my 2 cents


Last I looked into writing a JPEG en/decoder, I ran into the issue that I was unable to find a specification not behind a $800 paywall.


Here you go:

https://www.w3.org/Graphics/JPEG/itu-t81.pdf

https://www.w3.org/Graphics/JPEG/jfif3.pdf

I'll also link Cristi Cuturicu's "A note about the JPEG decoding algorithm", which is where I started my decoder implementation from, and it was indeed a ton of fun.

http://www.opennet.ru/docs/formats/jpeg.txt


The JPEG reference source code is pretty readable.


"no real need to make your implementation more complicated than a single array"

That's our industry in a nutshell. Our computers, instead of becoming more capable over time, can barely keep pace with the increasing naivety of our programmers.


Yes, this is engineering in a nutshell: determining a course of action within a set of constraints that meets your objectives. Where constraints can be time, cost, physical limitations (processor speed, memory size, disk space), etc; and objectives can be functional (user can edit files), nonfunctional (user can edit large files in < X seconds, energy usage), personal learning, or any number of other requirements.

The GP offered a valid decision point to consider based upon what an engineer is solving for. I don’t think he said that an array was the solution he’d ship in a production text editor to millions of end-users.

Engineering is hardly naive. :)


Actually, that's exactly what I was saying --- plenty of existing text editors use the "stupid" single array, yet no one complains about their performance.

One example? Notepad.


Notepad is notepad because someone, god bless their soul, had the sense to put new features into a different app as Wordpad.

In some terrible, dark dimension, Notepad has a ribbon interface and supports PDFs.


Just because people tend not to edit large files in Notepad doesn't mean they'll complain about it when they do. Actual complaints are of course sparse because hardly anyone uses Notepad for anything serious if they can use an alternative. BUT when they do, oh they will complain.

I believe an older version of Notepad even had a (fairly low) limit on file size it would open.

I mean that's the reverse argument, computers have gigabytes of memory today, and are super fast, so you should be able to load a multi gigabyte text file and edit it, on a single line, with word wrapping.


In webdev, increasingly often, the expectation is that programmers not only to do the backend, but also the database management, the frontend (which used to be graphic design, css/html, and js, separately) and everything devops.

Outside of webdev, Unity springs to mind, as another great example of this: The stuff you can do as a single game developer is mind boggling, or at least used to be, until indie devs everywhere started boggling our minds on a daily basis and thus raising the standard of what consumers expect an indie game to be.

This is, of course, not possible because within 50 years humans evolved to be a lot better or smarter or faster than their predecessors. It is made possible through more flexible higher level tooling, that you don't have to understand the inner workings of to take advantage of, and more abundant computing resources, that in tandem, enable work that will be in the "good enough" territory for most use cases.

This is also not a choice that programmers as individuals or even a group make. It's a choice that the market makes.

There is nothing naive about it. Naive is assuming, it would be any other way.


In web dev I have observed the opposite trend: when I first started my career everyone was expected to be full stack and know how to deploy a thing. nowadays devs tend to be strictly front end or back end or dev ops, etc. Devs that can optimize a sql query, model a db schema and then write a well organized react or angular front end seem to be the exception not the rule.


> In webdev, increasingly often, the expectation is that programmers not only to do the backend, but also the database management, the frontend (which used to be graphic design, css/html, and js, separately) and everything devops.

What do you mean, increasingly often? This was the case 15 years ago already and I see only examples that it has gotten less, because of all the frameworks that exist.

Also it's exactly what I liked about webdev. When your existing talents for graphics design and explainer-of-technical-things shine in a tech context, that feels good. A lot of programmers have no feel for this, and a lot of designers write awful code. Which could have, but historically did NOT improve at all with higher level tooling, mainly because of this "good enough" attitude. Feel free to prove me otherwise, but what did happen: Thanks to things like Bootstrap, now programmers can avoid the worst design mistakes without having to learn design. Graphics Designers, however, well .. I don't know? Are there tools that allow them to write or generate code that doesn't suck? (Without programming skills, like the coders without design skills).

> This is also not a choice that programmers as individuals or even a group make. It's a choice that the market makes.

> There is nothing naive about it. Naive is assuming, it would be any other way.

I don't know ... Do you believe there no longer exist people that deliver quality over this entire skill set? Or that they somehow exist outside of the market?


People jump for complex and overly optimized solutions too quickly, IMHO. From a conceptual perspective, I enjoy these sort of challenges but that's where it ends.

For product demands where deadlines are constantly unrealistic, underfunded, underscoped and demands are ever changing, I'm a fan of providing the simplest conceptual solution to the task at hand and not focusing on developing complex abstractions and optimizations too early.

From my experience, that time is typically wasted until functionality is zeroed in and real money is available to pay for the work, as the early complex abstractions typically fail to meet pace with demands and the optimizations break when ever changing requirements.. change. That's just my experience, YMMV.


Are you suggesting that it’s bad to use the simple uncomplicated approach because it’s inefficient, or that it’s bad to add layers upon layers of complexity which end up bringing modern computers to their knees?

Personally I’m in the latter camp. There’s so many layers of abstraction nowadays which each in theory make programming better/safer/easier which in practice end up creating an incredibly inefficient mess.


Complexity != abstraction != leverage.

Today's software suffers from too many layers of complexity that are each pretty dumb and serve mostly bookkeeping. The result looks like an overinflated bureaucracy. In the example above, using a more efficient data structure for text representation will add at most one layer of abstraction (but there's a good chance you'd create that layer to hide the array anyway), but offer significant benefits in terms of performance, at a cost of little and well-isolated complexity.

This is the best kind of abstraction: complex, deep behavior hidden behind simple interface.


Same. I generally write in C without too many layers between my code and the CPU, and it is just incredible how fast modern CPUs are with naive code that doesn't even attempt to be optimal.

I wish others understood that, because the things I work on are losing performance (and a massive amount of developer time, which could be used for optimization or other useful work) to excess complexity, not too simplistic code.


Vim is 25 years old. Efficient text-handling data structures aren't newfangled gobbledygook.


And vi is even older. Plus ed...look at the release date.


Yet if you read the rest of the comment you would realize this specific use-case (editing text) was done fine with a single array buffer when computers had less than 1mb of memory to work with.

This is a perfect example of when it’s stupid to keep optimizing.


How does 1mb computer keep a 2mb textfile in an array in RAM?


Do you have text files you need to edit that are that big?

I've opened files that big in a text editor before, but it was definitely the wrong tool for the job.


pointers?


I wish that were our industry. Instead, we make things super complicated and make them slower at the same time.

Let's take the text editor example. Let's say we use it to edit a large document. Is Moby Dick large enough? It's around a megabyte of (pure) text. Let's figure out a persistence solution. How about "we save the entire text to disk"? So a megabyte to disk. My laptop's SSD does (large) writes at 2GB/s. So the ultra simple solution could save the entire text around 2000 times per second.

That's a lot faster than I can type.


Your laptop's SSD sure, 2GB/s - that 5400 rpm laptop hard disk that your user has is writing at a measly 1mb/s because the disk is also being accessed by 5 other programs.

Now the user is either queuing up a bunch of background saves leading to overload or is forced to wait 1s per keystroke.

Well done!

I guess the simple solution then is to tell the user to buy a $3000 laptop just so it's capable of running notepad.


Hmm...Mac laptops have been SSD-only for how many years?

Anyway, even laptop drives are well over 40-50 MB/s these days, and any disk scheduler worth its salt will schedule this kind of write (one contiguous chunk) near optimally, so still 40-50 writes/s.

And of course, you queue these writes asynchronously, dropping as needed, so if you actually manage to out-type your disk, all that happens is that your save-rate drops to every couple of characters. Big whoop.

Also remember that this is Moby Dick we're talking about. 700+ pages, so something that vastly exceeds the size of the kinds of documents people are likely to attempt with a Notepad class application.

Last not least, this is a thought experiment to demonstrate just how incredibly fast today's machines are, and that if something is slow, it is almost certainly because someone did something stupid, often in the name of "optimization" that turned into pessimization, because doing Doing the Simplest Thing that Could Possible Work™, i.e. brute-forcing would have been not only simpler but significantly faster.


Raise your hand if just running your web browser has pegged your top-end, multiprocessor, high-mem system in the last month. Both Firefox and Chrome have for me.


I think that's largely due to JavaScript and its ecosystem of abstraction-bloat that has been mentioned in another comment here, along with the trend of "appifying" what should really be static sites. A static page that contains even dozens of MB of content won't stress a browser as much as a "web app" containing only a few hundred KB of countless JavaScript frameworks glued together --- despite the latter presenting a fraction of the actual content.


I use to read the blog on virtualdub.org (video capture and processing) and enjoy his rants on bundled library bloat. Virtualdub was small in footprint and great to use. So do programmers become reliant on scaffolding too much, or is it a necessity as you learn?


Why not both? I mean, I wouldn't say that it's actually necessary, but scaffolding exists to hide away the incidental complexities of the problem being solved, revealing the problem for what it is. Demonstrations of recursion and pattern matching tend to use the same problems because they're such a good fit that there's a very close correspondence between the high-level explanation of how to solve the problem and the code itself.

At the same time we ought to be aware of that scaffolding and how it works (or could work), and how to build such abstractions ourselves. Not just because all abstractions leak[1][2] and potentially introduce bloat, but also because it means I don't have to pull in another dependency to save me a page (or three lines) of trivial code. Or maybe because the "standard" solution doesn't quite support your use case (I can't count the number of times that I've rewritten python's lru_cache[3] because of it not accepting lists and dicts).

[1] https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

[2] https://www.joelonsoftware.com/2001/12/11/back-to-basics/

[3] https://docs.python.org/3/library/functools.html#functools.l...


Not sure about Gb of throughout makes editor memory representations unimportant. Just this week my friend said he kill emacs with a few-MB text file. I was astonished that a software of that esteem would struggle with that.


In my first job I would routinely open 10-20 MB files in Emacs. It handled it just fine. I mean, it gives a warning that this is considered big, but I ignored it.

Now if you open a large text file in something other than text mode, it could bring it to its knees depending on the mode. As an example, opening an XML file in the nXML mode is quite expensive, because nXML mode is powerful and utilizes your XML structure. I just tried a 12 MB XML file and told it to go to the end of the file. It's taking Emacs forever to do it (easily over 30s). But if I switch to text mode for that same file, it handles it just fine.

I just tried an 800 MB text file. It handled it fine.

The one thing where you can easily get in trouble: Long lines. Emacs cannot handle long lines well. Kinda sad.


Are you sure text mode is fine? I usually have to use fundamental mode to edit big files (more than a dozen MBs or so).


Yup. Text mode is fine. If that's causing problems, perhaps you have things enabled in your config that causes problems?

As an example, I have anzu minor mode selected. So if I try to search in the 800MB file, it hangs until I cancel.


It's unlikely plain emacs struggled with that file.


Try `emacs -nw -q` for a stock experience. That should have no problem with any reasonable text file.


> there's no real need to make your implementation more complicated than a single array

Ugh. That's just offensive.


Not really, it's a commonly used scheme. Read on gap buffers: https://en.wikipedia.org/wiki/Gap_buffer

It's really an array with a gap at the cursor location. Used by emacs and others for decades.


In the article, I listed a rope, gap buffer, and piece table as potential solutions instead of "just an array".


VSCode uses a JS piece tree, which allows it to be faster* at manipulating large files than Sublime’s native code implementation.

https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...

*or at least competitive with. I’ve measured it to be faster but I’ve heard others have had different experiences


From a quick read it's actually pretty close to exactly what Sublime Text does. Source: Worked on that code at sublime recently.


But that's massively better than just an array. It changes the time complexity from O(char_inserts x file_len) to O(cursur_move x file_len), which is likely a couple orders of magnitude better.


That's not "just an array".


How big can file can a gap buffer be used on until it starts to slow down with current hardware?


That's not a single array though (per the OP's point)


Kind of related, there's this list of "Build Your Own $FOO" (that I'm pretty sure I learned about here on HN originally :) https://github.com/danistefanovic/build-your-own-x

- Build Your Own Text Editor

- Build Your Own Shell

- Build Your Own Git (!)

etc.

Comes complete with the Feynman quote "What I cannot create, I do not understand".


Wasn't the quote, what I cannot explain I do not understand?


Hah, yes, good point, I think you're correct about that.

The repo has an adaptation of the quote, then. :)


Honestly, I hate the way these articles are framed: "X things every programmer should [read|try|learn]". I've certainly seen a list of 1000 books everyone should read. I hate the way these are framed because you'd spend your entire life doing all of them so it's clearly not practical.

Can we step away from the hyperbole here and just start saying (in this case) "Interesting project ideas" or somesuch?

This of course leads to people piping up their own "must dos" like "write a compiler" (huge undertaking).

Interestingly I see a comment here like "do something with actual customers" and the replies are interesting, essentially dismissing this as a business rather than technical problem.

I find this interesting because software exists to solve problems for people so this is probably the most useful advice I've seen. The ability to identify a pain point and use software to solve it is arguably the most useful ability a software engineer can have.

You will of course learn things by writing a compiler or . text editor or a ray tracer and if scratches an itch for you, by all means go for it.


The author thinks these are valuable challenges every programmer should try. Not must, should. It's just an opinion. If you don't agree with it, then don't do the challenges. Why does it cause you to write a 200 word rant?


Why did cletus' post cause you to write a 40 word rant?

I thought it was a valid point. 1000 books I should read? If I devoted all my free time, for the rest of my life, I might make it. But that leaves me exactly zero time left for the next person who's got some fine-sounding "should" for me, or the next, or the one after that. It's all my time for the rest of my life - for one person's "should".

I don't think it changes the problem to say "should" instead of "must". It's just an opinion. It doesn't create any more of a "should" for me than it does of a "must".


I second this, when such advice comes from people who have some authority, it can be very confusing to a lot of young people entering the field. It could even be disheartening and really demotivating for some, who otherwise might have exceptional logicial/design/frontend/database/statistics/ML... skills.


These are great suggestions!

A few years ago I worked through building a spreadsheet in JavaScript. It was a great introduction to interpreters. I read through Writing an Interpreter in Go by Thorsten Ball [1]. Constraining the interpreter to execute formulas in cells was a straight-forward way to approach building one from scratch.

Writing a Pratt parser as part of this forced me to understand how it works. Figuring out how to process a sheet led me into algorithms and structures like directed acyclic graphs (as mentioned in the article). I found myself referencing Introduction to Algorithms and really studying it [2].

In the end I turned it into a talk at Big Sky Dev Con in Montana. The whole thing was a good experience - from researching how to do it, to sticking it out through the implementation, to distilling it to a 45 minute talk. Be sure to check out the recording [3] and code [4] if you're interested.

Any of these suggestions will lead you down a rabbit hole of learning with a clear objective in sight to keep you motivated to dig deeper.

[1] https://interpreterbook.com/

[2] https://www.amazon.com/Introduction-Algorithms-3rd-MIT-Press...

[3] https://www.youtube.com/watch?v=Sj4h0DcVLL0

[4] https://github.com/lancefisher/cellularjs


Thank you for sharing this, Lance! I'm a "recovering actuary/Excel power user/Excel developer (shudder)" currently in Lambda School in their web development track.

I'm really enjoying watching the recording of your talk, and in addition to learning one way to build a spreadsheet, I'm also learning lots of good software development practices orthogonal to the specific project, which is great.

Again, thank you for sharing!


Thanks! Glad you like it!


By the way, I'm curious, how did the company you work(ed?) for in Missoula, MT, get started? I'm always interested when I come across software companies in non-megalopolises.


The three founders live here, and wanted to build a software company. They worked hard to make something people wanted, and raised money from the three Fs (friends, family, and fools). They applied to YC on a lark, actually got in, moved to Mountain View for 3 months, learned a ton and made good connections.

After gaining traction in the market it was possible to raise from angels and then VCs.

I believe you can build and sell software from anywhere if you have the drive and find a way to solve problems that people are willing to pay money to solve.

Here's an article that talks more about it: https://missoulacurrent.com/business/2018/08/missoula-tech-s...


Write a database. Depending on what angle you take, it can look like a compiler (SQL parsing, predicate evaluation, optimizing for index usage), OS (file systems for in-place updates of complex data structures, and scheduling with lock dependencies), distributed systems (consistency and availability tradeoffs across machines, with possible partitions).

And then there's the whole mental model of relational algebra and stream processing of queries.

It'll give new appreciation for existing databases and what they can and can't do for you.


One of my favorite courses in college was basically this. I learned so much about filesystems, data structures and concurrency that has been useful in my career!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: