Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What overlooked class of tools should a self-taught programmer look into
927 points by nathanasmith on May 13, 2019 | hide | past | favorite | 401 comments
15 years ago I learned Python by studying some O'Reilly books and I have been a hobbyist programmer ever since.

The books went into detail and since reading them I've felt confident writing scripts I needed to scratch an itch. Over time, I grew comfortable believing I had a strong grasp of the practical details and anything I hadn't seen was likely either minor quibble, domain specific, or impractically theoretic.

This was until last year when I started working on a trading bot. I felt there should be two distinct parts to the bot, one script getting data then passing that data along to the other script for action. This seemed correct as later I might want multiple scripts serving both roles and passing data all around. Realizing the scripts would need to communicate over a network with minimal latency, I considered named pipes, Unix domain sockets, even writing files to /dev/shm but none of these solutions really fit.

Googling, I encountered something I hadn't heard of called a message queue. More specifically, the ZMQ messaging library. Seeing some examples I realized this was important. The step of then plowing through the docs was nothing short of revelatory. Every next chapter introduced another brilliant pattern. While grokking Pub/Sub, Req/Res, Push/Pull and the rest I couldn't help breaking away, staring in space, struck by how this new thing I had just read could have deftly solved some fiendish memorable problem I'd previously struggled against.

Later, I pondered the meaning of only now stumbling on something so powerful, so fundamental, so hidden in plain sight, as messaging middleware? What other great tools remain invisible to me for lack of even knowing what to look for?

My question: In the spirit of generally yet ridiculously useful things like messaging middleware, what non-obvious tools and classes of tools would you suggest a hobbyist investigate that they otherwise may never encounter?

Makefiles. I always dismissed them as a C compiler thing. Something that could never be useful for Python programming. But nowadays every project I create has a Makefile to bind together all task involved on that project. From bootstrapping the dev environment, running checks/test, starting a devserver, building releases and container images. Makefiles are just such a nice place to put scripts for these common tasks compared to normal shell scripts. The functional approach and dependency resolving of Make allows you to express them with minimal boilerplate and you get tab completion for free. Sometimes I try to take more native solutions (eg. Tox, docker) but I always end up wrapping those tools in a Makefile somewhere forthe road since there are always missing links and because Make is ubiquitous on nearly every Linux and macOS it is just all you need to get a project started.

Example: https://gitlab.com/internet-cleanup-foundation/web-security-...

Running the 'make' command in this repo will setup all requirements and then run code checks and tests. Running it again will skip the setup part unless one of the dependency files has changed (or the setup env is removed).

In 2019 Makefiles are a useful tool for automating project-level things. Too often webapps will require you to install X to install Y to run producing artifact Z. Since Make is old and baked and everywhere, specifying "make Z" is a useful project-level development process. It's not tied to a language (e.g. Tox) nor a huge runtime (Docker). Make is small enough in scope to be easy, and large enough to be capable without a lot of incantations.

The big downside of Make, alas, is Windows compatibility.

> The big downside of Make, alas, is Windows compatibility.

GNU Make works fine on Windows. The sources come with a vcproj to build it natively, or you get it from ezwinports. At my dayjob, we have a pretty complicated build with GNU Make for cross-compiling our application to Arm and PowerPC, and it works on Windows, even with special Guile scripts to reduce the number of shell calls which are extremely slow on Windows.

Most popular folder on Windows is "My documents", it has a space at least in some Windows versions. Make doesn't support such paths: http://savannah.gnu.org/bugs/?712

VBScript works better on Windows, IMO. Also works out of the box on all Windows versions since at least 2000 (on Win9x it was shipped with IE).

>Most popular folder on Windows is "My documents"

Not really... not since XP, anyway. Unless you have a space in your username (which is a terrible idea for many other reasons), your "Documents" path is C:\Users\JohnSmith\Documents. "Program Files" is pretty much the only important path which is likely to have spaces, and your makefiles (hopefully!) don't need to touch that.

On Windows, it's not up to me to decide where users will keep my stuff, and where it will work. Users decide.

For a software to work fine on Windows, it must support spaces in files and paths. Also Unicode in files and paths.

VBScript does, GNU Make doesn't.

> which is a terrible idea for many other reasons

If you use make to setup stuff, it's very possible you'll need to access "c:\Users\All Users" which does contain space in username. Also "c:\Program Files (x86)\Common Files" which contain more than one.

You can try the 8.3 convention,

DOCUME~1 Documents


<SYMLINKD> ALLUSE~1 All Users [C:\ProgramData]

> Make doesn't support such paths

That is entirely correct and really the most glaring downside of Make. In my opinion: If you have spaces in your dependency names, just stay away from Make as far as possible.

You wouldn't install Visual Studio 6 to XP easily; since it wasn't support spaces in "Program Files" directory :)

Long paths were introduced in Windows NT in 1993, VC6 released in 1998.

I’ve just installed Visual C++ 6.0 Professional on a WinXP VmWare machine. Took less than a minute, BTW — modern SSDs are awesome. The default installation path under program files also contain spaces, it’s "C:\Program Files\Microsoft Visual Studio\VC98"

BTW, they have a bug even on the very first welcome screen: http://const.me/tmp/vc6.png

You'll have WSL/WSL2 to work with, too. If not make, then CMake is now supported in Visual Studio (2017/2019) and works well.


Interesting, hadn't run into this before. What's the advantage over MSYS2?

Eli is one the few free software veterans who exclusively works on Windows. His ports are excellent and native Windows binaries wherever possible ("native" meaning: no MSYS at all). It's not that "native" is always better, but it is good to have the choice. Especially w.r.t. GNU Make, I found the MSYS version to be very hard to reason with, since the additional MSYS path conversion makes things even more complicated than it already is...

Does make files ever actually work that way? In my experience they always require you to install a bunch of packages for libraries which usually only tells you the package names for ubuntu so you have to hunt down what the package is called on your distro or if that version of the package is even in the repos.

If you write them yourself they certainly can. For small projects, you can leave everything explicit and it works great.

Can you break down your problem into a bunch of rules of the form “To produce this file, I need to run these shell commands, which read those other files over there as input”? If so, Make can take care of figuring out which steps actually need to be run.

Most plain makefiles I've used use a tool like pkg-config to resolve library/header paths.

Will not be a problem for much longer :) https://www.theverge.com/2019/5/6/18534687/microsoft-windows...

Hopefully Windows 11 is just a reskinned Ubuntu running all the old Windows programs through Wine.

I'll be surprised if there will ever be a Windows 11.

> The big downside of Make, alas, is Windows compatibility.

You'd have to give me a _very_ compelling reason to support developers who use Windows, when Windows lacks these essential tools. Besides, don't people who develop on Windows live in WSL?

Nope. I develop in Python, Java, and Kotlin on Windows and never touch WSL. Make is available natively through Chocolatey (a Windows package installer), but I prefer Gradle.

(I also write code to run on Linux, but still prefer Gradle.)

Slightly off topic but what would you suggest for someone who is familiar with build systems but who hasn’t used gradle?

I’m just getting into Kotlin and gradle isn’t something I’ve used before since I’m mostly web, .net til now.

Why don't you use WSL?

I can barely understand why you'd want to develop on Windows (ok, for non-Windows-only products) with it, but without it...

If you're already using a Vagrant or Docker-based development workflow, WSL doesn't really add much, and takes some things away. I/O performance, for example.

> If you're already using a Vagrant or Docker-based development workflow, WSL doesn't really add much, and takes some things away. I/O performance, for example.

I've been actively using WSL for over a year along with Docker and set up the Docker CLI in WSL to talk to the Docker for Windows daemon.

Performance in that scenario is no different than running the Docker CLI in PowerShell, or do you just mean I/O performance in general in WSL? In which case once you turn off Windows defender it's very usable. WSL v2 will also apparently make I/O performance 2-20x faster depending on what you're doing.

WSL adds a lot if you're using Docker IMO. Suddenly if you want, you can run tmux and terminal Vim along with ranger while your apps run nice and efficiently in Docker. Before you know it, you're spending almost all of your time on the command line but can still reach into the Windows cookie jar for gaming and other GUI apps that aren't available on Linux and can't be run in a Windows VM.

I find that it depends a lot on what you're doing. The real problem with WSL is I/O latency.

It's acceptable for relatively infrequent file access, but will eat you alive if you're doing anything that involves lots of random file access, or batch processing of large sets of small files, or stuff like that.

I just haven't seen that as a problem in my day to day as a developer working with Flask, Rails, Phoenix and Webpack.

That's dealing with 10k+ line projects spread across dozens of files quite often, and even transforming ~100 small JS / SCSS files through Webpack. It's all really fast even on 5 year old hardware (my source code isn't even on an SSD either).

Fast as in, Webpack CSS recompiles often take 250ms to about 1.5 second depending on how big the project is and all of the web framework code is close to instant to reload on change. Hundreds of Phoenix controller tests run in 3 seconds, etc..

It isn't perfect. The IO performance is currently poor and it doesn't play well with Windows Defender (wastes a lot of CPU). Also, since your IDE would live in Windows, you can sometimes have issues with Windows and Linux both interacting with the same files.

More developers are coding in Windows than any other operating system -- almost more than Mac and Linux combined. The Hacker News filter bubble might lead us to believe otherwise.


Windows 47.5%

MacOS 26.8%

Linux-based 25.6%

BSD 0.1%

87,851 responses

(The Stack Overflow survey is a poor representation of the entire development community, but it's worth something, maybe the best we have.)

I've compiled a thing or two with MSYS2.

Developers in corporate environments?

VS2019 supports Clang and cmake now.

>The big downside of Make, alas, is Windows compatibility.

Isn't the big problem that you have no idea what it's doing to your system? Also that you aren't expected to be able to undo it. You can read the makefiles, of course, but it seems simpler not to have to. (Just update the necessary packages yourself, to the latest version.)

Forgive me if this is naive of me.

>Isn't the big problem that you have no idea what it's doing to your system?

As opposed to what exactly? Any other alternative, e.g. separate shell scripts, "npm run" scripts in package.json, running a Docker image, hell even cmake or other make-like tools - does stuff you don't know about without reading the files either.

With Docker at least everything is contained in the container. Which makes isolating and resetting environments a breeze. Something I worry about often is contaminating my system's 'state'. Which always leads to broken builds or incomplete build systems because a missing dependency is not spotted on your system because it was installed by some other tool some other time.

I tend to write my Makefiles to create as much of a local dev environment as possible for every project. Using Python virtualenv/Pipenv/Poetry, Ruby vendored dirs, custom Gopath per project (using direnv), etc. Most tools support some sort of isolation/localisation, but it's often just not on by default.

I wish more tools did this, I almost always want a local, self-contained environment for everything. The few times I don't actively want I don't see much pain in having one. A couple minutes setup time, maybe?

I have seriously considered hiring someone to audit and prune all the random little libraries and tools I've installed over the years for that one-off time I had to process a weird file format or wanted to try something from HN.

To keep my system clean, I use Darch.


Every boot is a fresh install. Any one-off only becomes persisted unless I add it to my recipes.


Maybe you'd like NixOS?

I'm fascinated by NixOS and am following it, just haven't had a lot of time to dive in yet.

It does sound like the right idea. This is hypocritical as someone who doesn't use it but I hope more people use it.

(I was heavily downvoted). What I was thinking is: as opposed to just running your built-in package manager yourself, to upgrade your system to the latest version of all the packages it might require.

Makefiles are used for more than package management. In fact, it doesn't seem very common to use them for package management. Maybe I'm missing something?

I think they were talking about projects that tell you to run `make install`, which I agree is less than ideal.

If you write your own Makefiles you know what they do. They’re not that hard to grok and even hand-rolled Makefile use is (IMO) underrated.

I always had trouble reading makefiles because the control flow is not very linear. At least with shell files basically everything is explicit.

The dependency graph is one of the better reasons to use makefiles--think of the nonlinearity as a bonus!

I often find the non linear way of working with Make an advantage. Since it allows you to break a big piece of procedural shell code with lots of control flow (if X is installed don't install again, etc) into small self contained functional pieces with clean input and output boundaries which can be run individually. It also greatly improves code reuse as every target/recipe can be considered a function.

> Since it allows you to break a big piece of procedural shell code with lots of control flow (if X is installed don't install again, etc) into small self contained functional pieces with clean input and output boundaries which can be run individually

At that point, I use a scripting -not shell- language (which is not as implicit)

I don't program in c anymore, so my major workflow is all in one language; I write my server in the same language that do s the compilation, which is in the same language that does utilities like creating network tunnels to my lab nodes

I hate makefiles.

That said, I wholeheartedly agree with the comment.

It's sad that something so central to a project and so useful and important to so many people seems like it hasn't advanced ... ever.

Developers generally do the minimum with Makefiles and get out. They are similar to 1040 forms in popularity.

I've always had a dream of a redesigned "make" system... with import statements, object oriented rules, clear targets, rules and files clearly seperated, structure and organization... sigh.

I can agree with your sentiment. I often sought alternatives for Make because some things are just missing. However I always end up with a Makefile because Make is just so basic and ubiquitous. Any big change or alternative to Make and you will loose that. That's why I try to ovoid newer Make features. So for me Make not advancing is actually a feature. As it is one of the few things I can depend on to stay the same.

I find Nix to be a really nice alternative to Make; although it's also quite heavyweight and "invasive" (it's not just self-contained binary like `make`), so it's more a case of "I'm already using Nix to install dependencies, why not use it for orchestration too?"

Bazel is pretty nice.

Bazel is great until you have to install tensorflow from source into a container and are sitting wondering why you have to put and configure a JVM inside a container-destined-to-be-static-binary, temporarily to get a python program with no Java bindings installed.

As I'm not the most intelligent developer, I'm sure there's a better, more sophisticated way to do this but I got really frustrated and gave up.

Why not use multiple stages in your docker file?

I'm sure bazel is a good tool when it is used properly, but as a greenhorn in the tech field, the constant version mismatching between bazel and tensorflow can become quite the pain when you have to build tensorflow from source.

I will +1 that. I like Bazel a lot because it really forces you into a singular way of building a project which is clean and nice.

That said, mileage may vary. It was originally built by Google so it has quirks. I find it is best suited for projects with compiled dependencies and large repos.

Otherwise, I was going to add that Gradle as a build system is very advanced and improves upon make in many ways.

Oh whoops, I basically wrote the same comment without reading yours. In any case, here here!

> It's awesome that something so popular is so well engineered that it doesn't need changes.

There, fixed it for you. In particular, I'm glad that the OO cancer hasn't spread to something as basic as a build system.

What I meant in this respect is that a lot of makefile rules have a lot of commonality - it would be great to inherit a workhorse rule and tweak an option instead of having to copy/paste a rule or twiddle makefile variables (which aren't normal programming language variables)

> Doesn't know how to write OO code. > Calls OO cancer.

I really like this tutorial to get into Make files: https://swcarpentry.github.io/make-novice/.

fabfile.py (Fabric) could be used as a Makefile in Python. If you don't ever need to ssh to other machines ti run your tasks, you could use pyinvoke library directly (tasks.py). https://www.fabfile.org/

It is easy to add command line arguments to the tasks, configure them using files (json, yaml), environment variables, to split the task definitions into several modules/namespaces.

Having used fabric in the past, I've always found it just as easy to use a shell script and make files.

There's always some level of bootstrapping a project (installing packages/software, compiling libraries and dependencies) where it's easier to just to write a shell script than to program python to do. E.g. How do you get fabric installed on a system?

There's also been this longevity of sorts that Make seems to have gotten right. People just keep going back to it because it's simple.

I've been moving away from using shell scripts in a tools/ directory to using Python Invoke (http://www.pyinvoke.org), which is the library underlying fabric.

I used bash scripts for years, but for a lot of reasons made the switch:

- It was always painful to create small libraries of functions used across multiple scripts in a project

- It's difficult to consistently configure them with per-user settings. I've written bash implementations of settings management, Invoke handles this for me.

- I'd still have to reach for Python whenever I needed to do anything with arrays or dicts, etc.

- Getting error handling correct can be a chore

Invoke has a lot of nice to haves to:

- Autogenerated help output

- Autocomplete out of the box

- Very easy to add tasks, just a Python function

- Easy to run shell code when needed

- Very powerful config management when needed

- Supports namespacing, task dependencies, running multiple tasks at once and deduplicating them

It's not perfect, but it's a lot better than my hand rolled scripts were.

Groxx replies hits the point. I might work with a small number of platforms, but the "super-simple" qualifier is the point. The point at which you need dictionaries (associative arrays) in your install script, not to mention settings management beyond a make include is the point at which you've outgrown make.

it's also far, far, far easier to make it work predictably on multiple platforms. and easier to understand and change later. that can get nightmarishly hard in make/bash, once you go outside the super-simple realm.

Snakemake is a quite nice improvement on make for data munging stuff.

Wow I've never seen such bloated python project before. It has 10 dependancies with 2 additional optional dependancies and the introduction/tutorial is absurdly overspecific.

The first example they use to describe the tool is:

> Cufflinks is a tool to assemble transcripts, calculate abundance and conduct a differential expression analysis on RNA-Seq data. his example shows how to create a typical Cufflinks workflow with Snakemake. It assumes that mapped RNA-Seq data for four samples 101-104 is given as bam files.

This is epitome of non-programmer programming, colour me disappointed.

And yet it's still less crufty than the "by hackers for hackers" GNU Automake and less over engineered than the "made by real professional programmers at a real big tech company" Luigi. Would love the hear if you have any suggestions for actual alternatives for doing this type of automation beyond what make can neatly deal with rather than just going "eww.. it has dependencies"; "eww.. it's made by bioinformaticists".

One of the worst problems with using windows (in my opinion) is that there’s no native GNU make.

> One of the worst problems with using windows (in my opinion) is that there’s no native GNU make.

GNU Make even comes with a vcproj file for building a native binary with Visual Studio. Worked fine for me. Building it with Guile support though is difficult, but fortunately Eli Zaretskii provides native binaries through his ezwinports, and they worked pretty much flawlessly for me. Of course you will usually need a shell to execute recipes, but Make itself runs natively. For more information, see README.W32 in the sources.

Scoop has it in their repos as well (the gow package).

There are a number of ports of GNU Make to Windows. MSYS2 [1], for examples, provides a reasonable development environment that includes Make.

If you just want a Make, there is [2] which can be installed separately and is part of the GNUWin32 collection.

[1] https://www.msys2.org/ [2] http://gnuwin32.sourceforge.net/packages/make.htm

Isn't non-native development on Windows a solved problem nowadays with WSL(2)?

WSL is currently horrendously (unusably, IMO) slow. WSL2 promises a 20x speed up, but it was already 100x slower than native Linux at some actually-realistic workloads that happen all the time when you're developing (e.g. `git grep`), so it's probably still too slow to be tolerable.

I had the opposite problem of wanting to develop some stuff for Windows from a Linux environment, and I settled on running a linux VM and copying binaries over by scping to WSL, which works reasonably well.

A nice thing with WSL is that you get working make and rsync. But I would like make for coding on native windows. Many FOSS projects use Makefile as the parent post described.

Windows does ship nmake but it is a little different.

I develop on Windows, and I like make as a lazy default you can just type in and as long as you maintain the make file, it will build the thing.

It is also a nice document of what you can build and how.

I also like it because Netlify supports it, so you can get it to run make to deploy your site when you push a commit, giving you a lot of control about your CI, while keeping it simple.

Any good resources for learning about make files that you can recommend?

I don't really know a good all in manual, most thing about Make I learned over years of using it from different sources. And I still sometimes discover new features (and new ones are still added in recent release, but I tend to avoid them to keep Makefiles compatible on older systems).

But the Make manual is pretty comprehensive as a guide and reference: https://www.gnu.org/software/make/manual/make.html

Also (as with most things) knowing what name some concept has makes it easy to search for references. For example the terminology of rules (target, prerequisite, recipe): https://www.gnu.org/software/make/manual/make.html#Rule-Synt...

Things I tend to google often because I forget and some are used more often than others are: automatic variables, implicit rules, conditionals and functions.

One trick that really helps making Make complete is making your own pseudo state files and understanding the dependency system. One of the best features of Make is its dependency resolving. You generally write rules because you want a target (a file or directory) to be created, based on prerequisites (dependencies) according to a recipe (shell scripts). Make figures out that if the prerequisites didn't change, it doesn't need to run the recipe again and it will reuse the target. Greatly saving on build time.

Because Make relies on file timestamps to do its dependency resolving magic if you don't have a file there is not much Make can do. So what you can do instead is create a pseudo target output yourself. For example: https://github.com/aequitas/macos-menubar-wireguard/blob/mas... Here a linter check is run which creates no output. So instead a hidden file .check is created as target. Whenever the sources change the target is invalidated and Make will run this recipe again updating the timestamp of .check. Also note the prerequisites behind the pipe (order-only prerequisites). These don't count toward the timestamp checking, but only need to be there. Ideal for environment dependencies, like in this case the presence of the swiftlint executable.

Matt Might's article is really good:


Worth noting that that's an introduction to GNU make, which, while the most common implementation, isn't the only one out there.

The GNU Make manual is excellent.

For learning advanced techniques: "The GNU Make Book" by John Graham-Cumming.

This is a nice video. The only thing I'm missing that should be covered imho (as you will encounter it even if you don't use it) is implicit/pattern rules: https://www.gnu.org/software/make/manual/html_node/Pattern-R...

"GNU Make Book" by John Graham-Cumming


We're doing this, and I mostly love it. I haven't found a great way to do code re-use across projects yet, and I'm not super happy with the Make function syntax (but, maybe if it needs a function, I should turn it into a shell script that itself is called by the Make command...).

All in all tho, it's a fantastic place to write down long CLI commands (ex: launching a dev docker container with the right networking and volume configurations) that you use a lot when working on the project.

Our Jenkins pipeline also relies on the Makefiles, literally just invoking `make release`, which is also pretty awesome.

When using it in multiple projects and CI you also tend to develop some kind of Developer-API with common commands/targets. No matter what kind of project you run you always use the same target names to get started. No remembering which tool is used for this lanuage, just clone it, run `make` and you're off, `make test` to test, etc.

Make does support includes (https://www.gnu.org/software/make/manual/html_node/Include.h...) which allow for some form of code reuse across projects. But then you encounter the balance between DRY and clarity. There are always exceptions, so you try to make stuff to universal, but then its hard to grok the code. And I feel that if I start to use functions I'm using Make wrong and that kind of logic better fits in shell scripts called from the Makefile. Makefiles (the way I use them at least) should be simple to read and explain themselves. But it's often hard to balance this with the features Make provides, like implicit rules and automatic variables. And if I ever turn to generating Makefiles (other than for C projects where it kind of expected) I will probably retire.

> common commands

Oh absolutely. It's fantastic for that. Our build pipeline actually relies on that; every project has a "release" target that is basically for the CI to use.

> Make includes

Yeah, I looked into that, and I think I had the same conclusion.

> scripts called from the Makefile

That's what I'm thinking is the way to level up this kind of system. Although then, why have `make init` instead of just `./bin/init` ?

The biggest reason I use Make is the dependency resolving.

In the `make init` example. It doesn't matter how many intermediate steps are involved `init` is the end-state I want to achieve. So in most of my Makefiles the `init` target will fan-out into requirements as wide and deep as it needs, including running apt to install missing system dependencies. But then the good part. If a dependency is already fulfilled Make won't have to run it again. Although sometimes its hard or clunky to convert some dependencies into 'files' so Make can do its dependency resolving work properly.

Have you ever considered using Rakefiles instead?


Never wrote them myself but have encountered them sometimes. Had no major issues with them then I believe. However I would probably write a Makefile to manage the Ruby environment and install Rake as they don't come installed by default.

self taught dev here too. I have never used make files, but pretty sure I'm using NodeJs in a similar role. I use it to automate all my "scripting", including deploying of my SaaS product to the cloud and running unit tests.

If it sounds interesting, check out https://www.npmjs.com/package/shelljs

PS: I do my primary development on windows, but my production environment is ubuntu. node apps "just work" on both environments. truely cross platform.

I put Make in the same class as Vi. I hate using them but I have to learn them because they're the least of N evils, the most pragmatic way out of a hole.

I second this... a lot of times, broken make files are standing between you and victory, so it would be good to at least have some familiarity with them.

I also use similar Makefiles in my projects. I use "make release" to generate the docker container.

I love Make in concept and kind of hate it in practice. There is sooo much incidental complexity and so many warts to work around. I think it's a concept that is ripe for a new approach that thoughtfully keeps the good, ditches the bad, and maybe even adds some useful capabilities that aren't already there.

But of course I'm immediately skeptical of this idea a la https://xkcd.com/927/ (Standards). For instance, maybe this is what npm and all the rest thought they were doing. Certainly Rake in the ruby world tried to do this, and I never really liked it, so clearly they missed the mark somehow, at least for me. But then when I feel discouraged about the ability to improve on things, I think about how I felt this way when I first heard about Git. Why would you implement a new source control system when we already have subversion? Sure, svn has its frustrations and warts, but this new thing is just gonna have its own frustrations and warts and now we'll just have another frustrating warty thing and we haven't really gained anything. And this is totally true! Git is super frustrating and warty. Except that it's also way better than subversion, much faster and far more flexible. It was a revelation when I started using it. So I think back to Linus when he was thinking about creating git and think that he probably didn't have this discouraged uncertainty about improving things; he just had ideas for a better way and he went out and did it. (And yes, I know it was influenced by bitkeeper and other DVCs exist, so it's not like he invented the concept, but my point stands.)

So maybe someone could make a better Make?

On Windows there is great Powershell module Invoke-Build.

Makefiles are so old and quaint, why not use "{flavorofthemonth}".format(flavorofthemonth=np.random.choice(frameworks)) ?

Read the curriculum of an undergraduate computer science course and read up on the things you haven't heard of. Some courses will even have lecture notes available.

E.g. these four pages are the university of Cambridge masters in computer science:





(Or a MOOC, but the links above are easy to browse text, syllabuses and lecture notes, not a load of videos.)

I support this 100%. I worked for years as a self-taught programmer. When I went back for my CS degree, I was shocked at how much I didn't know that I didn't know.

Numerous times I'd be sitting in a class and we'd go over a solution to some theoretical problem, and I'd realize that this solved a problem that had taken me days to discover on my own (and this solution was usually better than what I'd come up with).

If you are the kind of person who can work through everything on your own (including what may seem like the the boring parts), I highly recommend doing so.

Could you give an example of one of the times something theoretical helped you solve a real world problem?

I've thought about going back for my CS degree a lot but can't really justify the cost and time investment vs self teaching. But it's something that's always been in the back of my mind.

Not the OP, but I too was a self-taught programmer as a teen who got a CS degree in my 20s. I independently came up with the idea binary search in sorted data structures. But the first time I encountered hash tables in the course of getting my CS degree my reaction was "That's impossible! You can't get O(1) efficiency!"

(Sadly though this exposure was not in the context of a theoretical course on data structures, but rather in the context of reading the docs for HashMap as my university dropped older courses and languages to jump on the bandwagon of becoming a "Java school".)

Sure. It's been a while, but the first one is that comes to mind is when we went over Floyd's Tortoise and Hare cycle detection algorithm. I realized it was a much cleaner solution to detecting cycles in a linked list than a solution I developed on my own over several days.

Another example: the automata class I took went over pushdown automata, and I immediately saw that it would solve the issues I'd been having with a finite state machine I was using to handle input for a game.

Oh and recently I needed to put different sections of a screen on different layers so that no 2 adjacent sections were on the same layer. I realized that this was basically just graph coloring, so I was able to find a solution in minutes instead of hours.

I'm sure their are people who can get through most of a CS curriculum on their own, but I'm not that disciplined. I've also never met anyone who was. It has been immensely helpful.

To clarify: the first three links are for each year of the (three-year) undergrad program, the fourth is for the Masters.

The Cambridge course isn't perfect, but they do a very good job of making as much teaching material as possible publicly available.

FWIW, I've found many undergraduate computer science courses to lag behind on tooling, so take the recommendations they have with a grain of salt.

The Cambridge course is much more theoretical than most others, afaik. Tooling on programming language semantics, for example, doesn't change that much.

Do Cambridge courses not have labs/projects? I looked at the course materials on a few of the courses and couldn't find any. Or are they given out to students separately?

There are hardware and software labs, which are administered on paper by PhD students. These include(d): ML (the functional programming language), FPGA/soft core development, Java tasks, breadboarding some logic, prolog and probably some different ones now (looks like some machine learning tasks?). Some of them are referenced and described on the links above. There's also a group project in year 2, a dissertation individual project in year 3, and a small holiday project between 1 and 2. Overall, a few students get through it without being able to properly program, but most basically self teach.

There is a system of supervisions, that is a bit like doing homework and going over it in a private (1/2/3 students to one prof) lesson once every two weeks. Sometimes the questions would be standard for a course, sometimes the professors chose their own. They are not necessarily directly tied to the course as lectured.

Thank you very much. This is very valuable.

Does an offline copy of this exist? Do you think it will go down when the term probably ends soon?

I would expect the material stay up: at the moment, everything back to 1998/1999 is still accessible:


Do you know about curl/wget? Each one does pretty much the same thing as the other, but you can start a religious war by suggesting that one is preferable.

Anyway, either of them will let you mirror a website so you never have to worry about it going down.

And, since every Unix command line tool inevitably gets mined and turned into a web service, you could always submit those urls to archive.org instead of or as well as curl-ing/wget-ing them.

https://github.com/ArchiveTeam/grab-site if you're super serious. Also archive.org will probably accept those output warcs.

1. Profiler. There's a standard tool that tells you what part of your code is slow. Over half the time it'll find something dumb and easy to fix instead of whatever you expected.

2. SQL / relational database schemas. Persistence opens up a lot of capabilities. And databases themselves are very well-optimized; if you do any nontrivial data manipulation it's likely that whatever the query planner comes up with will be faster than your first idea of how to do it by hand.

3. Graph searches. An awful lot of problems can be solved by knowing how to turn problem into a graph search. Make sure not to fall into the trap of thinking a graph search is limited to paths through space - you can solve problems like "get through this dungeon with keys and doors" by adding duplicate nodes for the different states.

4. Sequential Bayesian Filters. Are almost as useful as graphs, but aren't in a standard CS curriculum so you'll look like a wizard. These solve the problem of "I want to know a thing and I know how it changes over time, but I only can get rough estimates for its current state." Kalman Filters are simple and give great results when applicable. Particle Filters have lower quality but are applicable to more problems and dirt simple to code.

Support for 4! Yet, my understanding is that particle filters are superior but computational more demanding. For nonlinear problems, the extended Kalman Filter linearizes the task, whereas particle filters don't and work with many point estimates instead.

I loved this book: https://users.aalto.fi/~ssarkka/pub/cup_book_online_20131111...

and also Thomas Schoen group does great work on Sequential Monte Carlo (SMC), MCMC for sequential data :) http://user.it.uu.se/~thosc112/index.html

They are also building a probabilistic programming language for sequential data! https://github.com/lawmurray/Birch

Regular old Kalman Filters are the best (literally perfect) when your problem fits all their requirements. They also have a lot of nice properties if you're dealing with a problem that mostly fits their requirements. But the linear-gaussian requirement is pretty steep, they don't always work.

I don't like the EKF much and prefer the UKF. The core filtering code is a little more complex but they're much easier to actually work with; you can give them arbitrary functions like a particle filter.

Particle filters have the advantage of being able to handle arbitrarily wacky distributions. But they are random and do some wacky things in edge cases. They'll behave much more poorly in low-evidence situations than other filters will. And they fall over spectacularly if you switch from low-evidence to high-evidence (there's a workaround for this but it's still counterintuitive). Finally they're just more computationally expensive than the others.

Birch sounds interesting, I'll take a look.

strongly agree on profiler and SQL.

Horror story re SQL: in my SaaS I skipped sql and went with cloud datastore (NoSql) and regret it. basically (to simplify) you can't query your data without doing a full table scan (IE Slow).

NoSql is not no sql though..

Unit testing, mocking, and various other testing techniques.

Why? Any project of sufficient complexity is very hard to test. If all you're doing is code -> build -> run to debug your code, you can very easily break something that's not in your immediate attention.

The problem is that good unit testing is hard, and time consuming. It can be so time consuming, that unless you can really plan in advance how you test, you could spend more time writing test code than real application code. (This is what happens when writing professional, industrial-strength code.)

So, when a hobby project becomes sufficiently interesting enough; such that the code will be so complicated that your code -> build -> run loop won't hit most of your code, you should think about how to have automated tests. They don't have to be "pure, by the book" unit tests, but they should be an approach that can hit most of your program in an automated manner.

You don't need to do "pure" mocking either. If you're writing something that calls a webservice, you could write a mock webserver and redirect your program to it. If you're writing something that works with pipes, you could have a set of known files with known results, and always compare them.

The goal is that you should cover most of your program with code -> build -> tests; and only do code -> build -> run for experimentation.

Let me second this. And in particular, I strongly encourage every developer to try starting a new project in a test-driven fashion (by which I mean that you advance the code by writing a bit of test and making it pass, and then doing that over and over.)

There's a qualitative difference between working in a well-tested code base that's very hard to describe convincingly. A lot of my early development experience was in code bases that had little or no testing. Experiencing a well-tested code base totally changed things for me. Instead of work being a death-of-a-thousand-cuts experience, it became pleasant, steady progress.

> Experiencing a well-tested code base totally changed things for me. Instead of work being a death-of-a-thousand-cuts experience, it became pleasant, steady progress.

I had the luxury of taking a well known data process and rewriting it with integration tests (input in, matching output with a golden file). It changed my professional life. Whereas before our deployment process included a 3 day wait and manual data checking on stage, after I was able to do deploys multiple times a day with confidence.

Made a believer out of me.

Unit tests can give you false positives (test failed but code is correct) and false negatives (test passed but code failed).

And TDD seems to create so many tests that you get huge false positive rates. I recently jumped on a project and I made a couple of fairly small code changes (a couple of hours) which caused 100 tests to fail. I then spent the next two days going through and correcting all 100 tests none of which found an issue in my code.

If you're saying that it's possible to do testing badly, I agree, just like it's possible to write production code badly. Sometimes teams new to unit testing do it ritualistically, without really understanding the purpose. That can lead to all sorts of bad outcomes. E.g., lots of tests that look impressive and even generate good coverage numbers, but don't really test what matters. Or tests that are highly duplicative, such that changing one thing in the code requires changing a lot of things in tests.

I have definitely dealt with code bases like that, and that sucks. But I have also dealt with code bases where the tests were great, and that's an amazing experience.

To do TDD well, I think it's important to release early and often and to reflect on one's experience (e.g., with weekly team retrospectives). That way if people are doing something unhelpful, like writing very duplicative tests, pretty soon they'll become an impediment to progress. The team will learn to write the useful tests, while skipping the ones that might fit some hypothetical pattern. It also helps people learn to design for testability; often, painful tests are a sign of bad design of the production code.

What are some resources for “good testing”, test boundaries, and possibly antipatterns?

(Ruby, Rails)

I've read a couple TDD books and this definitely seems to be a big blind spot. How to deal with the maintenance issues of unit tests.

They all seem a little fanatical in their pro unit test talk and don't discuss the downsides.

I find https://www.youtube.com/watch?v=EZ05e7EMOLM describes my own experiences with automated testing quite well.


- Focus on "automated testing", don't get obsessed with philosophising about "the true nature of a 'unit'", or other such dogma.

- Be empirical: base your rules on what works; don't base your work on "the rules".

- The goal of testing is to expose problems in our program: "test failure" is a success, because we've found a problem (even if that problem is with the test!). Anything else is secondary (e.g. isolating the location of failures, documenting our API, etc.). Avoiding this goal defeats the point (e.g. choosing to ignore edge cases).

- Focus on functionality rather than implementation details, e.g. 'changing a user's email address' rather than 'the setEmail method of the User class'. This improves reliability and makes failures more useful/meaningful (i.e. "this feature broke" vs "this calling convention has changed").

- Mocking is a crutch: it works-around problems that can usually be avoided entirely during design; it can still be very useful when a design can't be changed (e.g. adding tests to a legacy system).

- Testing a real thing is objectively better than testing a fake thing; we should only mock if testing the real thing is unacceptable.

- If two components always exist together, pretending that they're independent is a waste of time and complexity.

- Having some poor tests is better than having no tests. Tests can be added, removed and improved over time, just like anything else.

- "Property checking" is a quick way to find edge-cases and scenarios we wouldn't have thought of.

- Fast feedback loops are important. Reducing I/O and favouring pure calculation usually speeds up testing more than reducing the number or size of tests (e.g. "unit" vs "end-to-end"). Incidentially, this is also how we avoid having to mock.

The type of engineers who would screw up 100 unit tests independently are exactly the kind of engineers who should be forced to write tests for their code. Can you imagine the integration tests had they not been doing any testing at all?

Does that indicate that the tests were not written correctly in the first place?

I don't think so. They probably could have been written better, they weren't written poorly, but it's really hard to write 200 unit tests for a feature that don't break when the feature is updated.

This is the gospel truth. It does take discipline though, because writing tests sucks. I like to have a policy of never committing a non-trivial function without a test. That way, I can never put it off and wind up with a huge chunk of untested code.

Are there any resources out there you would recommend for learning testing techniques in a Python context?

Is that useful if you’re never writing django apps?

It depends on your background. Having written web app before let's you quick grasp the ideas laid in the book.

To me, the most important chapters are

- https://www.obeythetestinggoat.com/book/chapter_mocking.html - https://www.obeythetestinggoat.com/book/chapter_purist_unit_...

Having said that, the concepts are universal.

Brian Okken's "Python Testing with pytest"[1]. More recent than Harry Percival's book.

[1] https://pragprog.com/book/bopytest/python-testing-with-pytes...

You are completely wrong.

Mocking is a huge design smell. The more mocks or integration tests your projects requires to get full coverage the less modular your program is. A program that uses many mocks is a sign of very very poor design. You will find the code more complex to reason about and much harder to reuse code without necessitating a lot of glue code to make things work together. Without proper knowledge you won't even know the program is poorly designed.

I will grant you that 90% of programmers out there don't know how to design programs in a truly modular way, so most engineering projects will require extensive mocking. In fact most engineers can go through their entire career without knowing that they are making their programs more complex and less modular then it needs to be. Following certain design principles I have seen incredibly complex projects require nearly zero mocking (very very rare though).

Mocking indicates a module is dependent on something. Dependency is different from composition.

     Dependencies                                Composition

           C                                        C
 |                     |       +----------------+       +-----------------+
 |     A               |       |                |       |                 |
 |                     |       |                |       |                 |
 |        +----------+ |       |                |       |                 |
 |        |          | |    in |                |       |                 |  out
 |        |          | |    -->+       A        +------>+         B       +-->
 |        |    B     | |       |                |       |                 |
 |        |          | |       |                |       |                 |
 |        |          | |       |                |       |                 |
 |        |          | |       |                |       |                 |
 |        +----------+ |       |                |       |                 |
 +---------------------+       +----------------+       +-----------------+
What's going on here? Both examples involve the creation of module C from A and B.

left: 'A' exists as wrapper code around B and is useless on its own. To unit test A you must mock B.

right: every module is reuseable on its own. Nothing needs to be mocked during unit testing. No dependencies.

The only exception to the right example where you MUST mock is a function that does IO. IO functions cannot be unit tested period, they can only be tested with integration tests.

There's a name for the left approach. It's called Object oriented programming using inheritance or composition(the oop version of composition; not functional composition) as a design pattern. (both are bad)

There's also a name for the right approach. It's called functional programming using function composition.

I don't advocate that you strictly follow either style. Just know that when you go left you lose modularity and when you go right you gain it. All functional programming does is force your entire program to be modular down to the smallest primitive unit. Extensive mocking in your program means you went too far to the left.

tangent: Another irony around this world is that a lot of functional programmers (javascript and react developers especially) don't even know about the primary benefit of functional programming. They harp about things like "immutability" or how its more convenient to write a map reduce rather than a for loop without truly ever knowing the real benefits of the style. They're just following the latest buzzword.

Forgive me, if I'm being dense, but doesn't either of these cases depend on how the composed objects are being used?

In your functional example A is an input to B (or vice versa?), how do you propose testing one of the modules without first instantiating the other one?

I'll give you two examples. One functional and the other OOP. Both programs aim to simulate driving given an input of 10 energy units to find the final output energy.


 engine = Engine(10)
 car = Car(engine)
 car.drive() #result 8

  class Car:
    def __init__(self, engine):
     self.engine = engine

    def ignite(self):
     self.engine.energy =- 1

    def run(self):
     self.engine.energy =- 1

    def drive(self):
     return self.engine.energy

 class Engine:
  def __init__(self, energy):
   self.energy = energy

 # ignite not testable without engine
 # run not testable without engine
 # drive not testable without engine and a car
 # ignite, run, and drive are not modular cannot be used without engine. 
 # engine testable with any integer. 
 # Car useless without engine
 # engine useless without car

 #functional \
 def composeAnyFunctions(a,b):# returns function C from A and B. See illustration above. 
  return lambda x: a(b(x)) 

 def ignite(total_energy):
  return total_energy - 1

 def run(total_energy):
  return total_energy -1 

 drive = composeAnyFunctions(run, ignite)
 drive(10) #result 8

 # compose testable with any pair of functions
 # run testable with any integer
 # ignite testable with any integer
 # drive testable with any integer
 # all functions importable and reuseable with zero dependencies. 
 # input_energy -> ignite -> run -> output_energy

"I think the lack of reusability comes in object-oriented languages, not functional languages. Because the problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle." - Joe Armstrong

you don't necessarily need the car or engine to simulate the energy output of driving.

I've been using static methods in java that follows the pure function way, it proved very easy to maintain even to those who inherited my code later on.

That's mainly just namespacing. The only point to use an Object in object oriented programming is to unionize objects and state. To combine them together into a single primitive. This combination breaks compose-ability.

Static functions avoid state. You put them in an object in java because java has to have everything in an object. In any other language these would just be top level functions namespaced into a package or something. You are basically using java in a more functional way. Which is fine.

Thank you so much for a concrete example. I need to think about this some more. Clearly the code make sense, but in a wider context, can you have a banana without a jungle? I'm dabbling with some functional programming but I definitely have more experience with oop, so what you're saying is difficult for me to grasp, but the benefits are hard to ignore.

There are downsides to FP as well. I am not advocating one over the other. But there is a concrete theoretical reason why FP is more modular, reuseable and organized than OOP code.

Smalltalk is possibly the only OOP language that lets objects be compose-able and modular. Check out Pharo if you're interested. If you learn smalltalk well enough, you could apply its principles to traditional OOP languages and gain the modularity benefits.

> There are downsides to FP as well.

I've seen Java 8 functional stuff get unreadable.

But other than that, is there any other concrete downside?

For using Pure Functions, I don't see any downside to this. Aside from it being impossible to use for outside the program side-effects like IO to device.

I mostly agree with what you're saying, but I will add that is is also possible to write well-designed, modular, easy-to-test (minimal mocks) OOP code. It does provide more guns to shoot yourself in the foot with, I will admit.

Yes you are correct. Check out smalltalk, it fits the paradigm you describe. It was actually rated the most productive programming language in the world according to namcook. Ironically, it's Definitely one of the least popular languages as well.

Your argument appears to be, in TL;DR form: OOP and dependencies are bad and wrong, you must use Functional Programming or you will be wrong.

Isn't that a little extreme?

No. You are putting words in my mouth and accusing me of being extreme. I am NOT promoting one paradigm over the other. TLDR? I hope you read my stuff. I find it rude if someone just comments with a one liner and summarizes everything I said into a catchphrase that is a perversion of the truth. I feel like a presidential candidate.

Anyway, this is what I am saying:

If you use functional programming your code will be more modular and reusable because the paradigm forces you to be that way.

If you use Object Oriented Programming your program will automatically be less reusable and less modular but more object oriented.

This is all I am saying. Your mistaken statement that I am promoting one style over the other is based off of this assumption: Modular programs are better than less modular programs. This is not True.

Something like a physics engine is a better fit for OOP then it is for functional. Although your program will be less modular as a result, OOP is still a better fit because physical objects are easily modelled with OOP objects.

Trees, graphs and algorithms involving things of that nature are a better fit for objected oriented programming then functional because many of these algorithms involve mutating nodes. Again, if you follow this style your program will become less modular overall as a result.

The ideal program is one that spans the spectrum of both OOP and functional. When it calls for it use functional or OOP depending on context. Overall for complex web applications that most startups make, in my opinion, the program should be more functional then it is OOP. A web request is basically a function that takes in a request as an input and outputs a response. The form factor of a function better fit for this, and you get high modularity as a side benefit. There is no point in simulating the request/response paradigm in a stateful Object while losing modularity in the process.

For a game. OOP is better in my opinion. Gaming entities involve constant mutation of things with state so OOP is a better fit. UI is a better fit for OOP as widgets are better represented by objects (FRP aka react&redux, imo works well but is an awkward abstraction)

There is one exception to this rule. In general Objects in object oriented programming are not compose-able. However, Smalltalk is an object oriented language where objects ARE compose-able. Smalltalk is the language that coined the term "object oriented" and although it is no longer popular as it was before it is still a very robust language and learning from it has huge benefits.

Thank you for the clarification!

Learning in-depth your various options for persisting data, is very useful since most applications have to deal with persistence in some form, and increasingly in a distributed manner. Go beyond simply skimming the surface of SQL vs. NoSQL and the marketing claims different databases make about their scalability and consistency. Learn what ACID and CAP stand for and the tradeoffs involved in different persistence strategies. Learn SQL really well. Learn how to read a query plan, which is the algorithm your SQL query gets compiled into. Learn about the tradeoffs of row-based vs column-based storage. Learn how indexes work, and what a B-tree is. Learn the MapReduce pattern. Think about the tradeoffs between sending code to run everywhere your data is stored vs. moving your data to where your code is running.

Two great resources I've been going through are

- https://dataintensive.net - Really deep dives into different types of data storage solutions, their history, and how they actually work.

- http://www.cattell.net/datastores/Datastores.pdf - Good paper that helps differentiate similar but different datastores. Really helpful when you're trying to pick a modern data solution.

Designing Data-Intensive Applications is probably the best O'Reilly (if not overall technology) book of the past decade.

The talk on "Turning the database inside-out" [0][1] by the author, Martin Kleppmann, is a fantastic intro to these dynamics, and it's something I'll always recommend to both experienced and inexperienced data modelers and backend developers.

It goes pedagogically through the way things are typically done in a relational database in such a clear way that word-for-word it's one of the best tutorials I've seen... but it also weaves a narrative of "how can this be done better/more scalably/more reliably/more flexibly-to-business-needs" in pointing to a streaming/event-sourcing architecture. You may or not need the latter right away, but it's a fantastic tool to have in your toolbox to be able to say "ah, this new requirement feels like it would benefit hugely from this architecture."

Especially for OP who's starting to think about the "why" of messaging queues, this could be a fantastically valuable first step.

[0] https://www.youtube.com/watch?v=fU9hR3kiOK0

[1] https://www.confluent.io/blog/turning-the-database-inside-ou...

Another good resource that guides one through both the philosophy (why) and technical details (how) of building a web application is Software Engineering for Internet Applications:


Learning how to use dtrace / bpftrace [0] is very valuable if you ever need to get into serious systems profiling.

There are some really cool data structures out there you might not know about. One of my favorite basic ones that I get a lot of use out of is the trie [1] (a.k.a. prefix tree). Very useful for IP calculations.

Also look into probabilistic data structures [2], very amazing things can be done with them.

[0] https://en.wikipedia.org/wiki/DTrace

[1] https://en.wikipedia.org/wiki/Trie

[2] https://en.wikipedia.org/wiki/Category:Probabilistic_data_st...

DTrace is life-altering.

I keep hoping that someone will build a dtrace(1) CLI that transpiles to bpftrace.

Profiling, period

Approximate Windows equivalent of DTrace is sysinternal process monitor, freeware. Very useful sometimes.

The Windows equivalent of DTrace is.. DTrace. [0] DTrace is about far, far more than snooping the filesystem. At best, Process Monitor is an equivalent of Brendan Gregg's DTrace utility, opensnoop. The true power of DTrace is to correlate events across subsystem boundaries. Like, graphing the top quartile of latencies from network acceses initiated via a given function in your application.

[0] https://techcommunity.microsoft.com/t5/Windows-Kernel-Intern...

Bloom Filters are awesome.

Shell scripting for processing text. You can often get so much done with so little code and effort.

Also on a semi-related note, I think as a self taught programmer, it's easy to get stuck on things that seem cool but are just procrastination enablers (I know, I've been guilty of it for 20 years). Like, if you're about to start a new project and you want to flesh out what it's about, you really don't need to spend 5 hours researching which mind map tool to use. Just open a text document and start writing, or get a piece of paper and a pen. It won't even take that long.

I spent about 1.5 hours the other day planning a substantially sized web app. All I did was open a text file and type what came into my head. For fun I decided to record the whole process too[0]. I wish more people recorded their process for things like that because I find the journey more interesting than the destination most of the time. Like your journey of eventually finding message queues must have been quite fun and you probably learned a ton (after all, it lead you to message queues, so it was certainly time well spent).

[0]: https://nickjanetakis.com/blog/live-demo-of-planning-a-real-...

These days it might be better to just learn python. It's cleaner and scales better to complex code. And it's ons most system modern systems available out of the box where shells are available too. Shells are still good for simple oneliners, and knoting multiple processes together, but text-processing involves so many different commands, each with their own quirks, that a consistent simple language is IMHO superiour.

For processing text, using Python doesn't really make sense in a ton of cases.

If I want to search a text file for a specific string, why wouldn't I just use `grep "hello" myfile.text`, or if I wanted to do it on a directory of text it's a minor change of `grep -R "hello" .`.

Why would I go through the trouble of opening a Python interpreter, or writing out a Python script to do the equivalent in Python?

Or if I wanted to grab the third column of a CSV, I would for sure just use the `cut` command or maybe `awk` (depending on what I'm doing).

For more complex parsing you can often pipe together a few commands and maybe convert it into a 5 line Bash script to make it a little easier to create variables, etc.. It becomes something you can whip up in 1 minute.

Then there's also more involved text parsing that doesn't require piping a bunch of commands together or shell script glue, in which case it comes back to using grep with its various flags and potentially a regexp. It's a natural fit for the problem and you can iterate on it so quickly.

Honestly this applies to Ruby & Javascript/Typescript as well and not just Python. I really don't see the value of learning shell scripting anymore when the newer languages are just was easy to learn, terse, and you can adapt better to changing conditions when needed with libraries.

I often find multi-line Python scripts with `import os` and others that could be a fraction in size (and just as clear) in bash. Even more ridiculous are the times I find a node script (published to npm, even) that is little more than a wrapper on a shell script.

Inevitably someone will read these arguments and think “those are just bad programmers”, but your point was that you “don't see the value of learning shell scripting”. The value is in not spewing absurd code like that. Shell commands are fast and efficient. There isn’t an emphasis on libraries because instead you use tools. Is `grep` not enough? Try `the silver searcher`[1] or `ripgrep`[2].

Are shell scripts the best instrument for every job? No, but no tool is.

[1]: https://github.com/ggreer/the_silver_searcher

[2]: https://github.com/BurntSushi/ripgrep

Just because someone knows shell scripting I'm not going to consider them "bad programmers". I know it myself and I've used it extensively before I learned python & ruby.

My point is that the cost for learning and using shell scripts is just too high compared to just using a modern language that's just as terse and a lot more powerful and flexible. Context switching from one language to another isn't free either.

imo the only time shell scripting was practical was when the only major programming languages were C, C++, and Java. imo even Perl5 is more practical than shell scripting.

Also I doubt that the python program you mentioned was that much bigger than a shell script

This 1000x. I put of just getting by with shell script for some years and when I finally decided to get deeper into it, it's magical.

A good series of piped commands with tools available basically everywhere can solve problems you had no idea could be so simple to solve.

have any good resources for this?

  man bash  
  man awk  
  man sed  
  man grep
You can also do most of this stuff with Perl one liners if that suits your fancy.

sed + awk - https://www.amazon.com/sed-awk-Dale-Dougherty/dp/1565922255

awk - https://ia802309.us.archive.org/25/items/pdfy-MgN0H1joIoDVoI...

general *nix text processing - https://www.tldp.org/LDP/abs/html/textproc.html

Perhaps these are a good start:

awk: - GNU Awk User Guide - https://www.gnu.org/software/gawk/manual/gawk.html#Getting-S...

- Grymoire guide - http://www.grymoire.com/Unix/Awk.html

sed: - Grymoire guide - https://www.grymoire.com/Unix/Sed.html - Official docs - http://sed.sourceforge.net/#docs

The IBM developerworks articles about this are old (2000!), but still incredibly useful and well written. Start here:

https://www.ibm.com/developerworks/library/l-sed1/index.html https://developer.ibm.com/tutorials/l-awk1/

There's a free O'Reilly book from last year: https://www.datascienceatthecommandline.com/

awk is pure magic.

Also, just get to know your shell.

big timesaver: control-r

basic but useful all over:

    for i in *.c; do cp "$i" "$i.bak"; done

What does double quotes dollar do in the command?

Ensures filenames with spaces in them get passed as a single argument, instead of being inadvertently expanded into several.

Quoting and expansion issues are a pain in shell languages...

$i is the variable i declared in the for loop. Quotes just wraps it, so that it's (somewhat) safe if the file has a space in the name

What do you mean, "somewhat"? This looks safe for all I can tell.

You know how you should almost always eat your vegetables?

You should almost always quote your Bash variables too. Here's why: https://nickjanetakis.com/blog/here-is-why-you-should-quote-...

It prevents filenames with spaces from expanding into two arguments to the command.

I'd split your first point in two:

- Shell scripting for running commands and managing files

- Unix utilities for processing text

These happen to complement each other nicely. It's not that bash is better at manipulating text than Python, it's that Python makes it painful to invoke commands (like those Unix utilities) and pipe data between them (e.g. https://news.ycombinator.com/item?id=17733865 ).


I can't recommend this book enough. I have a CS background, and still had quite a few "I can't believe this thing has been hiding in plain sight!" moments while reading it.

It's great. Incredibly dense with useful information and it just blows my mind how much knowledge Martin has about the topic. I recommend watching this talk from him to give a little glimpse of the book: https://www.youtube.com/watch?v=5ZjhNTM8XU8 This is just about a little part of one of the chapters.

Oh man, this is good, thank you.

I'm now torn between reading this one first or the Architecture of Enterprise Applications.

I loved Designing Data-Intensive Applications. It gives you the reasons why NoSQL databases exist and the problems they solve. Moreover it gives you reasons to select one over another. It's really excellent and one of my top two CS books

Your other top CS book out of interest?

If it helps, IMO "Designing Data-Intensive Applications" is a better bang-for-the-buck. Enterprise-scale applications are a world unto themselves.

Edit: I meant Patterns of Enterprise Application Architecture by Fowler in my comment above. Recommended by DHH.

My advice would be to skip it completely. It's just packed full of standard GoF OO dogma.

Thanks. So, what you're saying it is redundant if you've read GoF?

But this is mainly for distributed (web) systems, right?

Are there good books for data intensive desktop apps? Like games or CAD design tools?

Debuggers and property based testing. It is a select few people that can actually productively (not their own metrics) use print statements for debugging. Learning how to craft repro scenarios and adequately capturing state in a debugging session can enable junior devs to easily surpass senior devs.

Property based testing aren't quite formal methods, but I think they are a good stepping stone. And they also somewhat force your code into an algebraic/functional style which also make it amenable to refactoring, better testing and is easier to understand.

Design tools like Swagger can help one think through services w/o diving into code. Code itself is a liability and should be thought of as "spending" not creating. Code is a debt.

Refactoring and code understanding tools, if you use PyCharm (you should, it is free in all senses), learn how to navigate into your libraries. Read you libraries.

This x1000 debuggers are seriously undervalued by many developers. It’s like a super power.

What are some good resources to learn about debugging patterns and tips/tricks? My preferred language, Julia, recently introduced a nice set of tools related to debugging. I feel like there's probably things that would make me more productive but I think the techniques would be more broadly applicable than a specific language.

One thing I always do these days is I step through any new code I've written the first time I run it. This usually weeds out some bugs that might take a while to find because they are easy to miss. It also ensures that you actually go through each line of code you write, doing a forced code review on yourself early on.

I highly recommend learning PROLOG & understanding how to write your own simple planner system. The hairiest real problems are hairy because they're best suited to a declarative style (and programs written declaratively can be made much more efficient through more clever solvers -- given naive code, a clever solver has a much bigger efficiency boost over a dumb solver than an optimizing compiler does over a non-optimizing one -- although PROLOG itself leaks too much abstraction for many of these techniques to be viable in it).

I also recommend understanding message routing systems used in file sharing, like CHORD.

If you don't have a strong background in the math behind theoretical computer science, you might benefit a lot from an understanding of the formal rules around boolean logic, symbolic logic, & state machines -- especially, rules about when certain kinds of things are equivalent (since stuff like demorgan's law are used for simplifying and optimizing expressions a lot, and rules for state machines are used to prove upper limits on resource usage).

If you don't already, learn to use awk. It's a much more powerful language than it seems, and fits extremely well into the gap between command-line prototyping in shell one-liners & porting a prototyped tool to python or perl, and so it's a huge time saver: it is faster to write many kinds of tools in a mix of shell and awk and then rewrite them in python than it is to write them in python in the first place.

I've never used Prolog in the 15 years since I learned it in college. It's an interesting take on programming, for sure, and I appreciated the mind-expanding exercise, but hasn't helped me in my career at all.

Totally agree on awk. I use it almost every day for quick little one-liners. Big time saver.

Also agree on state machines, because from there it is a short hop to understanding formal grammars and the foundation of compilers and languages, which has been immensely useful in my career.

Learning about automata theory was one of the most mind-expanding experiences I had in college. It was certainly not something I would've stumbled upon without guidance. I believe that describes most of the value I derived from a degree, being nudged in the right directions toward solutions and problems that a lot of smart people have though about for a while.

Understanding how different languages, or inputs more generally, can be transformed into meaningful outputs is pretty satisfying. It's a topic that almost seems to transcend the realm of computer science.

The best thing you can do for your career is learn things that don't apply to your career. After all, it's impossible to predict unknown unknowns, which makes accidentally having already learned something nobody else valued enough to pick up the most valuable skill.

Formal methods.

It took me nearly a decade of working in distributed systems to be introduced to TLA+ and other tools in this space. Until then my knowledge had been built from textbooks describing the fundamental data structures, algorithms, and protocols... but those texts take an informal approach with the mathematics involved. And since I was self-taught I was reading those texts with an eye for practical applications than for theoretical understanding. I had no idea that a tool existed that would let me specify a design or find potential flaws in systems and protocols, especially concurrent or parallel systems, with such ease.

I think type theory and category theory have also been great tools to have... but I think mathematics in general is probably the more useful tool. Being able to think abstractly about systems in a rigorous way has been the single-biggest booster for me as a practitioner.

> the single-biggest booster for me as a practitioner.

Can you instantiate this claim with an example? I'm somewhat knowledgable in both math and computer science theory but have yet to feel as though my math background has helped me in practical CS.

About 3-4 years ago I was working on an open source cloud platform for a company deploying a public cloud. There was a particular error that sometimes happened in our production environment where a rebooted VM would come up but couldn't connect to the network.

We tracked it down to a race between two services in the data plane. It turns out the VM controller wouldn't wait for the network controller to unplug a virtual interface before requesting the interface to be plugged back in. There was a lack of co-ordination happening. However it only happened when the network component was under heavy enough load that it would take too long to respond before the VM service finished rebooting the VM -- usually it was fast enough and the error wouldn't appear.

I managed to model this interaction at a high level in TLA+. From there I had suspected that the error was in the mutex locking code in the async library this system depended on so we refined the model pretty close to that implementation. As I recall we found that the mutex code wasn't the culprit -- a fine result. We ended up implementing some light-weight co-ordination mechanism to ensure that the VM service waits to acknowledge the progress of the network service.

Since then I've continued to use TLA+. I find that programming languages are insufficiently expressive enough to describe high-level interactions between other processes, network events, and humans.

I'm sorry but I don't see why the TLA+ modeling was necessary. You said that you noticed the lack of coordination before that. From your description, it seems like the mutex thing was a diversion. Anyway, a mutex would not necessarily be adequate for coordination (and many types of mutexes will not work between processes at all).

So to me it seems that you could have gone straight to the lightweight coordination mechanism without the TLA+ model. And anyway, if there was a problem with the mutex, you could test that theory by doing additional logging or an experiment around the mutex functionality.

Sorry for what? It was my first time using such a tool. I found it useful for understanding the system. I've improved my understanding of TLA+ since then and it has been valuable.

The premise that self-taught programmers necessarily have core holes in their knowledge and skills that would have been filled if they had a CS degree is entirely false.

Start with the example you gave of messaging middleware. There are many BS CS curricula that do not address this at all. Also he mentioned that he had already learned about named pipes on his own. For many applications, named pipes could be a perfectly valid alternative to some external message queue system.

Looking at the items submitted, the vast majority are core skills that would necessarily be picked up by people who need to work with them. The idea that someone would not know about Makefiles or debugging or profiling or SQL just because they were a hobbyist or self-taught is ludicrous. If you are serious about C programming, whether it's a job or your hobby, you are going to learn about Makefiles. Likewise anyone seriously working on a data-centric application is likely to become well-versed in some database technology, up until a few years ago that would have automatically been relational.

And one other thing. Some of the most important skills in programming are in the domain of software engineering. Software engineering is very poorly addressed by many BS CS programs. So again, whether they have good SE skills is often not going to be determined by whether they have a BS degree or not. It's not even necessarily determined by whether they are working in a professional environment. It's mainly going to be a factor of their motivation to self learn and above all practical experience.

My experience has been quite different. True that some of the most technically skilled programmers I know of had no degree, but the polished ones, the ones I find easier to work with, tend to have one. Further it's pretty easy for me to tell if a person's degree was a CS degree or not just by talking to someone about the problems they have and how such problems might be solved with code.

That's not to say it's required; some of the best professionals I know have non-CS degrees (one in fine art -- painting) or no degree. But if you're still young, I submit that a CS degree is totally worth your time.

The argument that I was making was clearly stated at the beginning of my comment. It is significantly different from the supposed argument that you seem to be refuting.

Notice how I did not say that "the developers that certain degree holders find to be most 'polished' and easiest to work with will on average not have a degree".

Notice how I also did not say that a CS degree was not worth people's time.

We all have holes in our knowledge.

We all, generally, specialize. I have never worked with 'big data', implemented any part of a commercial web site, or worked on HPC(just to give a few examples). What may be useful to one may be useless to another.

Definitely think about Software Engineering. Maintainability, debuggability, extensibility. Don't get lost in stupid details. Don't nitpick the coding style or try to optimize code that isn't currently meeting the requirements(if there are requirements, and if there aren't you should try to define them).

SQL (even if just SQLite) as databases open up a lot of power.

Vim or Emacs for powerful text editing.

A low level language. Sometimes Python doesn't cut it, or it is pretty suboptimal. If you're writing a trading bot, is speed of execution not important?

Operating System knowledge can be helpful at times. I bought one of the No Starch books "How Linux Works" and it is very helpful.

The command line and by that I guess you should know the common Linux commands (cat, grep, sort, uniq, head, tail, ls, top) if you use Linux and how to chain them together via pipes. To give some context, I can write one command which would require 8 lines of Python (saves you valuable time). If you use Windows, learn enough Powershell to be comfortable with it. On occasion I'll use Powershell over Python even though it is dirt slow for reading files.

I thought this course did a good job at getting SQL to stick in my brain, largely due to the relational algebra section. https://lagunita.stanford.edu/courses/DB/2014/SelfPaced/abou...

The W3Schools site is what helped me


It has a database you can query and add to and what not.

Firmly agree -- I'm pretty sure I've said it elsewhere, this was a transformational course for me.

I got my start in product support where a lot of our problems could be solved in SQL. We were encouraged to learn it, but most people were happy with the most basic syntax for DML. I worked through the entirety of Dr. Widom's course and it gave me the fundamental data understanding to be a real value to the teams I've worked for, and I used that opportunity to transition into being a full time developer.

* Regular expressions. The most valuable "not a language" that I know. * SQL. For me, it was a trip coming from a procedural language background. I kept trying to figure out how to do loops. * Command line - DOS and unix * Batch languages for those. * HTML / CSS / Java (please don't kill me) Script and the DOM

Second regular expressions. I use it all the time to edit large bundles of code.

Value/message/actor/event oriented programming like you mentioned is useful for building distributed systems. I am a huge fan of going a step further and learning this model:

Imperative shell, functional core.

The external shell of code in a project is responsible for network connections, console IO, etc. But the internal guts of a program should be largely functional, that is, instead of mutating (changing) values, consider returning different forms of the same value.

Decisions (branches, logic) are made at one level, data dependencies at another.

The talk Boundaries by Gary Bernhardt describes this model in detail: https://www.destroyallsoftware.com/talks/boundaries

I'm also a fan of "imperative shell, functional core, imperative implementation of that core," where within a function that you promise to have no side effects or very specific side effects (say, a component's render function, or a transformation from one complex data representation to another), you should still feel free to use loops, imperative-style control flow, even network requests, etc. I find this puts folks used to "imperative everything" at ease, while still maintaining almost all of the benefits of functional-core provided that execution is scheduled properly.

It's really easy to do this with pure functional programming in impure languages like Scala. You can use arbitrary impure code within lazy IO values. Outside, the functional purity makes reasoning easy. If, as recommended, there is only a single point in your program where the IO values are actually executed, then the execution order can be reasoned about statically.

I'm surprised nobody's mentioned spreadsheets - specifically Google Sheets or something scriptable and hosted. Recently I've built up a small system which sucks in data from a few places (fitness, task management, calendar,..) and analyses it against several goals I've set. This means I can see how I'm progressing towards what I want without even touching it anymore.

They're a really nice UI for bootstraping projects, and even some small databases. A current project of mine for a client uses sheets as a backing store for an email collection list. Since there's only a few hundred rows, it makes sense at this scale and works really well for non-technical users.

Similarly Microsoft Access. I learned Access for my first job (100% of my job was moving tasks into Access, 3 years later I helped some interns transition to a SQL server+web application). 10 years later I'm still using Access for some aspects of my new job. The ability to quickly prototype something and receive almost immediate value is vastly underestimated (I know of one company which runs all of their tasks out of Access).

I suppose what I've described is just a little more of an unstructured version of this. Mixing data and computation in such a way is definitely a tech debt tradeoff though.

Excel & Google Sheets are probably the best enterprise software development platform ever made.

Your IDE, its refactoring tools, but especially its debugger.

VSCode or PyCharm (assuming you are still a Python developer) could be a good place to start. I'm always surprised when I see professional developers coding in Sublime Text and debugging with print statements (or their equivalent). Usually you have better options than that, especially for statically-typed languages - but even for JS and Python.

I don't know how people could work on large scale projects without a step through debugger.

Honestly I still find a lot of engineers don't know git properly. Like they know enough to commit and push but that's about it. It really helps to understand everything git has to offer.

Absolutely. When I get people into learning Git, I start with the obvious (checkout, commit) move to the essential (branches, forks) and then usually say "When you're comfortable with that, start playing around with rebase. When you get into trouble with that, come talk to me again, because that's the key learning moment."

I've learned more messing up when using fancy git rebase stuff than any tutorial.

O'Reilly have a free book on Git that is amazing. I find this to be the perfect level of detail. Easy to enough to read in a short enough time, detailed enough to grasp the magic under the hood.


Yes, and in the vein of this git became way less scary after learning how it works this course helped a lot (unfortunately paid but a lot of businesses have a licence - https://www.pluralsight.com/courses/how-git-works).

I also use magit (https://magit.vc/) - inside spacemacs (http://spacemacs.org/)

Ha, I just did a talk and asked how many people knew about git bisect. No one.

I found this zine (not free) to be helpful:


The three most important basic git operations to know (in my opinion)

    git checkout -b
    git log
    git rebase -i

I strongly prefer git merge over git rebase.

Using rebase results in a cleaner history and simplified workflow in many cases. However it also means that when you have a disaster, it can be truly unrecoverable. I hope you have an old backup because you told your source control system to scramble its history, and you don't have any good way to back it out later.

For those who don't know what I mean, the funny commit ids that git produces are a hash signifying the current state of the repository AND the complete history of how you got there. Every time you rebase you take the other repository, and its history, and then replay your new commits as happening now, one after the other. Now suppose that you rebased off of a repository. Then the repository is rebased by someone else. Now there is no way to merge your code back except to --force it. And that means that if your codebase is messed up, you're now screwed up with history screwed up and no good way to sort it out.

That result is impossible if you're using a merge based result. The cost is, though, that the history is accurately complicated. And the existence of a complex history is a huge problem for useful tools like git bisect.

I don't know. Since git uses content-based addressing, you can't actually alter any commit, only create new ones. And orphaned commits don't get garbage collected for like 30 days even if you explicitly tell git to clean things up. So, the original stuff will still be there. It might just not be obvious how to access it. Part of the commit is the reference to zero or more parent commit object ids. So, if you find the old commit, it still has it's history intact. `git log -g` is a handy command to see a composite git log that travels across branch changes.

I do get what you mean though. You effectively create new commits with an alternate view of history. I don't get quite why/how that causes a situation in which the code can't be merged? I don't rebase much, I prefer merging. Is there any resource that can explain why rebasing might be dangerous like that?

In general if branches diverge too far then you have difficulties merging no matter which strategy you use and sometimes if it diverges too far it just becomes hopeless. Mostly though if you are working in a team, commit daily and merging/rebasing frequently it should present fairly few problems.

I find I never run the actual command git bisect. I just do `git log --decorate --oneline --graph` and eyeball a good commit to start from and then basically do it by hand using commit messages to aid in making reasonable guesses as to where to try but following the basic binary search philosophy. Works well enough even with a complex history.

Here is an example of how to create a problem.

You rebase your private branch off of a shared master and pull in other people's commits. Someone else pushes out their rebased version using force. More commits are made on top of the other people's commits, including reversing some bad commits. You try to rebase off of the shared master.

In your last rebase, you are trying to replay all of the commits in your history that are not in the remote history. However git does not understand which local commits are from you and not pulled in on the previous rebase. It therefore tries to play them on top of the remote master if it can make sense of them. Which means that you bring back the reversed commits. You might find conflicts in code that you have not touched. You resolve them as best you may. And now you've got the definitive version of what happens, and no way with the screwed up history to figure out why it is going to go wrong. Then you force commit because that is how a rebase flow works..and everyone is screwed.

I agree on branches diverging too far. Merge early, merge often.

If you never run the command to git bisect, you should try it. What it's for is finding the random commit that recently broke a piece of functionality that nobody realized would break. Because nobody realized it, the log messages will say nothing useful. And you don't need to figure out where the change is - just write a test program for the breakage, run git bisect, and look at the offending commit.

> Then you force commit because that is how a rebase flow works

Absolutely not. Force pushing a shared master is probably the worst sin one can commit with git. I guess you already have come upon the 'why' of it.

A "Rebase workflow" works so that devs use rebase to 'move' their work on an updated master after a pull/remote update, resolve potential conflicts locally, and do a fast-forward push to origin/master. This also works on copying work between different feature branches just as well.

Ah OK. Hmm my policy is to always disable force push on master. Force pushing to master should never be allowed.

> Since git uses content-based addressing, you can't actually alter any commit, only create new ones. And orphaned commits don't get garbage collected for like 30 days even if you explicitly tell git to clean things up. So, the original stuff will still be there. It might just not be obvious how to access it.

SmartGit does a splendid job with this. In the Log window where you see all your commits and how they are related to each other, there is a Recyclable Commits checkbox. Turn that on and everything in the reflog shows up in the log tree, just as if it were any ordinary commit. You can right click one of these commits and add a branch there, or do any of the other operations you can do in the commit log window.

Same thing for stashes. Did you know they are just commits too? I didn't, until I clicked one of the stash checkboxes in SmartGit. Then the stash showed up in the tree just like any other commit.

I don't understand why so many developers are resistant to the idea of using a powerful Git GUI like SmartGit. For me it is like having a superpower compared to the meager options the Git command line gives you.

Even if you like the command line, it's not like you have to choose one or the other. You can use SmartGit and the Git command line, whichever makes any task easier for you.

I work on a small open source project. Our master branch is protected from force pushes, and I only use rebase on personal branches for features or bug fixes. It's very nice if I'm planning to merge several commits but I need to fix something in one or more of them.

I agree that rebasing should not happen in the main branch(es) of a repo.

I'll have to add `git reflog` for instances where you've felt that you've completely screwed up something as one can always move back to a previous state. I think this is essential.

A useful one that I'll add is what I call the sword command: `git log -S<word>` This one allows one to list commits that contain a particular change. This has been useful in tracking down old changes

I know you're probably just trolling but I'll take the bait — I'd put all of these six before any of those three:

    git pull
    git clone
    git status
    git commit -a
    git push
    git diff
I mean, the ones you mentioned are pretty useful, but if you don't have a repo in the first place, even git-log isn't going to be very useful; and if you're branching and rebasing, you probably have to commit first.

(I actually prefer Magit, to the point where I sometimes run Emacs just for Magit when I'm using a different IDE.)

I'm not trolling, and to assume that I am doesn't exactly "assume good faith." [0]

The comment I replied to says:

> Like they know enough to commit and push but that's about it.

In that vein, I was suggesting what I consider to be the most basic git commands outside of the "clone, commit, push" workflow.

[0]: https://news.ycombinator.com/newsguidelines.html

Oh, I didn't know you meant to imply that qualifier; I was responding only to what you said, which was absurd, not what you meant, which I now see was reasonable.

Things that really helped me understand git as a tree of references.

Git reset —soft and git reset —hard

Git cherry-pick

Git rebase

Git reflog

Git stash

Understanding this makes you able to manipulate branches and commits like a wizard. Once I learnt those, I can get myself out of the hairiest git problems.

I also find git blame extremely useful, along with code exploration tools like DeepGit [0]

[0] https://www.syntevo.com/deepgit/

`git add -p` changed the way I think about commits

I love that one too! It's great for creating commits that address one specific thing.

When I went from amateur Python programmer to Google Cloud developer support, I remember being completely blown away by technology and design patterns I've never heard of that goes into modern web/enterprise architecture. I had to learn it all the hard way, but these days there are great free (or mostly free) courses you can take to learn this stuff.

For Google, check out the study guides for their certifications, specifically Cloud Architect (basic overview), Cloud Developer, and Data Engineer.


Be sure to follow along with the recommended Coursera and Qwiklabs tutorials and do the exercises. You'll learn about all kinds of neat stuff, like scalable application design, container technology, monitoring+metrics, various types of database technologies, data pipelines (including Pub Sub messaging), SRE best practices, networking+security, and machine learning.

I currently work on AWS, and don't find it a good starting point for diving in to these things quickly, but most companies use it so it wouldn't hurt to learn I guess. I still recommend GCP over AWS to start with, as their technology is far more interesting and focused, and quicker/easier to work with.

A new version of "The Pragmatic Programmer" recently came out. [EDIT: not available yet, only preorder at amazon, beta version available at pragprog.com.] That book is all about tools and methods that a self-taught programmer should look into:


For me, that Amazon page is listing it as a pre-order, without any release date. And all the other versions (Kindle, Paperback) are the 1st edition instead of the 2nd.

Very frustrating, as I considered the first edition to be essential and upon reading your comment, instantly went to purchase the 2nd edition.

Edited to add: Found a date, Amazon is listing it as October 21, 2019.

Sorry, I thought I'd read a review of it already, so just didn't look closely to see it wasn't available yet.

It looks like you can get a DRM-free beta version of the ebook on their website, with free upgrades to published version once it's finalized: https://pragprog.com/book/tpp20/the-pragmatic-programmer-20t...

Learn Emacs. Stick with it to get comfortable. Text editing is one transferable skill that you can reuse again and reuse in different projects and on different languages.

Org-mode in Emacs. My poor man's project management for a number of projects consists of a todo-<project>.org file listing all the planned features, the pending TODO items, the DONE items for the current working release, and release notes describing features and changes for each of the released version. In one place, I have the future features, immediate todo items, current completed item, and the history of all past releases in an org file, making things simple to access and manage.

For theoretical stuff, learn about transaction.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact