This subject is tightly coupled with system modeling, and can touch many different domains:
Operational Amplifier Circuits
Computing Systems and Networks
Atomic Force Microscopy
If you really explore control theory you'll learn when the P or the I or the D is unnecessary. Or when you need a second derivative (not just the first). And it'll often give you closed-form solutions for the coefficients so you're not just blindly hunting around in parameter space.
Very simple rules can create lifelike emergent behavior. This is an example of 2 simple motors using PID to try to target a red dot, in 3D, with gravity on:
I can watch control systems all day.
If you have a (rough) model of the system e.g. a transfer function, you can invert the model to get some initial parameters, and then fine-tune by hand.
If you don't have a model but are able to run tests  to collect data, you can apply any number of heuristic methods to systematically perturb the system to collect data to aid in tuning. Then fine-tune by hand.
Difficult loops e.g. MIMO (multiple input multiple output) loops that are larger than 2x2 where there's interaction between the variables (i.e. you move one variable, others move as well) require more analysis with loop-pairing techniques like RGA (relative gain analysis).
For really difficult loops (or loops that have degraded over time), you can use software like Loop Pro . This only makes sense if you're controlling something valuable and where loop degradation could lead to safety or performance issues. You probably don't need software to tune the PID in your espresso maker.
 Some systems are "open-loop unstable", which means they can go haywire if you perturb too much in the wrong direction. Simply blindly perturbing these types of systems is inadvisable. Process understanding/analysis is needed to avoid unstable situations.
 For really large MIMO systems, PID is no longer the right control mechanism. Advanced control techniques like MPC are used, where a computer repeatedly solves an numerical optimization problem at some frequency to calculate the next control moves. MPC is usually used to control chemical plants with hundreds/thousands of control variables. PID is used in local loops, whereas MPC sits on top as an optimizing layer.
Edit: Also, I learned a lesson about closed-loop systems with the launch (well, EDL) of the Curiosity rover on MARS. It's mentioned in this article  but the trigger to start lowering the rover on the crane wasn't some process monitoring distance from the ground...it actually had no direct notion of that. It just watched the throttle setting on the retro-rockets and started unwinding the winch when the throttles reached a reduced steady state setting...something expected once the descent stage had reached its hover altitude.
 (search for throttle) - https://www.planetary.org/blogs/emily-lakdawalla/2012/070607...
Totally unrelated, but the first time I saw that technique with writing on the glass and then flipping the video L-R was about 4 years ago when F5 did a good video introduction on elliptic curve cryptography 
They (F5) even wrote up the technical details of their Lightboard 
I had the same idea (ie PID loop to control system behaviour) some time ago to control traffic to external system so that it is just at the limit of throughput of that external system. The external system is really badly implemented and we had to put much effort into maximizing throughput.
In the end we are replacing HTTP requests with Kafka topics which let's the other system process as fast as possible without destabilizing. On our end we continuously monitor latency and have flow control based on consumer behaviour.
Took me a while to find the article, but here it is: http://web.archive.org/web/20160828042756/https://www.qualit...
The general idea is that PID controllers make unnecessary adjustments in response to regular, small, expected fluctuations in the input. These adjustments frequently increase variation beyond what a more tempered approach would have.
As Wheeler notes in his article, PID controllers are based on the assumption that calibration/adjustments are free. As any practitioner of SPC knows, adjustments are never actually free in the face of natural variation in the process.
- PID theory, especially pole analysis
- linear quadratic regulator
- digital filter design
- the Hebbian rule of neurons
- distributed consensus: paxos, RAFT protocol, blockchain protocol
- STAR voting
- pretty much everything deep learning, in particular descent algos like Adaboost
- Conflict-free replicated data type (those are neat!)
- Lorenz attractors
- type theory
- category theory
0 - For simple things PIDs work well, but then you don't need an entire course on Control Theory to learn how PIDs work.
1 - Most books focus too much on maths an very little on applications. You'll see the same toy examples repeated
over and over in literature.
2 - For more complex things related to CT that's even worse. I've read some Multi-variable Control Theory books that were 100% maths, 0% on how to apply the goddamn thing you're learning.
3 - Complex CT techniques are often fragile (because you're modeling systems with high order polynomials), so most people just skip to using Machine Learning.
4 - You often need to make too many assumptions about a system to apply CT techniques.
I've actually used CT for a few things related to electronics circuits in my life, but overall I think most courses I took in university (Analog CT and Digital CT, Multivariate and Adaptive CT) were complete overkill and way too much theory without practical insights.
Kalman filters seem pretty useful, especially with phones and stuff having an array of different sensors that can be combined, and probably require a class more than something more simple like a PID controller.
Best approach I found was high-sample-rate LQE / LQR control - the LQE part is a Kalman filter, which is a piece of cake now, but was very hard to achieve with available hardware 25 years ago.
The biggest enemy of simplicity was nonlinearities, and much of the 'art' of precision digital control system designs was in knowing how to linearize them through software tricks and/or adapt the algorithm to provide piecewise linear control.
I had to re-tune one after they replaced the quartz chamber.
Maybe part of the game could be determining where to add instrumentation of various kinds or making changes to the process. Extra fun is when the process behaviour depends on feedstock properties that aren't directly observable or environmental conditions.
This works when the cost of managing the request queue is low relative to the coat of serving the requests themselves (usually true whenever there is some IO involved).