Hacker News new | past | comments | ask | show | jobs | submit login
Lessons I wish I had learned before I started teaching differential equations [pdf] (1997) (williams.edu)
426 points by Tomte on Aug 20, 2022 | hide | past | favorite | 177 comments



One of the most interesting applications of ODEs (in my opinion) is the field of control theory, which empowers autonomous systems/robots/vehicles to guide themselves. Brian Douglas has some fantastic introductory videos on the subject [1].

I don't think I agree completely with section 5 (Forget About Existence and Uniqueness of Solutions). For an ODE in the form xdot = f(t,x), existence and uniqueness on an interval follows if f is continuous in t and Lipschitz continuous in x [2]. This often isn't too difficult to demonstrate for a given system. In addition, if you can't prove uniqueness of solutions then you likely have to treat the system as a differential inclusion [3]. I personally find differential inclusions fascinating, but the math can be much more dense.

[1]: https://engineeringmedia.com/videos

[2]: https://en.m.wikipedia.org/wiki/Picard%E2%80%93Lindel%C3%B6f...

[3]: https://en.m.wikipedia.org/wiki/Differential_inclusion


> 7. stay away from differentials

I always had issues with this. My undergraduate math classes were very formal and well thought, every object was defined, almost all theorems were proved in class.

In particular, we were taught differential equations theories, some general theories like systems of differential equations with constant coefficients, Cauchy's theorem, and a bunch of recipes.

In a separate class, we were also taught differential forms (e.g. a.dx + b.dy being a linear combination of base elements dx and dy - if I remember well).

But then came the physics class, why we could go from dy/dx = f(x) to dy = f(x).dx and integrate both sides remain a mystery. "dx" suddenly became an infinitely small things, when it used to be a notation for a linear form OR a notation for f'(x) === df/dx.

Generally speaking, I had a lot of issues in physics because the maths used in that context were not formally defined and seemed to obey to some unwritten set of rules that you were supposed to find out by yourself during your final exam...


>But then came the physics class, why we could go from dy/dx = f(x) to dy = f(x).dx and integrate both sides remain a mystery. "dx" suddenly became an infinitely small things, when it used to be a notation for a linear form OR a notation for f'(x) === df/dx.

It's the mathematicians who changed it.

Infinitesimals used to be how differential equations were defined as in the first place (by Newton and Leibniz). Physicists still think like that because we are end users of mathematics. So the interface has to stay the same. Old text have to keep working etc. Mathematics for us is not so different to software as an end user.

In the quest for formalization, infinitesimals have been removed and replaced by limits by mathematicians.

And now they have found a way to have a rigorous notation that has infinitesimals again:

https://people.math.wisc.edu/~keisler/foundations.pdf

Your example is explained on page 34.

On the other hand, a notation of mathematics that is incompatible with infinitesimals will not be a success in physics or engineering, for obvious reasons (compatibility; and also by using infinitesimals one almost always gets to the correct result anyway--even in very complicated cases--, and they are way more intuitive than limits).

So there's no problem--it's just that life is short and there's too much to learn.

If you want to prove theorems, by all means, use the most rigorous tools--but to use the result it's nice to have a simple interface.

Mathematics, for us, is a tool to communicate ideas (about the physical world). Overcomplicating it will obscure the ideas.


So this should be explained to the students instead of letting them feeling around in the dark confused.


I just (barely) passed what is known as my university's hardest first year physics course: electricity and magnetism. We didn't use differential equations, but we had the same hand wave "just think about it" relationship with line integrals. The way my professor explained it was "we're engineers, not mathematicians, so we actually know what we're talking about".

What he meant by this is that as engineers, we need to have intuition as to what an integral actually is rather than knowing how to prove things with them. To an extent, this is a good way to go about it. Really it's not at all useful to understand things like delta-epsilon as an engineer because those concepts really have no application. It's much more useful to just gain an intuition as to what you're actually doing when you're taking a line integral, and applying that to, for example, calculating an electric field strength as a result of an object.

Physics teaches physics, not math.


My engineering and physics intuition came from understanding the math behind it. For example, I've seen a lot of handwavy explanations of how jet engines worked, but understanding eluded me until I went through the math in jet engine class.


Can you elaborate? Im and Aero and while there is a defo a bunch of math involved suck, squeeze, bang, blow is pretty much the gist of how jet engines work and I don't find it really all that hand-wavy, its just a funny way of describing gas pressure and velocity graphs.


The hand-wavy explanations never explained why the exhaust went out the back rather than the front and the back.


Do elaborate. The most funky differential I know comes from EE's Maxwell eqns (how capacitors work, and weird ideas about "ground").


The compressor up front produces enough pressure to prevent the exhaust going out that way.

Another way to see it is you cannot start a turbine by dumping fuel in and igniting it. The turbine has to be spun up first, usually with an electric motor. The Me262 engines had an ingenious tiny gas engine in the nacelle which served that purpose, complete with a little handle to start it!

Next time you're on a jet, watch them do the engine start. You can see it slowly gaining speed, then suddenly it dramatically speeds up - that's when the fuel gets squirted in and ignited.

Anyhow, jet engines look deceptively simple. But they are real masterpieces of engineering.


Different strokes for different folks.

Michael Faraday was brilliant and driven, but his intuition did not come from understanding the math as he was famously math-illiterate.

On a personal note, the subject I had the most trouble with was classic electromagnetism (calc-based freshman physics): It was the first time I hit a wall/limit on what I could grasp.

The wall was because I could never couldn’t develop an intuition for the physics behind it.


>The wall was because I could never couldn’t develop an intuition for the physics behind it.

On the off chance that you can understand German, check out chapter 9.1.1 Fluidvolumen und Divergenz (until and including chapter 11 Erhaltung der Energie), in the book Einführung Theoretische Meteorologie by Michael Hantel (ISBN 978-3-8274-3055-7). It even uses a car analogy ;)

You will understand fluid dynamics, and by extension electrodynamics, like you never did before. And it will teach you the physical intuition first.

This book was such a lucky find for me ten years ago.

Equation (11.3) is the fluid analogon to the continuity equation (in electrodynamics that is: ∂/∂t ρ + ∇⃗⋅J⃗ = 0)--and you can see how, physically, you get to it, and what the parts and the whole mean. Lots of diagrams and geometry and simple analogies make that take about 30 pages.


I think you are just reiterating what the person I was replying to said. Makes sense, what I was saying is that it should be explained that there are differences with pure mathematics and that it's meant to be a pragmatic tool, and maybe even some of the reasoning as to why the differences are the way they are.


Did this professor think conservative fields and independence of path were for schmucks too?


It's not possible for a teacher to guess all the possible ways a student might be confused. When the student is confused, the onus to ask questions is on the student.


Everyone in this class was uniquely confused compared to every other class I took. Had a 80% drop-out rate. I don't think it was fully a problem of the students.


It was explained in my physics classes over and over again


As you pointed out infintessimal manipulation in physics produces the correct results most of the time. In fact, it's almost always correct because Taylor Series say so. In the physical world we deal with differentials that are always real. Technically physicists are rejecting the later terms dealing with convergence because for all practical purposes they don't matter.

This frustrates math people like me to no end but I can see why they do that. To a mathematician dx is certainly not a number. It's an infintessimal and therefore not a real number. But physicists can get away with it...their math is simpler.


Thanks for this link. A bit hard to digest more than a decade after I last did formal math but very approachable non the less


> But then came the physics class, why we could go from dy/dx = f(x) to dy = f(x).dx and integrate both sides remain a mystery.

That is just integrating both sides with respect to x, but you're skipping the step that makes that clear.

Start with dy/dx = f(x).

Integrate both sides with respect to x: ∫(dy/dx)dx = ∫f(x)dx.

Doing the integrals produces: y = F(x) + C, where F(x) is the integral of f(x) and C is an arbitrary constant of integration.

Above, the integral on the left doesn't do anything except cancel the derivative. A similar but more substantive method is solving a DE by separation of variables, where the "mystery" is integration by substitution (a.k.a. the chain rule backwards).

Start with dy/dx = f(x)g(y). This is a "separable" DE.

Rearrange to (1/g(y)) dy/dx = f(x).

Integrate both sides with respect to x: ∫(1/g(y))(dy/dx)dx = ∫f(x)dx.

We are integrating with respect to x, but where is the x in 1/g(y)? It is hiding inside the y: y = y(x).

Define a substitution: Let u = y(x). Then du/dx = dy/dx ⇒ du = (dy/dx)dx, and our equation becomes ∫(1/g(u))du = ∫f(x)dx.

Do we really need to introduce another variable? No. We can just use y: ∫(1/g(y))dy = ∫f(x)dx.

We often jump straight to this point which makes it pretty confusing, as it looks like we're doing different things to each side of the equation.


Once you reach the concept of differential forms in physics, a lot of these seemingly "ad-hoc" rules suddenly make sense. The issue is that outside of theoretical physics grad students, few people will ever encounter those, since they basically only allow you to re-learn Maxwell's equations on a more fundamental level. But then again a lot of this directly translates into other field theories, including QFT, so there's that. For undergraduates, this seemingly wrong notation is mostly just taught out of convenience and readability. For example, you can do separation of variables more formally in undergrad notation by simply applying the integral as an operator to both sides. In your example that would be Integral(1 dy/dx dx) = Integral(f(x) dx), which works exactly the same, but the mnemonic "separation" of variables is no longer obvious at all.


Theres definitely a lack of rigor math-wise in physics as a rule. It’s not for everyone.

That being said, that lack of rigor is a useful tool sometimes. The dirac delta “function” (functional?) was described in the world of physics first as the only reasonable derivative of a step function/impulse before it was properly defined in the words of mathematics; just because the equational rigor isn’t there yet doesn't mean the argument isn’t solid or the tool is poor.


It is not the lack of rigor. It is more using Math as tool. If it fit the purpose


I think what is confusing about differentials is that variables are meta-objects. You can apply function to a number, you can apply function to another function and so on but you cannot really apply function to a variable, you apply it to something that the variable stands for.

The introduction of calculus in my university went like this:

At first we introduce derivative f’(x) as a limit [f(x+h)-f(x)]/h. It is a linear function. We call d the function that takes f(x) and h and returns f’(x)*h (or it can be defined by currying, whatever). So now we can write df and it makes sense.

Ok, so far so good. But what is even dx? We cannot just apply a function to a meta-symbol. Variables are not first-level objects, they are just notational fiction. You may say that it is an identity function, but such explanation breaks down when more free variables are introduced.


You're making a good point. However, taking inspiration from differential topology (and differential forms in particular), the notation dx can be made sense of once you interpret x as a coordinate function x: M -> R on a manifold. So in a sense you are replacing the variable x with a function x (whose parameter is usually left unspecified).


Yeah, it’s the standard way to do it. But there is a reason the 7th point is called “stay away from differentials.” It’s cryptic and unintuitive for many people.

I wonder whether name binding concepts from CS theory a la lambda calculus can be leveraged for an alternative formalization.


I think usually slightly alternate definition of derivative is used - https://en.wikipedia.org/wiki/Total_derivative


If f is a function and x is a variable then dx means that you should express f with respect to an input x, so if A is the area of a a circle then what is dA?

since A = pir^2 you have that dA = 2pirdr since A = pi(l/(2pi))^2 where l is the perimeter then you have dA = pi2(l/(2pi))(1/(2pi))dl, so the idea of dA is something related to the diferential of the variable used to define A. The use of the chain rule allow us to operate with differentials. For example, since here r = l/(2pi) we have that dr = dl/(2pi) so that dA = 2pirdr = 2pi(l/(2pi))dl/(2pi) = 1/(2pi)ldl. An finally this implies that dA/dl = 1/(2pi)l


HN formatting has swallowed your asterisks; try writing them as \*


It’s my experience that there are some people who cannot understand calculus, no matter how hard they try. The fact that they do try is often indicative of considerable talents in other areas.


This echoes my own experience of differential equations in physics. I once actually asked my teacher directly about the dx schenanigans, and he basically said 'whatever, it's just what it is'. It left me with a rather sour attitude towards the subject as a whole, the idea that you should just manipulate symbols in some arbitrary way


> I once actually asked my teacher directly about the dx schenanigans, and he basically said 'whatever, it's just what it is'.

I received pretty much the same answer…

Did you study in the Netherlands as well? Your name suggests that.


> But then came the physics class, why we could go from dy/dx = f(x) to dy = f(x).dx and integrate both sides remain a mystery.

This never made sense to me. When I asked about it, I was just told to go with it.


I expect you asked the physicists rather than the mathematicians. The mathematicians might roll their eyes a little at those awful sloppy physicists, but would also be able to explain why it works (in so far as it does) and say a bit about the sort of worries that make mathematicians uneasy about such notational carelessness.

Here's what I'd say if asked.

The handwavy idea is that derivatives are limits of difference ratios, and integrals are limits of sums, and if you let dx,dy stand for the finite changes whose ratios and sums we are working with it makes perfectly good sense to go from dx/dy = f(x) to dy = f(x) dx, and it turns out that when you do all the adding up and taking limits everything still works -- provided all the functions involved are "nice enough", which in physics they almost always are, but the possibility that they might not be is why mathematicians get cross about this sort of sloppiness.

Let's fill in the details.

"dy/dx = f(x)" means: there is some currently-unknown functional dependence of y on x; when you make very tiny changes in x and y consistent with this functional dependence, the ratio of those very tiny changes is always approximately f(x), in the sense that you can force the ratio to be as near f(x) as you like by requiring the changes to be small enough.

Well, if for small changes dy/dx is as close as you like to f(x) then dy is as close as you like to f(x) dx, in a slightly stronger sense: the error divided by dx is as small as you like, even though dx is very small.

Now, let's think about those integrals. When we write "integral dy" or "integral f(x) dx" this is shorthand for the thing you get very close to by adding up lots of things that look like "dy" or "f(x) dx", with the value of x or y advancing in tiny steps from an initial to a final value. (I am talking specifically about definite integrals here, but we can be lazy and not always write down the endpoints.)

In the situation we're looking at, we know that "dy" and "f(x) dx" are always very close to one another: they differ by as small a multiple of f(x) as you please. So when we look at the sums that are approximations to those integrals, the difference between them is as small a multiple of the total change in x as you please. So if we fix the endpoints of the integrals, this means that the sums can be made as close together as you like; so the integrals, being the limits of those sums, must be equal.

But! There's one thing there about which you should be a little uneasy. For each specific (x,y) we can make dy/dx as close as you like to f(x) by requiring dx to be very small. But what if this doesn't happen "uniformly"? I.e., what if the dependence of y on x is "less differentiable" at some points than at others? Then we'd have to make the steps smaller in some places than in others, maybe by an unbounded factor, and maybe that causes trouble. And you'd be right to worry about this. There are in fact possible ways the dependence of y on x could go that fail in just this sort of way. (See e.g. https://en.wikipedia.org/wiki/Volterra%27s_function.) But they require y to be a rather pathological sort of function of x, and if e.g. your function f is continuous then the trouble can't arise.


Perhaps I’m just too much of a physicist, but I’m not sure what there is not to get. Granted, I’m an experimentalist, so that doesn’t help with my rigour. Still, I will encounter students who will have the same concerns, so I’d love to have my blind spot corrected. From my perspective, it’s just two steps: multiply both sides by the same value, then integrate both sides. However, you seem like a smart individual, so it’s probably something more foundational.


I think there's two steps of weirdness. First, many of us see dx/dy not so much as a division of values, but as just some notation for a rate of change strongly linked of f(x). Handling that algebraically seems strange. The explanation of infinitesimals will allow us to sort of begrudgingly accept that such manipulations might be doable, but if we do so we picture dx and dy purely as very small values with a now kind of unclear relationship to f(x).

The integration step then becomes a weird mess. First of all most of us are used to only ever taking integrations with respect to a variable. Except now when we integrate, that 'with respect to' part is somehow already pre-filled by the earlier shenanigans. On top of that, we are somehow now integrating with respect to different variables on both sides which intuitively feels like we just did different things on both sides. And what in the world do our new forms now have to do with f(x)? dx and dy were supposed to relate to a change of f(x), but somehow we just took dy out entirely? At this point we've lost all intuition for what we're doing, and feel like we're just staring at some weird notation hack


Thank you. I’m going to need some time to think about that.


the other part of this is that while 99% of the time, these tricks give the right answer, there are cases where they can lead you to doing invalid things like switching between sums of integrals and integrals of sums (which in pathological cases can change the answer)


Within the scope of calculus and modeling with differentials, this has a fairly simple geometric interpretation. I assume though the author you're quoting means f'(x), though.


This is one of the places where I disagree with the article. As an (experimental) physicist, I've always found the dx notation better to understand. When I asked the professor about how it works, he called it a calculus (as in German Kalkül, set of rules to almost mechanically manipulate a mathematical expression, not to be confused with "Calculus"). Of course there exist proofs in terms of limits, or differential forms. But it is something one has to learn, just like div and curl and so on.

We had very mathematically rigorous math lectures. I was able to do most of the proofs, but failed to calculate even a simple derivative when thinking in differential forms. (Maybe because I failed to see what is a function and what is a variable in the "mathematical" notation and misapplied the chain rule.)

The other thing where I disagree is the part about avoiding word problems. I think the word problems were the only interesing part - how to translate a given physical situation into a differential equation and variables. Solving them is not really interesting - everything interesting for me has been solved before, so I can just ask around the hallway or punch it into a computer algebra system.


One way I explained this the other day, or rather soothed concerns lacking a simpler answer: the notation is designed to (mostly) make you write down things that make sense


I find the following remark by Rota interesting:

Insofar as the Laplace transform goes, two radically different uses of the word “function” are dangerously confused with each other. The first is the ordinary notion of function as a something that has a graph. The second is the radically different notion of function as density, whether mass density or probability density. For the sake of the argument, let us agree to call this second kind of function “density function.” Professional mathematicians have avoided facing up to density functions by a variety of escapes, such as Stieltjes integrals, measures, etc. But the fact is that the current notation for density functions in physics and engineering is provably superior, and we had better face up to it squarely

I've suggested to mathematicians that the notation

  ⌠             
  | f(x) dμ(x)
  ⌡             
in measure theory where μ is a measure should be replaced with

  ⌠             
  | f(x)⋅μ(x) dx
  ⌡             
where the measure μ is treated as a density function. What I've found is a mixture of either not understanding the logic, or the response "What's the point?" So it's interesting seeing Gian-Carlo Rota suggest the same idea AFAICT.

As a bonus, the Radon-Nikodym "derivative" would be written as μ/λ instead of dμ/dλ. The RN derivative then doesn't appear like a derivative - it looks simply like division.


I think I am one of these mathematicians that doesn't understand the logic. How can I write μ(x)dx instead of μ(dx) without risking the confusion that dx is Lebesgue measure? You may have explained this in your other reply, but I don't quite follow.

P.S. Beautiful integral signs.


I'm suggesting maybe writing the Lebesgue measure as

  1
so the Lebesgue integral of a function f becomes

  ⌠             
  | f(x)⋅1 dx
  ⌡             
The logic is that the Lebesgue measure is a density which is everywhere equal to 1. Given a measurable space over \mathbb R^n, I think there is only one such measure.

Another example is that δ(x) in

  ⌠             
  | f(x)⋅δ(x) dx
  ⌡             
represents the Dirac measure.

For producing this ASCII art, I use Sympy. I write for instance

  pprint(Integral(f(x) * delta(x)))


So what does dx mean in this setting then? If the answer is nothing, then let me suggest simplifying your expression to the following:

  ⌠             
  | f(x)⋅μ(x)
  ⌡   
Now it occurs to me that the only problem with this new notation is that you risk confusing which term is the density (especially if there are multiple greek letters floating around). To clarify this potential confusion I have a solution! Add some notation to indicate which is the density:

  ⌠             
  | f(x)⋅dμ(x)
  ⌡   
Wait...


> So what does dx mean in this setting then?

It means a small change in x. It means the same thing in dy/dx.

So you know who GCR was, but this is the first time you're hearing of this...

> Wait...

So your comment's sarcastic. You're a rude anonymous maths fan.


> It means a small change in x.

So here is the issue. "A small change in x" is a concept that is relevant and meaningful for Riemann integration: it represents that the integral is defined as a limit of Riemann sums as the change in x becomes infinitely small.

But this is not relevant for Lebesgue integration! We are not taking a limit of Riemann sums and there is not a limit of change in x getting arbitrarily small. What matters is the measure (and the definition of the integral in terms of simple functions is something entirely different).

So you see using dx this way in the context of Lebesgue integration seems like a potential source of confusion, not simplification.

This is why i ask what dx means. Either it represents the standard Lebesgue measure on R (this is valid and a special case of the notation d\mu(x) since \mu is the identity function), or it is nonsense that potentially confuses concepts from other integration theories.

This is why mathematicians don't like your idea. They encounter many students who are confused about this distinction and don't find that it's a useful simplification.


I think you're the rude person. GP wasn't being sarcastic, they were showing you why they think that taking your notational convention to its logical conclusion, you end up with the notation we have today.


Indeed.


Thanks for your reply above.

I don't think the comment was sarcastic or rude. They are pointing out the following inconsistency: you've basically attached "dx" to every integration sign, making the "dx" essentially irrelevant.

Moreover, "dx" does not mean "a small change in x". "dx" is a differential form; it is in particular the "d" operator applied to the function f : x --> x.

As I revisit your comment, I think the point Rota is making about physics notation -- which I _do_ agree with -- is that one should use density functions instead of measures, in general. So, for instance, using the Dirac "density"

\int f(x) \delta(x) dx

instead of

\int f(x) \mu(dx)

where \mu is a point mass at 0. This happens again in the context of stochastic differential equations, where mathematicians shirk away from writing dB_t = xi(t) dt, where xi(t) is "white noise". One can make sense of this in the sense of distributions, and then everything happens in a nice inner product space. Indeed, the physicists are much more competent at actual calculations, and the density representation of things (e.g., in terms of "xi") is very useful for those.


Please excuse my imposition, as I am a humble programmer who spends his days adding and subtracting 1, and not a mathematician.

Yet, this discussion of the confusion and potential confusion of misinterpreting notation strikes me as something that has long (well, in the sense of programming) been solved in my area with type systems and syntax highlighting.

Do mathematicians not have these tools?


There is no syntax highlighting on the blackboard.

Math notation is not designed. It has haphazardly evolved over centuries. It is not rigorous even though math itself is (attempts to be) rigorous, it is a language as imperfect as its users. But it does its job well enough.


They do! It’s called abstract algebra and it’s very similar to type theory in a lot of ways.

But to get to the rigorous mathematician definition of manipulating dx and dy, it requires a large amount of the machinery from abstract algebra that’s hard to quickly absorb or explain.


Did you mean to write dμ(x) instead of μ(dx)? As a non-measure-literate person, I can understand dμ(x) as the d of density μ(x) evaluated at x. But μ(dx) has μ evaluated at the infinitesimal dx which is very different. Is μ(dx) the correct notation?


What is wrong with the two difference usages of a function? If you treat function as a system that produces an output given an input - does it matter if it is a graph or a probability distribution?


> If you treat function as a system that produces an output given an input - does it matter if it is a graph or a probability distribution?

Because that's wrong. For example, take the Dirac delta function (point mass). You can only integrate it, not evaluate it.

GCR was saying that it's OK to use the same notation for both kinds of function. Mathematicians use separate notations, which is ugly and awkward. Engineers and physicists conflate these objects in their terminology, which is misleading. The right answer is to use the same notation but explain there's a difference.


Now I'm thinking what the type of the Dirac delta function is.

I'm leaning to something like: (X (R -> R) ((R -> R) -> (R -> R))) -> R where we are given a function (R -> R) on which it acts and a linear operation (R -> R) -> (R -> R), like integration, which acts on it and gives us a single real number as the answer.


This is basically correct, the theory of distributions can be presented in terms of linear functionals. A functional is a map that takes a function and returns a scalar. I'm not sure how to encode linearity in your type system though...


>I'm not sure how to encode linearity in your type system though...

You don't really, the same way that a regular function R->R has no specific type information that tells you ti's linear or not, e.g. both x and sin(x^2) have the same type.


What do you mean you can't evaluate it? It evaluates to zero everywhere but the origin.


The rigorous definition of dirac delta is a function that doesn't take points as inputs. It's a function that takes in sets and then integrates over that set to product a value.


This is what is often taught in engineering/physics, but is not correct. One physics professor tried to explain it by pointing out that it only has meaning under an integral sign (not sure it's correct).

The other obvious problem is its value at the origin. Infinity as a value is not something mathematicians like - at least not the way it is used here. And integrating it to get 1 makes no sense mathematically merely by setting its value to infinity.


These concerns never really made much sense to me. 1/x also has a singularity, but as far as I'm aware, it's a perfectly normal function. So why is it suddenly a problem that the dirac delta is zero almost everywhere? I recall that in my analysis class, they wanted to define the dirac delta as the limit of a sequence of functions. That never seemed very different from defining a matrix exponential as the limit as n -> infinity of a taylor series.


> 1/x also has a singularity, but as far as I'm aware, it's a perfectly normal function.

No one integrates it over 0, but they do the delta function.

> I recall that in my analysis class, they wanted to define the dirac delta as the limit of a sequence of functions.

In my engineering courses, they often tried to make it a limit of the "rectangle" function. Consider a function that is 0 everywhere except between [-delta, delta], where its value is whatever is needed to make the area under this rectangle 1. Then let delta approach 0.

This definition still fails using the standard limits/analysis that is taught (convergence problems, etc). In mathematics, the delta function needs the theory of distributions to "work".


Dirac delta being zero almost everywhere is sometimes assumed (when describing infinitely short impulse) but often it is a superfluous assumption, that is not needed for the use of delta. This is because delta is a distribution, not a function. It does not need to be ascribed values.

For example, delta can be put as initial condition for psi function in the time-dependent Schroedinger equation to search for Green's function. Evolution of this initial condition in time is a regular oscillatory function that looks nothing like 0 or almost 0 almost everywhere for any $t>0$. So if it isn't 0 almost everywhere for t>0, why should we impose that at t=0.


I hated this course, and later taught it at Stanford after enough years of real engineering that I didn’t even remember the topics. By that time I had developed my own way of doing everything: linear algebra notation, ODE and PDE solvers, etc. If you start with the solver, it makes more sense. For instance: let’s say you have something like a driven heat equation in front of you. It is a second-order mess, foreign and unrecognizable. Now discretize it. It becomes stupidly obvious: change in heat occurs at a rate proportional to the sum of differences with the neighbors. Okay, now I get it, and I can write out the solver iteration. How can you speed up the solution? Convolution. Derive that kernel from your solver iteration. We can also do that in a transformed space like Fourier. Okay so, now let’s transform the original equations and then use some magic tricks to simplify. Wow, look how much faster our solution is! Does it give the same result as our simple simulator? Great, now we know why we’re learning this.


Sounds great! Any literature recommendations for an approach like that? Online course?


Slick!


> it led me to realize that I had no idea what a differential equation is

I remember how I realized that I had no idea what a differential equation is when reading W.L.Burke book "Applied differential geometry" (highly recommended). Modern differential geometry turned out to be the strict mathematical bedrock for a lot of undergrad math. I was even more surprised to learn that a lot of my former teachers wasn't really aware of that.

In my view, the introductory math courses are supposed to just introduce the basic vocabulary, show the big picture and demonstrate the usefulness of the subject by showing algorithms for solving typical problems. It is not supposed to make sense on deeper inspection - and, I believe, teachers should be candid about it.


An education in geometry and it's prerequisites is a very rewarding journey

F. Schuller's lectures on this (starting with logic ending with applications of geometry to physics) are very pleasant


After reading Strogatz, I view past courses I have taken in engineering DEs as a crime on students and lecturers alike. I know there's a substantial difference between pure/applied maths courses and engineering, but a massive failure to make such a broad subject boring with dry, abstract examples is still beyond what I can forgive. That said, the courses did not traumatise me nearly enough to stay away from DEs which I guess is the only positive take I have at the moment.


Thanks for mentioning Strogatz! I'm watching the first few lectures of his nonlinear dynamics course[1] and love how clear and approachable he makes the subject.

[1] https://www.youtube.com/watch?v=ycJEoqmQvwg&list=PLbN57C5Zdl...


What do you like about Strogatz?


I enjoy the overall fashion in which he presents the topics — clear, motivating example to start with, building on relevant details etc. His textbooks read like a good popsci book, yet they are of course much more detailed. I’m not sure how suitable his e.g. « Nonlinear dynamics and chaos » are for math students, since it feels a bit informal, but for developing intuition, interest and the scope of the subject, I haven’t seen anything better.


Thanks! In particular, which title? I don't see a simply-titled "differential equations" book. 'Nonlinear Dynamics and Chaos'?


Yes, « Nonlinear dynamics and chaos » is my favorite, I think it’s his best book. It has tons of nice examples and gets you a crash course over multiple concepts. « Sync » is nice too, but not a textbook definitely ( I don’t recall if it even had a single equation in it).


Awesome.


For anyone who was subjected to the standard intro ordinary differential equations (ODEs) course and left unenlightened, I’d highly recommend Vladimir Arnold’s book on the subject [1]. It gives a lot of insight into the underlying geometry and can be read profitably by anyone comfortable with Calculus I-III and undergraduate Linear Algebra, and who took intro ODE. It wasn’t until I worked through this that I really got and appreciated ODEs.

[1] https://loshijosdelagrange.files.wordpress.com/2013/04/vladi...


Great writeup

It's amazing (in a bad way) how most professors couldn't care less about how/why they teach something. Didactical thinking is nowhere to be seen. And yes even a lot of youtubers just repeat the same unsufferable self-pleasuring of "solve this problem here like it's the 1900s". Or how professors will throw a fit whenever you solve a limit problem using L'Hospital instead of their farcical show-and-dance about limits.

It's appaling.

Pro-tip: Nobody cares.

> Allow me to state another controversial opinion: existence theorems for the solutions of ordinary differential equations are not as important as they are cracked up to be.

Correct. See point above

> will list a number of disconnected tricks that are passed off as useful, such as exact equations, integrating factors, homogeneous differential equations, and similarly preposterous techniques. Since it is rare – to put it gently – to find a differential equation of this kind ever occurring in engineering practice, the exercises provided along with these topics are of limited scope:

Ugh. This is so Dickensonian it hurts. See the point above about "Nobody cares"

Learn what they care about, learn how to best explain the point in question (it's hard)


> Or how professors will throw a fit whenever you solve a limit problem using L'Hospital instead of their farcical show-and-dance about limits.

They are teachers, they want you to think and practice mathematical way of thinking, using minimal set of assumptions. Exercising this discipline in thought to some extent is useful. Using l'Hospital rule to solve simple problems that are easy to solve from first principles is like taking out a fly with a nuke, when you have nukes. Just because you can, does not mean you should.


"First principles". I mean, besides the whole pick and choose which first principles to teach and which ones to drop (there are a lot), they might not be as useful as one may think

When discussing limits, the tradicional way this is explained is pointless and extra annoying. And then you end up using limits for pretty much nothing after working with derivatives and integrals.

So, again, nobody cares. And maybe someone can explain limits better than the "two dials" analogy because that's too much mathematical theatre that doesn't help when a good explanation is actually needed.


L'hopital*


https://en.wikipedia.org/wiki/Guillaume_de_l%27H%C3%B4pital#...

Though it seems people write it both ways (it is pronounced without the S though)

> In the 17th and 18th centuries, the name was commonly spelled "l'Hospital", and he himself spelled his name that way. However, French spellings have been altered


Definitely not. Either l'Hôpital or l'Hospital


"Why is it that no one has undertaken the task of cleaning the Augean stables of elementary differential equations? I will hazard an answer: for the same reason why we see so little change anywhere today, whether in society, in politics, or in science. Vested interests dominate every nook and cranny of our society, even the society of mathematicians."

How true! I wonder why humans are unable to devise a society where useful change occur swiftly when necessary.


This is one of the most unnecessary aspects of IT in my opinion. I get that there are some things that require so much foundation that explaining them in technical detail cannot be done, but there are too many people in our profession that get a kick out of making things look needlessly hard, when sometimes they can be explained in a sentence so you grok the idea behind it.

Maybe it is also a nerd thing, but I know people who will explain you some minor aspect of the thing in excruciating detail, but totally fail to give you the big idea of what this thing even does, solves, is there for etc. It is the kind of explaination that only makes sense when you are on the same level on that topic as them.

This always makes me angry. I can remember how hard it was to grasp concepts and conventions I use blindly nowadays. But back then I was not stupid, there just was a heap of things no documentation or tutorial would explain, because it would just be assumed. Remembering your own struggles is essential when creating explainations for others.


I just finished writing a comment in another thread (about macOS) on how I'm tired of all the change in the software industry. There's just a constant churn and change for its own sake. UX designers, especially, seem to love continuously disrupting my workflow by rearranging the interface in ways that surprise me and offer little/no obvious benefit.

Here we have the opposite: stagnation, a lack of change. Corruption and nepotism everywhere gumming up the works. To address your point: I believe the reason we cannot devise such societies is because vested interests are more motivated and better equipped to bend the rules to their advantage. That's the whole reason to invest in anything. People want leverage to secure their positions and their possessions against external risks. A society where everyone is equal is like the beginning of a Hunger Games: a deadly rush to either grab a weapon or take cover.


The vi editor was designed for slow terminals and a particular keyboard without arrow keys (ADM-3A terminal keyboard), yet after 46 years there is no shortage of vi fans that insist that modal editing (an adaptation for slow terminals) and hjkl (an adaptation for a keyboard that used to have no arrow keys) are inherently superior for all modern editing needs. :)


Terminals remain dreadfully slow compared to the speed of thought. People prefer vi because it's the best editing paradigm discovered so far which achieves one thing: reduce, as much as possible, the delay between a thought in mind and a corresponding change in the document.

Other paradigms put too many context switches between my thoughts and the changes I want to see. I do not want to take my hands off the home row to reach the cursor keys, or God forbid the mouse. I also do not want to destroy my pinkie finger with endless usage of modifier keys. So that leaves me with vi (and kakoune).


Well, for me, to record my thought as quickly as possible, it is vitally important for the editor to always be in the input mode, I can’t lose time to remember the modal context I am currently in (and emacs is hardly better, as it can suddenly switch its focus to some command buffer, and not come back unless some specific action is taken).


I don’t want to advocate for you to use something that doesn’t fit your preferences but for the record Vim/NeoVim can work that way too.

You can start it in Insert mode and you can have it stay in Insert mode as you run a single command using C-o.

There’s also a plethora of Insert mode ctrl shortcuts which are worth learning in addition to Command mode stuff.

If you’re on the emacs path you'll be able to do all the same stuff, just differently.

The major point is that terminal based software is fully keyboard driven and you can commit the shortcuts to muscle memory and compose them.


That’s interesting, thank you.


Your complaint would make sense if all of my text editing consisted of sequential writing in a stream-of-consciousness fashion. But then, I wouldn't need a text editor at all since I can do that with cat. In reality, I rarely ever do that kind of writing. 99% of the time I am working on some existing document, such as a source code file, and so I need an editor optimized for random access editing. Vim is that editor.

As a long time vim user I am in normal mode all of the time. When I sit down at a vim session I haven't touched in a while I press escape a few times to make sure I'm in normal mode. Then while I'm editing I am continuously typing various motion and change commands, some of which enter insert mode, and then quickly making the change before escaping back to normal mode.

Because I am in normal mode all the time (or perhaps vice versa), I spend most of my time jumping around the file and reading what is already there. Extremely rarely am I looking at a completely empty file where the only useful thing to do is start inserting text. And even then I am often typing some command to pull in text from another file or as the result of a bash command.


> I also do not want to destroy my pinkie finger with endless usage of modifier keys.

This is exactly why I switched from emacs to vim, after a couple of decades of emacs use. I decided I'd put up with slow editing for a few days, as I forced myself to use vim. (It was that or not type for a week or so.) After a few days of frustration, vim started to grow on me. Swearing became less frequent.

A month later, I was perhaps 75% as fast in vim as in emacs, and my hand pain had completely vanished.

And a few months after that, I noticed that I was really quite fast in vim -- certainly faster than I was in emacs -- and that the constant puzzle-solving (finding the action that required the fewest keystrokes) was intriguing.

Using emacs seems to me to be an exercise in remembering, whereas using vim is more an exercise in thinking. Part of the appeal is that this thinking does not interfere with the thinking involved in writing. It's a bit like driving a standard car: the extra cognitive demand (do I need to shift now? are the revs right yet? am I at the biting point?) seems to be handled by a different processor, and having that processor working makes the experience more enjoyable.


> Using emacs seems to me to be an exercise in remembering, whereas using vim is more an exercise in thinking.

This. This right here is one of the best characterizations of the vim/emacs debate that I have ever heard.


Sorry, but HTF did we get from differential equations to eMacs and Vim? (My Twitter personality is showing through.)


I think the reason Vi wins even at infinite terminal speed is that it minimizes the things you need to type to express what you want to achieve/edit to computer. This was motivated by slow terminal speed in the beginning but is also addressing the bottle neck of human computer interface.


Why would anyone use pinkie for modifier keys? Nowadays you can reconfigure positions of CTRL and Meta and Shift to your taste. I happily us Emacs without ever stretching my fingers, because all these keys sit happily under my thumbs (literally).


"First you need to buy a custom keyboard that puts all the modifier keys under your thumbs" is not a very enticing pitch for a text editor. Doubly so for those of us on laptops where the idea of carrying around an external keyboard is a non-starter.


In general, most people hate change. We have divided life, knowledge, every aspect into little known boxes. Change requires us to work, adapt to the alteration. Perhaps reconsider our viewpoints, re-actions and even occupation. Change happens more typically when the new idea slowly seep into main stream society and old ways recede as people retire (or die). Modern technology has vastly increased the rate of change (ideas, lifestyles, jobs etc) by making the transmission of idea much easier and faster. But this places a high demand on anybody reading the news (drinking the firehouse of the internet) and trying to keep up.


And in addition to the fight against vested interests, we have to invest a lot of energy against stupidity and to jump through artificial loops set up by lawyers, administrators and managers. Which leaves so litte to actually do something!


Because people can’t agree on which change would be the most useful (we have limited capability to change things, regardless of how easy the change would be), nor when it is necessary.


> wonder why humans are unable to devise a society where useful change occur swiftly when necessary.

It does within free markets. Universities and professiorial societies are notoriously elitist, exclusive and x-poly


The only reason for scientists and engineers (i.e. non-pure-mathematicians) to learn ordinary differential equations is so they can move on to partial differential equations. This is because:

> "Most of the natural laws of physics, such as Maxwell's equations, Newton's law of cooling, the Navier-Stokes equations, Newton's equations of motion, and Schrodinger's equation of quantum mechanics, are stated (or can be) in terms of PDEs, that is these laws describe physical phenomena by relating space and time derivatives. Derivatives occur in these equations because the derivatives represent natural things (like velocity, acceleration, force, friction, flux, current). Hence, we have equations relating partial derivatives of some unknown quantity that we would like to find."

For the vast majority of people working with DEs, they'll have two major goals: (1) Take a physical problem they wish to model and construct (formulate) the PDE (the actual mathematical model), and (2) Solve the PDE, taking into account the initial value and boundary conditions.

Going through a introductory course in ODEs is a valuable process because a common method of solving a PDE is reducing it to an ODE. However, with real-world problems, the actual approach will almost always be computational:

> Numerical methods: These methods change a PDE to a system of difference equations that can be solved by means of iterative techniques on a computer; in many cases this is the only technique that will work. In addition to methods that replace PDEs by difference equations, there are other methods that attempt to approximate solutions by polynomial surfaces - spline approximations.

A related common approach are perturbation methods, which transform a nonlinear problem into a series of linear ones that approximate the original nonlinear equation, and to which in turn numerical methods can be applied. Wikipedia has a good writeup:

https://en.wikipedia.org/wiki/Perturbation_theory

Quotes from a very good source: "Partial Differential Equations for Scientists and Engineers, Stanley J. Farnow, 1993, Dover"

Pure mathematicians tend to oppose the introduction of approximate and computational methods early on, but there's no reason they can't be taught side-by-side with the traditional 'exact solutions' material, and it would probably greatly interest and benefit students.


As an engineer, approximate analytical solutions are often so much more valuable than numerical solutions. I wish approximation methods were taught more. They allow playing "what if?" exercises much more quickly, or finding important scaling laws.

For example, the Tsiolkovsky rockey equation says that delta v goes with the log of propellant mass. That's so useful to an engineer, to quickly estimate feasibility before diving deep into simulations! Or, the stiffness of a beam goes as thickness cubed, divided by length to the fourth power. Or the resonant frequency of an oscillator goes as the square root of 1/LC. Relationships like that are immediately useful for predicting problems and understanding what a solution looks like.

Of course sometimes numerical solutions are necessary. But usually, you run numerical simulations after most of the major design parameters are already dialed-in, and you just want to refine the solution or predict its behaviour. Searching only numerically for an optimum design is kind of a blind search, and no matter how fast your simulation, it gets increasingly difficult in higher dimensions. L

And it often needs to start all over again when the boss says "what if we double the payload? How much will that cost us?"


As I read, I begin to share your blame.

That what looked so promising in the 19th century really should have been immediately quashed in the 20th when computing theory reared its head.

The types of assertions Rota chides -- must have a closed form solution, composed strictly of non-infinite, occasionally wonderfully simple solutions, and asserting that they must exist through deductive route -- all hallmarks.

I thought pure mathematics was over?


I loved my differential equation course, and it was a community college course I took senior year of high school. One of the chapters in the book was using diff eqs to solve physical systems and it was an eye opening experience. For a hobby project, I'm writing a video game that requires some unusual behaviors so an off-the-shelf physics library won't cut it, I have to roll my own and most of the math I'm using is directly from that course.

One additional note, the author mentions not being able to motivate the Laplace transform. The Laplace transform is really critical when designing robotic control systems. It's to the point that in diagrams, one usually draws "1/s" instead of an integral sign.


> The Laplace transform is really critical when designing robotic control systems.

It's used all over engineering, but its usage doesn't suggest how Laplace magically came up with it.


Why are we teaching sums and multiplications? After all, computers can do that these days. /s

The complaint that we're teaching something that we never see in engineering, like homogeneous equations is preposterous. It's like saying why teach fractions when in engineering you always see reals. If you want more advanced differencing equations, you can do a more advanced course or self study more advanced material. I have a guy in my company who has a PhD in a very niche topic in differential equations and he can actually solve real world differential equations.


Arguably the emphasis on fractions in primary education should have gone away as soon as calculators became common. They are pretty useless except in a few special cases.

Put another way: if we had had calculators all along, and someone had recently proposed teaching fractions as important ways to represent (some) real numbers, would that idea have gotten any traction at all in pedagogical circles? I don't see how.


Oh my. I'd rather tell myself that you're trolling, than lose my last scraps of faith in humanity.


What would be a good argument against the point that we wouldn't bother teaching fractions if calculators had always existed?

Math education is worse than religion when it comes to blind adherence to tradition. A lot of kids who learn calculus in high school should be learning statistics instead, for instance. And a lot of time in engineering school is wasted on worthless analytic techniques that should be spent on numerical methods that the student will actually use. Never mind the holy wars about how linear algebra should be taught. Compared to the ferocity of these debates, fractions seem like low-hanging fruit with few arguments in their defense.

Classroom time is limited, so we need to make every minute count.


> What would be a good argument against the point that we wouldn't bother teaching fractions if calculators had always existed?

Well perhaps the fact that we wouldn't know how to divide 1 unit into 3 equal parts?

And if you're going to respond that an approximation is always good enough "in engineering", how would you estimate the amount of error you're introducing if you don't know what fractions are?

> Math education is worse than religion

Another more recent religion is that of people who defend that you should only learn the narrow set of things that have immediate application in the work place (these people also forget that there's many different work places with different demands. These people normally have dumb jobs). If you learn only things that have evidence application, nothing new will ever be invented. Math education has a lot of problems, but learning fractions isn't one of them.

Learning LESS is never the answer. I'm being very honest with you when I say that someone arguing we shouldn't learn fractions might be the absolute dumbest thing I've ever read on HN.


(1997). Discussed here in 2017: https://news.ycombinator.com/item?id=15163979 and previously in 2016 (different link): https://news.ycombinator.com/item?id=11207183


I took 18.034 at MIT, just like all Physics majors. It was absurd, stupid, and the most boring class I ever had. So after the first 3 classes, I never went. Two weeks before the final, I holed up in the Student Center Library and did all the problems. I got like 60 on the final, but the average was 40 so I aced the course.

It was criminal (I am less diplomatic than Rota, who by the way, I had had for 18.02, the previous course) not to emphasize differential equations with constant coefficients. I had to take the graduate course in signals and systems (from the famous Communist Jack Kurzweil, no less) at San Jose State to fix this deficiency. After 35 years as a an optics and laser engineer (but most successfully as a salesman), here is what I would teach:

1. LDE’S w constant coefficients, up to systems, poles and zeroes, with the last week on simple control systems and the last lecture on “e to a matrix.” 10 weeks of a 15 week course. I should add that almost every really successful experimental physicist I have known has had to build a control system from scratch to solve a critical problem in his rig. 2. Numerical methods of solution especially Runge Kutta integration. Most scientists and engineers will encounter at least one situation where the company tools won’t be trickable into providing a solution. 3. If there is any time left, or perhaps in an honors class, do something with the Schrodinger equation.


So, what's interesting about differential equations for someone who doesn't find analysis or physics particularly compelling? Say, a computer scientist with a greater love for discrete math?

Linear differential equations are a great introduction to eigenvalues. There's a chicken-and-egg problem understanding this, and it would be helpful to take ODEs and Linear Algebra together.

As I first read in Feller, generating functions are a formation that runs the full length of mathematics and beyond; the Laplace transform from ODEs is just a continuous version of a generating function.

As Marcel-Paul Schützenberger first made clear, the notions of "rational" and "algebraic" from pure mathematics match up with finite state machines and push-down automata from computer science. One studies these most easily using generating functions. As Richard Stanley explains in Enumerative Combinatorics Vol. 2, there's a third class of generating functions beyond algebraic: D-Finite. Roughly, this corresponds to how one learns to expand the solution to a differential equation in ODEs as a power series. The true significance of this class in both pure mathematics and computer science is poorly understood, but there's something clearly out there as significant as rational and algebraic. That is intrinsically fascinating.

When I see someone filling a blackboard with equations, I see someone flailing because they don't know what to do. The best math is simple, and can be expressed as philosophy, like recognizing this formation crossing the sky.


For those who don't know, the author was one of the best combinatorialists ever.


He was also a very good lecturer: I took a probability class with him (he had stopped teaching the DiffEq course by the time I went through.)


Lucky you!



When I was in college more than 50% of the courses existed for historical reasons only; not that I never used anything, ever, from it but some continue to exist today, decades later. Higher level mathematics in computer science is an exception in the life after college, so teaching it to everyone just because 1 in a million may ever used it is a waste of time, money and brain cells.

Note: back then in Eastern Europe some college-level math was done in the last 2 years of high school, so what is considered "higher level mathematics" is a subjective term, please take that into consideration when reading the paragraph above.


I was a curve-breaker to the downside, I became convinced since that math is exclusively about plug-and-chug formulas, and that each course would fling these at you. Now I better understand that it's a story about problem solving.

I think organizing the course around the complexity of the differential model under inspection is a bad principle.

The course should be organized around broadest-applicability solution methods to narrowest. Something like early numerical-methods. This might help lift the veil.


The author is spot on. Linear systems show up everywhere from EE to controls. I've never ever ever had to use a Wronskian, ode with non constant coefficients. Separation of variables is also a fundamental. The only thing I disagree with is what he says about Bessel functions, it's nice to have some exposure to them.


How do Electrical Engineering courses in college deal with this? I think differential equations and the Laplace transform is something they learn on the firt semester or year. The dropout rate must be high unless they have found a creative and fun way to teach this stuff.


As an undergrad in EE and CS this was a bit of a mixed experience to me. Laplace transform was presented as kind of a nice hack, but I think the actual development of intuition is delegated to non math specific (engineering) subjects like for instance, Field Theory, Circuit Theory, Signal theory or Numerical methods. It seems that it’s expected that the students just blindly learn the basics of DEs which serve as one of sieves for non-math inclined students (here I may controversially say, rightly so!). You’re simply to swallow bunch of dry scones aka theorems and only then learn how to sue them.


There is zero intuition about the Laplace Transform. It just is. Like English or Chinese.


I really would like to learn Rota's thoughts on Arnold's book on ODEs. I love that book, but I cannot recommend it easily since it has little overlap with our undergraduate curriculum DE course (in a physics department).


I was fortunate to have problem solving based DiffEQ classes as a freshman, and my prior experience in programming meshed well with differential equations.

My first published Android app was a differential equation visualizer:

https://play.google.com/store/apps/details?id=simplicial.sof...


I loved diffy-q. It made the math real for me, much like physics did.

It was still a hard course, but worth it. Now, if only I remembered some of it...


My differential equations course was one of my worst undergraduate experiences. Somehow I still passed.


My major did not require DE, but I took it anyway as a senior--mostly to test my mettle. Most of the time, I was just like why won't they just let me solve this via the Trapezoidal Rule (or similar, my memory is hazy on this), or Monte Carlo methods.


I failed my university differential equations course :D

I did a lot of homework, but none of it "stuck" and most of my classmates were just memorizing everything which didn't work for me. The pedagogy needed to be updated for sure.


People have mentioned the use of differential notation in differential equations. It's confusing.

In teaching calculus we still use both the prime/Newton notation (y', y", and so on) and the fraction/Leibniz notation ("dy/dx", etc.) for derivatives. They're both useful for writing and for visualizing how things work, and both are used outside math (e.g. in physics and engineering). "dy/dx" was originally thought of as the quotient of two infinitesimal quantities --- arbitrarily small but (not necessarily) zero.

Limits became accepted as the foundation for calculus, and infinitesimals were regarded as heuristic or non-rigorous. But if you're careful, "you can get the right answers", and Abraham Robinson's discovery of nonstandard analysis (around 1950 - 1960) showed that infinitesimals could be put on a rigorous basis. Besides Robinson's "Non-standard Analysis" and Jerome Keisler's (now free) calculus book someone else mentioned (and the supplement which provides a lot of the theory), people who are interested in this can check out "Infinitesimal Calculus" by James Henle and Eugene Kleinberg (https://store.doverpublications.com/0486428869.html - a short introduction) or "Applied Nonstandard Analysis" by Martin Davis (https://store.doverpublications.com/0486442292.html - it starts with the logic and set theory and like Keisler's supplement is fairly theoretical).

Why aren't infinitesimals used in teaching calculus? After all, we can now make them rigorous, and the notation fits well with differential forms, which are used in differential geometry. Keisler's book (which I think came out in the 1970s) was an otherwise traditional calc book which made the attempt, but there is just too much inertia behind the limits approach. Some people were bothered also by the fact that infinitesimals are definitely non-constructive; the Bulletin of the American Math Society oddly had Errett Bishop, a noted constructivist, review Keisler's book when it came out, and the review was not very positive. (I don't know why they picked him as a reviewer.)

But there's still a problem - you're defining the derivative as a limit of difference quotients, but it sure looks like a fraction. What to do? Calc books often fudge this in the following way. They define the derivative f'(x) (note the prime notation) as the limit of [f(x + h) - f(x)]/h as h goes to 0. Then "dx" is regarded as synonomous with "Δx", and both are considered to be simply increments/changes in x --- no infinitesimals. (So, e.g., dx could be 0.3.) Then you define dy to be f'(x) dx., so of course dy/dx = f'(x). In this interpretation, "dy" and "dx" are increments in y and x as you move along the tangent line to the curve. Rather than refer to these as "differentials", calc books will often put this in a section called "linear approximation", which usefully discusses using the tangent line to a curve to approximate the curve.

I think it's good to bring up this history in calc courses - I always tried to.

Math is rigorous and proceeds according to logic (that's the aspiration, anyway), but the way math develops and is taught is a social thing.


The differential equations course was one of the most baffling experiences I ever had. The professor on the first day told us the course would be rote, as opposed to proofs, and every day he copied the methods to the chalkboard. He specifically instructed us to copy them verbatim into our notebooks. In this way there was not a lot to discuss, and from time to time the professor would gently steer us back to merely copying and memorizing the methods, even though no one had questioned him out loud. The methods were entirely disconnected. I had no indication of how they were derived or what the original motivation might be. What a differential equation was or why I wanted to "solve" one---this generated a second equation---was a mystery. None of the problem in the physics sequence looked like this. The engineering students claimed to have them, but reminded me that "this was all done by computers now." In the textbook there were no word problems, only formulas, and so I was never able to infer what this might all be about. The problems gave no opportunity for insight beyond recognizing the form. On the homework I manipulated one formula into another. On the test I did the same thing. Through memorization, I got an A in the course. I never encountered a differential equation before or since.


> The methods were entirely disconnected.

This is standard an unavoidable. There are like a dozen of tricks that solve a few special cases, and they were found after heroic brute force search in the void. (The real fact that is somewhat hidden is that most differential equations can't be solved analytically. You solve analytically only the few cases that are solvable analytically, otherwise you just get a numerical solution or an approximation.)

> In this way there was not a lot to discuss, and from time to time the professor would gently steer us back to merely copying and memorizing the methods, even though no one had questioned him out loud.

That's a horrible way to teach.


I phrase it as “The class I cheated my way to an A but didn’t commit any academic dishonesty violation.”

> The real fact that is somewhat hidden is that most differential equations can't be solved analytically. You solve analytically only the few cases that are solvable analytically

I discovered this very early in my semester of Differential Equations. We were allowed a single 8.5x11 notesheet for the exams. As there were only a handful of the “most general” cases which are solvable on paper with a basic calculator, I simply copied the step by step solution for each of the very most general case completely worked out in whatever techniques we were going to be tested on for that exam.

The professor was an engineer before becoming a math professor so he only liked to include real-world situation ODE’s on exams which further reduced the potential problem space.

While it greatly confused the professor/grader who scored my exam that I kept adding zero-coefficient terms before solving the differential equation perfectly…I got 100% on all the exams.

The catch was that I didn’t learn anything. The next semester it turned out that I needed to know those techniques for Reaction Kinetics and Heat&Mass Transfer and Biochemical Engineering (these courses involved deriving and solving many equations from first principles).

I had to crawl back to my Differential Equations professors office hours for 3 weeks and beg him to actually teach me differential equations. He was very confused after asking me what grade I got (an A) and I had to explain to him how I got an A without learning anything.

To his credit, he did a fantastic job assigning me custom work for 3 weeks and reviewing it with me and I was able to learn what I needed for the more advanced courses.

But without his help and some additional tutelage from my peers, I would have been completely screwed for the rest of my Chemical Engineering major.


Well put. Me too.


> most differential equations can't be solved analytically.

Exactly, so why don't they teach the numerical analysis for actually solving PDEs that matter? These are equations that are very highly relevant to a wide array of real-world science and would be extremely beneficial for many people to know, even if (like calculus or even algebra) most people may not need them later.

I ended up wandering into a career where I work with PDEs nearly every day in some form or other, and would have greatly appreciated some basic training as part of my formal education.


Luckily there are many interesting examples that can be solved. In particular the linear differential equations. Many equations can be approximated by a linear version of it.

Also, in Physics, a lot of ODE are mysteriously integrable if the variable is x instead of t. (One reason is that it's easy to measure the force/fields, but the "real" thing are the potential, so you are measuring the derivative of a hopefully nice object.)

Also a lot of the theoretical advanced stuff to prove analytical solutions and to estimate the error in the numerical integrations use the kind of stuff you learn solving the easy examples analytically.

And also historical reasons. We have less than 100 years of easy numerical integrations, and the math curriculum advance slowly. Anyway, I've seen a reduction in the coverage of the most weird stuff like the substitution θ=atan(x/2) (or something like that, I always forget the details). It's very useful for some integrals with too many sin and cos, but it's not very insightful, so it's good to offload it to Wolfram Alpha.


Hmm, maybe. How would that impact the larger curriculum? Are you thinking a new class, or just change how differential equations is taught?

I think there is a little bit of an annoying situation where at least Electrical Engineering students are going to want Differential Equations pretty early on as they are pretty important to circuits (IIRC, I don't touch analog stuff anymore). Like maybe as a first semester 200 level class. This doesn't afford space to put a Linear Algebra class in beforehand (needed for numerical analysis).

Maybe the symbolic differential equations stuff could be stuck at the end of integral calculus, but

1) curriculum near the end of the semester is risky (students are feeling done, and it can suffer from schedule shifts).

2) Transfer students or students who satisfied their calc requirements in highschool (pretty common for engineering students) wouldn't be aware of your curriculum changes.

Or, a numerical-focused PDE class could be added to elsewhere. I bet most math departments have one nowadays, but as an elective.


They do, but if you want to solve PDE numerically instead of DE analytically, you should enroll in the “Numerical Methods for PDE” course instead of “Analaytical methods for DE”.


They do, but they’re fairly advanced level courses. For e.g. if you go down the Theoretical Physics or Applied Maths routes you’ll do perturbation theory and asymptotic analysis, probably in Master’s level or grad school.

Most people will do some computational courses that at least have them solving basic PDEs in their first or second year of undergrad now.

(This reflects the state of those in the UK at least)


I had (aerospace engineer) two semesters of Numerical Methods. Then we had more specialized things like Finite Elements Methods.

I wish we had been taught how to use a Computer Algebra System (example, Mathematica or Maple).


“Unavoidable” is a bit too strong I think. For an ODE course, what does the usual list of elementary methods really include?

- Separation of variables. If one is fine with differentials (or their modern cousins differential forms), there isn’t much to explain here.

- Linear equations solved with quasipolynomials. The only ODE-specific observation is that d/dx in the ( x^k e^x / k! ) basis is a Jordan block; the rest is the theory of the Jordan normal form, which makes interesting mathematical points (an embryonic form of representation theory) but exists entirely within linear algebra (even if it was motivated by linear ODEs historically).

- Ricatti equations. Were always a mystery to me, but it appears they could also be called “projective ODEs” to go with linear ones and have pretty nice geometry behind them (even if, as you said, they were first discovered by brute force search).

- Variation of parameters. Despite the mysterious appearance, this is simply the ODE case of Green’s method beloved in its PDE version by physicists and engineers. (This isn’t often included in textbooks, in fear of scaring students with Dirac’s delta, but Arnold does explain it, and IIRC Courant–Hilbert mentions it in passing as well.)

- Integrating factors. Okay, I can’t really explain what that one means, even though it feels like I should be able to.

Not that teaching it like this would make for a good course (too general, and ODEs ≠ methods for solving ODEs), but that’s essentially it, right? There are certainly other methods you could mention, and not unimportant ones (perturbation theory!.. -ries?), but this basically covers the standard litany as far as I can see. And it’s no haphazard collection of tricks—none of these is just pulling solutions out of a hat.

(In the interest of changing things up and not spending an hour on a single comment, I will omit the barrage of references I’d usually want to include with this list, but I can dig them up if somebody actually wants them.)


I was thinking in something similar. (I have no idea what a Ricatti equation is https://en.wikipedia.org/wiki/Riccati_equation ) Perhspas I should have said "half a dozen".

Here the first ODE course is half a semester. If you spend a week or two proving existence and unicity, you get one week to study each method and make a few examples and then you must change to next week trick.

Fourier/Laplace and other advanced stuff are in a more advanced course.

I never used perturbation theory for ODE. I've seen it for solving eigenvalues/eigenvector of operators in QN. But perhaps it's one tool I don't know.


Around 50 years ago, I was a math major and consequently required to take a course on differential equations. For perhaps the first time, I was taking a class on mathematics that just didn't click for me; nothing was intuitive. The class was just a big bag of complicated tricks. Each category of differential equations covered in the course had its own special trick. There was no general or universal approach to solving a DE; one had to recognize a particular problem was a DE from a particular category and apply the special trick to solve it. The methods were often long or complex and they only solved some differential equations. Many differential equations have no trick at all to solve them. It turned out to be a tough class for me because I'd never before needed to learn math purely by rote.

Those that have taken Integral Calculus may be thinking that solving DEs sounds akin to integration where one may have to apply substitutions, integration by parts, trigonometric substitutions, or partial fractions. Yes Calculus requires learning a bag of trick too, buts its a small bag of simple tricks with wide applicability. So many of the functions one needs to integrate succumb to this small bag of tricks that it's almost fun to hone ones technique. A class on elementary differential equations is just depressing.

To be fair, differential equations are important. Physical phenomena are often best described by differential equations. Fortunately, programs like Mathematica can be used to tackle real world differential equations one way or another (perhaps with numerical methods) to obtain solutions.

I was fortunate to have my Probability course (sadly, not my differential equations course) taught by Gian-Carlo Rota.


Math was all easy peasy for me until I took a course on differential equations. For the first time I couldn't just visualize the problem and due to a lack of discipline on my part I dropped out.

I've been coding for years and have been able to fake it with my limited math education but would love to have the time to learn more for the sake of understanding.


Unfortunately it seems this way with a lot of higher level math and it's not really unique to differential equations. The difference is, unlike calculus in general, in diffeq you have actual rote formulas to solve most of the known solvable cases.

I found the classes to be rote. The derivations are truly non-trivial. The book Ordinary Differential Equations by Arnold goes into more detail. Basically if we taught the reasons we'd require everyone to take analysis and differential geometry to truly understand how they work. Given the MAJORITY of students in diffeq are engineers and not math majors 99.9% don't want to know and/or don't care about this detail. You see a similar occurrence in calculus where you're basically told "dont think about it too hard" for your own safety. If you start wondering a little too hard about calculus you end up switching majors to math and taking two semesters of real analysis. It's also EXTREMELY common for engineering professors to teach differential equations rather than math professors. This further waters down the rigor because (obviously) an engineer will not know/care about the rigor. Part of the reason I've pursued a math degree is because there was so much handwaving in engineering/computer science it became just an extremely annoying grab bag of math tricks and I wasn't satisfied.

To me we have too many inter-dependent classes to teach each class with full rigor. As a result you end up with a collection of half-understandings for most of your undergraduate career and only if you take a math major itself (or a minor in math) will you actually unlock the other half. A better path through math might be basic algebra I, II-> geometry -> trig -> abstract algebra I+II -> analytic geometry -> calculus I, II, III -> real analysis I+II -> differential equations I+II, but this would basically make every degree a math degree. What you experienced is the compromise.


A class taught like this for me was what got me to quit physics and switch to CS.


And why it took a long time for back propagation to be introduced into machine learning..

Back propagation is (almost) just a fancy word for differential equation, with derivative relative to the error in the output against your training data.


As someone who's starting to learn a bit about machine learning, it feels like the whole field is full of fancy terms like this that seem to mostly map to simpler or more familiar ones. "linear regression" instead of fitting a line, "hyperparameter" instead of user-provided argument. Half the battle seems to be building this mental translation map.


You are looking at it from a programmer standpoint rather than a mathematical standpoint.

Linear regression isn't just fitting a line, it's a statistical technique to fit a line of best fit. Hyperparameters are a bayesian term for parameters outside the system of test or "algorithm". User input really misses the bayesian aspect.

These terms actually have meaning so I'd be careful ascribe simpler definitions. The underlying meaning is important to the reason they work. If you don't have a really strong background in probability theory and statistics trying to dig into machine learning will take work. Id recommend taking an MITx course or picking up a textbook on probability so the terminology feels more natural.


To be fair, "linear regression" is standard statistics 101 that much predates machine learning or computers.


A user-provided argument could also be an input parameter or a regular function parameter altogether.

Yes, hyperparameters are often set by the user of a model, but more specifically they are parameters that exist separately from the data put into a model (input parameters) or the structure inside of neural networks (hidden parameters). Hyper- meaning above, helps conceptualize these parameters as existing outside the model.


Actually, backpropagation is more of a fancy word for the chain rule.


ALMOST like using the chain rule

Backpropagation ≠ Chain Rule: https://theorydish.blog/2021/12/16/backpropagation-≠-chain-r...


That's just nitpicking, but ok: backpropagation is the application of the chain rule for total derivatives.

Look into forward- vs reverse-mode automatic differentiation, and you'll understand what I'm referring to.


Yes, backpropagation isn't the chain rule itself, but just an efficient way to calculate the chain rule. (In this respect there are some connections to dynamic programming, where you find the most efficient order of recursive computations to arrive at the solution).


I think of it as: computing the chain rule in the order such that we never need to compute Jacobians explicitly; only Jacobian-vector products.

I also didn't totally grasp its significance until implementing neural networks from matrix/array operations in NumPy. I hope all deep learning courses include this exercise.


Yes, they are not the same. The chain rule is what solves the one non-trivial problem with backpropagation. Besides that, it's just the quite obvious idea of changing the weights in proportion to how impactful they are on the error.


Is that why it took long? I was under the impression it was because of diminishing gradients in backprop once you stack a huge amount of layers (the deep in deep neural networks).


Could you please forward me to a resource that explains this connection?


The reverse mode has famously been re-discovered (or re-applied) many times, for example as backpropagation in ML, and as AAD in finance (to compute "Greeks", ie partial derivatives of the value of a product wrt many inputs).

A few resources here:

An overview, with a bias towards finance: https://informaconnect.com/a-brief-introduction-to-automatic...

On the history: Andreas Griewank, Who Invented the Reverse Mode of Differentiation? https://ftp.gwdg.de/pub/misc/EMIS/journals/DMJDMV/vol-ismp/5...

On the history of back propagation: https://en.wikipedia.org/wiki/Backpropagation#History

The article that introduced it to finance: Michael Giles and Paul Glasserman, Smoking adjoints: fast Monte Carlo Greeks https://www0.gsb.columbia.edu/faculty/pglasserman/Other/Risk...

Survey of the application in finance: Cristian Homescu, Adjoints and Automatic (Algorithmic) Differentiation in Computational Finance https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1828503


It was in one of the fast.ai courses, I think where Jeremy did back propagation using Excel

https://www.fast.ai/

Could be that someone else here remember the exact video


Hope you don't mind me plugging my blog post, that covers chain rule -> autodiff -> training of nn. https://sidsite.com/posts/autodiff/


Absolutely not. Thank you for sharing.


I'm temporally leaving my degree on Telco/EE due to this. I have passed and done well in all the subjects but those "memory-heavy" math, and that's what I have left.

We have to memorize a lot of information without the explanation about why is done in that way (due to the lack of time in those subjects), and also we are more encouraged to study how previous year's exam were made than the content itself. This one of the big reasons only 10% to 15% (IIRC) of the enrolled students pass those exams every year.

That scene, knowing that I have to do a task that is time consuming, pretty hard, artificial, and useless for the rest of my academic life, my work life, or my life in general, is what made me leaving this year. I don't have enough mental health to do such a big thing.

PS: Sorry for the rant. I'm having too much time at home due to COVID and maybe wrote too much.


Sounds like you made a good decision.

I am middle aged and completed my EE degree when I was 20, but it was 90% theory with very little practical use (mostly useful if you were to continue climbing up the education chain). Completing the degree made me despise working with electronics, a topic I had deeply loved and had spent my teenage years learning for myself. Most courses were rote learning, and I was very good at passing exams, but it was two years before I realised how pointless the majority of the “knowledge” was, and then I forced myself to finish the degree (sunk cost), which I now regard as one of the few true mistakes of my life (wasted years, for valueless academic “knowledge”). The degree got me a software job, so there is that, but I am sure I would have ended up in software anyway (early love of computers).


I had the same experience but with a final exam that was way too long and covered all of the types of diff. equations given to us throughout the semester (expected to just memorize everything). Result was that the average score was ~29%, only reason anyone passed the class was that it had a curve. By far the worst university level math class I had, it has been much easier to learn to solve them depending on need with physics.

The required memorization made things especially difficult for me because I tend to work off intuition rather than memorization. I also usually can't name theorems despite knowing them from practice (this also used to be a huge pain for exams where solutions were unreasonably only considered correct if you named everything used).


Wow I could have written that. The prof teaching my diff eqs course was a nice enough guy and I think tried to get the engineers interested by making the optimization problems as applicable to IRL as possible but it was dull and rote and I don't remember any of it 15 years later, but I feel like I could still pass a statics final so I don't think this is my problem per se.


Differential equations show up all the time. Your professor could have provided concrete examples -- maybe they were more interested in their own research work than their teaching duties.

In high school, I looked into analytically calculating a ball's maximal trajectory length (or something like that) and was told it required solving differential equations and would be taught in college.


This is exactly my experience with my differential equation lessons in my maths classes at Year 1 and Year 2 undergraduate chemical engineering. The way we were told to just follow the instructions and not have any critical thinking at all about what we were doing made me so unmotivated that I sort of gave up learning differential equations. I was lucky this was during lockdown, so I was assessed by online tests and was able to get through it, but my god was the teaching so so unengaging.


Very similar experience to mine except I failed to memorize stuff and had to ask prof. how I can improve my score, he told me that I should do my best and he will fix the rest. He was kinda grandpa figure.


they removed the human element from the content. they've focused on the outcomes, the resulting inventions of the scientists and mathematicians. they only teach how to use the techniques, not how they were made.

paving the way (or building a wall) such that few can understand how people came up with that stuff. this is intended. this literally constructs knowledge as power.

The ways of thinking used to come up with the techniques are hidden, restricted. The academics who know the whole story (who know the ending -- which is what is taught, as well as how mathematicians of old came up with such ideas) hold this kind of power.

This gets even more interesting when the academics who know the histories, cannot really use the techniques. then the only people who knew both are historical figures (who get bathed in myth).

I cannot forgive them for this, given as they are still actively doing this. e.g. finding out how they make shredded wheat cereal is not possible [1]; and this must be technology from the early 20th or late 19th centuries... anything more recent is just hopeless.

[1] https://youtu.be/Qx8ovCJ9XPw?t=132


How to make shredded wheat has been publicly known since at least 1895. [0]. How to make it efficiently at large scale is a trade secret that the company invested in and has a right to protect. None of this is related at all to the teaching of differential equations.

[0] https://patents.google.com/patent/US548086A/en


again, on my own very stretchy way of thinking (which involves big leaps in reasoning). you're saying that a company has a right to protect its secrets, but I'm hearing something comparable to (e.g.) "colonialist superpowers have the right to enslave people from Africa". I suppose I may be tuning into a moral ethical-framework from the future when I take 'offense' by the "rightful" actions of companies to keep knowledge bound and locked.

the relation is ideological, cultural (in the sense of being close to the intention of); not direct, causal, material (in the sense of relating to the actual implementation).


You can motivate the methods somewhat - if that weren't the case, no one could have thought of them. I can't usefully explain without an example, so I apologize if the math that follows bores anyone.

One of the standard methods is "integrating factors for first-order linear equations". You are told that, faced with an equation

    y' + p(x) y = q(x)  you should multiply both sides by  e^(the integral of p(x)).
For example, you might have

    y' + (2/x) y = x.
Then you multiply by e^(integral of 2/x), which is x^2.

[Sometimes I wish Hacker News had TeX available.] If that's all you tell people, it looks like some random abracadabra and it's no wonder why people feel they just don't get it. So you might try to explain this way:

"The equation has a derivative in it. To undo a derivative, you need to integrate. But if you integrate as-is, you have no idea how to integrate y' + (2/x) y."

"Well, you know that the integral of df (the derivative of f) would be just f. So if you could make the left side look like the derivative of something, then you could just integrate both sides."

At this point, you scratch your head and think: "What could I do to make the left side be the derivative of something?" This kind of thought is impressionistic - you have to think in a vague way of things the left side "is like". Daydreaming for a while, you might realize it's a sum of two terms, so you think: If this is a derivative and it's the sum of two terms, what derivative rule gives a sum of two terms? And you might think of the Product Rule.

But the given thing is not the derivative of a product as is. What to do? So continuing this line of thought, you might think - maybe I can multiply it by something to make it the derivative of a product. Once again, you have to search through your experience with derivatives and maybe mess around on scratch paper. Finally, you realize x^2 works - multiplying by x^2 makes the equation

    x^2 y' + 2 x y = x^3.
The left side is d(x^2 y), so you can integrate both sides and get x^2 y = (1/4) x^4 + c.

The final step is to think whether you can generalize what you did with "p(x)" instead of "2/x". After some additional messing around, you come up with the integrating factor I gave at the start.

I have no idea who discovered this method, or what their thought process was (if they even explained it at the time). This was about the extent of the motivation I got when I was taught this stuff in high school/college. I'd tell students this sort of thing when I taught differential equations. But I don't know what other people do in teachng, and I'm not sure this helps. For people who feel their differential equations courses were baffling/unmotivated, is this the kind of explanation you want? Or do you want something completely different, like applications?

At some point, explanation ends. Can a painter say why he put a daub of paint of that color in that place in a painting, or can a writer say why he had a character do this or say that? There are several points in the motivation above where all I can say is "you have to sit there and think and mess around", even after paragraphs of writing. I'm not sure how to do better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: