I'll make an attempt here, though. I think the effect on level 4 is totally classical so it shouldn't take too much background.
1. The solution to a problem involving light is called a wavefunction, which is just a name assigned to functions that map position and time onto what the wave is doing at that position and time. I.E. for a sound wave, W(x: position, t: time) => p: air pressure.
2. The meat of a "wave equation" is essentially also a function, but it's higher-order. It maps wavefunctions onto wavefunctions, I.E. consider high-order function L, such that L(W: wavefunction) => X: wavefunction. The name for this map is an "operator," by the way.
3. We can set up operators L(W) so that they map all "true and physical" wavefunctions on to the zero-function, f(x,t) = 0. This contrivance is the job of physicists to design; so for our discussion let's just take it that for every physically possible W(x,t), it is the case that L(W(x,t)) = 0. (And vice-versa, every solution to that equation is physically possible.) The equation L(W) = 0 is called the "wave equation," by the way.
3. It is a property of L that L(W + Q) = L(W) + L(Q) for wavefunctions W and Q. This implies that if L(W) = 0 and L(Q) = 0, then L(W + Q) = L(W) + L(Q) = 0+0 = 0. Therefore, since physical possibility <-> L(W) = 0, then we may conclude that the sum of any two physically possible wavefunctions W and Q is another physically possible wavefunction, W+Q.
4. Ignore time and look at my nice graph. This illustrates adding two functions (there A(x) and B(x) ) which also happen to be solutions to the wave equation. Play around with the parameter p, (whose purpose is to let you select A to be one of many horizontally offset versions of itself) and see if you can make A + B do anything noteworthy.
5. Hopefully in looking at my graph, you have noticed that for some values of p, A+B became flat everywhere. Now, I can finally explain what's going on with level 4. When the beamsplitter produces two beams from one, the two functions it shoots out have two different values of p, the dynamics detailed in . If two beams are incident on the splitter, four beams will shoot out, and since some of them are overlapping in space they will add and you'll see the superposition effects. See my drawing . (By the way, the parameter p is called phase, and the verb for the thing the waves to under superposition is called interference.)
 I've drawn it here. Please don't over-interpret it, I just sketched it in paint without a lot of attention to detail. https://imgur.com/a/7YLMa
Hopefully this is helpful and sheds some light on the underlying physics!
(I'm also curious why mapping to the zero function ends up being the central criterion for physical possibility—but I'm guessing that's a rather deep subject ;))
Edit: NVM I see what's going on with 'L(W(x,t)) = 0' now: the equation is equating the full evaluation of the nested functions on the left (using the x,t params) with zero. When written like 'L(W) = 0' it looks like it's equating the function returned by L(W) with zero.
There is a deeper point to be made here that I'm glad you brought up. Functions form a vector space (because they satisfy the axioms of vector behavior, basically because they can be added to each other and scaled by constant multiples). In linear algebra the symbol 0 often does double-duty as the zero vector, which is defined as the vector that doesn't change other vectors when it's added to them. So, here, when I write L(W) = 0 I'm implicitly invoking 0 = f_zero(x,t) = 0.
As for why "mapping to zero" has a physical basis, well, it's really more of a thing we're always guaranteed to be able to do. You can always subtract everything from the right-hand side of an equation! For example, Wikipedia introduces the one-dimensional wave equation as D_t^2 u = a^2 D_x^2 u. I can also write that as L[u] = D_t^2 u - q^2 * D_x^2 u = 0, so L[u] = 0. (In my notation, D_x is the derivative with respect to x, and D_x^2 is the second derivative with respect to x.)
The real question is why the addition thing works; if I had to try explaining it I would just say it's just fundamental that Maxwell's equations are linear, and when dealing with things that aren't, we usually approximate them with linear functions anyways. That's how gravitational waves emerge from GR, by the way: at low energies the nonlinear equations behave nearly linear, and in that approximation the familiar wave equation falls out.
 If you zoom in to a small enough range in the graph of all but the most esoteric functions, the thing on your screen will look like a line. Try it, it's a good intuition to have.
> The real question is why the addition thing works
Unfortunately I'm still at the point where I can't see why it should be surprising that it works. I'm assuming that by the 'addition thing' you are referring to the fact that adding two wave functions always produces another wave function—or maybe it's something about the characteristics of the wave function produced through adding? I'm not sure how linearity plays into things here. Maybe it's surprising that it's possible to form a linear operator (I'm assuming the "operator" you mentioned is this: https://en.wikipedia.org/wiki/Linear_map) for wave functions? I guess not though since it's probably just using the structure of those functions as vectors and it doesn't matter what they're 'about'. Nope, not sure :)