And here's JS-free rendering of the notebook:
Trying to "beat" this rule is like trying to beat the Pythagorean theorem. If you want to derive higher frequencies, you'll quite obviously have to make additional assumptions about the measured wave.
This point of view also has some advantages in that if you think about it, you can see how you might play some games to work your way around things, many of which are used in the real world. For instance, if I'm sampling a 20KHz range, you could sample 0-10KHz with one signal, and then have something that downshifts 40-50KHz into the 10-20KHz range, and get a funky sampling of multiple bands of the spectrum. But no matter what silly buggers you play on the analog or the digital side, you can't escape from the pigeonhole principle.
From here we get the additional mathematical resonance that this sounds an awful lot like the proof that there can be no compression algorithm that compresses all inputs. And there is indeed a similarity; if we had a method for taking a digital signal that could have been 500Hz or 50500Hz despite an identical aliased signal, we could use that as a channel for storing bits in above and beyond what the raw digital signal contains; if we figure out it's the 500Hz signal it's an extra 0, or if it's 50500Hz it's a 1. With higher harmonics we could get even more bits. They don't claim something quite so binary, they've got more of a probabilistic claim, but that just means they're getting fractional extra bits instead of entire extra bits; the fundamental problem is the same. It doesn't matter how many bits you pull from nowhere; anything > 0.0000... is not valid.
Of course, one of the things we know from the internet is that there is still a set of people who don't accept the pigeonhole principle, despite it literally being just about the simplest possible mathematical claim I can possibly imagine (in the most degenerate case, "if you have two things and one box, it is not possible to put both of the things in a box without the box having more than one thing in it").
Aliasing is always a factor because no real signal has a highest frequency nor a fixed bandwidth (noise is never zero, and all filters roll off gradually forever)
The reason it's not perfect is because of your example, where a signal is not baseband. The extra leap required to understand that is amplitude modulation and demodulation.
Notice that to reconstruct the original signal of your example, you need to know the samples which are collected following the hypothesis of the sampling theorem, and you ALSO need to know the magic frequency 100MHz so that you can shift up your 2kHz bandwidth. That's the same setting as modulation.
The only concept missing, then, is recognizing that sampling can perform demodulation.
Take a look at Wikipedia: https://en.m.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_samp... it suggests that Shannon himself considered your case an additional point on top of the sampling theorem.
Assuming there's no external source of radiation, I can absolutely guarantee there is no energy propagating at cosmic ray frequencies in a circuit built around an audio op-amp with standard off-the-shelf components.
This isn't just pedanticism, you really can't just sample at 2x the corner frequency of your circuit.
(Unless there's some quantum effect that gives a minimum energy level possible for a signal? But even then it would be probabablistic? Is this what you mean? I didn't pay close enough attention in physics class)
This is a common subtlely problem with statistical denoising (aka pattern recognition, aka lossy compression). Commonly the source data and noise models are synthetic formulas (simple sine wave or aussian), and knowledge of the data-generation procedure can infect the supposed algorithm. (a subtle form of training your model on your test dataset)
People are able to interpret speech under very high levels of noise due to having strong priors / being able to guess well / having shared state with the speaker.
Of course you should take advantage of whatever priors you have about the signal; we know rather trivially from information science that information gain / transmission rate is maximized when the listener knows as much as possible about the source. This is also key for compressed sensing.
You see analogues of this pattern everywhere in the world: e.g. people learn most quickly when they have some context about the problem space already.
The real question is why the sender would produce a signal with any level of predictability in it, rather than a signal that purely maximizes the effective bit rate of transmission (e.g. minimize redundancy modulo noise). But it's easy to see that physical constraints (e.g. how the human voice works) are at least partly responsible for the enforcement of regularity, at least for many signals that we're interested in in practice.
English, like all other natural languages, is very redundant. We can't change it to remove all (unnecessary) redundancy since they need to both be speakable and understandable by most possible pairs of communicators, and we are constrained by pesky little facts like the physiology of our mouths and vocal cords.
So in a very real sense, you can probably can practically beat the sampling theorem for most real-life signals that we care about, as long as you have the priors and the necessary compute to apply them.
Sampling is a self-contained symbol-transmission technique that is absolutely signal-agnostic. You can throw anything at it, including band-limited noise with no redundancy. As long as there's nothing in the signal above N/2, it works.
As soon as you include redundancy and priors, you're solving a different problem. Your channel is no longer signal-agnostic, and you can use known information external to the signal to reconstruct it. The advantage is you can use less data, but the disadvantage is that your reconstruction system has to make strong assumptions about the nature of the signal.
If those assumptions are incorrect, reconstruction fails.
Technically you could argue that the N/2 limit is a form of prior, and sampling is a special instance of a more general theory of channel transmission systems where assumptions are made.
The practical difference is that N/2 filtering followed by sampling at N is relatively trivial with a usefully general result. More complex systems can be more powerful for specific applications, but are more brittle and can be harder to construct.
If you mean that the theorem is actually wrong, then I agree— the proof works and you can't actually avoid its conclusions given its assumptions.
But I think we can agree that for many (most?) practical symbol-systems in their particular contexts, the signals are actually high redundant as viewed against a particular basis, so slavishly applying the conclusions of the sampling theorem will cause you to miss the possibility of side-stepping its assumptions entirely.
Whether you're really in a position to take advantage of the latent priors in the signal very much depend on the tools at your disposal—obviously you will not be extracting speech with smart priors if all you have is analog filters at your disposal.
In the information theoretic sense, this just means there is enough redundancy in the signal that you can error correct the noise. If you’re in a noisy room talking to a coworker, your shared context and priors are really redundant bits of information. If she says something about “nachoes” but you couldn’t tell if she said “nachoes”, “tacos” or “broncos”, the nachoes next to you add redundancy to the signal.
So in that context, what you’re saying is that we should take advantage of any redundancy we can find in an incoming signal.
“For example we know that the neural recordings of interest are a superposition of pulse-like events – the action potentials – whose pulse shapes are described by just a few parameters. Under that statistical model for the signal one can develop algorithms that optimally estimate the spike times from noisy samples. This is a lively area of investigation.”
But the article is about an article that tries to calculate and subtract the thermal noise, that is not nice at all.
"Arrogance" literally is akin to "to ask", e.g. interrogate ... unless the "a-" is akin to "anti" instead of "ad-" or whatever, so that arrogance would be like walzing in and taking without asking or making an assumption without due inquiry.
About all the other things, you want to follow the field of Applied Algebraic Topology. People such as Robert Ghrist, Shmuel Weinberger, Herbert Edelsbrunner, Gunnar Carlsson, Justin Curry whose thesis was on applied sheaf theory (it's nice and readable). Mostly the collaborations they part in, them just being in good places in the citation graph (OTOH Carlsson's Ayasdi Inc. is sadly very much a vapourware). Also, in over a decade of very active research in this area progress is being made mostly on the applicable algebraic topology part, not necessarily the applied part.
I'm not entirely sure what this means in terms of applied mathematics. Does it mean applied mathematicians produce these papers but applied fields like EECS, CS, biology etc never use them?
This is quite clearly (2).
> Effectively they can replace the anti-aliasing hardware with software that operates on the digital side after the sampling step
Sampling with filtering in the digital domain after sampling is absolutely old hat, found in everyday tech.
Let's look at audio. We can sample audio at 48 kHz, and use a very complicated analog "brick wall" filter at 20 kHz.
There is another way: use a less aggressive, simple filter, which lets through plenty of ultrasonic material past 20 kHz. Sample at a much higher frequency, say 384 kHz or whatever. Then reduce the resolution of the resulting data digitally to 48 kHz. There we go: digital side after sampling step.
This is cheaper and better than building an analog filter which requires multiple stages/poles that have to be carefully tuned, requiring precision components.
> (2) existing idea spun as something new
The existing idea is oversampling and is often spun as beating the sampling theorem. But this is unlike the airless tires example, because, while oversampling works, it isn‘t beating the sampling theorem but rather based on it.
You can filter after sampling, but that filter is not a replacement for the analog ant-aliasing filter. To avoid aliasing you have to somehow limit the bandwidth of your input.
It seems because the signal and noise characteristics are known (in the original article), the noise can just be "averaged away"
Let us consider a periodic signal (though, I believe, this assumption can be weakened). If we have to assume that all frequencies from the inverse of the period length up to the Nyquist frequency might occur in the signal, the sampling theorem gives us the information that we need.
Now some "practical" consideration: Assume that the signal that we want to sample is very "well-behaved", e.g. we know that lots of its low frequencies are 0 or near 0 (or in general have some other equation that tells us "how the signal has to look like"). So if we reconstruct the frequencies of the signal, but in these Fourier coeficients some other value than 0 appears, we know that it has to come from "noise" that originates in some higher frequency above the Nyquist frequency.
My mathematical intuition tells me that it is plausible that such a trick might be used to reconstruct signal exactly even if we sample with less than than 2 times the highest occuring frequency. Why? Because we know more about the signal than what sampling theorem assumes. So a "wrong" reconstruction for which the sampling theorem can show its existence if there occurs a frequency higher than the Nyquist frequency is not important for us, since we can plausibly show that this signal cannot have 0s in the "excluded low frequencies" Fourier coefficients.
This would explain to me why such an impossible looking algorithm works so well in practice.
It it very plausible to me that such a trick is already used somewhere in engineering.
If your signals are not "full entropy" or "white noise", you can do all sorts of funny tricks to increase the SNR to some extent. These tricks sure are nice but they veil the true issue at hand which is about information and capacity.
I think the author here is clearly correct, I just don’t think their $1,000 offer does anything to demonstrate it.
What is the "unstable" truth?
What is the "proof by incentive"?
From what I've seen people in the DSP community are very anal about the inviolability of the sampling theorem. So charging $10 may be his way to punish novice "fools" who think they know better.
I'm not sure I disagree.
The sampling theorem is exactly the same thing but for trigonometric polynomials.
They have simply shifted in time the sampling of the noise. Even if they argue they synthesize the noise in realtime, they had to have sampled it at some point to know what to synthesize in the first place. It was at this point that they were required to sample at at least 2MHz to accurately quantify their noise at 1MHz.
The mechanism used in pilot's noise cancelling headphones is much the same. they don't sample the engine noise as its being made, they synthesize it from known frequencies.
That said, his takedown of their approach makes sense to me. It's hard to really judge it though as my signal processing is getting _really_ rusty these days. I guess that's why we have professors. :)
See https://en.wikipedia.org/wiki/Sparse_dictionary_learning and https://arxiv.org/abs/1012.0621 for the gory details.
The trick works only because the signal is spike like. I reckon it will miss some of the spikes in real life on real signals or produce spurious spikes. The statistics are not accurate at any given point in time just in the limit.
During my university period, I had to digitally sample an ECG (electrocardiogram) signal. Since the shape of the signal is important in diagnosis and must be preserved, I used a Bessel filter .
The signal maybe "noise like", but it should be a noise that has a distinct bandwitch from the "real noise"
The antialias filter just reduces the amplitude of the data collected (signal plus noise) in the exact frequencies that pertains to the bandwitch of the "real noise".
i.e you have to garantee that your desired signal has a different bandwitch than the noise. In that case, thermal noise must have this proprierty in comparison to neuron signals.
PS: English is not my first language. Sorry about any grammar or orthographical errors.
This is what we do in high-quality computer graphics rendering.
Suppose you are want to reconstruct a signal, showing only the frequencies < N hz.
With an analog filter, you filter out everything above N hz. Then you can sample at the nyquist frequency of 2N hz, and then perfectly reconstruct the filtered signal.
Alternatively, without the analog filter, you can take some number of samples M >= 2N. The samples are randomly placed/timed. You accumulate the samples at the 2N locations, but you weight the sample when you accumulate it with some filter function. This effectively convolves and filters the signal with a low-pass filter.
Because of the random sample placement/timing, high frequencies get converted to noise instead of causing aliasing.
Admittedly you are taking more than 2N samples, however you don't have to do any analog filtering, and you can still reconstruct he bandlimited signal. Your reconstruction will have some noise, but the noise will decrease as you take more samples (e.g. as M increases).
"That filtering process, because it happens in the analog world, requires real analog hardware"
are wrong or misleading.
"Filtering means removing the high frequency components from the signal"
Sure, it can mean that. To claim it always and only means that, even in the context of periodic waveforms, makes me question the authors understanding of the subject.