
Show HN: Sound visualisation, better than FFT (iOS) - avaku
https://vsound.app
======
avaku
Dear HN!

While the post is still somewhere around the front pages, I wanted to ask what
do you think about the following potential applications for monetisation:

1) Desktop app with various visualisations suitable for displaying on screens
at bars / clubs. Charge for subscription for extra (good) visualisations (made
with proper graphics, not what I did).

1.a) Same but for making videos for YouTube etc.

2) Phone/Tablet app that can consume sound from other apps. Charge for extra
2D/3D dynamic visualisations (without user being able to move around), one
time fee per extra visualisation.

3) 3D environments on Phone/Tablet, where things "vibrate to music", like
leaves on the trees, waterfalls, etc. Charge for extra 3D environments. Very
hard to make, but possible.

4) Apply machine learning to extract components of the tracks, and allow
mixing different components from different songs using a new kind of user
interface. Extremely hard, but potentially could lead to a DJ revolution, if
successful.

5) Any other ideas?

Thank you!

~~~
sova
I love that you released a fully functional free version, I think you can
release a separate [pro] version now that would somehow replace your lock
screen with these awesome visuals. People would pay a one-time fee for that
$2-$5 and would happily support you and your works. Great endeavor, keep the
music coursing!

~~~
avaku
Thanks for your suggestions!

------
avaku
I also thought this might be useful for people who have problems perceiving
some frequencies (maybe all?), to just see the sound. I am not sure what is
the correct way to call people who have trouble hearing, but I am really
wondering if this could help someone, and I'd like to help make it better.
(edit: typos)

~~~
andai
I had the same idea. I was looking at some spectrograms of a podcast in
Audacity and I began to recognize different people's voices and frequently
used words. So I realized that with enough effort it would be possible to
learn to read in this way.

I also like looking at spectrograms of birdsong. I had a setup where I would
see the spectrogram of sound coming through my microphone in realtime. I would
place it on the windowsill and a few times I would see very faint birdsong I
hadn't noticed with my ears, and then I would "tune in" my ears to that
frequency range and I would hear it.

Thinking about it the ideal form factor for a realtime spectrogram for the
hard of hearing (or the curious) would be something like Google Glass.
(Originally thought of a smartwatch but that doesn't make much sense for an
additional sense).

There's a project that's the opposite of this where a colorblind man attached
a sensor to his head that converts color into sine waves he can hear. Adam
Montandon. Eventually this electronic sense became a natural part of his
experience and he began to dream in "color".

~~~
avaku
I am not sure it would be possible to visually recognise speech, but might
help people with hearing difficulties to perceive a concert better or
something.

Funny think about birds! Yesterday, as I was preparing to release the app,
birds were singing outside of my window (coronavirus lockdown, so quiet), and
I noticed how fun to look at their songs on my app! I've spent literally 30
minutes looking at that :) I've posted a link to birdwatchers reddit :)

------
panpanna
The part titled "the science bit" on their homepage is completely
unscientific.

Am I looking bat the wrong place? Can someone explain the math used here?

~~~
avaku
I haven’t published the details of the algorithm. The science bit is supposed
to give some idea, but not details. I know people want the details straight
away, but I don’t work in academia, so my goal is not to make a publication,
but to gauge interest in applications of this.

~~~
panpanna
I was more hopping for something like "we use DFT instead of FFT because
reasons"

No need to publish a scientific report.

~~~
avaku
Ok, I try. We use Wavelet Transform instead of FFT because it provides
suitable resolution across all frequencies. P.S. I've realised that I
implemented Wavelet Transform after I've already done it :)

~~~
memming
You've been retroactively scooped!

------
avaku
Hi! I made this app in my spare time off work... I described the motivation on
the website. Please let me know any feedback! Would really appreciate it.

~~~
ablerman
I'm curious how this differs from wavelet analysis? Is this a wavelet
application with the wavelet function optimized by AI?

~~~
avaku
I figured out that I've implemented a real-time wavelet transform while I was
doing this... :) I tried and couldn't find any good apps that would use it to
display real-time analysis of sound, so I made one. Actually, I was working on
the AI to generate music, but I haven't got to the AI bit yet, and decided to
release this quickly, just for fun, see what people think. I think much better
visualisations can be made from the output of this algo...

~~~
fluffything
Could you describe more in detail how your algorithm differs from a wavelet
transform, if at all? The website doesn't say.

You mention that your wavelet transform is "real-time", but the normal wavelet
transform already is real time.. or what am I missing ?

~~~
avaku
I guess you're missing the detail of how it's actually implemented in
practice. The short description would be just "wavelet transform" I guess. The
details is in how to implement it efficiently so that it can run on the phone
in real-time, and draw the visualisation at the same time, without drowning
the CPU (plus the additional feature of high-precision detection, which is not
solved by wavelet transform itself). I will describe it in more details when
I'll be open sourcing the algorithm (after I've implemented something on top
of it that I could potentially start selling).

~~~
fluffything
I see, thanks. FWIW, if your phone has an APU or GPU, it is quite easy to
implement with CUDA. Is pretty much a massively parallel problem.

~~~
avaku
That would be hard to target different phones... I am just using SSE/AVX (via
Apple's Accelerate framework) for now for some parallelisation, helps to speed
up the detection x10.

------
dchyrdvh
It's a topic that's been fascinating me for a while. Sounds, speech and music
"produce" images in some sense, so visualizing sound on a flat screen is a
meaningful thing to do. That parallel with the Heisenberg principle is a deep
observation, I think. A few random thoughts. Colors correspond to notes: a
clear red is like a clear "do" note. Volume is like size of the shape.
Repetitive patterns correspond to symmetry. And that observation that there's
no such thing as a "moment" if sound, and it's always about the past few
seconds of sound that make it look like a music. I think this corresponds to
multiple image layers with exponential decay: a sound pattern of past would
look like a faint decaying background image that eventually disappears. I wish
I could study this topic full time, but in our society you either make money
or work on interesting things: the heinsenberg principle gas influence even
here.

~~~
avaku
These are very interesting thoughts, they remind me of when I’m deeply
thinking about how a track/song is represented in my mind and trying to
visualise it. (With the exception of red being a “do” note, I think because
I’m red-green colourblind). Sometimes I’m drawing to try to imagine much more
complex visualisations that leave imprints of past music that then get
activated again. At this stage it’s more like dreaming than actual possibility
that I could already implement. But it’s nice to dream. I agree about
society... That’s why I did this app in my free time off work. I could only
spend 1 hour per every 2 days on it. Imagine the possibilities if I (and other
people) had more time for creative work...

------
fg6hr
Cymatics. It's sound interference drawing some peculiarly complex, but cool
looking patterns. Edit: I've tried to find actual math behind those pictures
and only found piles of pseudoscience. There's a CymaScope app that draws
them, but they are super secretive about it. I suspect that it's just the
interference pattern in a tibetian bowl or a cup of tea. It's almost
suspicious that wikipedia has detailed scientific articles with hardcore math
on dumb topics, but not only wikipedia, but the entire math community seems to
carefully avoid this topic. Edit2: I think the math behind this is
[https://en.wikipedia.org/wiki/Normal_mode](https://en.wikipedia.org/wiki/Normal_mode).
In this case, the picture can likely be derived by numerically solving the
diff equation of a sound wave with a boundary condition on a circle. Edit3:
This lead me to
[https://en.wikipedia.org/wiki/Vibrations_of_a_circular_membr...](https://en.wikipedia.org/wiki/Vibrations_of_a_circular_membrane)
with Bessel equations and all the good stuff. Solving it (numerically) would
supposedly create the cool picture of the 432 Hz note. Edit4: And I've
returned to where I began: cymatics, cymaglyphs and that cymascope. It gives
an impression of a lazy pseudoscience at first, because of the somewhat sloppy
language they use, but after watching a John Stuart Reid's presentation (watch
it, it's only 40 mins), I had to change my opinion. The indeed capture the
interference patterns created by sound in a bowl of water and a picture of
that rapidly moving pattern is called a "cymaglyphs". I'd say, visualizing
sound and music is a solved problem and the solution is called cymaglyphs.
Obviously, an app can't use a bowl of water to solve the diff equations, but
it can do that numerically with a decent precision, and apply some smoothing
techniques to deal with the rapid evolution of these cymaglyphs.

~~~
avaku
Hi! Thanks for your comment! It’s interesting. I don’t think there is any
magic about cymatics per se: it depends on how sound interacts with physical
objects, and these objects have certain resonant frequencies, which is when
the start to vibrate “interestingly”. I like it as a potential visualisation
idea though!

~~~
dchyrdvh
That above isn't just an idea, it's the idea of sound visualization. It's a
bijective projection between sound and shape that turns nice sound into nice
shapes and vice versa. The analog sound visualizer would be an air filled
glass sphere put next to speakers, if only we could see air. All you need is a
performant software simulation of it. The only problem to solve here is
visualizing the rapidly changing patterns in a comprehensible way. P.S.
actually, we can see air patterns if we fill that glass sphere with dust.
Edit: In fact, that's been done already. Find the "Yantara Jiro voice made
visible". They used a cymascope (a water bowl in lab conditions) and captured
the formed patterns with a fast camera. A water bowl is indeed a great music
visualizer.

~~~
avaku
One could imagine making a “fake” cymatic visualisation while still looking
interesting!

~~~
dchyrdvh
Some kind stranger has explained the numerical solution and even provided code
in Python:

[http://hplgit.github.io/wavebc/doc/pub/._wavebc_cyborg001.ht...](http://hplgit.github.io/wavebc/doc/pub/._wavebc_cyborg001.html)

We only need to apply it to a circle and rewrite the solver in C. This will
give us a visualizer for a single frequency.

Then we run FFT, run the solver for all frequencies, observe that the final
solution is a linear combination of the individual solutions and apply the
rainbow coloring to corresponding frequencies.

This solver needs to be kinda fast to run at 15 fps, but luckily, different
frequencies can be solved in parallel. Most likely, changing the input
frequency a bit will change the solution only a little, and so we could pre-
solve the frequency range with sufficient density, cache them and rapidly
derive actual solutions by interpolation.

Bonus points for using complex numbers. The boundary condition in a singing
bowl is u(x0, t) = A sin(Bt) where x0 denotes the circular boundary. Since
real numbers are boring, we could expand the problem into the complex plane:
u(x0, t) = A exp(iBt). In this case the solution u(x, t) would be in complex
plane also, where the absolute value |u| is the amplitude, or pixel opacity on
our visualization, and the angular coordinate arg u would be maybe color of
that pixel?

~~~
avaku
This seems certainly possible. But I don't know how would you display all of
the frequencies that are present in music _at the same time_. Usually, they
only play one or two frequencies and look at the result. Mash them all
together? Maybe can be tried in the future. I was thinking to do something
"inspired by" this cymatics instead (when I get around to it). Thanks for all
the interesting thoughts!!

~~~
dchyrdvh
I'm pretty sure that interference pattern from a sum of two waves is the sum
of individual interference patterns. This should follow from how the wave
equation looks.

The real physical solver doesn't do FFT, though. It makes the boundary circle
vibrate with the input sound wave and effectively solves the wave equation
where the boundary condition is u(0,t)=f(t) - the input sound.

I've run some calculations that solving the wave equation in real time would
be infeasible. The convergence depends on the Courant number, which basically
says that the grid step dx must be c*dt, i.e. the sound must travel on grid
step per one time step. Since sound travels at 1.5 km/s in water, the grid
needs to be super dense and 1 second of sound would need around 1 petaflops of
calculations.

------
givinguflac
This is really cool, I think the linear and circle visualizations are my
favorite. I love how much more detail this provides vs fft! You can really see
the “feel” of the music. Nice work.

~~~
avaku
Thank you! I am really happy!!

------
jmenter
This app is really neat, and I'm curious about what your app is doing beyond
FFT analysis.

(I spent like 15 minutes playing a 'simulated' church organ and watching the
frequency pattern with the dots. It appears that each line is an octave with
the leftmost dot in each row tuned to a G. Super fun!)

Edit: I'm thinking how cool would it be to add some kind of visualization like
you see here: [https://paveldogreat.github.io/WebGL-Fluid-
Simulation/](https://paveldogreat.github.io/WebGL-Fluid-Simulation/)

~~~
avaku
Thanks so much for looking! You’re right, all the visualisations apart from
the first one arrange the notes in octaves (12 notes). I’m starting to work on
better 2D and 3D visualisations. Unfortunately Apple recently deprecated
OpenGL support, so I’m trying something that’s only supported by Apple now
(Metal), which means it would have to be for Apple only. I’m keen to make some
3D visualisations... I’m imagining something like a 3D environment, where you
could “walk” and watch things react to music like tree leaves and waterfalls!
:) The app is basically just “better” at detecting notes (frequencies) than
FFT, it’s basically doing an online wavelet transform. Thanks for your
feedback!

~~~
jmenter
Would you consider releasing the core frequency detection module? (I'm
assuming it's in C++?) I'm also an iOS developer and it would be fun to play
around with it.

Is the chromatic scale you align your visualizations to just or equal
tempered?

~~~
avaku
It’s equal temperament scale (each note’s frequency is 2^1/12 above the lower
one) with 12 notes per octave, have a look at user guide for more details.
Many people asked for open sourced version of the algo. I want to do it. Just
need to think a little about how to do it properly, so that I can still build
something on tops of it to “monetise” what I’ve been doing for a year now...
(Everyone wants to do smth enjoyable in their life.) I’ll try to do it soon!
Subscribe to email list, I’ll let everyone know.

~~~
Optimal_Persona
If you don't already, you should check out the DSP forum at:
[https://www.kvraudio.com/forum/viewforum.php?f=33](https://www.kvraudio.com/forum/viewforum.php?f=33)

Some of the world's finest audio devs hang out there - Urs Heckmann/u-he, Andy
Simper/Cytomic, Alexsey Vaneev/Voxengo, Dave Gamble/DMG - and so do a bunch of
amateurs and rand0s. Good discussions about both classic/novel audio DSP
algorithms, and the challenges of monetization.

~~~
avaku
Thanks a lot! I've been to that website before, but forgot now. Will try to
post there, and ask for discussion :)

------
bloomer
This is really cool. Have you looked at all at the constant Q transform
[https://en.m.wikipedia.org/wiki/Constant-
Q_transform](https://en.m.wikipedia.org/wiki/Constant-Q_transform) which is
basically a Fourier Transform with logarithmically spaces bins so that it can
be aligned with traditional musical notes. Miller Puckette (Max/Pure Data)
among others, has a fast algorithm for its calculation.

~~~
avaku
No I haven't seen this, it's _very_ related. I guess if you wanted to use this
for real time, you would still need to solve the question of window size...

------
avaku
Who wants to get NOTIFIED about the DESKTOP app, so you can take screen
recordings to make videos with your music, please SUBSCRIBE to email list on
the app page! I will make the same version as on iOS available as DESKTOP apps
for free as well. (Plus the visuals look much better if you feed sound
directly from the sound interface, not microphone.)

------
undershirt
A question comes to mind: What would be preserved if converting the visuals
back into audio?

This may help answer another question: What type of visual would allow feeling
the music without hearing?

~~~
klodolph
In general, one of the problems with the spectrogram is that pure frequencies
will get smeared across a few buckets, due to the finite window size.
Similarly, transients will get smeared across a few windows. These two are at
odds—no free lunch. If you increase the frequency resolution, you decrease the
time resolution, and vice versa.

If you are using a normal FFT this is no problem. You can reconstruct the
original signal with a very small amount of error. However, this works because
the FFT preserves phase data for each of the buckets. The spectrogram does not
preserve phase data, so it will really mangle things.

(I’ve tried this, but it’s been a while. You get a sound which is
recognizeable but total garbage otherwise.)

~~~
avaku
There is a trick to do this :) If you think about it, you don't need high-
resolution information about low frequencies, they don't change that fast. The
lower the frequency, the fewer data points (complex) you need to reconstruct
the original sound, because the lower frequencies don't change as fast
throughout time. You wouldn't be able to do this with FFT, which I have
started out with, but was unsuccessful. So I had to "invent" the new
algorithm, which I've figured out later is equivalent to Wavelet Transform.
The only new thing I've done is to make it "real-time", and I have some extra
math for "high-precision" frequency detection, which deals with the problem of
frequency leakage post-factum (I am willing to reveal this has something to do
with rotating complex numbers :)).

~~~
ssfrr
I'd definitely love to hear more about your technique once you're willing to
say more about it. I'm really interested in time-frequency representations in
general.

One way to get high frequency resolution from the STFT or Wavelet transform is
to use the phase derivative within each frequency band, which is usually
called "instantaneous frequency", and is closely-related to the phase-vocoder
(different from the daft-punk-famous vocoder).

The main issue with this kind of instantaneous phase estimate is that it
assumes there's only one dominant frequency within each band - does your
method improve on that?

~~~
avaku
The core algo is basically like a vocoder (I think), tracking certain
frequencies, and as others pointed out might be very very similar to
[https://en.m.wikipedia.org/wiki/Constant-
Q_transform](https://en.m.wikipedia.org/wiki/Constant-Q_transform) (I guess
another question is how to implement this effectively in "real-time").

About the question of "assuming there is only one frequency within each band"
\-- yes, it's a problem, and that's what's causing the "frequency leakage"
that you see in the app with the "high-precision: off" (and FFT). To solve
this, I've added some extra math on top, which is available via "high-
precision: on" in the app. I will release more details when I open source the
algo. Please subscribe to email list if you want to receive an update, I am
not super active on social media otherwise :).

------
ssfrr
Really nice visualizations - they feel really snappy. I really like in the
linear visualization how you can see the kick drum hits move down the
spectrum!

~~~
avaku
Yeah, this is the first one I noticed, and thought was really cool :)

------
dr_dshiv
I'm interested in scalograms and continuous wavelet transforms. Here's
something from a few years back

[http://stevehanov.ca/blog/?id=22](http://stevehanov.ca/blog/?id=22)

~~~
avaku
The pictures you have there look similar to what I see when I visualise the
spectrum over time (some examples here: [https://vsound.app/high-precision-in-
frequency-domain.html](https://vsound.app/high-precision-in-frequency-
domain.html)). I found that white noise was quite interesting too:
[https://www.instagram.com/p/BzF1f80AEvH/?utm_source=ig_web_c...](https://www.instagram.com/p/BzF1f80AEvH/?utm_source=ig_web_copy_link)

------
drawbars
Would love to be able to export videos using this, is that possible in the
future? I have a bunch of original music that I'd like to upload with visuals
like this. Nice work

~~~
avaku
Thanks! I'm also planning to release the desktop app (currently for Apple only
first), it would be quite easy to make actually, since I have decided not to
charge for it anything :) I've actually made videos that are on the front page
by doing screen capture on my laptop! I just need to figure out how to sign
the desktop app properly, so it can be distributed for Apple... I might be
even able to do a Windows app, since I am using JUCE framework. If you want,
feel free to subscribe to my email list (link on the app website), so that I
can send updates. Sorry, didn't make twitter account for this :)

~~~
drawbars
Thanks, look forward to this. I think there is a market for 'drag your music
into the app and export a music video' using your visualisation.

~~~
avaku
I've done this initial app just to gauge if there is any interest in this at
all (friends were telling me "no", but they don't know anything about music
:)). I am thinking that I could do some visualisations for free, and add the
fancy ones as a paid option. There are many visualisation apps out there, but
none have the same frequency detection algorithm... Many people asking to open
source it now, but I don't know how can I do it, and also get paid for making
something people can pay for on top of it... Just trying to make something
that would allow me to leave my (already high paying!) job... I'll need to
think :)

------
dubcanada
Is this Windows Media Player from 2005? I used to love playing songs in it
just to watch the random shapes it would generate.

~~~
avaku
Ha ha, is it? No :) It's just my simple attempt on visualisation. The core
feature is the better frequency detection. I want to make better
visualisations later :)

------
anotheryou
So it's just different windows sizes faded in to one and than displaying them
as bands?

~~~
avaku
There are different ways to describe it in one sentence :) You can say it’s
just a wavelet transform. Or that it’s just vocoder. There are many internal
implementation details actually. I haven’t published details of the algorithm
(yet).

------
psalminen
Any plans on releasing an android or, more preferably, a desktop app using
this?

~~~
avaku
Yes, soon! Please subscribe to my email list on the app page. I will try to
post on HN again, but I am not sure it'll reach the front page again :) P.S.
Please upvote.

~~~
psalminen
Thanks for the update! I subscribed and look forward to see what comes. This
new algorithm looks amazing.

~~~
avaku
Thank you!

------
yoda_sl
Pretty cool! Any chance the app could import some audio file to visualize ?

~~~
avaku
Note at the moment, but I will work on this. Thanks for feedback! P.S. It
looks much better with actual audio rather than microphone, see Instagram
videos on app website.

------
kitotik
Does it show the _actual frequencies_ in HZ / kHZ anywhere?

edit: wording

~~~
avaku
Do you mean the numbers? I was going to add that feature, will add it in
another release to turn this off/on. Sorry, I’ve been working on this for a
few months, so I decided to release what I have now, and then get on with
improvements... Thanks for suggestion, now I know people want it.

~~~
kitotik
Yes sorry, I meant the actual numbers. This would be very useful for example
when doing field recordings to get an idea of what’s happening.

Cool project, good luck!

~~~
avaku
Thanks for suggestion! Will definitely add in the next release.

------
canada_dry
Tip: you should credit (note) the artists used in your demos.

~~~
avaku
Of course. They are definitely tagged below each Instagram posts, and I also
asked them in person when I made those videos.

------
gok
Did you consider visualizing cepstrum instead of spectrum?

~~~
avaku
Never heard that word before :) I'll google!

------
zelienople
I wanted to try it, but apparently not compatible with IOS 9. I wonder if this
app has any features that require a later OS, or did you just pick the OS
compatibility at random in XCode when you created the project?

Why am I using IOS 9, you ask? Because that is the latest my device supports.
Why not upgrade you ask? Because my device works fine and is in great shape,
so I'd rather not destroy the planet for for that sake of my own greed, vanity
and stupidity.

~~~
dewey
Someone is creating a free app in their spare time and that's how you are
going to thank them for it?

If it doesn't work for your setup then just move on instead of telling people
to spend more of their time to make it work for your edge case.

~~~
avaku
It's ok, they didn't ask rudely :) People don't be angry, give me more upvotes
instead, so the link can stay on front page longer and other people can see
it! :)

