
Looping Music Seamlessly - crummy
http://nolannicholson.com/looper.html
======
rmcclellan
Awesome project! As a professional audio developer, I was really blown away
that this was the author's first project working with audio.

For anyone interested, I'd recommend checking out "The Infinite Jukebox",
which has a similar goal, but perhaps a more robust approach:
[http://infinitejukebox.playlistmachinery.com](http://infinitejukebox.playlistmachinery.com)

If I had to guess at why your approach didn't work well on recorded music,
it's probably because most of the time, there is more than a single event
happening at a time, so picking out just the highest FFT bin is probably not a
very robust "fingerprint" of that part of the music. The infinite jukebox uses
timbral features as the fingerprint, rather than just a single note.

~~~
tux1968
It's a shame you can no longer upload music to the InfiniteJukebox. Apparently
it relied on a Spotify API that was shut down October 1st. Best hope going
forward is an open source library that can offer the same track analysis that
used to be provided by that API.

~~~
johnsoft
The API was an Echo Nest API. Spotify acquired Echo Nest and shut down all
their APIs in 2016.

I'd love to see the app live on too! Spotify's API has similar
functionality[1] to the old Echo Nest API, for now at least. (But I don't know
if it returns all the same data.) Or, if you don't want to rely on Spotify, I
bet Essentia[2][3] could do the job just as well. Essentia is the open-source
brains behind AcousticBrainz[4]. So you could either use Essentia directly, or
grab the data from the AcousticBrainz API.

[1] [https://developer.spotify.com/documentation/web-
api/referenc...](https://developer.spotify.com/documentation/web-
api/reference/tracks/get-audio-analysis/)

[2] [https://github.com/MTG/essentia](https://github.com/MTG/essentia)

[3]
[https://essentia.upf.edu/documentation/essentia_python_examp...](https://essentia.upf.edu/documentation/essentia_python_examples.html)

[4] [https://acousticbrainz.org/](https://acousticbrainz.org/)

------
klodolph
If you’re going to do autocorrelation, why not just do it on the audio signal?
Why go through all the trouble of doing a FFT on the input signal and
extracting notes first?

Just to give more details—you can do autocorrelation using the FFT.
Autocorrelation is done by calculating the convolution of a signal with a
reversed copy of itself. Convolution in the time domain is multiplication in
the frequency domain. So you take the FFT, do a pointwise multiplication, and
then reverse FFT. The loop points will be spikes in the result.

There are some additional factors you will likely want to consider, like
windowing, but this is a much more straightforward way to do things.

(As an aside, I would probably put the G in the third measure on the left
hand.)

~~~
MadWombat
> If you’re going to do autocorrelation, why not just do it on the audio
> signal?

Because audio signal is very sparse. On a decent quality file you have 41K or
48K samples per second and you need a window of at least half a second to pick
a looping point.

~~~
klodolph
Audio signals are not sparse at all, unless you are using a definition of
“sparse” that I’m unfamiliar with. I also don’t understand your complaint
about the looping point, or window size.

If you don’t understand how autocorrelation works, I can provide some
pseudocode to demonstrate the technique. This is not an esoteric or unusual
way of finding a looping point.

~~~
MadWombat
I might be misusing "sparse". What I meant was that in a sufficiently high
sample rate music, there are a lot of samples between notes and a lot of the
samples are noise and garbage.

I do understand how autocorrelation works, to a reasonable degree. And that is
the reason for the DFT. We are talking music, so we care about the frequencies
more than about the immediate sample value.

And the window is important because autocorrelating over individual data
points is useless, sound signals are noisy and you get a lot of randomness.

Also, I tried the code provided on a few pieces I thought would be appropriate
and it didn't do a very good job. I might run a few experiments.

~~~
klodolph
> We are talking music, so we care about the frequencies more than about the
> immediate sample value.

The frequencies and the sample values are just two different ways at looking
at the same data.

> And the window is important because autocorrelating over individual data
> points is useless, sound signals are noisy and you get a lot of randomness.

I’m not sure what you mean by “individual data points”. The autocorrelation is
computed for an entire signal, not for any individual data point. Talking
about an individual data point in the input doesn’t make much sense.

Autocorrelation is not especially susceptible to noise.

> Also, I tried the code provided on a few pieces I thought would be
> appropriate and it didn't do a very good job. I might run a few experiments.

Try doing just a simple autocorrelation instead. This will work for the use
cases described (looped samples of music), you will see a large spike in the
autocorrelation for the loop point.

For music which repeats but is not looped, you can do some preprocessing e.g.
to find envelopes and do autocorrelation on that.

~~~
MadWombat
> you will see a large spike in the autocorrelation for the loop point

Why? All I can hope for is to pick the max autocorrelation value, but why
would I see a spike?

~~~
alanbernstein
That is the exact intended purpose of the autocorrelation

------
yyx
Also see:
[http://infinitejukebox.playlistmachinery.com/](http://infinitejukebox.playlistmachinery.com/)
[https://eternalbox.dev/jukebox_index.html](https://eternalbox.dev/jukebox_index.html)

------
mosselman
Cool and nicely illustrated article. It is also funny to me because in
secondary school (high school) I used to edit certain songs manually in order
to create x-hour versions of them. I'd manually look for repeat-points and
just copy paste pieces of the song in Audacity (an audio editor).

At some points people at school came to me asking if I could make multiple-
hours versions of songs for them.

~~~
rosstex
Did you have a YouTube channel? I used to do this and post them!

~~~
mosselman
That is cool! Back when I did this either YouTube didn't exist yet, or it
wasn't a thing yet. So I never posted them. I probably would have worried
about copyright things as well.

------
zwegner
Minor correction: the graph of the PCM data is displaying the data as unsigned
instead of signed, so it has lots of discontinuities between ~0 and ~64k.

Though, for the sake of showing the data as "raw", I guess it doesn't matter
too much.

------
kenrick95
> This happened a few months ago to BrawlBRSTMs, one of the first accounts to
> do it for a lot of music tracks, and people were devastated.

You can still access their website
[https://www.smashcustommusic.com/](https://www.smashcustommusic.com/) through
the Internet Archive and can still download the BRSTM files that they use to
generate those videos. BRSTM file is an audio file format that has loop points
encoded in it[1]. Of course, you'll need a special program to play and loop
it, like BrawlBox[2] or a web-based one that I created few months ago[3].

[1] [https://wiibrew.org/wiki/BRSTM_file](https://wiibrew.org/wiki/BRSTM_file)

[2]
[https://github.com/libertyernie/brawltools](https://github.com/libertyernie/brawltools)

[3] [https://github.com/kenrick95/nikku](https://github.com/kenrick95/nikku)

------
evan_
There’s a great app for iOS called snesmusic which lets you download videogame
music from a bunch of consoles including Nintendo 64 and play it straight
through or looped endlessly using the “real” loop points. You can download
music from within the app, and it’s all ripped directly from the ROMs. The
playback sounds pretty accurate to my ears.

~~~
PikachuEXE
Does it work with GBA?

~~~
joshvm
Modizer definitely does

[https://apps.apple.com/us/app/modizer/id393964792](https://apps.apple.com/us/app/modizer/id393964792)

------
BLKNSLVR
Semi-related and possibly not adding much other than an additional anecdote to
the concept, but this is a full continuous album that loops from the end back
to the beginning:
[https://en.wikipedia.org/wiki/Nonagon_Infinity](https://en.wikipedia.org/wiki/Nonagon_Infinity)

I happen to like the music as well.

------
saagarjha
Related, but I was looking for an audio format that could embed these loop
points inside of it so the file would be small but play infinitely. Is there
anything "standard" that does this? I looked into WAV and AIFF, which seem to
have "cue" points, but they don't seem to quite do what I want…

~~~
Intermernet
The "standard" is unfortunately (as far as I can determine) not open source,
but it is an "open" standard. The REX2 file format from Propellerhead is
documented somewhere at
[https://www.reasonstudios.com/developers](https://www.reasonstudios.com/developers)
but you will need to sign up for a developer account. I'm pretty sure it's
free to use, and it's supported by most serious DAWs.

>"REX2 is a proprietary type of audio sample loop file format developed by
Propellerhead, a Swedish music software company.

It is one of the most popular and widely supported loop file formats for
sequencer and digital audio workstation software. It is supported by PreSonus
Studio One, Propellerhead Reason, Steinberg Cubase, Steinberg Nuendo, Cockos
REAPER, Apple Logic, Digidesign Pro Tools, Ableton Live, Cakewalk Project5,
Cakewalk Sonar, Image-Line FL Studio, MOTU Digital Performer, MOTU Mach 5
(software sampler), and Synapse Audio Orion Platinum, among others."

[https://en.wikipedia.org/wiki/REX2](https://en.wikipedia.org/wiki/REX2)

~~~
tempwaveinfo
Did something change about their licensing? At least up until recently I
remember that Propellerheads applied proprietary licensing terms and requiring
every developer to be associated with a company for any of their third-party
stuff including REX and ReWire.

~~~
Intermernet
I think you're probably correct. They had a press release in 2004 talking
about "opening" the format. It's a bit of a let down to tell the truth.

"In the past, major Developers such as Steinberg and Emagic applied for and
obtained a license to support REX playback in their applications. Now as an
open format, any manufacturer, large or small, can support REX playback in
their applications.

Third-party manufacturers are encouraged to download REX2 developer
documentation. Implementing the Propellerheads REX2 file format in other
applications or hardware is free of charge. Further information about the REX2
file format is available at
[http://www.propellerheads.se/developer"](http://www.propellerheads.se/developer")

[https://www.reasonstudios.com/press/21-propellerhead-
softwar...](https://www.reasonstudios.com/press/21-propellerhead-software-
opens-rex2-file-format-playback-to-third-party-manufacturers)

A bit more info at
[https://www.reasonstudios.com/developer/index.cfm?fuseaction...](https://www.reasonstudios.com/developer/index.cfm?fuseaction=get_article&article=rextechinfo)

EDIT: Speeling

------
thekegsi
Could you pick a frame of audio and then cross correlate it with the entire
song? The peaks of the cross correlation should indicate when the segment of
audio repeats, i.e. potential loop points.

~~~
sorenjan
That's called the autocorrelation and is indeed used for finding periodicity
in signals.

~~~
thekegsi
well its slightly different in that you are cross-correlating only a section
of the signal with the full signal.

------
AndrewStephens
This takes me back to my days creating Amiga modules, desperately trying to
find places in the samples that I could set the loop points so they wouldn't
click annoyingly.

I didn't have access to Fourier transformations back then, I would just keep
setting the loop points in likely looking places and hope no-one would notice
the pop.

~~~
tenebrisalietum
Sheesh even the Akai S1000 at the time had a crossfade option to blend the
edges.

~~~
AndrewStephens
There may have been better options but I was 15 and didn't know anything.
Unlike now, where I am 44 and don't know anything.

------
rosstex
Oh man, this takes me back! I used the Echonest library and made a GUI
interfacef or it in Max/MSP for a class back in undergrad... good times.

[https://www.youtube.com/watch?v=DL8vJO05DCs](https://www.youtube.com/watch?v=DL8vJO05DCs)

------
EarthlyFireFly
Beware - the power of music isn't innocent:
[http://www.earthlyfireflies.org/government-use-of-music-
to-i...](http://www.earthlyfireflies.org/government-use-of-music-to-influence-
obedience/). And here we had an interesting discussion on LinkedIn of its
influences: [http://www.earthlyfireflies.org/linkedin-dialogue-on-
music/](http://www.earthlyfireflies.org/linkedin-dialogue-on-music/).

------
beefhash
On a related note: I've noticed that ffmpeg supports the BCSTM format, which
has built-in loop points (unsurprisingly, since it's used for video game
music). ffmpeg decodes and stores that information. mpv however doesn't take
advantage of that. That's mildly unfortunate.

------
joosters
What is a 'frame' in terms of PCM audio? If the raw data is just measures of
amplitude at a set frequency, don't you have to pick an arbitrary time slice
to examine (and so risk losing lower-frequency sounds)?

~~~
anon9001
> What is a 'frame' in terms of PCM audio?

Sometimes people in the audio world use "frame" to mean "set of PCM samples
per channel". So a typical stereo audio signal at 44100hz would be 44100
"frames" per second, even though it's really 88200 samples per second. The
author seems to be using it that way, but it's confusing because he's also
talking about sliding a FFT window over by frames.

> If the raw data is just measures of amplitude at a set frequency

LPCM is literally amplitude samples at a rate. Thinking of the sample rate as
a 'set frequency' will lead to confusion (even though it obviously is a
frequency). When you're thinking of samples as sequential amplitudes, you're
thinking in the "time domain". For oscillations of the signal, that's the
called the "frequency domain". Fourier transform is how you convert from time
domain to frequency domain.

> don't you have to pick an arbitrary time slice to examine (and so risk
> losing lower-frequency sounds)?

You need at least 2 samples to make an audible frequency. If you only had 1
sample, you wouldn't hear anything, because nothing would be moving. So at
44100hz of sampling frequency, you can capture 0hz to 22050hz of audio
frequency. That's called the Nyquist frequency, and it's always half of the
sample rate.

~~~
fdsa_889876
> You need at least 2 samples to make an audible frequency.

That's not strictly true. An audio file with a single non-zero sample (usually
set to full amplitude) is often used for testing -- usually called a Dirac
impulse or similar.

That impulse will be (necessarily) band-passed by the playback hardware and
put out filtered "white" noise.

That impulse can be recovered by a mic to show e.g., pre-ringing caused by
(FIR) filters. An FFT of that impulse will show the playback hardware's
response in the frequency domain vs full bandwidth.

> Thinking of the sample rate as a 'set frequency'

For e.g., a WAV file, that's a fixed number of samples per second (a frame
being 1 sample x n channels). That _is_ a set frequency, and deviating from it
will alter the pitch of the music.

There really is no case where sample rate varies, unless we're talking about
minute variations between the clock signals of different hardware, which
requires the use of sample rate conversion to match.

A related concept is the bits-per-second of lossy formats (e.g., AAC) which
may vary from frame to frame (and that frame will mean something different
from a WAV frame).

> You need at least 2 samples to make an audible frequency.

I think you're confusing this with Nyquist being 1/2 of the sampling
frequency. You can very much capture an audible signal with a single sample,
but that signal will be limited (by hardware, by Nyquist, etc).

[Edit]

I should say that this single sample has to be non-zero and the playback
system has to have a DC-offset that isn't equal to that sample's amplitude.

~~~
taneq
> That's not strictly true. An audio file with a single non-zero sample
> (usually set to full amplitude) is often used for testing -- usually called
> a Dirac impulse or similar.

Is this a PCM type sample or a frequency-domain sample? If the former, how
frequently does this impulse get repeated in order to turn into white noise
after going through the playback hardware? It sounds like if it's not repeated
it should just make a nasty 'pop'.

> I should say that this single sample has to be non-zero and the playback
> system has to have a DC-offset that isn't equal to that sample's amplitude.

As I understand it, if you try to play a PCM audio file with a uniform value,
you're effectively putting DC through the speakers, driving them to a
particular offset where they'll stay until the end of the track. Is that not
the case?

------
M4v3R
I use mpv commandline tool to play music in a loop, seamlessly. You open an
audio file, press “l” (lowercase L) at the starting point, then use arrow keys
to go to the and point and press “l” again. That’s it. The only problem is to
press the keys precisely at a music bar boundary, but with enough practice you
can get pretty close so the seam is barely noticeable or not noticeable at
all.

------
sambe
Maybe iOS 13 should pay attention (single-track loops often stutter at the
end).

------
guggle
[insert xkcd411 here]

