In a nutshell: the human brain has an amazing ability to separate a garbage pail mush of audio into meaningful streams that we perceive as different event streams. This enabled us to do things like notice (as my prof says) the difference in white noise between the nearby river and the rustle in the leaves from something else, like an approaching animal. Score following requires doing this to separate the aggregate noise into different instruments and into voices within that instrument, suitable for note by note analysis. This, it turns out, is a wicked hard problem.
I'm sure there are further difficulties, just saying that your answer isn't entirely satisfying.
Sheet music is a guide for human beings who understand "musical context" to, with much leeway, create a cohesive sound.
The sound created does not have a deterministic relationship to what's written, or else, why have any live music at all?
And, more importantly, even if it did deterministically produce the same "sound", that sound is layered to us by our prior familiarity with instruments (, room geometry, etc.).
We aren't deconstructing it by "mere frequency", but by meaningfully parsing out & grouping frequencies. There is no naive algorithm to do this.
And that is doing it basically per-note! This task (aligning sheet music with MIDI) would allow for global optimization approaches which should be far easier and more likely to work.
Autocorrelation is the algorithm used for Radar (at least, that's what my professors told me). When radar bounces off of another object, there are different "echos" of that radar (ex: an object was 10 miles away and 20 miles away: the 20-mile object will take 2x longer to come back). Its messy and everything.
The radar input is very messy, full of reflections, echos, and more. But autocorrelation takes all of that information, and tells you where the objects were.
Writing that on the MIDI piano roll is slobber proof. But if you take music written that way and view it as sheet music, it is nearly impossible to read because of the many ties that don't even necessarily begin or even end in the same place as the others.
It all ends up looking like layered sheets of snow sliding off your roof. A ribbon of differently-sagging tie-lines making it not only unsightly, but impossible to read.
Why sheet music if you have MIDI already? Because as someone who records, I use MIDI for writing, but I don't expect other musicians to follow my piano rolls when recording guitar, bass, vocals, etc. and I wouldn't want them ever to sound as robotic as MIDI tends to be.
Drums however - that's one area where MIDI piano roll to sheet music actually works flawlessly, even if the notes don't have the usual x| appearance.
If you want to analyze the music or perform it, you want the sheet music. If you want a computer to perform the music, you want the MIDI.
Additionally, the system described in the article works on scanned paper sheet music, not electronic sheet music, so you also have to factor in OCR issues. All things considered it would be very difficult to manually write an algorithm to do this reliably.
If those timing variations were intentional, then I agree they represent information. If they are just sloppy playing, they are noise. And if the performer is just applying a stylistic form ("swing these bars"), then it's an inefficient encoding of the musical intent.
And yet notating any of that information in every bar as, say, three-quarters to one-and-a-half hemidemisemiquavers early or late to each offbeat defeats the purpose of having a readable score.
The score is a reduction that's intentionally lacking tons of information that defines the essence of a performance. That's not a bug, but a feature.
This is precisely part of my point, or would be if you corrected it. The score contains, either implicitly or explicitly, global performance clues that will tell the performer what to do about timing (and if it doesn't that's because it is expected to come from a conductor or other similar source). It's a highly efficient mechanism precisely because it is a global property of the score (or perhaps locally scoped to sections). Much better than providing timing information for every note (as MIDI would do). The MIDI version is an inefficient means of information transfer.
BTW, things do not get even crazier with "world music styles like samba". Samba is an extremely regular groove that is very easy to understand. This is true of most Afro-Cuban derived rhythmic structures - the complexities come from layering a set of very simple patterns. Things do get "even crazier" with rhythmic traditions from Indian, Balinese and some parts of Africa, places where conventional western ways of describing things really don't do a good job at all.
And I'm sorry, but regarding samba, you have absolutely no idea what you're talking about. I assume you're referring only to the surdo's backbeat, but the essence of the samba rhythm is the sixteenth-note groove played by the pandeiro, and that sound is pretty much as far from "extremely regular" as you can get while still maintaining a consistent pulse. I found the style relevant to bring up, as it's a commonly given basic example of a groove featuring microrhythmic variation.
and yet ... in almost all the musical forms where this happens, it isn't notated.
I've played samba (surdo and tamborim parts, mostly). I have many friends who play Brasilian music in general. I think we have a problem with definitions, because the pandeiro part precisely fits my definition of "extremely regular timing". When playing samba, unlike various jazz influenced forms, you do not play ahead or behind the groove. The variation still uses a 16th note grid, albeit with lots of freedom of which parts of the grid to play or not play.
Regardless, the context of this discussion was determining whether MIDI data can hold "extra information" that the score does not, and I can't agree with most of your statements about that.
> technically, that would be a case of the MIDI data having noise in it, not extra information.
> and yet ... in almost all the musical forms where this happens, it isn't notated.
There's a term for this type of thinking, and it's called "notational centricity."
> "musicological methods tend to foreground those musical parameters which can be easily notated" such as pitch relationships or the relationship between words and music. On the other hand, historical musicology tends to "neglect or have difficulty with parameters which are not easily notated", such as tone colour or non-Western rhythms.
So any given parameter not being notated with ease or in detail doesn't prove anything about its role as an intentional, stylistic element of the music. That is, not being notated doesn't make a parameter any more likely to be "noise," because what does get notated is not a "core representation" of the musical text. In fact, the distinction between the two mostly comes down to historical coincidence or other non-musical factors.
MIDI or other more granulous performance capture standards have plenty of "extra information" to offer that is not noise, and I'd even say it's mostly the case that you'll see a robust SNR there.
Samba isn’t Afro-Cuban derived, whether in its rhythmic structures or otherwise.
Sure you can extract these patterns from MIDI based on things like velocity curves, but by default raw MIDI won't tell you that.
If you are curious, try flipping through a book on orchestration at the library sometime, or a book on music theory. Even at its most basic, a score records which notes to play, but MIDI doesn’t even do that—MIDI records the note values only, so you can’t tell the difference between G# and Ab. Then add in all the articulations, dynamics, and arbitrary instructions that a composer can put into a score.
That's a problem with MIDI-the-standard, not MIDI-the-idea. In the 1.x standard, it's not just a simplistic definition; the "known workaround" (pitch bend) for when you actually need to express this difference in a played note, is global per instrument. But it's nothing that can't be fixed with a protocol that follows the same principles, but has a richer data model (like OSC).
> What information isn’t captured in the movie but is captured in the novel?
This, absolutely this. And it also doesn't mean we shouldn't be getting better cameras.
For a keyboard instrument, played conventionally, MIDI does a fairly good job of capturing everything that a composer could tell a performer (and vice-versa).
Slightly less true for percussion, but still somewhat true.
Rather untrue for any instrument where technique can be (ab)used to alter timbre significantly (e.g. most reeds, most strings)
Very untrue for any instrument capable of continuous pitch generation (e.g. unfretted strings).
Obviously "arbitrary instructions" are out, but then they are also not formally part of "sheet music", but more "additional written material from the composer", which can accompany MIDI in various ways too.
What are the dimensions of a note that can be varied on a violin other than pitch and loudness?
By contrast, the technique involved in playing the violin family is very varied:
* Which string are you playing the note on? You get different sympathetic vibrations depending on which one you choose.
* Are you moving the bow up or down? Direction matters!
* How are you changing bow movement between notes? Keeping the same direction, or changing? Resting your bow on the string the entire time, or bouncing the bow?
* Where you are playing the bow? Down by the bridge, or up by the fingerboard?
* Are you even using the bow to the play notes? You can pluck it instead!
* Fingering too, you can make many small motions with your finger to give it a vibrating quality.
* On a related note, the pitch you play is a continuous quality: you can slide from one note to the other and hit all the notes in between. Unlike a piano (and MIDI), where notes are discrete pitches.
* You can also adjust tuning of the strings (though not on the fly), or damper them with a mute (which can be done during a long rest in a piece).
There's probably a few more expressive techniques I've forgotten, and I've definitely forgotten all of the fancy Italian names for these techniques.
You can look at the violin sections of Saint-Saëns' Danse Macabre (https://www.youtube.com/watch?v=71fZhMXlGT4) to see how different bowing techniques can produce rather stark effects.
If you want to see a reference, go to your local library and find a book like The Study of Orchestration (https://www.amazon.com/Study-Orchestration-Fourth-Samuel-Adl...) and flip to the section on strings. I own this book, but I don’t know where it is at the moment, so… off the top of my head, here are other dimensions besides pitch and loudness:
- Notes can be connected in different ways. Legato, détaché, martelé, staccato, spiccato, sautillé, jeté/ricochet, tremolo, pizzacato, louré, marcato
- Bowing direction: up/down (they sound different)
- Adjust bow position: sul ponticello, sul tasto
- Natural and artificial harmonics
- Use of mutes
- “Extended techniques”—altering the tuning, col legno, etc.
These all affect the sound. Note that the difference between staccato and legato is NOT accounted for solely by the length of the notes as in a MIDI file. You also might be surprised how many of these have really boring, everyday notational conventions. As in, a violinist would look at sheet music and say, “obviously, technique X is used for this note”, but that would not be encoded in the MIDI file at all.
All of the above techniques are explained and demonstrated in YouTube videos if you are interested.
Legato vs detache. Articulation.
All sorts of esoteric annotations, e.g. "Fire the cannon!", or "Release the penguins."
However, on other instruments, you have a much more direct relationship with the sound. With a violin or guitar, you physically touch the strings - directly or with some device. That already means you have something like 6 degrees of freedom in physical space, not counting fretting. Plus if you use a finger directly, your playing won't sound the same as someone else's with different fingers. The whole action of playing a note is a complex process in time, not a single event, so you can't capture it as one set of parameters. Then there's things like tapping on the body of the instrument, not the strings. You'd basically need a model of the contact surface that touches the string, its physical parameters, and its precise motion in time at high resolution, to accurately capture all possible playing techniques that people actually use.
What we've done with sheet music is standardize many of these techniques into instructions that players can perform. It's obviously only a much lower information version, but it's there.
Of course, if your question is "can we encode everything in sheet music with MIDI" the answer is obviously yes, but it's not standardized. A given virtual instrument could hypothetically implement anything you could write a score for (perhaps with an obscene number of samples or a really clever modeling synths) and be controlled by MIDI, but only a tiny subset of these options have vaguely standardized mappings to MIDI. MIDI is a very limited standard with tons of room for expansion, and only the most basic MIDI features interoperate between manufacturers. You can make MIDI music sound amazing with specific setups (and this is effectively how a lot of modern music is produced), but you can't make General MIDI music (the standardized subset) sound amazing.
The other issue is that sheet music is at a higher abstraction level than MIDI. It leaves more to the interpretation of the performer, but it's designed to be easily interpertable. For the portions of this that MIDI does express at a lower level, it's very hard to infer the higher level representation.
% This is excluding prepared pianos and techniques like reaching in and touching the strings, which are rare; of course technically you can do crazy things to a piano, but in practice we don't.
That's the difference between sheet music and MIDI. MIDI is extremely powerful, but it's not really about setting not just instructions but guidelines for human performers who will ultimately use their judgment on how to interpret it and perform it -- often in large groups. Compared to that, MIDI is not just way more low level, but simply for a difference purpose.
I get that mapping from an auditory stream onto a score is an interesting problem, but isn't it naturally decoupled from the visual processing problem of reading and segmenting images of sheet music?
Wouldn't it make more sense to deal primarily with a digital intermediate encoding of the score?
What am I missing?
HN: Is there a tool/service/app that allows me to point at a piece of sheet music and it just plays it? I've tried a couple apps but they don't work well.
There is only an embedding or re-presentation in some less informative form. ie., you have to choose some partial measurement system and apply this to the images of scores to produce a numeric representation.
This is, in part, why a midi-file is always going to sound worse than an orchestra operating from sheet music. The musicians understand the intensions of the composer.