I work for Sibelius so I'm heavily involved in this world. MusicXML is a great standard and offered a solid basis for data interchange between music notation programs. But now there's a new group working to build a successor standard, MNX: https://w3c.github.io/mnx/docs/
It was originally going to be in XML but they recently switched to JSON, which is a good move, I think. I can't wait for it to be adopted as it will give so much more richness to the data set.
I'm all for JSON-with-comments, and I even have a bit of a soft spot for the idea of JSON-with-expressions, but considering how far I expect this one to be on the "interchange" side of the spectrum between interchange formats and authoring formats, I doubt that comments will be missed a lot.
Certainly less, less by several orders of magnitude, than in the depressingly ubiquitous JSON-as-configuration use case...
> Another concern with comments is that apps might try to (ab)use them […]
There's no "might" about. This is exactly what happened in the early/beta days of JSON, per Douglas Crockford, creator of JSON:
> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.
There is a big but though: whether the loss of interoperability for those cases was any real loss. Those people wanted their own files parsed a specific way by their own infrastructure. Their files already weren't meant for every regular JSON parser: if they wanted wider interoperability they'd had dropped those comments.
And nobody stops them from continuing to use those meta-parsing semantics even if JSON-standards prohibits comments.
I think Crockford was wrong in his objection. It shouldn't have been his concern.
"Oh no! Bad actors might abuse this absolutely necessary feature, so it's better that we leave it out and let everyone suffer to avoid that! You'll just have to cope!"
There's no "might" about. This is exactly what happened in the early/beta days of JSON, per Douglas Crockford, creator of JSON:
> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.
It's so "absolutely necessary" that JSON has found hardly any success and is struggling to find a use case or a niche… /s
Or it seems that JSON works just fine without comments, especially as a data exchange format, which contradicts that claim that it is "necessary" (absolutely or otherwise).
> Or it seems that JSON works just fine without comments
Code that can't be annotated might as well be machine code. And guess what? Json largely is just that.
> It's so "absolutely necessary" that JSON has found hardly any success
I am almost 50, and since as early as I can remember, the worst technology always wins when there is some sort of standoff. That json succeeded doesn't mean anything, other than that it was the worst.
I'm old enough to understand why this is usually true... the best technology is usually more expensive, and economic forces favor the cheapest. Json though? Best I can figure is that imbeciles just got used to picking the worst, even when there was no expense tradeoff.
Snarking about how it won... I guess that means you think it was perfect on its first try? When the fuck has that ever happened with software?
You're just plainly wrong, and I don't know why or how you'd bother to be. Do you have a few hundred million in JSON Inc. shares about to IPO and somehow I missed it?
I've missed comments in JSON myself, when writing stuff like configuration files. But interchange formats are a different beast, not really meant to be read by humans. And if they need comments to be comprehensible, one might wonder whether the scheme chosen was insufficiently self documenting in the first place.
Which, if dropped, are a clear change to the document in question. Whereas a dropped comment would have to be considered equal, even if the "special comment aware software" might not agree at all. I'm not happy with the decision, but I can't refuse the argument some merit.
As different standards proliferate it makes me worried that we will have less interoperability not more, as programs only have limited resources and pick one format they focus on supporting and perhaps half-assing another. (How) do you plan to adress this? Are there plans for some library for dealing with these files to help adoption? What will be the best way for users to convert from MusicXML to MNX and back? Is losslessly roundtripping MusicXML an explicit goal? (I assume losslessly roundtripping MNX will not be possible in general as you intend to add new features to MNX that MusicXML doesn't have and will never have).
Eventually it'll work both ways. I'm hoping this is a big help for adoption, as gives you two formats for the price of one (just write an importer for MNX, and you can get MusicXML import support "for free" if you use the library).
I know JSON doesn’t have comments, but JS and JSON5 allow for comments. It would be super nice to allow for comments because you can hand annotate sections of the MNX file for the purposes of teaching.
Thanks! We're not planning to support inline comments at this time; this was a tradeoff we knew we'd have to make when we decided to use JSON.
Given the choice between supporting comments and supporting a wider variety of implementations/libraries ("plain" JSON as opposed to a comments-supporting variant), I think the latter is a more practical priority.
With that said, we'd like to add a standard way to add vendor-specific information to an MNX document — which is definitely a must-have, for applications that will use MNX as a native format — and I could see a comments-ish thing appearing in that form.
Regarding that examples page, I'm actually planning to do something along those lines anyway. The MusicXML docs and the MNX docs use the same system (a Django app), and the MusicXML part uses a custom XML tag to define "this part of the XML example should be highlighted in blue" (example: https://w3c.github.io/musicxml/musicxml-reference/examples/a...). It's on my to-do list to implement the same thing for the JSON version — which is essentially like inline comments(ish), if you squint.
Looking forward to discovering this standard. After 2 years working on parsing ABC, I realize how difficult it is to represent notation. Kudos on this effort!
I've written two music apps that use MusicXML as their native representation (https://woodshed.in is the newer one), so I've been involved in this world as well.
MusicXML is a great effort to tackle a very difficult problem, but some of the details can get rather hairy (e.g. having to represent many concepts twice, once for the visual aspect, and once for the performance aspect; or how exactly to express incomplete slurs). Interoperability in practice seems to be fairly limited (Possibly because for many music programs, MusicXML import/export is an afterthought).
One of the biggest contributions a new standard could make is to provide as complete a test suite as possible of various musical concepts (and their corner cases!) and their canonical representation. It looks like MNX has already made good efforts in this direction.
In what way is JSON a step down versus XML? Frankly I get nervous and sweaty every time I need to deal with XML, because of the inherent performance/security issues it brings into my codebase.
XML is much more precise and much more flexible. It also benefits from much more powerful and mature tooling. The few comparative downsides it has include verboseness, which doesn't matter to machines, and that younger devs don't know how to work with it, which again shouldn't be much of an issue in this use case.
XML is less precise, because it's more flexible. Powerful and mature tooling only matters to people creating and editing XML, not computers. XML Schemas are there to support human editors, not computers. Verboseness means larger files and longer processing times, which does matter to computers; the verbosity is explicitly and only there for human readers and editors. JSON is a much better format for something humans only occasionally look at.
You're wrong in almost every point. Flexible definition of data allows for greater precision of data structure. Something like XSLT isn't for human interaction; it's for super-powerful machine transformations. I'll grant XML files are larger, but not enough to make much difference when opening a music notation file. Any time a dev pushes back against XML, it's been my experience they are uneducated on the subject.
The reason why you don’t need some big, formal JSON schema (though they do exist) is because you can notate most of the constraints people care about in TypeScript. It’s just a bunch of structs, arrays, string enums, etc. XML doesn’t really have a nice mapping to type systems like that so it needs schemas.
My coup de grace against XML is that it is wholly unsuitable for serializing arbitrary strings in most programming languages. It’s defeated by quite simple strings like “hello\0world”. You can’t just escape the null using &#; because the standard, in its infinite wisdom, forbids it. Instead, you’re just expected to come up with some completely non-standard way like <null-char /> or just interpret “\0” specially in your application code. Meanwhile, JSON just lets you put pretty much whatever Unicode you want into a string with a standard way of escaping characters like the double quote.
> Flexible definition of data allows for greater precision of data structure.
Other way around. If there are 3 different ways of doing something, the data structure is less precise. Say you have an object with a Name as a string. You know exactly what this is going to look like in JSON:
{
"name": "Whizzbang"
}
How is it going to look in XML? It might look like this:
<Foo name="Whizzbang" />
It might look like this:
<Foo>
<Name>Whizzbang</Name>
</Foo>
It might look like this:
<Foo>
<Name value="Whizzbang" />
</Foo>
XML is less precise because it's more flexible. "But you just define an XML Schema to disambiguate" -- so now you're doing more work in a separate file that you have to publish and link in just to solve a problem that JSON doesn't have at all.
> Something like XSLT isn't for human interaction; it's for super-powerful machine transformations.
It's only "super-powerful" considering XSLT as a markup language. Considering it as a programming language, it rather sucks. If you're writing ETL scripts to transform data around, use an actual programming language. The fact that XSLT is Turing Complete only drives home the point that it's not a "powerful markup language", it's "poorly-designed programming language". Sure, if you're literally only transforming the exact same data from one XML schema to another, as some sort of adapter step, then XSLT beats general-purpose languages; but you're never just doing that, are you? You're linking in other data sources, validating things, sending things over the wire, etc. You already need the code to save and load your XML into a format you prefer in memory; just use that format for these tasks.
> Any time a dev pushes back against XML, it's been my experience they are uneducated on the subject.
Many people who push back against XML are not uneducated, but rather jaded on it, having worked with ambiguous formats, buggy schemas, and 4000-line long configuration behemoths that should have just been code. You can use XML parsimoniously, but there's not much overlap between the people doing that and the people who love XSLT.
Generally agree here. Just wanted to throw out another variant, additional tags and attributes can be added or modified by XML DTD files as well. The example XML could be just "<Foo />" with the additional values generated by a DTD file.
My experience has been similar to the immediate parent although I think xml looks better than json here. All fun and games till you forget to turn transforms and the 10 other knobs in your parser off.
The dev will be like "okay what are transforms though?" as they watch new calculator.exe instances popping up on the screen.
It's computer readable and (with highlighting) human readable. The JSON brackets are visually jarring. I feel that the XML is almost self documenting whereas I'd probably need the schema to understand the JSON.
I like JSON for data transfer but for describing documents XML is decent.
MusicXML only works with conventional western notation, but MNX is attempting to expand that significantly. How far they will get with that, I don't know.
They're totally different things, though the standards are maintained by the same people.
SMuFL is a font layout specification. It solves the longtime problem of "I'm making a music font. Which Unicode code glyph should I use for a treble clef?" For many years, this was a Wild West situation, and it wasn't possible to swap music fonts because they defined their glyphs in inconsistent ways. This problem is basically solved now, thanks to SMuFL.
MNX is a way of encoding the music itself. It solves the problem of "I have some music notation I want to encode in a semantic format, so it can be analyzed/displayed/exported/imported/etc."
MusicXML is old hat. All the cool kids are using MusicJSON now.
EDIT: I'd like to clarify that I posted this comment, as a joke, before the below comment went on to clarify that there was, in fact, a JSON-based rewrite of the music standard in progress:
I have not had much success using MusicXML to switch between different notation programs. Trying to read a score exported from Musescore as MusicXML in Sibelius or vice versa feels worse than switching between Microsoft Office and other ostensibly compatible formats.
Music notation is incredibly complex, and there are many places things can go wrong. There's a wide spectrum of error situations, such as:
* The exporting application "thinks" about notation in a different way than the importing application (i.e., it has a different mental model).
* MusicXML provides multiple ways of encoding the same musical concept, and some applications don't take the effort to check for all possible scenarios.
* Some applications support a certain type of notation while others don't.
* MusicXML doesn't have a semantic way of encoding certain musical concepts (leading applications to encode them as simple text (via the words element), if at all.
* Good ol' fashioned bugs in MusicXML import or export. (Music notation is complex, so it's easy to introduce bugs!)
> MusicXML provides multiple ways of encoding the same musical concept, and some applications don't take the effort to check for all possible scenarios.
This sounded interesting, so I went to the webpage, and found this point specifically called out:
> It prioritizes interchange, meaning: it can be generated unambiguously, it can be parsed unambiguously, it favors one-and-only-one way to express concepts, and multiple programs reading the same MNX file will interpret it the same way.
But I'm curious to see some examples of this. https://w3c.github.io/mnx/docs/comparisons/musicxml/ provides an interesting comparison (and calls out how the same MusicXML can be interpreted in different ways for things like octave shifts), but it would be nice if the page also included alternate ways that MusicXML can represent the same composition and talk about how certain programs end up misinterpreting/misrepresenting them. The Parts comparison, for instance, mentions how you can represent the same thing in two different ways in MusicXML (score-timewise and score-partwise), but only provides an example for one (score-partwise), and doesn't go into much more detail about if this leads to ambiguity in interpretation or if it's just making things needlessly complex.
Thanks, that's good feedback — will add that to the to-do list.
Just to give you a quick response: look into MusicXML's concept of a "cursor". Parsing a MusicXML document requires you to keep an internal state of a "position", which increments for every note (well, careful, not every note -- not the ones that contain a "chord" subelement!) and can be explicitly moved via the "backup" and "forward" elements: https://w3c.github.io/musicxml/musicxml-reference/elements/f...
For music with multiple voices, this gets easy to mess up. It's also prone to fractional errors in music with tuplets, because sometimes software chooses to use MusicXML position numbers that aren't evenly divisible into the rhythms used in a particular piece of music. That can result in a situation where the MusicXML cursor gets to a state that doesn't actually align with any of the music.
That sounds like a nightmare to deal with, I'm surprise you don't mention this in the comparison example for multiple voices.
Another suggestion: you highlight the MusicXML elements being discussed in blue, but not the MNX elements. Especially on the longer examples, highlighting the relevant MNX elements would be nice.
I recently used a funny workflow involving MusicXML. I wanted to learn a song that I only had sheet music for and not being much of a sightsinger, I had manually input the sheet music into Vocaloid so I could sing along with it (OCR exists but in my experience is in such a sorry state and requires so many manual fix ups that for the moment it's easier to type it in manually. As for enterring the data I have experimented and I'm significantly faster and more accurate with a piano roll than typing note names in musescore).
Now as this song had nonsense lyrics and many repetitions and almost-repetitions, the structure of the song didn't quite pop out to me, so what I did was export a midi from vocaloid that I opened musescore. From musescore I then exported it as MusicXML. I opened that in Notepad++ for the sole purpose of pretty printing the xml to normalize the texual representation and saved it right back. I took that and opened it in a jupyter notebook where I scraped it for <measure> elements with regular expressions and then I searched for repeating ones, that I assembled into repeating segments and sub-segments.
This helped me memorize the song.
What I liked about MusicXML was that it was self-documenting enough that I didn't need to reference documentation and I could find candidates for normalization quite easy (for instance I didn't care about directions of stems or inferred dynamics).
A gotcha is that Musescore 4 has a bug where it doesn't show the "midi import" where you can adjust the duration quantization, this didn't matter to me for this song, but I did bite me once in the past when opening a midi from Vocaloid. Musescore 3 works. Without playing around with that there can be an issue where it infers 16th notes as staccato 8th notes and similar.
Anyone remembering IEEE 1599? Seems to share a lot of goals.
And there are actually a lot of alternatives, e.g. ABC notation, Alda, Music Macro Language, LilyPond, to name a few. Difficult to decide which one to prefer.
MusicXML seems to be more for notation and sheet music typesetting rather than algorithmic operations on the notes themselves. Sure you could train a model on it but you'd be better off doing it on the specific domain and classically translating up to the XML format.
Right, but sheet music is ubiquitous in countless musical contexts and there's very little attention to it from the ML side. Sheet music is somewhat arduous to create and there is definitely room for a lot of automation and ML could help out a lot. I experimented with a tokenizer / GPT-2 (decoder-only) model for MusicXML (https://github.com/jsphweid/xamil) that is able to generate single staff music somewhat coherently. But it's just a first step and I don't care about generating (EDIT: hallucinated) music. Ideally we could add an encoder part to a model like this that takes in MIDI tokens and spits out sheet music. But I haven't gotten that far and don't have the ML chops to do it at this time. But it shouldn't be impossible.
For now, between the state of the art source separation models (e.g. demucs) and transcription models (e.g. Magenta's MT3) the last mile seems to be MIDI -> MusicXML IMO. But yes, I suspect it'll become more end-to-end ML in time.
Note that MIDI is a lot more effective when it comes to ML/AI, since it's multiple orders of magnitude less data. Daniel D. Johnson's (formerly known as Hexahedria, hired by Google Brain) model biaxial-rnn-music-composition is from 2015, requires very few resources for training or inference, and still delivers compelling, SOTA-or-close results wrt. improvising ("noodling") classical piano. https://github.com/danieldjohnson/biaxial-rnn-music-composit... You may also want to check out user kpister's recent port to Python 3.x and aesara: https://github.com/kpister/biaxial-rnn-music-composition (Hat tip: https://news.ycombinator.com/item?id=30328593 ).
Music generation from notation is pretty much the MINST toy-scale equivalent for sequence/language learning models, it's surprising that there's so little attention being paid to it despite how easy it to get started with.
MIDI is absolutely horrible for ML. It lacks very necessary information such as articulation etc which are important to make sense of music. It's popular because it's simple but there is no way to understand music by just looking at MIDI.
I'm a hobbyist in this space (am a composer myself as well a software engineer) and currently all tools are very poor. MusicXML is better than MIDI. MEI [1] is better than MusicXML etc.
The problem is there is miniscule amount of effort and money spent into this field because music overall makes peanuts. It really doesn't justify training expensive ML algorithms which is unfortunate.
> MIDI is absolutely horrible for ML. It lacks very necessary information such as articulation etc which are important to make sense of music.
This depends enormously on the instrument. Consider someone playing a piece live on a keyboard: we can keep a MIDI recording of that and we've captured everything about their performance that the audience hears.
It depends what you're trying to do. If you're trying to generate sheet music that's pretty to look at and easily understandable to a performer, then yes obviously it's not enough. If you want notes that will actually sound good when played back, it's hard to beat it.
Are you aware of the system I linked above? D.D. Johnson has a blogpost https://www.danieldjohnson.com/2015/08/03/composing-music-wi... with plenty of examples of what an instance of his model can generate. It may not be all that "good" in an absolute sense, but it's at least musically interesting, the opposite of elevator music. (There's also a proprietary model/AI called AIVA about which very little is known, but it does seem to be bona-fide AI output - albeit released in versions that have been orchestrated by humans - based on what it sounds like.)
it sounds like randomly generated MIDI... Doesn't sound like anything to me at all.
Music is very subjective but I've so far seen no model that's convincing. If you like it that's cool I suppose. I personally use algorithmic composing plenty in my own compositions (I write music for piano) and these kind of models don't do it for me. They're definitely tools, you can use them like ChatGPT to get a sense of things but we're decades away from producing "music" this way imho.
Well yes, this model like others is quite far from giving you a finished piece. But if it's giving you "a sense" of possibly useful ideas, that's enough to make it more than "random". (Besides, I'm not sure that we would even want AI to produce music on its own with zero human input - what would be the point of that? So "just noodling around, giving you some ideas to get started with" is quite good as far as it goes.)
My point is that -- in my experience and musical taste -- deterministic algorithms (e.g. literal scripts you write yourself to generate MusicXML, MIDI, lilypond, PDF etc) are orders of magnitude more useful than these neural network ML models that give you monolithic chunks of music. You can still use NN models in your scripts (e.g. have a model to determine chord distance, tonality etc) but there is no universal musical model that has come remotely close to convincing me so far. Of course, when it comes to Western classical music, "counterpoint" is probably the closest you can get to a universal musical model that you'll attempt to find in all pieces of music, but even that comes nowhere near even remotely close to explaining great majority of musical statements (even in something as contrapuntal as Bach). Especially when we come to 20th and 21st century music when people actively react to this model.
Yes it depends on what you're trying to do. If you're looking for something to automate part of your composition and make it an "algorithmic" piece where the computer picks the notes, these models are just too limited for that, at least so far.
BTW, "counterpoint" generally refers to one facet of how Western music works, the process of setting "note against note" in musical lines (or "voices") that preserve some kind of autonomy. But there's many other things that explain what makes music sound good, both within a single line and on a broader view, where music is written to target "rest points" or "key areas", and repeat or develop "thematic" material.
(The model I pointed to above doesn't even try in the least to explore these broader-scale things, it's trained on a very small-scale view of its input. It deals pretty well with counterpoint, and the inner workings of a single line. It ends up doing interesting things nonetheless when trying to make sense of its music as it randomly drifts out of the established scale - ISTM that it sometimes ends up changing key as a result of alterations in melody, not always in the background harmony. One could see this as a kind of very light and subtle atonality, even as part of what's clearly a 'tonal' style. It also knows about different historical styles within tonal music, and manages to overlay and transition between them quite well IMHO.)
It was originally going to be in XML but they recently switched to JSON, which is a good move, I think. I can't wait for it to be adopted as it will give so much more richness to the data set.