I'm so excited and curious about this that I can't even structure my thoughts well. They're just flooding my brain.
What part of this is pre-coded? What part is being generated? Is the goal to give a program some sheet music (maybe a MIDI file) and it figures out the fingerings[1] and then translates the fingerings into kinematics?
Because if that's the goal... Holy forking shirtballs that would be amazing. One of the trickiest things for me as a novice pianist is figuring out the fingerings to a piece. It's like a puzzle you work at until you've figured out what's comfortable. It's all about lookahead. "This section generally goes down so I probably want to begin with my pinky and not my thumb."
And if it got really good at that, not only are the fingerings useful, but maybe we could get feedback on how physically demanding a piece is. Another challenge I've discovered as a novice is that it can be surprisingly tricky to look at sheet music or hear a piece and determine if it's as easy as it sounds. Some pieces require some very complex fingering.
[1] what pianists call the determination of what fingers go where, not just to play certain notes together, but to ensure you can fluidly and comfortably play the next notes as well.
The demo you are watching is an agent trained from scratch with reinforcement learning. It has roughly 6 days of experience (10M steps at 20 Hz). The Javascript demo is replaying the policy open loop which is why it's not super robust to disturbances.
Re:fingering: we actually use fingering information to create a dense reward for the agent (otherwise it makes exploration super hard). It would be an exciting future direction to have the agent discover and optimize for fingering that best suits its kinematics :) And beyond that, having RL inform pianists about the difficulty of a piece or even more optimal fingering would be amazing.
We trained a bunch of these policies on roughly 150 songs (baroque, romantic, classical) and we did some analysis in the paper if you're interested: https://kzakka.com/robopianist/robopianist.pdf
There are two motions in particular that pianists use constantly that don't seem to be represented in the robot model, if you're looking to get closer to the way that human limbs and digits operate. (Naturally there are plenty of other goals, but if you can imitate human playing you can do things like suggest fingerings or assess difficulty, as you say.)
1) turning at the elbow (so that your forearm can make an angle with the piano keyboard instead of always being perpendicular to it). It looks like you translate the forearm back and forth instead, which I assume must be a lot easier to handle because of course it's not how human arms work.
2) rotating the forearm/wrist (like turning a doorknob). Pianists do this on basically every note to a greater or lesser extent. To take an extreme example, if you alternate notes with your thumb and pinky you are almost completely using your wrist and not your fingers. Without this degree of freedom it is not really possible to emulate a competent pianist, if that is one of the eventual goals.
This is insanely impressive.
For fingering, in the right hand I typically put my pinky on the highest note for a phrase, it feels more comfortable and you can accent it more than the middle fingers. In the left hand I typically put the bass note in the pinky as well. The middle fingers aren't as dextrous so I use them less, though a concert pianist could probably use your fingering.
Overall technique wise, human hands cup their hands more, the palm is arched where the robot's is flat. But who says it needs to model humans exactly.
I can't believe this is working in three.js! Amazing work!
Nice project! Anyway, one of the unrealistic details is that the robot in the simulation curls the fingers when it is not using them. In particular the pinky finger. Can that be fixed in a future version? For comparison, I got this as the first result in Google https://www.youtube.com/watch?v=cGYyOY4XaFs
It's also strange that all fingers are always parallel, but I guess that adding that freedom makes the search space huge.
I don't think the intention of this simulation is to be realistic. This particular agent just learned to play the music it was reinforced to learn given the physics constraints programmed for the hand mechanics (as far as I understand it). I doubt the physics emulate our human hands very accurately so I wouldn't expect it to be "realistic" or something that needs to be "fixed" unless the specific intention was to optimize actual human hand movements.
Yup, we're not trying to mimic human movements exactly but rather optimizing for the reward given the robot hardware. Fun fact, we do things like add an energy penalty to try and reduce jitteriness / un-human like movements and it does help enormously.
I understand that the research objective is not a human like movement, but I think changing the rewards to keep the fingers straight will get nicer videos to show, and I don't expect it to be too hard.
Another question: The pinky finger is not shorter than the other fingers. Can it be a problem for the robot to use the human fingering?
Fingering is harder than it seems, especially once you start to take into account speed, fingerings that work when playing slow may not necessarily work when playing fast. And individuals have different hand spans so a fingering that works for one person may not work for another.
If you crack this in a deterministic way it would be super useful as a library.
Piano fingering is a very subjective matter, because every hand, finger, arm and taste is different. I doubt that a robot fingering is of any use. You have to find it out for yourself under guidance of a teacher and maybe inspiration by fingerings of master pianists written in the score. The Henle app has a feature to show them separately. They differ vastly for the same piece.
My taste, my pianistict abilities, my arm, shoulder and body movements and my musical idea for the interpretation and imagination of the sound of the piece can not be seen in a photo of my hand. Not even a good teacher can give you a finished perfect fingering for you, I doubt that a dumb machine can do any better, especially if it plays that bad as the examples sound in comparison to a even beginner human. You have to find the fingering out for yourself and this is part of the great joy of playing / learning piano.
What if the AI gets to see you playing sample pieces or even if you have say a weighted keyboard that it gets to see the forces you can apply? I mean I've seen the Australian Sports Academy do all sorts of video and biomechanical instrumentation of elite athletes with the aim of improving their technique and provide a customised training regime. I can't see why it can't be used in music performance which in many ways is just as much athleticism as it is art.
Given sufficient information about your personal biomechanics, a sufficiently advanced AI may be able to suggest the right fingering for you. But the main problem with this (and I think this is something that many in this thread simply don't have the familiarity with) is that fingering is simply a 2D projection of the multiple-dimensional problem of how you need to move your entire upper body to play a piece.
I'll use a piece that I am practicing right now to illustrate: Chopin Etude Op 25 #1 ("Aeolian Harp").
For intermediate pianists (perhaps even beginners), the possible fingerings are actually really obvious just from looking at the notes, especially when using the suggested fingerings as a guide. This piece is structured around playing broken chords in circles, so there aren't really any fingering tricks here.
Notice the chords like the first right hand chord on the second bar on the second line, or the simpler left hand 4-note chords on the third line of the second page. Despite the obvious fingerings, somehow you need to figure out how play a broken chord that spans 15 keys, a distance that no one can comfortably cover by just stretching thumb and pinky. And that is because this piece is a study in the circular motion of the wrist (and really, the entire arm). If you do not realize this and try to simply try to stretch your fingers to go from key to key, not only will that limit your ability to increase your speed, but will build tension in your wrist as you go through this piece and eventually lead to injury. Not to mention that it really hurts to stretch your fingers with a static wrist.
(In my Jan Ekier edition of this piece, some of these ~15 key chords have two fingering suggestions that you can play with to decide which one you prefer.)
It may eventually be solvable, but this is a multiple dimensional problem, and a useful AI for this will need to give you a solution in multiple dimensions. If an AI can teach me all the motions of Chopin's etudes and allow me to just think about how to voice these pieces, maybe I won't need a teacher anymore.
Most of the information a pianist uses for fingering are the feeling of the hand and the sound of the piano while playing. This is not visible. Eyes are of little use for musicians. There are blind master pianists.
Hi author here! The app should work on mobile/desktop and was tested on both Safari and Chrome. I've heard it's buggy for some people (unclear if it's an older hardware problem) but you can try this embedded demo which works better: https://kzakka.com/robopianist/#demo
I had been playing with the idea of creating a browser-based virtual piano for when I'm travelling and don't have access to a real piano but have my laptop with me. The idea would be to point the webcam down at the table between me and the laptop, and play on the table as if a piano were there. Then use the mediapipe framework [1] to capture finger positions, and use those to update a virtual environment like the one you have here.
I put it on hold due to the significant engineering required, but it seems you have already implemented (and open sourced!) the browser-based piano simulation component.
A quick scan through your repo indicates that this is all implemented in Python. I see that you are using mujoco_wasm [2]. Can you please comment on what is required to compile your project to work in the browser?
That sounds like a great idea, I think you should go ahead and build it because this project is kinda completely different. And mediapipe is surely the way to go for a project idea of yours, since hand-tracking is quite robust with it.
Pretty interesting. Just a heads-up: On my system, in desktop Safari, the simulation stalled almost immediately. Upon reload, the fingers moved but no sound came out. After I turned up the volume, I opened the control panel and suddenly the sound started working and blasted me out. Beware!
This is really freakin cool ! As a mechatronics engineer, this is the sort of thing I imagined I would be doing. This really makes me want to dive deep into RL
and while the refinements in movement finess and control are obviously a needed thing, out of ignorance, what other abilities will this allow?
I assume that it will allow for a much more finessed touch control of hands/digits, and, coupled with sensors, as they evolve, be able for much more fine crafts - such as embroidery?
Uncaught Error: buffer is either not set or not loaded
ti https://unpkg.com/tone@14.7.77:1
start https://unpkg.com/tone@14.7.77:21
triggerAttack https://unpkg.com/tone@14.7.77:21
triggerAttack https://unpkg.com/tone@14.7.77:21
processPianoState https://kevinzakka.github.io/robopianist-demo/examples/main.js:174
render https://kevinzakka.github.io/robopianist-demo/examples/main.js:255
onAnimationFrame https://kevinzakka.github.io/robopianist-demo/node_modules/three/build/three.module.js:27951
onAnimationFrame https://kevinzakka.github.io/robopianist-demo/node_modules/three/build/three.module.js:12661
Indeed, browsers are pretty aggressive about not autoplaying sound, you need to interact with the screen (mouse click or finger tap + make sure your sound is up / ringer not on silent).
Oh dear, it's like listening to a 12 year old practicing before their lesson. But hey! Well done Robot for the achievement. I bet in no time you'll be https://youtu.be/e37NxUtFQSo. Good luck with the training!
They claim to be able to generate fingerings and animations from raw audio. The video demos give an idea of the output, but they seem to be shutting down.
I don't really have a background in ML or anything but how generalizable is this? Is the goal to have a trained model for each specific using this framework? It's pretty amazing, I love seeing anything robotics related in the browser.
Bug: if the WebGL is too slow (e.g. Haswell IGPU at 4K resolution), the song stalls completely. It should still play with reduced fps for rendering. Resizing the window down lets it proceed. Maximizing it again stalls again.
We have the sustain pedal implemented in the standalone MuJoCo simulation, e.g. see https://www.youtube.com/watch?v=VBFn_Gg0yD8. I just couldn't figure out how to do it with Tone.js :(
It seems to play with the tops of the fingers rather than the pads of the fingertips at times. In 30 years of playing, I never even thought of trying that. I'm kind of amazed.
This is cool, but thank you for introducing me to mujoco. This could be exactly the library I've been looking for when it comes to simulating a fencing engine.
I think it's because the 38 muscles they have given their robot hand don't map perfectly with real muscles in a real hand.
In particular, there seems to be not much ability to move each finger left or right. For example, actuator 14 lets the baby finger swing outwards - but now try to stretch your baby finger out, and you see it can swing much further than this robot finger can swing out, and the pivot is closer to your wrist, allowing more reach.
From their whitepaper: "Fingerings in the dataset were provided by experienced pianists who graduated from a music college or who had played the piano for more than twenty years. The pianists were asked to choose pieces that they could play and provided the fingering that they had actually used for the performance."
What part of this is pre-coded? What part is being generated? Is the goal to give a program some sheet music (maybe a MIDI file) and it figures out the fingerings[1] and then translates the fingerings into kinematics?
Because if that's the goal... Holy forking shirtballs that would be amazing. One of the trickiest things for me as a novice pianist is figuring out the fingerings to a piece. It's like a puzzle you work at until you've figured out what's comfortable. It's all about lookahead. "This section generally goes down so I probably want to begin with my pinky and not my thumb."
And if it got really good at that, not only are the fingerings useful, but maybe we could get feedback on how physically demanding a piece is. Another challenge I've discovered as a novice is that it can be surprisingly tricky to look at sheet music or hear a piece and determine if it's as easy as it sounds. Some pieces require some very complex fingering.
[1] what pianists call the determination of what fingers go where, not just to play certain notes together, but to ensure you can fluidly and comfortably play the next notes as well.