Hatsune Miku has, "performed at her concerts onstage as an animated projection".
Visually, it was totally convincing from the center of the floor at mid-distances. There were other characters who performed, they had costume changes, special effects, etc... It's a 2D effect but at concert distances, you can't really tell and the lighting of the character convinces your eyes.
There was one point where she put her foot up on a monitor speaker and my mind was blown until I realized that the monitor was part of the hologram too. It had been sitting there by the floor for the whole song waiting to have her put her foot on it.
But what if it was trained on a dozen top pop stars? A hundred? A thousand? At what point does the resulting voice no longer belong to a human in legal terms?
My guess is we'll find out relatively quickly as pop music is willing to stretch IP pretty far in pursuit of a hit song.
This is comfortably on its way out of uncanny valley. Very impressive.
Edit: the only thing that sticks out as being a little off is the pitch/intonation envelope. Some of the pitches are off the mark, and some of the glides between notes aren't quite what a human would do. The vocal tone is near perfect.
Pitch should be the easiest thing to fix. I wonder if that's an artefact of the training set.
Vocaliod has been around for over 13 years:
"In the initial evaluation of our system, we use three voices; one English male and female (M1, F1),
and one Spanish female (F2). The recordings consist of short sentences sung at a single pitch and an
approximately constant cadence. The sentences were selected to favor high diphone coverage. For
the Spanish dataset there are 123 sentences, for the English datasets 524 sentences (approx. 16 and
35 minutes respectively, including silences). Note that these datasets are very small compared to the
datasets typically used to train TTS systems, but this is a realistic constraint given the difficulty and
cost of recording a professional singer"
Just because a singer is professional doesn't mean they're any good. My wife copes with adversity by singing and she can sing "fuck fuck fuck shit shit shit" in soprano, on key, from the kitchen. The only thing keeping her from singing in public is her stage fright.
There are a /lot/ of people like her, that would answer an ad in the newspaper (or craigslist) that would like to /volunteer/ and contribute to a geeky project as long as they got credit in the paper.
At that point, the largest non-tech cost winds up being the studio rental fee, if you have one.
Here is their paper: https://arxiv.org/pdf/1704.03809.pdf