Hacker News new | comments | show | ask | jobs | submit login

In the first clip, I'd say 80% of the soundbites were obviously robot-like, but one or two of the "Obama" quotes were startlingly clear - "The good news is, that they will offer the technology to anyone" - I can't hear anything wrong with that in the first clip at all. If they were all that quality I'd say we'd be easily fooled. As a proof of concept this is pretty big.



I can definitely hear issues with that phrase. It has quite robotic drop-offs.

Though coming soon: Neural networks to determine whether speech is NN-generated? :P


> Neural networks to determine whether speech is NN-generated?

I guess this would be an ideal use case for a generative adversarial network based approach.


This is likely part of how the speech-generating NNs are trained (ie. there's a generated-speech-detector and the network is trained to fool it, while it is also trained): https://arxiv.org/abs/1406.2661


And then the generators train their own generation NN against that :P


And you have invented Generative Adversarial Networks. They are the basis of all new ML findings, like pix2pix.


In a sort of Turing Test where I don't know who's a robot, or where I'm not even expecting a robot, it would probably be a bit harder.


The "Obama" material sounded quite good, but reverb can cover up a multitude of sins...


Huh.

That means that recordings of speeches/performances from concert halls are potentially suspicious.

Know of any other instances of reverb covering things up? This is interesting!


> "potentially suspicious"

History is now doomed. Crackly recordings are obviously fakeable. Children will listen to JFK's "We shall not go to the moon" speech, proof that the moon landings are a liberal conspiracy and all that grainy footage is just CGI with a noise filter.


This idea hit me harder than I expected -- I am reminded of the scene in Interstellar, where the main character's daughter's teacher asserts that we never went to the moon. I don't recall whether she genuinely didn't believe it, or whether she felt it was better to lie to the kids to keep them motivated in the present, but apparently we are getting much closer to having gatekeepers of knowledge be able to actively subvert the artifacts of history in ways which are indistinguishable to the audience. Scary stuff.


> I don't recall whether she genuinely didn't believe it, or whether she felt it was better to lie to the kids to keep them motivated in the present

She stated it as a simple fact and seemed to believe it. There wasn't any indication that she might think it was a fact of what one might call "political convenience". It made the scene just that much chilling to me: https://www.youtube.com/watch?v=MpKUBHz6MB4


This is nothing new.. "History is written by the winners..." comes to mind...


Wasn't it the main character's teacher?


There have always been Gate Keepers to knowledge. Technology is in many ways blasted down these gates for many.


Yeah, that's a real issue. You can already fake audio very very easily. People don't realize how constructed movie soundtracks are, and I don't mean in terms of special effects but just people talking in every day situations.


I haven't worked extensively in voice adaptation, but I learned working in text-to-speech that adding a bit of reverb is quite effective at covering up artifacts.

Something similar seems to be going on in live vocals. If you lack confidence in your own voice, adding a bit of reverb can make it sound much better. Not sure what's going on — whether the reverb jams the critical listening facilities in one's brain or something like that.


> Something similar seems to be going on in live vocals. If you lack confidence in your own voice, adding a bit of reverb can make it sound much better.

a.k.a. why your singing sounds way better in the shower.


Wow. TIL x 2!

Now I understand why I liked adding a bit of reverb when listening to old MOD/IT/S3M audio files - it covered up the "digitalness" of the song structure a bit.

Thanks for the live vocals tidbit too, that's definitely something to file away.

I wonder how far you could push that in a presentational context (ie, when giving speeches), or whether "who left the speakers in 'dramatic cathedral' mode" would happen before "I dunno what they did to the audio but it sounds great". Maybe if the presentation area was fairly open/large it could work; the question is whether it would have a constructive effect.


In my (admittedly limited) experience mixing for both recording and live settings, I can say that the "sounds great" comes a long way before the "dramatic cathedral mode". If you can listen to it and hear reverb (unless you're going for that effect) you're doing it wrong. What you want is a bit of fullness, slightly softer edges at the end of words/sentences.

It's similar to the difference between 24fps cinema and 60fps home video. The video/clean signal retains more of the original information and is "more correct", but 24fps/touch of reverb adds a nuance that keeps things from getting too clinical. As to why we interpret clean signal == clinical == bad.... I can't really speculate.


Brains are incredible differential engines not evolved to handle current technology.

A standard quality video is just a projection, a high quality video stream on a 4k set running full 122hz is a weird window from wich we don't get stereo depth clues. The brain constantly has to rethink it's not real as we shift the head and the pov doesn't adjust.


I tried an Occulus a while ago and found it to be quite unrealistic, digital and "fake." (And I only had it on for about 30 seconds but my eyes felt a bit sore afterwards!)

Once LCD density allows for VR with 4K (or, if needed, 8K) per eye... yeah :) we'll firmly be in the virtual reality revolution.

Obviously we'll also need tracking and rendering that can keep up but display density is one of the trickier problems right now.


Interesting.

> If you can listen to it and hear reverb (unless you're going for that effect) you're doing it wrong.

I was thinking precisely that; I figured it'd need to be subtle and just-above-subliminal to have the most effect.

Completely agree about the 24fps-vs-60fps thing. I think this is a combination of both the fact that the lower framerate is less visual stimulation, and that I'm used to both the decreased visual stress and the overall more jittery aesthetic of 24fps.

Regarding >24fps, I think how it's used is critical.

I remember noticing https://imgur.com/gallery/2j98Y4e/comment/994755017/1 (yes, a random imgur gif - discovering that imgur doesn't have an FPS limit was nice though). I think this particular example pushes the aesthetics ever so slightly, but still looks pretty good.

I don't know where I found it but I remember watching a 48fps example clip of The Hobbit some time back. That looked really nice; I completely agree 48fps is a great target that still retains the almost-imperceptible jitter associated with 24fps playback.

To me the "nope"/sad end of the spectrum is motion smoothing. I happened to notice a TV running some or other animated movie with motion smoothing on while in an electronics store a few months ago... eughhh. It made an already artificial-enough video (I think it was Monsters Inc University) look eye-numbingly fake (particularly because the algorithm couldn't make its mind up about how much to smooth the video as it played, so some of it was jittery and some of it was butter-smooth). I honestly hope the idea doesn't catch on; it'll ruin kids and doom us to having to put up with utterly unrealistic games.

But I can see that's the direction we're headed in: 144Hz LCD panels are already a thing, and VR has a ton of backing behind it, so it makes a lot of sense that VR will go >200Hz over (if not within) the next 5 or so years.

The utterly annoying thing is that raising the framerates this high almost completely removes the render latency margins devs can currently play with. Rock-steady 60fps (with few drops below ~40fps) is hard enough but manageable on reasonable settings in most games nowadays (I think?), but when everyone seriously starts pining for 144fps+ at 4K, it's going to get a lot harder to keep the framerate consistent - now that we've hit ~4GHz, Moore's law won't allow breathing room for architectural overhead as has been the case for the past ~decade, and with current system designs (looking holistically at CPU, memory, GPU, system bus, game engine) we're already pushing everything pretty hard to get what we have.

So that problem will need to be solved before 144fps+ becomes a reality. A friend who has a 144Hz LCD says that going back to 60Hz just for desktop usage is really hard because the mouse is more responsive and everything just "feels" faster and more fluid. I'm not quite sure whether the games he plays keep up with 144fps though :P

On a separate note, I've never been able to make the current crop of 3D games "work" for my brain - everyone's pushing for more realism, more fluidity, etc etc, and it just drives things further and further into the uncanny valley for me, because realtime-rendered graphics still look terribly fake. Give me something glitchy and unrealistic in some way any day.


I think what it does is mask/smear the fine detail - the texture of the sound - but in a way that we are used to, so still sounds natural.


Adding a bit of autotune has made whole careers...


Also - lowering the bit-rate can coverup other defects (e.g. phone call).

The cadence was a bit off/unnatural, but I'm sure that is not too hard to fix. Phone-in TV/Radio/web shows are about to get very interesting.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: