I did see that, though my interpretation is that breathing is included in its voice tokenizer which helps it understand emotions in speech (the AI can generate breath sounds after all). Other sounds, like bird songs or engine noises, may not work - but I could be wrong.
I suspect that like images and video, their audio system is or will become more general purpose. For example it can generate the sound of coins falling onto a table.
allegedly google assistant can do the "humming" one but i have never gotten it to work. I wish it would because sometimes i have a song stuck in my head that i know is sampled from another song.
I could be wrong but I haven't seen any non-speech demos.