Love ggwave! I used it on a short film set a few years ago to automatically embed slate information into each take and it worked insanely well.
If anyone wants details: I had a smartphone taped to the back of the slate with a UI to enter shot/scene/take and when I clicked the button it would transmit that information along with a timestamp as sound. This sound was loud enough to be picked up by all microphones on set, including scratch audio on the cameras, phones filiming BTS, etc.
In post-production, I ran a script to extract this from all the ingested files and generate a spreadsheet. I then had a script to put the files into folders and a Premiere Pro script to put all the files into a main and a BTS timeline by timestamp.
Yes, timecode exists and some implementations also let you add metadata, but we had a wide mix of mostly consumer-grade gear so that simply wasn't an option.
One of the nicest data through sound implementations I came across was in a kid's toy (often the best source of innovation)
It was a "Bob the Builder" play set and when you wheeled around a digger, etc the main base would play a matching sound. I immediately started investigating and was impressed to see no batteries in the movable vehicles. I realised that each vehicle made a clicking sound as you moved it and the ID was encoded into this which the base station picked up. Pretty impressive to do this regardless of how fast the vehicle was moved by the child.
The acoustic modem is back in style [1]! And, of course, same frequencies (DTMF) [2], too!
DTMF has a special place in the phone signal chain (signal at these frequencies must be preserved, end to end, for dialing and menu selection), but I wonder if there's something more efficient, using the "full" voice spectrum, with the various vocoders [3] in mind? Although, it would be much crepier than hearing some tones.
I'm wondering if shifting frequency chirps like LORA uses would work in audio frequencies? You might be able to get the same sort of ability to grab usable signal at many db below the noise, and be able to send data over normal talking/music audio without it being obvious you're doing so. (I wanted to say "undetectably", but it'd end up showing up fairly obviously to anyone looking for it. Or to Aphex Twin if he saw it in his Windowlicker software...)
The issue is the (many) vocoders along the chain remove anything that don't match the vocal patterns of a human. When you say hello, it's encoded phonetically to a very low bitrate. Noise, or anything outside what a human vocal cord can do, is aggressively filtered or encoded as vocal sounding things. Except for DTMF, which must be preserved for backwards compatibility. That's why I say it would be creepy to do something higher bitrate...your data stream would literally and necessarily be human vocal sounds!
Yes. JT8 / FT8, wspr, and then the entirety of fldigi.
To get started.
If you need more speed you need to convince me you won't abuse my ham spectrum but winlink, pactor, and some very slick 16QAM modems exist. 300baud to 128kbit or so.
I love GGWave. We've been using it in our VR game to automatically sync ingame recordings with an external camera.
At the beginning of the recording it plays the code "xrvideo" which in the second stage of merging the video it looks for the tag in both streams and matches them up
ham optimizes for the wrong thing, imo. look at ft8: perfect for making contacts at low power with stations far, far away, but really only tuned to the particular task of making contacts.
you can package some text alongside, but fundamentally all amateur operators are looking for is a SYN / ACK with callsigns.
There's also JS8call which is a modified version of FT8 meant for actual communication. IIRC you can do some neat things with it, like relaying a message through another user if you don't have a direct path to the recipient.
As one of the accursed hams, I wonder what ggwave's propagation profile would be compared to RTTY / CW (Morse code) etc. Would be interesting to try it out.
This is cool! Some of Teenage Engineering's Pocket Operators, at least PO-32 [1], uses a data-over-sound feature.
Does Ggwave use a simple FSK-based modulation just because it "sounds good"? Would it be possible to use a higher order modulation, e.g., QPSK, in order to achieve higher speeds? Or would that result in too many uncorrectable errors?
it is a software modem using FSK, but i don't know anything else about it. I am annoyed because i could have had this idea; i'm a HAM who really only cares about "Digital Modes", and have software modems capable of isdn speeds over "AF"
That's really neat! I realize this demo is a contrived setup, but it is basically an example of what Eric Schmidt was talking about when agents start communicating in ways we can't understand.
In the spirit of abusing an error correction mechanism for aesthetics (see: QR codes with pictures in them, javascript without semicolons) could you do that here? How much abuse can the generated signal take?
Just listening to the samples here they're really not that far off. Could probably use a little softening at the edges on the higher tones but it's nowhere near as unpleasant as it could be.
I remember discovering ggWave few years ago, before the rebrand, it's still the only working( and fastest verifiable) library that can transmit data over sound.
I could not get to work on a project using this then, because of college. But now I am integrating this in my startup for frictionless user interaction. I want to thank the creators and contributors of GGWave for doing all the hard work for these years.
If I find something to improve I'd like to contribute to the codebase too.
All kinds of modems use this kind of scheme as well, PSK is too low-bandwidth for modern needs so everything is QAM these days. DOCSIS specifies I think QAM-256. Inter-datacenter fiber links use "modems" as well.
yes and also soundcard modems: https://i.imgur.com/8mhB4u7.png QAM16 over a PC soundcard into a radio. It's enough bandwidth to stream video between VLC instances. not "slow scan TV", either, fast scan.
Uh, don't try and find this if you're going to use it to pollute the spectrum i am licensed for.
If you're interested in using GGWave in Python, check out ggwave-python, a lightweight wrapper that makes working with data-over-sound easier. You can install it with pip install ggwave-python or pip install ggwave-python[audio], or find it on GitHub: https://github.com/Abzac/ggwave-python.
It provides a simple interface for encoding and decoding messages, with optional support for PyAudio and NumPy for handling waveforms and playback. Feedback and contributions are welcome.
> Bonus: you can open the ggwave web demo https://waver.ggerganov.com/, play the video above and see all the messages decoded!
I could not get this to work unless I played the video on one device and opened it on another. While trying to get it to work from my MBP, waver's spectrum view didn't really show much of anything while the video was playing. Is this the mac filtering audio coming into the microphone to reduce feedback?
If anyone wants details: I had a smartphone taped to the back of the slate with a UI to enter shot/scene/take and when I clicked the button it would transmit that information along with a timestamp as sound. This sound was loud enough to be picked up by all microphones on set, including scratch audio on the cameras, phones filiming BTS, etc.
In post-production, I ran a script to extract this from all the ingested files and generate a spreadsheet. I then had a script to put the files into folders and a Premiere Pro script to put all the files into a main and a BTS timeline by timestamp.
Yes, timecode exists and some implementations also let you add metadata, but we had a wide mix of mostly consumer-grade gear so that simply wasn't an option.
I posted a short demo video on Reddit at the time, but it got basically no traction: https://www.reddit.com/r/Filmmakers/comments/nsv3eo/i_made_a...