The author is right about full duplex audio. My opinion is that manufacturers long neglected audio, incorrectly thinking that video and eye contact was the key to more natural conversation (the pandemic did do something to correct that, though)
We run https://cleanfeed.net/ to focus on full-duplex low latency audio (no video). Of course we are dealing mainly with studio environments and good quality equipment, making it a different challenge to a regular meeting tool.
But since the first prototype, we were using it ourselves as an "always on" line between developers. The ability to have 3 or 4-way conference in full duplex with low latency appeared to foster much more of a 'human' relationship than the meeting apps. The lack of picture may even have a positive effect, because it was part of privacy that made people more relaxed.
This x 100. In the normal office environment audio cues are pretty much the main thing you can't bring home. Just hearing small conversations going on around you and being able to passively process them and jump in and out as necessary is a game changer for collaboration.
I have tried to build an "audio office" product based on low-latency spatial audio over the web but it's way too far outside my expertise. Glad to see that someone else is on it!
This freaks me out. You like ambiet conversation with discernible speech!?
I had someone send me a recruiting message on Stack overflow jobs once about a product like this, and it's one of the few times I declined on the basis I fundamentally disagreed with its existence. How do you focus?
While your statement is true it reads like a non sequitur because it misunderstands GP’s objection, which is not to others’ opinions but rather to unfiltered noise in the workplace. Or rather, to a product that pipes in noise that deliberately evades filtering. To focus properly I need silence or something equivalent. This has nothing to do with others or their opinion or my subjective evaluation of the same. It’s a matter of attention.
Yes, this was my interpretation. I don't mind cafes or whatever, because as many people mention, the blending together of all those convos is fine, and helps sometimes, but would lose my job in am office pretty quickly at this point. All it would take is one asshole crunching an Apple and I'm done
* Move away from the camera to show more of your torso. The audience will get to read more body language, if they can see more of your body.
* Move the audience’s video as close as possible to the camera (make it smaller), that will have you look like you’re “looking” at the camera versus off to another screen
* have the camera as close to eye level as possible
Mostly it is latency added by webcams, microphones(But our ears are used to delayed audio anyway) and the software. Your webcam probably has >100ms of latency then the app adds >50ms then multiply that by 2x (for two way conversation). Even a display like a TV can have loads of latency.
If you just open up a camera view there's huge latency without it even sending data anywhere.
The internet latency is probably only a small portion of over all latency and WiFi latency isn't much either. There's already lots of latency with audio (Perhaps having a microphone could make audio travel faster then through air?)
I wish people would take latency more seriously. Using a rolling shutter, encoding a single line at a time and sending it via UDP could probably save almost all the latency. Sub 30ms latency IMO would make it much better.
For distributed teams the Internet RTT between a pair of call participants is frequently higher than 300ms, and that's unloaded -- if the connection is marginal then average RTT could be much higher.
Wi-Fi is also a huge contributor to latency. In congested areas or with problematic APs or end stations it's not uncommon to see packets delayed more than 500ms by retransmits. The average might be lower, but with frequent excursions to much higher than average latency.
The thing which proved life changing for me was ditching WiFi for Ethernet in the home office. I wasn't able to lay cables but I did buy an ethernet over power adapter (you can also get them for coaxial). Speed took a hit but I didn't have the same issues I previously had with packet loss, even after manual tweaks to avoid interference with neighbours.
Worst case extra latency added by a Bluetooth headset is less of an issue, you are close and congestion is less of a problem.
Home WiFi generally has really bad worst case latency and variability (jitter), removing it from the path is one of the best things you can do to improve realtime communication performance.
Bluetooth and Wi-Fi use the same frequency bands. Bluetooth radios are much weaker than wifi radios. Bluetooth devices frequently give up much faster than wifi devices. Bluetooth devices are also much more latency sensitive than wifi.
When it's a fight between Bluetooth and wifi, Bluetooth almost always loses -- big time.
This is oversimplifying things. Bluetooth devices are typically much closer than WiFi devices, and need less bandwidth. The protocol has less overhead (on modern BT devices). It's true that implementation quality of BT devices is typically severely lacking, but with a good quality implementation (e.g. Apple AirPods with a Mac or iOS device), the WiFi is far more likely to be the weak link in a home videoconferencing setup.
I frequently hear this sentiment, yet in my country I struggle to find evidence of more than a few ms of wifi vs cabled connections. Whereabout do you live that this is such a huge issue? Is it a large, dense metropolis like Beijing or Singapore?
Usually it's not the average latency but the random spikes that kill your responsiveness. My wifi ping is usually within a couple ms of wired ping, but packet loss / latency spikes are a daily occurence
It looks like no one has mentioned the #1 cause of jitter (not latency per se which is an average value), which is WiFi scans.
Due to bad engineering, a WiFi chip can either be scanning or transmitting, and background location tasks can request a WiFi scan at any time. Depending on your OS and what else you run, this can cause stutters in the 250ms-1second range.
I'm in the UK and live in a student area and it's a huge issue. There are some times of day where the Wifi speeds drop massively (I assume interference and madly swapping channels) - but wired stays constant.
I have read articles like this. I comment specifically on this subject because I used to believe them without testing them myself. When I did start testing it myself I quickly discovered these rules just do not hold true where I am (Europe), so I am wondering where these claims come from.
once you have to scroll in the ap-list there is a very good chance that you are contested on air-time because some neighbour is active on any given channel. [0]
add to that interference from non-wifi or other unclean rf-equipment like babymonitors, wireless doorbells or bluetooth ... 2.4ghz is quite crowded, 5ghz is crippled by a wall...
I consider this irresponsible in the face of the Zoom CEO saying to moneteize the knowledge they have about you. Very different from face-to-face in my opinion.
Face-to-face is not only about interaction of people present, it's about eavesdropping as well.
Praise for the rest and for caring about the other participants.
I sorted through the many headsets at current when I joined and settled on the relatively cheap logitech H340, and made it standard at the company. You can hear yourself and the microphone is very good. I consistently get told that I have better audio than others, and wired headphones just work. One other nice effect is that as communication headphones they don't pass as much very high and very low frequency stuff as hi fi headphones (despite their claimed frequency responce), and this cuts a lot of background crap out. I was on a group call with one person in a car, and the people wearing bose headphones complained about the auwful background noise... which I could not hear. This is a lesson from aviation and radio, you only need 300hz-10khz for voice.
11. Turn off the view of yourself. In almost no other situation I know of are you staring at a mirror of yourself while you're talking to someone else.
First thing I do in Zoom, immediately followed by repositioning the window right below my DSLR camera up above so it looks like I am looking people in the eyes
Impossible in Google Meet or Teams, which really sucks.
It seems that this should be a feature of the software too. You should be able to tell the software where your camera is, and you should be able to tell it to move the picture of the talking person (or a person that you select, manually) closest to the camera, always.
Exactly right. I look up into lens thatbis just on the top of monitor. the angle between camera lens and people's faces us just a few degrees. I often shift my gaze straight into the lens too, just because I can...
Recently started doing this and it feels like it's freed up around 25% of my brain cycles to concentrate and think about what's actually happening in the meeting.
Now I wonder what the unintended effect of this deepfaked eye contact is on the conversation. Breaking eye contact, in a variety of different ways, is an important part of body language
The impression I get from those tweets is that it’s not making the eye look as it it’s always looking at the camera, it’s just adjusting so that if you look at the screen (the persons eyes in e.g. FaceTime) it looks like you are looking at the camera.
So if you break eye contact with the image of the person on the screen, it will show you breaking eye contact in your video.
Isn't there already eye-tracking software that can tell where you're looking? If you can fake the eye position, why not keep faking it to track if the person looks away?
If you have a GPU, also check out Nvidia broadcast. It removes webcam background sure, but it can also do some seriously high quality noise cancellation, much like the krisp.ai service does from the article.
It can even run the same noise cancellation algo on incoming audio! So you can filter colleagues' noise for yourself if they don't want to bother
I have a laptop with an Nvidia GPU and gave Broadcast a try but ended up going for Krisp. It was overheating my machine and I was often working in places where other people are talking, which Nvidia's noise cancellation wasn't really useful for. Krisp has this handy "voice cancellation" feature that you can train to recognize your own voice and it cancels out other people's voices.
I said this last time and I'll bring it up again: quality of equipment is not the problem, poor UX design is the problem.
1. We still don't share a common digital space when having meetings, because everyone sees tiles in a different order. "Let's go around the table and give our updates!" Uh, in what order? There is a lot of power in the subtle social cues available when everyone knows who you are talking (or listening) to by shifting towards them.
2. There's still no effective way to non-verbally signal that someone wants to speak. If everyone was push-to-talk, then holding down the unmute button would let the UI emphasize your tile and signal to others. This would cut down on verbally stepping on each other. Nobody uses those "raise hand" buttons, because they are too single-purpose and relatively high effort.
3. Often, there's no way to tell at a glance if you're muted. I can't believe Zoom still does this: you have to move your mouse to get it to show you the UI. Why are you hiding the UI.
4. What is the point of reactions appearing on your own tile? You're not talking: nobody is looking at you. Reactions should appear on the tile you are reacting to!
It's small things like this that make video calls a mess, and nothing's changed since the beginning of the pandemic except for video filters.
> If everyone was push-to-talk, then holding down the unmute button would let the UI emphasize your tile and signal to others.
The most baffling thing to me is how common it is for video conferencing systems to either lack a push to talk system or just have a really really bad one. I agree this would CVS fantastic but they’ve gotta actually put it in first.
I find it odd too. VOIP services for gaming have been doing it a long time. It seems most people only unmute when they're speaking anyway, don't know why the big conferencing solutions leave it out.
This is all true, but not actionable. OP was speaking to consumers about things they can do within their control, and for that purpose it’s a great post.
What I'm getting at is that consumers throwing money at the problem will not solve the fundamental issues. You'll have a 4k webcam with brilliant audio, and you'll still have trouble getting a word in.
I agree with you that there's a trillion-dollar bill just laying on the ground for the first videoconferencing company which embraces the medium, instead of striving for the impossible impression that everyone is in the 'same place at the same time'.
It's orthogonal to having decent gear, however. Using a flattering aperture and a sensor with adequate sensitivity, not to mention a real microphone, will create a positive impression in a mostly invisible way, and building software which isn't painful to use would magnify this effect rather than flatten it.
#2 - we use Meet at work and use of the hand-raising feature is widespread. The only problem is that the accompanying noise is too intrusive: many people stop talking when they hear it, such that it’s almost the same as interrupting someone.
Yeah it's also very analogous to literally raising your hand in a classroom. Absolutely nobody does that when talking with a group of friends, and yet everyone knows when someone wants to get a quick word in!
> 3. Often, there's no way to tell at a glance if you're muted. I can't believe Zoom still does this: you have to move your mouse to get it to show you the UI. Why are you hiding the UI.
Did they change this recently? On Windows I always see an icon on the lower left corner showing that I'm muted without having to move my mouse or show the controls. I do prefer to go into the options and enable the Always show controls setting under General just so I have a bigger indicator. I think pressing Alt also toggles the controls.
> There's still no effective way to non-verbally signal that someone wants to speak
Some tools have a raise hand feature. Though not sure if it brings such users to the top of the participant list or otherwise highlights them beyond an icon.
My team does this really well. If you have something to say, put a finger up. If you see someone else with a finger up, put up two. Or three, or whatever, so it's a queue. Which is a good indicator to the current speaker how soon they need to do talking. If you need to interject, we have another hand signal. We also have signals for "I agree" / "+1", "ouch", "pause", etc.
Both myself and many of my colleagues have done most/all of these things and IMHO our video calls are better but still fatiguing and nowhere near as good as face to face
that TFA recommends Zoom above all other options is strange. it has very poor design decisions in my experience and no longer works reliably without being installed.
also as debate from an HN post from a couple days ago covered well, using an external camera instead of webcam is a fool's errand. my how I've tried.
the most important change for reducing fatigue in my experience: turn video off. audio only.
I've had good luck with used Cisco/Webex video conferencing hardware off eBay. The audio and video quality is typically better than most laptops and you don't have the video/audio encoding fighting for processor time with whatever your computer is running.
The biggest issue is that some video conferencing services don't support standard protocols, so you can't connect to Zoom without buying a special license. A WebEx subscription with some of the newer equipment is sufficient to connect to Teams and Google Meet through WebRTC as well as any standards based systems (Chime, etc.)
Outdated and reads like an ad for Zoom. One of the recommendations is literally "use zoom because it's better".
Also on shaky ground is the advice to use a DSLR camera as web cam. As pointed out in a recent discussion here, the problem with adding more devices to the chain is... more devices in the chain.
Dedicated diffuse lighting, wireless headsets, separate cameras etc.. participants will need to be patient while you prepare and configure your film set.
I used a wireless headset for awhile, but swapped it for a wired headset after too many times joining meetings without the headset pairing in time.
I have a Logitech g533 headset, which is wireless, and came with a dongle. It might be something other than Bluetooth? I paired it once when I got it 2 yrs ago and that's it.
Yeah the ones with dongles aren't Bluetooth, I also haven't had many problems with mine which is similar. Bluetooth is pretty bad for audio+mic, it has to reduce audio quality to do both
As the other comment said, your Logitech g533 isn't bluetooth. It's using its own dongle and wireless transmission of lossless audio. No other wireless device will ever compete or try to pair with the dongle, so you can set and forget. Glad to hear you can pace happily during meetings!
In my case, I was using a Logitech bluetooth headset. When calls came in, I would need to switch the headset on, wait, hope it connects, hope Teams hadn't switched to default mic, and hope the battery was still good. I grew tired of the uncertainty and went back to a wired headset.
I don't know about that specific model, but I'd expect a "gaming" headset to attempt to have both a high-enough quality and a very low lag.
There are also DECT headsets which seem to have very low lag. Quality isn't great, but you wouldn't expect it to be given the size of the phones. But it works great for voice and has a very long range.
Speaking about non-bluetooth wireless headsets, I'm super satisfied with the corsair hs80 - the microphone sound quality is absolutely crazy and I can pace and walk around my whole apartment without any issues. Highly recommend them.
The author says not to use a wireless headset. And what’s the difference in time to join a meeting between a high quality mirrorless or DSLR camera and a cheap webcam after the initial setup?
I'm guessing it's a reference to the discussion on this[0] article that essentially says that the more advanced/pro level audio and video equipment in the loop, the more likely you are to have a problems and delay the start of the meeting for everyone. Also many DSLR cameras have a 30 minute time limit due to the CMOS sensor overheating.
It's not because the CMOS sensor overheats. That's just some faff someone made up to cover up the real reason.
Imports tariffs for digital video cameras are significantly higher than tariffs for still cameras. So companies like Canon, Nikon, and Sony artificially limit the maximum record time on their DSLRs to get around the higher taxes.
The difference is the DSLR isn't designed to function like a webcam - always plugged in ready to go. "Set and forget" doesn't apply as reliably.
For personal use it's fine. But for work meetings where someone calls you unexpectedly is where things can go wrong if you have unconventional equipment dependencies.
The author recommends an "old DSLR". But you will need utility software running, it's not just a simple driver. Manufacturers make and update these utilities at their discretion. It's a gamble whether it all runs smoothly.
The DSLR needs to be in a special mode, and won't wake up in that mode automatically for incoming calls if powered down, etc.
One hackish idea that I had for quick deployment of my mirrorless as webcam, but that I never implemented was to use a dummy battery ⃰ and have a micro controller act as the dummy battery switch. The switch is then controlled by a process that monitors whether the UVC device that's associated with the mirrorless camera is currently being used.
[*] A power adapter that feeds into a battery shaped terminals, thus allowing an "infinite" battery which.
It could be better than USB charging while using the battery because USB charging might not keep up with the battery drain. The downside of course might be malfunctioning dummy battery causing your camera to fry.
Settings for 'original audio' on zoom are missing on desktop linux.
As I imagine HN skews towards that OS, I'd encourage you to write to them asking for support (or avoid zoom altogether; but that might be impossible for some).
If you really want that full-duplex audio experience on linux today, disable audio on the call and move everyone to https://cleanfeed.net. This is not an ideal workflow, but it gets you there.
Why other products don't just take audio seriously is a pity.
One advice: if you have two monitors, put one with the conferencing software and camera to the side, so you have to turn your head to look at the conference. One issue in video conferencing with multiple people is inability to tell if the person is actually paying attention. That way you will give clear indication that you are not engaged in something on your main screen.
As discussed lighting is very important. One thing that makes a difference is the brightness of the monitor, at least if you have a large/bright monitor which can serve as a light source. If your face is too dark, increase monitor brightness. If you're in a dark room and you start to look like an alien, decrease monitor brightness.
Doing things like throwing up a white page full screen on a monitor you're not using can help too. After all a nearly white page is much brighter then a black one.
All these people talking about "a DSLR is too difficult to setup and you're going to waste time at the beginning of the meeting getting everything going."
I've been using my DSLR as a webcam for the last 2 years. I set it up on a short tripod so it sits just over the top of my monitor. The only time I have to screw around with setup is when I take it down to shoot my kids for family Christmas card. Takes 5 minutes and then everything is back in place.
It never ceases to amaze me how much a bunch of so-called hackers can argue about something they've never even tried to do. Hours spent bickering back and forth about something that takes literally 5 minutes to test out.
Anyway, I use a fixed 50mm lens on my camera, which is a classic lens for portrait dimensions and gives me a very shallow depth of field to blur out the junk in the room behind me.
Someone should write an AV scoring system. Measuring all the important components. Your AV score should be displayed alongside your name in meetings in order to shame people into improving their dreadful audio and video.
The latency introduced along with the decreased amount of body language will always take a lot of spontaneity out of conversations, and if you have a policy of muting yourself unless you want to speak, it's much harder to get a conversation going fast enough.
One of the questions was how to mount a smartphone. I use a car mount, specifically the iOttie one touch... But any extendable car mount would work. Just suction cup it to the back of your monitor and you're set
where do you find $20-80 nice camera ? most of the nice camera Canon, Fujifilm, Nikon, and Sony I find cost thousands of dollars. I want to replace my Logitech webcam
A mirrorless or even an older DLSR will be a marked improvement for meeting with colleagues as well. They use HDMI out, and they aren't really be suggested for audio, so neither of the factors you mentioned are even involved.
Of course if you sound like crap and can't stream at a decent bitrate, fix that first / as well. But with crisp audio and a decent Net connection, there's plenty of room to improve video quality from awful to decent, and your colleagues notice both, whether they realize it or not.
Back in 2014, I was a freelance software consultant. I was doing some web and database development, and also helping my mother out with her consultancy doing database reports. We were traveling around the country, meeting with clients. My mom has some mobility issues due to some old, bad knee and foot injuries. She really should have been in a wheelchair, but most of these places didn't have access ramps or even elevators. So she often just sucked it up and walked the blocks and climbed the stairs.
I was pretty dissatisfied with my work at the time. I was spending a lot of time arguing with an IT manager who thought he knew things about software development on one hand, or just writing boring-ass Crystal Reports on the other. Then this newfangled device came out called the Oculus Rift DK2. I also had a Leap Motion device that I had glued to the front of it. It was an exciting time, as decent enough VR to not make the majority of people sick or have headaches hadn't been available--and certainly not at a consumer price point--until then.
I hacked around on some things, got a small amount of notoriety amongst other burgeoning VR developers at the time (I'd basically made the first JavaScript library for making VR-capable apps), and then one day I got a call from this company called AltspaceVR. They wanted me to interview for a job.
We did it in Altspace. This was my first experience with VR chat. The person I talked to had an HTC Vive dev kit. We stood in a futuristic room and talked, face to face. And it just worked. It felt like actually talking to a person. The conversation was so easy, compared to 2D teleconferencing. I wasn't distracted by my own video feed, worried about how I looked or whether I was providing enough "camera contact". There was body language and a feeling of the bodily presence of the other person.
It was immediately obvious that this was far superior than traditional teleconferencing. I had a brainwave about the future of work, meetings with clients without having to travel, my mom no longer destroying her knees, no longer having to fly around the country. And I could be involved on working on this future! If I just abandoned everyone I knew and moved to California.
I asked the guy I was interviewing with if he really thought that was necessary. He said they really liked having everyone in the same room together. I asked him what were we doing, just now, talking to each other as if we were in the same room. He vacillated, admitted they weren't dogfooding their own product.
And that's when I decided that I had to work in VR, had to work on productivity apps and not games, and had to dogfood it and stick to living on the East Coast. Now I'm the head of R&D at a foreign language training company, where I make a VR teleconferencing app for people to meet together in culturally-appropriate environs to practice their language skills.
I've always wondered about one thing -- with VR, your eye convergence distance is whatever the software wants it to be, whether that is something in your virtual hands, or something across the virtual yard. But your focus convergence distance is just millimeters away from your eyeballs.
But for the millennia that human beings have evolved, our convergence distance and our focus distance has always been the same. If it's in your hands, then the distance is about three feet, or whatever. If the object is in the yard, it might be thirty feet.
Just won’t ever be. Once you realize the delay is what’s causing you to be exhausted, what’s causing meetings to never spawn creative solutions, what’s making it difficult to focus.
you soon realize that only certain meetings work over video and some are only worth doing IRL.
I'll get all that stuff he recommends if work pays for it. I wouldn't spend a dollar of my own money to make zoom meetings "better" and neither should anyone else who doesn't own their own business. I still wouldn't turn my camera on unless I was absolutely required to.
The pandemic forced me to do my work at home, and setting the boundaries between work and home has become more important than ever.
Nice clothes are good to have regardless of work, and many jobs DO offer commuting benefits. If a job requires me to drive to work, at the very least they need to provide a place to park. If my job requires me to have a better camera or microphone than what is built into the laptop, why is it on me to buy that?
Out of consideration to your coworkers? I don't wear my button-downs on weekends. When working remote, I care more that you have a good mic than I care that you're wearing pants.
We run https://cleanfeed.net/ to focus on full-duplex low latency audio (no video). Of course we are dealing mainly with studio environments and good quality equipment, making it a different challenge to a regular meeting tool.
But since the first prototype, we were using it ourselves as an "always on" line between developers. The ability to have 3 or 4-way conference in full duplex with low latency appeared to foster much more of a 'human' relationship than the meeting apps. The lack of picture may even have a positive effect, because it was part of privacy that made people more relaxed.