Hacker News new | comments | show | ask | jobs | submit login
Krisp.ai – Mute background noise during your calls (krisp.ai)
393 points by nyxtom 24 days ago | hide | past | web | favorite | 107 comments



I had been wondering when a product like this would come along. Being able to remove background noise from calls sounds pretty great, but it's not a problem I run into too often - most people on my calls are in a meeting room or a quiet place.

But, if it can really fill in missing or distorted chunks then for me that's a killer feature. I'm going to give it a try on my morning check-in call, which is coming up.

I'd love to see a product like this be able to adapt to the voices of individual speakers, and fill in gaps or distortion in their natural voices. I presume that would be more difficult, though, because you'd need a model of each speaker at the output device, so both parties would need to be running Krisp, and you'd somehow have to share the model between them - which if you already have network issues causing dropouts and distoration might not be feasible. Unless it was a side-channel thing, where for regular calls with the same people the voice model for is updated after each call, ready for the next one.

Though I'm not sure I love the idea of a model of my voice being constructed and transmitted around. But I think it would be really cool :)


> most people on my calls are in a meeting room or a quiet place

Since I moved to NYC, I think I haven't had one phone call without some obnoxiously loud siren/horn/baby/dog/jack hammer/piano/bros in the background. People must think I listen to the NYC soundtrack on repeat at full volume and make phone calls.

I envy you. :)


Make sure you check back in with the results!


As requested:

This call typically has decent audio quality, and as it's a stand-up / status call people tend to speak one at a time. No-one today had much in the way of background noise, so I'm not sure there was a lot of opportunity for the app to show off.

That said, I switched the Krisp speaker mode on and off repeatedly while each person who was talking.

1. When just one person was talking Krisp didn't seem to ever make the sound worse, which I guess is a start.

2. When there was a conversation going back and forth, or when one person started talking right after someone else finished, the voices got mangled or muted and I couldn't understand them. I had to turn Krisp off.

3. When there was some background echo (like in a large room) or some minor distortion I thought it maybe sounded a bit clearer with Krisp running, but it didn't really make much difference. I could understand the speaker either way.

With the audio problems I mentioned at (2) and no real gain from using Krisp I doubt I would use it regularly, though if I run into a call with bad background noise I might try it again.

I also tried the Krisp microphone, and at one point I had to repeat myself, which doesn't usually happen. But I have no way of knowing whether that was due to me speaking unclearly, or audio issues at the listeners end, or something else. So I don't really have an opinion about the microphone, but as I am in a quiet place anyway I wouldn't probably use it.

It would be nice if there was a single channel evaluation mode for the speaker. If I could hear in my right ear the normal audio was, and in the left what the Krisp-processed sound, then I would have a better chance of evaluating the performance. I guess if you have a lot of continuous background noise that mode would be redundant, and it should be an obvious improvement switching back and forth.


Such feedback from users is priceless. Thanks for it. Krisp is still in beta and we keep experimenting with different DNN models. For example we have a DNN model which maximally preserves multiple voices while removing noise. It's not shipped yet.

Re #2, one thing we noticed is that Conferencing apps themselves will distort the voice when multiple voices are overlapping. Especially when there is also noise. There is not much Krisp can do here since the stream it receives is already distorted. Unfortunately for krisp speaker we don't control the audio stream. Imagine how many times the stream gets signal processed before krisp speaker gets it (noise cancellation in the headset, noise cancellation in RTC, codec, etc).

Re Krisp microphone, the DNN model used here is more effective since what Krisp receives is "less processed/distorted stream".

Please stay tuned, our release cycle is around 2 weeks. More quality and UX features are under way.


"Windows support is coming soon. Please leave your email."

And nowhere to "leave my email".


Intercom button on the right.


>When there was a conversation going back and forth, or when one person started talking right after someone else finished, the voices got mangled or muted and I couldn't understand them. I had to turn Krisp off.<

This may have nothing to do with Krisp as an app. All phones/communications device have the ability for either half or full duplex audio. Half duplex means only one caller can be heard. If someone were to speak, the audio would cut out as you experienced. Full duplex allows for conversations to happen all at once.

So this all could just be a symptom of the feature of the device or service you're using. I would try your test on different devices and/or services.


My first internship was at a company that was doing this over a decade ago, but before machine learning had proliferated to adjacent fields. It's easy to separate out stationary signals like ambient noise, which are easily isolated in the frequency domain. But it's a totally different thing to remove something like a baby's cry, shuffling of papers, or other sharp transients. Blindly separating without a large corpus of inputs + machine learning techniques seems impossible.

I remember as intern spending hours in front of spectrograms manually deleting noise so that the researchers could get clean targets. Let me tell you, I started being able to identify a lot of phonemes just by visually examining waveforms.

Eventually, the company did pursue some noise cancellation, but only as one part of their offering. I don't think they ever could get the holy grail of separating non-stationary noise.


There is a new paper out from Google research that specifically addresses the use case with screaming children in the background - very exciting! https://www.youtube.com/watch?v=zL6ltnSKf9k


Krisp removes screaming child noise quite well.

This was a sensitive use case for the team since I (CEO/krisp) had a baby 3 days ago and really needed Krisp to do my calls :)


Here's to hoping they make a version for Windows and/or Linux. Mac support is nice, but this seems business-focused (though handy for casual use).

Most businesses, especially outside of Silicon Valley, don't use Mac; Windows will likely be a larger market.


2Hz CEO here (we built Krisp).

Windows support will come in Dec. We are working hard on it.


How about a VST plugin? that would give you cross-platform support (Mac/Windows/Linux/iOS) and would allow it to be used as a part of many, many audio processing pipelines - from simple streamers/youtubers all the way to pros


This. A VST plugin would open a lot of opportunities to integrate.


What does it actually filter on? Is it trained to recognize my voice or just any human voice? I can see you can smartly filter out noisy noise, but can it also filter out nearby conversations like in an office environment?


Oh cool thanks for the roadmap, but do you have a way to inform us on what kind of monetization path you're going for? where is the catch?


It's currently free as it's in Public Beta. Krisp will have a freemium business model after Beta.


How about Linux? Shouldn't be that much more of a lift.


Yeah, and I have to de-noise a conference I recorded a few days ago before uploading it. A Linux version of such a tool would be of great help, although I will rather try standard approaches first.


+1 Will help many devs who do remote work and have to take calls.


How about iOS, particularly for X series phones which have the "bionic" ML co-processor with installable models?


Anyway for this to work within the browser, maybe via wasm?


I'm hoping for an iOS/Android version.


Mobile support is harder since Krisp won't be able to register as a virtual mic/speaker on Android or iOS. At least we haven't found any way around it.


There's an app for Android called SoundAbout, which can reroute audio. Not sure how it works, but maybe something similar is possible for Krisp? https://play.google.com/store/apps/details?id=com.woodslink....


This looks intriguing. Do you know if it receives the audio and re-routes or simply changes the routing configuration?


It simply changes the audio configuration. There's no way to intercept and modify audio data on Android, unless you're rooted. :(


double-edged sword - bug Apple enough and you might get support, or bought..


Or cloned


Even in silocon valley, any non-"art design" business uses windows as the primary machines


How do they make money?

All audio is processed on device, but is the goal to use the public training/learning to tailor a more robust model which they can sell commercially/integrate into apps/phones/etc?


My suspicion is that the founders/engineers will be acqui-hired for a not-insignificant sum by one of the existing big players in the teleconference space (Zoom, Google, Cisco, etc.).


Further, this is probably their goal. In fact, I wouldn't be surprised if Apple bought something like this as system-wide background noise canceling would be a fantastic OS feature.


iPhones already have this feature if you hold your phone up to your ear. There is a microphone in the back of the phone near the camera that phase cancels out background noise from the input from the front mic.


They need this for the AirPods then. Everyone I call while outside in NYC says the microphone is super sensitive and picks up all the surrounding noise.


They just filled that patent. 5 mics on the device for beam for separation


That's really cool! I knew that it did sound canceling but I thought it was all software!


I remember when they proudly talked about all the mics on the iPhone 5 used for this when they presented it on stage.


Maybe they can do semi-dense audio cancelling with some ML model for cancellation on a Mac as well.


they also recently added 4 (or 5?) mics on the new iPad Pro so that it could do facetime calls while cancelling out the feedback.

Wouldn’t be surprised if a software update makes it possible to use all of those mics for some very good ambient noise cancellation during face time calls


Or they hope to license this and have it built into every service by default.


from the other site below which also seems to be them they charge per minute of processed audio.

https://2hz.ai/api/index.html


Folks, one of the devs — artavazdm — has commented below and has stated:

"It's only free during beta. Although we haven't decided on the exact monetization strategy yet."


Getting acquired by Dialpad in 3, 2, 1..


I'd be a little worried about the feature that fills in missing voice chunks. It sounds ripe for accidentally replacing one lost word with a completely different word that could also make sense in context. Almost like the issue where Xerox copiers would sometimes replace one character with another. Hopefully the filling in of missing chunks is done in a way that doesn't allow it to fill in whole words, but rather just short sub-syllable chunks of audio?


It sounds ripe for accidentally replacing one lost word with a completely different word that could also make sense in context.

For awhile, it seemed like the autocorrection in iOS was deliberately trying to break up me and my then-girlfriend. It even seemed to have a penchant for doing an unfortunate autocorrect just a sliver of a second before my finger hit send on a txt. I finally turned the feature off.


I just tried this over a Zoom call. It worked well but not as well as the samples on their website would make you believe. Background noise was muted during silence periods but not always during speech from the active speaker. This makes it even more annoying because you hear tidbits of background noise only at the same time that the active speaker is speaking. I'm still very impressed though.


As an amateur radio operator, I struggle very often with noise-related issues. Nice tech. I'm used to see digital processing applying in this field, but never thought about applying machine learning. Feeling curious about learning about the technology some day, because, for me, based on the solutions I currently know, this seems, indeed, magic.


my recent blog post had some details about the tech behind krisp:

https://devblogs.nvidia.com/nvidia-real-time-noise-suppressi...


Thank you very much!


Seems to me that "a virtual microphone [that] sits between your device microphone and Apps" would be a juicy target for attack

I am /not/ saying Krisp.ai is a trojan or is nefarious, but if I was the NSA or FSB or... something like this would be very interesting to me. Both for infiltration (deliberately malicious) or for exploitation (compromised / exploited at run time).


Make this work on a mobile device for phone calls for business people and you're in the money!

Finding quiet space to make phone calls is a hassle.


I have a generic Android phone and its lack of background noise was quite surprising, so either it has active noise canceling or an extremely short-range mic. I suspect most if not all of them already have such functionality.


I'd like to see this work with a music app and replace the need of buying expensive noise-cancelling headphones.


I'd really like to see this technique applied to remastering e.g. songs from vinyl records.

Similar to that project to colorize old photos with a GAN, but in audio form. [1]

[1] https://news.ycombinator.com/item?id=18456527


This would be really great for work-at-home people, freelancers, and digital nomads! I work online for a startup and this is exactly the kind of thing I can think we could all use. Another niche that could benefit from this would be online teaching. Most of the teachers of online English teaching companies work from home. For work at home parents (and there are plenty), the mute whining/screaming child feature is a godsend. Fingers crossed, waiting for Windows support...


A dream product for new parents working from home :)


It would be nice if I could upload audio and have it processed. I have a lot of videos that were captured on an iphone or something and would love a tool to help clean up the audio track before posting.


There is noise reduction(w/ templates) in Movie & it does a decent job. You can also try out audacity & gate noise below a certain threshold for crisper audio.


OMG! this is awesome. I just tested and it works wonderfully! It just mutes all background noise (White noise, tv sound, toy sound, kids crying, people eating..) just wonderful.


This looks nice, I hope we can get a Linux version too


The scrum meeting sample with the loud keyboard felt all to real. Look forward to trying it out next time I'm working at a coffee shop.


I've been waiting for something like this, except now I want it to be embedded in my bluetooth earphones. They all suck at removing background noise when I'm speaking on a call. Will this work well even if I'm speaking into my airpods?


Something I'd like to see if a voice recorder that only records my own voice. It would be even better if other voices were obliterated and filled over with something like the "voice" noise from Katamari Damacy.


Is there a way to run an existing audio file through this to clean it up? For example, if I have a video of a presentation with lots of background noise like chairs creaking, people coughing, etc.?


Here is demo link that you can use for that purpose. https://demo.2hz.ai/

We are also building web and iOS apps to cancel the noise in the files.


Cool! I thought I had a test recording for this, but (un?)fortunately, my workplace sprung for a fancy directional mic that already did a very good job of isolating the speaker's voice.

I still plan to play with the app for voice chat, though.


Nice, it's especially notable for loud noise (which otherwise triggers voice activation) while not talking


They have an API: https://2hz.ai/api/index.html

You have to apply for it and it's apparently priced per minute. I applied and I'm waiting for pricing info.


Sure you can. I don't think anybody can stop you doing that. You just need 2 computers. On one you play the clip. On the other you record it. The remove the audio from the video and put the new audio onto it.


Yes, of course I can run my audio file through it by actually playing it and recording the result, but I'm asking if there's a version of this tool that takes an input file and writes to an output file on its own.


I've been using http://mizage.com/shush/ for macOS push to talk. Highly recommended.


Just tried the Skype call testing service with and without Krisp while mashing on my keyboard. It did seem to almost eliminate the typing sounds which were rather loud otherwise.


Excited to check this out but looks like it's got the HN Hug. Going to recommended this to my dad who always takes calls from noisy airports.


Will this stay free or is it only free during beta?


It's only free during beta. Although we haven't decided on the exact monetization strategy yet.


Isn't the real solution having two or more microphones and using the multiple signals to locate and isolate the desired sound?


Yes, that's how phone microphones for example work.

To remove static background noice would just need to get a silent sample and reverse out the polarity of the static noise from the audio signal. It can be done real-time or in post production.

To sum out random background sounds you would need an omnidirectional mic that picks up everything. Then you'd need a close range shotgun type mic from which you subtract all that noise mix picks out.


What I am thinking of is more is using signal phase differences on spatially separated similar microphones to isolate the desired source and filter the rest. There is probably a name for this I wouldn't remember. You wouldn't need a specific directional mic and could instead build a synthetic virtual microphone with whatever directional properties you wanted (including distance and not just directional selection)


Beamforming?


*noise


First app in a long time that I've downloaded within 5 seconds of hearing the pitch. Seriously looks too good to be true.


I’ll try this tomorrow with airline lane recordings from a black box. It would be interesting to see what will happen.


Great!! Is it useful in an audio recording in a noisy area as well?

I'll give it a spin and see if it helps.


awesome app! Might wanna correct the grammer for the unfurl description though

<meta name="description" content="Take calls from wherever you want without being embarassed for a background noise. Get krisp for Mac and use with any conferecing app!">


grammar* :)



Is it possible to use something like this to turn any headphone into noise canceling headphone?


Not really. The tech used in noise-cancelling headphones is called ANC (active noise cancellation). It eliminates the noise coming to your ear from your surrounding environment.

In contrast, Krisp eliminates the noise going from your environment to the call participants and vice versa.


How would the application know what's noises are being made in the environment around you?


I've read the front page and FAQ section but I'm still not 100% clear what this product is. Is it an active noise cancelling app that listens to your environment around the laptop and cancels noise out (like Bose QC25/35 headphones)? Serious comment - I think you need to explain this better on your site.


I wish this was possible in the mobile ... I have this problem on a regular basis...


How much latency does it add?


Around 15ms to the end to end call.


waiting the app for the mobiles. ai does nit stop to amaze the community


Most useful for me would be speech to text in a conference call.


This seems like it would be useful as a pre-processing step before feeding audio to a speech-to-text engine.


But I want to mute people who make phonecalls in the train ...


It’s great for taking conference calls from the coffee shop


Works pretty awesome!!! Can be a life-saver app!


Can you make a plugin for Garageband?


Great idea,great product, great TEAM.


"Seemless calls", so my calls won't seem? Seem to be what? What would it mean for something to be seemful?

Edit: I know it's a typo, but I figured it's a door to somewhat interesting line of linguistic thought.

You could have "seemful" which might be somewhat synonymous with "inauthentic" and "seemless" which might be in with "genuine"?


Unconstructive comment. Please don't demean their hard work with sarcasm. Small companies are hard enough to build as is.


It's fixed now. Thanks for pointing to it.


I think that was a typo...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: