
Say_what: Using speech-to-text to fully check out during conference calls - pavel_lishin
https://github.com/joshnewlan/say_what
======
jandrese
This is the high tech equivalent of drawing eyeballs on your eyelids so you
can go to sleep during a meeting.

I do kind of like the idea of dialing into a meeting but saying that you'll be
AFK and then just reading the transcript afterward. It seems like it could be
a real time saver in those cases where you're not really participating
currently but want to keep up with the project.

~~~
teen
couldn't you be fired for recording people without their consent?

~~~
danso
It varies by U.S. state...and seems to become somewhat more complicated if
callers are all in different states:

[http://www.dmlp.org/legal-guide/recording-phone-calls-and-
co...](http://www.dmlp.org/legal-guide/recording-phone-calls-and-
conversations)

> _Eleven states require the consent of every party to a phone call or
> conversation in order to make the recording lawful. These "two-party
> consent" laws have been adopted in California, Connecticut, Florida,
> Illinois, Maryland, Massachusetts, Montana, New Hampshire, Pennsylvania and
> Washington. (Notes: (1) Illinois' two-party consent statute was held
> unconstitutional in 2014; (2) Hawai'i is in general a one-party state, but
> requires two-party consent if the recording device is installed in a private
> place; (3) Massachusetts bans "secret" recordings rather than requiring
> explicit consent from all parties.). Although they are referred to as "two-
> party consent" laws, consent must be obtained from every party to a phone
> call or conversation if it involves more than two people. In some of these
> states, it might be enough if all parties to the call or conversation know
> that you are recording and proceed with the communication anyway, even if
> they do not voice explicit consent. See the State Law: Recording section of
> this legal guide for information on specific states' wiretapping laws._

~~~
DeadBabyOrgasm
I'd imagine this only applies if you keep the recording. What about hearing
impaired people who might use this for accessibility purposes? One would only
need to keep it for the following fractions of a second it takes to turn it
into text. At that point, it turns into the legal equivalent of notes you
write while on a phone call.

That said, IANAL.

~~~
pc86
It applies if you record. Whether you keep it or for what reasons you record
are irrelevant.

The legal question is whether text-to-speech qualifies as a "recording" when
the audio itself is not stored.

~~~
Spivak
My guess would be no, as speech-to-text is no different than having someone
transcribe the conversation.

If the law were as technical as you describe then VoIP calls themselves would
be against the law since a person's voice is recorded, transmitted, and
momentarily stored.

------
tluyben2
I tried this a few weeks ago with Watson for that purpose (being able phase
out during conference calls); complete gibberish came out on all my trials
(about 40 conference calls) with 2,3,4,5 and 8 people, all in English. All
come out complete garbage. You can't even _guess_ what we are talking about
reading the entire transcripts.

Not sure if this is not just the same thing... If it is, I would love to hear
your experiences.

Edit: shame I didn't publish it :) Edit2: I had one call for which I have a
colleague speaking who speaks 'the Queen's English' and same result...

~~~
ldehaan
wouldn't that just mean that Watson sucks at text to speech? also since that's
probably the case for most tts services (at least in my experience)

why not run the audio through all the tts engines you can find and compare
results. keep what has the highest matches and see if you can get the other
words using phonetics engines.

for words that are specific to your company train a tts engine on those words
and weight your local engine higher than other services.

you can get a pretty decent translation back, but who would be so lazy as to
work that hard..... ;)

~~~
skykooler
A large part of the issue is that if you are dealing with phone meetings,
audio is severely compressed, and not very similar to the training data that
Watson has been taught with. Google Voice _might_ be able to do a better
transcription, because they have much more data from phone messages to work
with, but eventually it comes down to the fact that voices over a phone line
are a lot harder to understand without context processing than voices recorded
with a decent microphone.

------
yoo1I
Answer to what we're all thinking:

> I do run the risk of losing credibility as to whether I'm actually listening
> in meetings now, but I'm not too concerned about that. I made this as a
> joke, and my coworkers know that

[http://www.businessinsider.com/josh-newlan-creates-say-
what-...](http://www.businessinsider.com/josh-newlan-creates-say-what-to-tune-
out-of-conference-calls-2016-5)

~~~
mistersquid
I think this answer--that Newlan's employer, Splunk, is in on the joke--is
only a partial reason that Splunk tolerates Newlan writing and _publishing_
"Say What?".

From the "Readme.md":

> Installation (OS X)

> 1\. Sign up for, install, and run Splunk Enterprise

> ◦ This has to be enterprise; the HTTP Event Collector feature used here
> doesn't exist in light

Kudos, to Splunk for having the guts to "allow" Newlan to unleash "Say What?"
on conference calls everywhere, and cheers to Newlan for dreaming it up.

EDIT: formatting, readability, punctuation.

~~~
kevinschumacher
Oh, he works for Splunk?

My initial thought was, "who would choose Splunk for the datastore for
_anything_?"

This now makes sense.

edit: by "anything", I mean "anything that isn't logs/security event data"

------
adrianN
"How to get fired for sending audio of all you conference calls to IBM"

Pretty fun hack though.

~~~
cwilkes
Next post from IBM: "we trained our speech to text AI bot on your idiotic
conference calls and it killed itself out of boredom"

~~~
ianamartin
Next next post from IBM: Tay turned out to be good for something after all. It
turns out that project meetings are a lot more fun when no one is talking and
8 people are channeling an angry, racist, 19-year-old girl.

~~~
sbierwagen
Tay was Microsoft, not IBM.

------
ellisv
"… what do you think about that Josh?"

"Sorry, I didn't realize my mic was on mute there."

"No problem Josh, we can hear you now just fine."

"Sorry, I didn't realize my mic was on mute there."

"... um, Josh?"

"Sorry, I didn't realize my mic was on mute there."

~~~
klenwell
Or a conference call with a project manager (fully present) and 5 developers
all using this. Perhaps a scene for Silicon Valley.

~~~
gregmac
Reminds me of "A Conference Call in Real Life"
[https://www.youtube.com/watch?v=DYu_bGbZiiQ](https://www.youtube.com/watch?v=DYu_bGbZiiQ)

------
coleca
This is genius. How long before instead of dialing into conference calls, we
just send our bots and have bot to bot con-calls? Could this be the killer
Twilio app?

~~~
misiti3780
`Could this be the killer Twilio app?`

how so ?

~~~
nitrogen
...by merging it with a bot that quotes a random sentence from any comment
with "killer" and "app" in it, asks for free work from humans by saying, "How
so?" (AKA nerd sniping) and submitting the results to a startup incubator
automatically.

------
ams6110
Most brilliant thing I've seen in some time.

If it highlights the complete waste of time that almost all conference calls
are, it would be a service to mankind.

------
kh_hk
Kind of related, I did something similar (as in purpose) for slack [1]. It can
be used to set yourself away or... well, it can respond a canned message as
you. It also disables the annoying auto-away after 30min. You can even close
slack and let it send you system notifications for incoming mentions. Heck,
you could even run it on a server and pretend to be you whilst you sip mojitos
on a beach.

[1]: [https://github.com/eskerda/slack-keep-
presence](https://github.com/eskerda/slack-keep-presence)

------
Phemist
Not usable for people named 'Mike'.

~~~
zeroxfe
Turn up the echo cancellation, and you'll be fine.

~~~
pizzasynthesis
'Mike' and 'mic' are homonyms. Methinks it a joke.

------
arrowoftime
Or just use this: [http://docs.gridspace.com/](http://docs.gridspace.com/)

------
zomg
my solution: don't go to meetings where you don't/can't/want to pay attention.
what a waste of time.

------
transpy
I had this idea once and routed my laptop's output audio into its own
microphone and then opened a Google text document and turned dictation on. It
didn't work too well, because of: noise/signal ratio and the strong Indian
accent of the speaker. With normal paced pronunciation and clean audio it
works definitely better.

------
jakobegger
Pretty amazing. Makes me wonder what the current state of the art in general
purpose voice recognition is.

I recently wondered if there is a a video-chat app that uses voice recognition
for live subtitles? Would be awesome for talking to people who don't have
perfect hearing. Could this be hacked together in a similar way?

~~~
skoocda
I'm working on it, making more of a general purpose API for speech recognition
to plug in during Hangouts calls, twitch streams, or any pre-recorded media.
Called Spreza

~~~
skoocda
* I should clarify as well, for your first question. State of the art is generally either IBM's 2016 English Conversational Telephone Speech Recognition System [0], or Baidu's DeepSpeech2 [1]. IBM's is a little more complex, but I'd actually argue Baidu is more promising due to the robust ability to learn new languages.

0: arxiv.org/abs/1604.08242 1: arxiv.org/abs/1512.02595

------
AstroJetson
Very nice, and I thought I was cool putting my end of video on freeze while I
did other stuff during the conference.

------
douche
I wonder if I can hire an intern or something to just transcribe all
conference calls? Actually having a record of what was said would be lovely -
too often audio is just too ephemeral and goes poof, with nobody remembering
what was agreed upon.

It's got to be worth $10-15/hour in wasted time and recapping.

~~~
whacker
I think a note taker taking live notes that everyone can see, and perhaps
edit, is a great way to record meetings. As a bonus, you even have something
tangible to show the time you put in!

~~~
skoocda
I'm working on this at the moment, in a way that also uses the edited
transcripts to train our ASR system to perform better for later sessions. The
difficult part is the speaker diarization, however. Multiple people talking at
once requires some intricate signal processing to sort out.

------
delazeur
Obviously there is the potential for certain forms of abuse here, but it could
be really useful for creating a record of what was said and what promises were
made on the call.

"Oh, you thought it was due on Monday? Nope, transcript says you promised it
by close of business Friday."

------
ClayM
Necessity may be the mother of invention, but laziness is it's apathetic dad.

Kudos, sir. Kudos.

------
srtjstjsj
Disclosure: This is an astroturf ad for the author's employer.

It's a funny gimmick, for entertainment and to get visibility for the author's
employer's name.

------
teen
I know it a joke and all but you're not supposed to record people without
their consent, and I think this would qualify.

------
zaidf
Boggles my mind that gotomeeting doesn't have a feature to auto unmute when
your name is mentioned.

~~~
aggie
That adds a ton of complexity and opportunity for embarrassing bugs/flaws
while gaining only slight convenience. As a product manager I wouldn't touch
that.

~~~
zaidf
It may not be the simplest feature to implement but I do think we have enough
tech now to build a pretty solid feature around this.

Also, I wouldn't call it merely a slight convenience when multiple times a day
you see the pattern repeat of there being a long pause after you ask someone a
question, followed by "sorry, I was on mute."

~~~
mungoman2
Voice-chat used for gaming has had unmute on speech for more than a decade.
That solves the problem of background humming, while getting all the voice
tgrough.

------
VonGuard
I just want this as a call transcription service.... Good ones are hard to
find.

~~~
sr_banksy
Try steno: [https://s10o.com](https://s10o.com)

------
gd2
I like it, but it also sounds so wrong. Nice hack project.

------
ascotan
I approve of this. Gold star to this man.

