
Speech and Language Processing, 3rd ed. draft (2017) - panyang
https://web.stanford.edu/~jurafsky/slp3/
======
jhanschoo
On the slim chance that someone here wants to re-implement the unmodified
Kneser-Ney algorithm[0], the presentation of it by the book does not account
for unknown tokens in the query not in the vocabulary. I extended the
recurrence to its natural closure including unknown tokens here
[[https://github.com/jhanschoo/HMMTagger/blob/master/readme.pd...](https://github.com/jhanschoo/HMMTagger/blob/master/readme.pdf)].
A straightforward task, but it might take you an hour or two (probably more)
otherwise to obtain it and prove its correctness, seeing as I couldn't find an
extension in a Google search nor is it described in the original paper as
well. I believe that it would likewise be straightforward to extend this to
modified Kneser-Ney as well.

[0]: The modification of using multiple discount values due to Chen & Goodman
is regarded as a more well-behaved smoothing, and more popular today.

------
imurray
Why link to the pdf?

The webpage
[https://web.stanford.edu/~jurafsky/slp3/](https://web.stanford.edu/~jurafsky/slp3/)
links directly to the PDF, and gives context and other download options. It's
not so easy to go back from the PDF to the webpage.

Mods: I'd change the link, and the title to "...3rd Edition draft".

Everyone else: please stop linking to PDFs when there is an obvious html page
to link to instead.

~~~
sctb
Sure thing, we've updated the link from
[https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf](https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf).

------
lgessler
Note that the 3rd edition isn't entirely finished yet. A full table of planned
chapters is here:
[https://web.stanford.edu/~jurafsky/slp3](https://web.stanford.edu/~jurafsky/slp3)

------
zerkten
I found speech and language processing to be one of the most interesting
courses of my degree. Recently I decided to take a look at speech synthesis
again and discovered a book by Paul Taylor on this subject ([http://svr-
www.eng.cam.ac.uk/~pat40/](http://svr-www.eng.cam.ac.uk/~pat40/) and draft PDF
at [http://svr-www.eng.cam.ac.uk/~pat40/ttsbook_draft_2.pdf](http://svr-
www.eng.cam.ac.uk/~pat40/ttsbook_draft_2.pdf)). It is more engineering focused
than other books in this area.

~~~
woodson
Note that the book was released 2009 and the draft is from 2007. While most of
the content is still very relevant for understanding the basics and challenges
in building TTS systems, recent progress in DNN-based synthesis (including
WaveNet, GANs, and end-to-end approaches like Tacotron) is obviously not
covered.

------
contingencies
Often this stuff is used for surveillance. If you choose to study this area,
please be careful how you apply your knowledge. There are plenty of positive
ways to use it: contributing content classification systems to sci-hub or
libgen, building tools for the disabled, automating multilingual visual design
aesthetics with computational linguistics and machine learning...

~~~
ngcc_hk
Whilst I understand the concern but just Internet is now for control and
tracking (chi-na is more obvious but post someone we all know better), we
cannot stop the wheel.

Not knowing is even worst.

~~~
amelius
We need an equivalent of the hippocratic oath for software engineers: I swear
by Hephaestus, the ugly god of crafts, that I shall not use my skills for
controlling and tracking ...

~~~
myaso
Great, these who want to build these things will simply hire people who don't
subscribe to this oath ... And now the organization will be populated by a
whole tribe of people who have _no qualms_ about doing this -- does that sound
like something that will make things better?

Aside: the book looked interesting. Skimmed it, beautiful typesetting, not
gratuitously mathematical, looked readable -- I may be wrong, but it's just a
exercise of adding awareness of it's existence to me.

~~~
amelius
> Great, these who want to build these things will simply hire people who
> don't subscribe to this oath ...

How many doctors are willing to give up their title to do unethical stuff?

An ethical code will change people's perception, and that's a big thing. It
will morph the perception of Google and Facebook from cool tech companies into
dark places where no decent people want to work; unless of course these
companies are willing to change. And there will be a huge effect on politics,
other businesses, and media.

~~~
myaso
Um, how are they supposed to make money if they don't track you to a certain
extent? Do they have flaws? Yes -- it's not funny what kind of things you can
target for based AdWords, their internal staff probably aren't creative enough
to think up ways in which their tools can be used to inflict misery. Software
engineers aren't soldiers, doctors, or civil servants what kind of ethical
code do you expect? How will people be held accountable to it? What will
happen to these who make transgressions?

~~~
randcraw
And how are cigarette companies going to make money if they are hindered in
addicting you and giving you cancer? Legally of course.

Software engineers are surrogates for ALL jobs because software is used
_everywhere_. When we serve medicine, we should be obliged to take the
Hippocratic oath. When we serve the military or intel agencies (with a
security clearance), we ARE obliged to take an formal oath of secrecy. IMHO,
many other S/W roles bear no less responsibility.

I think it's time that we software infrastructuralists take our role behind
the scenes as seriously as those on the front lines do.

~~~
myaso
Really? As a former smoker I don't place cigarettes and software in even
remotely the same category. Ok -- question. How will this be enforced?
Legal/Medical/Eng have licenses to practice you can lose these licenses. I
don't need anybody's approval to write code, all the tools are available for
free. You are naive if you think some feel good words on paper can make this
work without some force backing these words.

~~~
randcraw
Let's say you're a programmer working for a tobacco company. Or for an alcohol
distillery making only the most down-market 'ripple' products, inevitably
consumed only by destitute inebriates dwelling in the dungeons of life. Do you
really imagine that you bear no responsibility for the impact of your work
when you enable the perpetuation of pain?

I've worked for several past employers whom I now disrespect (and whose
leadership since earned them this disrepute), so this issue isn't merely
hypothetical for me.

The principal question isn't about policing and punishment. It's about civic
duty as an enlightened human being. Each of us either takes responsibility for
our actions and does no harm, or we willingly do. On our part, that
necessitates continuous diligence taking an interest in how the products of
our work affects others.

Software has become an inescapable part of our society's technical and social
infrastructure. Like scientists and engineers, S/W pros bear responsibility
for how our work is used. And how it's abused. That's all I'm saying. Each of
us has to work out the details for ourself, but dismissing them outright
shirks that duty, and I believe, diminishes our humanity bit by bit.

~~~
myaso
People have the mistaken belief that the ideal possible world is actually one
with no _pain_. The best solution possible with finite resources is maybe
somewhere short of the the _best_ you imagine -- to live is to suffer as they
say.

You had shitty leaders I'm sorry about that, but maybe they were trying their
best in a difficult situation -- it's probably not all fun choices. Or maybe
they were just _assholes_ \-- updated.

I do not disagree with you on the "why" \-- as Grove said I want to know
_how_? You assume that each person can be trusted to figure this out for
themselves -- maybe some people can be, but if you look at the entire
population you will end with a distribution where more and more _force_ will
need to be used to coerce the fringe elements into compliance -- these fringes
can destabilize the entire equilibrium since it might snow ball out of control
as more and more people pile on seeing the benefits that it brings.

~~~
amelius
> but maybe they were trying their best in a difficult situation

Then let them prove it! That's one thing an ethical code will ensure.

~~~
myaso
There is a asymmetry of how they percieve the world and you do -- the
difference in available information might not be able to let them make choices
that would satisfy your standards.

~~~
amelius
That's what communication is for. It also happens to be a function of an
ethical code.

------
hackernewsacct
What are good resources for beginners wanting to learn about natural language
processing? Are there any good books, tutorials, courses, etc?

~~~
brian_spiering
[https://github.com/keon/awesome-nlp](https://github.com/keon/awesome-nlp)

------
JabavuAdams
I need a voice activity detection module (VAD) for my wearable computer.
Should I roll my own, or use someone else's (open-source). My immediate need
is speaker-dependent (just me), but it would be nice if I could offer up a
speaker-independent version eventually.

~~~
woodson
Check out this one: [https://github.com/wiseman/py-
webrtcvad](https://github.com/wiseman/py-webrtcvad)

If this one does not work for your application, perhaps look into simpler ones
like the ones used in mobile telephone codecs or in Speex.

~~~
JabavuAdams
Thanks! This is exactly what I needed. I was able to get it up and running on
Raspberry Pi with just "pip3 install webrtcvad", and the quality is at least
good enough to get me started.

------
akditer
very good theoretical information

