
Show HN: Languagecrunch – NLP server Docker image - artpar
https://hub.docker.com/r/artpar/languagecrunch/
======
syllogism
Nice!

Relevant links for anyone interested:

* spaCy on Github: [https://github.com/explosion/spacy](https://github.com/explosion/spacy)

* NER demo: [https://demos.explosion.ai/displacy-ent/](https://demos.explosion.ai/displacy-ent/)

* Neural coref by HuggingFace: [https://huggingface.co/coref/](https://huggingface.co/coref/)

* Accuracy of built-in spaCy models: [https://spacy.io/usage/facts-figures](https://spacy.io/usage/facts-figures)

Last time I calculated, the lowest cost way to run spaCy in the cloud was on
Google Compute Engine ns1-standard pre-emptable instances. It should be over
100x cheaper per document than using Google, Amazon or Microsoft's cloud NLP
APIs. Accuracy will depend on your problem, but if you have your own training
data, performance should be similar.

~~~
Xeoncross
I've run spaCy on small $10/m linux VPS instances. What do you mean by
cheapest? I'm sure you're right, but what volume are you referring too?

~~~
syllogism
I'm referring to best price per word when the service is continually active.
Like, if you want to parse a web dump, what type of instance do you provision
a bunch of?

------
ivan_ah
What do you need on the docker host machine to run this? Any specific docker
version? GPU?

Also, it would be useful to see the Dockerfile or script that generated this
img.

~~~
artpar
No special requirement. I run it on a ubuntu 16 server for production and
locally on osx.

Added GPU would probably help both spacy and neuralcoref in performance.

> Also, it would be useful to see the Dockerfile or script that generated this
> img.

Will put it on github shortly.

------
tobilg
Seems interesting! How can it be used with other languages than English?

~~~
artpar
So spacy has support for these languages [1] and wordnet has support for these
[2], but neuralcoref (pronoun resolution endpoint) is available only for
english.

This current docker image is not exposing those other languages but I can
expose them in an update if it helps a lot of people.

[1] [https://spacy.io/usage/models](https://spacy.io/usage/models) [2]
[http://compling.hss.ntu.edu.sg/omw/](http://compling.hss.ntu.edu.sg/omw/)

~~~
tobilg
Thanks for the insights. Could you please share the Dockerfile so that one can
make the other languages work?

~~~
artpar
[https://github.com/artpar/languagecrunch](https://github.com/artpar/languagecrunch)

------
laretluval
The demo examples are wrong or don't make much sense.

"Donald Trump's administration" is not a person.

In the following example, "The currency" is not a subject and "India" is not
an object.

I don't know how much useful information is extracted by this system.

~~~
syllogism
That example is a tweet, which the syntax and NER models haven't been trained
on. You can make calls to `nlp.update()` to improve it on your own data. We
also have an annotation tool, [https://prodi.gy](https://prodi.gy) , to more
quickly create training data.

(I'm the author of spaCy, not this Docker container.)

~~~
laretluval
SpaCy is wonderful, I've used it a lot over the years and I have high
confidence in its output.

I just wish the author of this docker container chose demo sentences that
advertised it better.

------
artpar
Source on github:
[https://github.com/artpar/languagecrunch](https://github.com/artpar/languagecrunch)

------
stevemk14ebr
@artpar can you post some docs on the endpoints and how they should be used. I
want to tie this into a speech to text system but i need more api info

~~~
artpar
Added docs on the bottom of the readme.

[https://github.com/artpar/languagecrunch](https://github.com/artpar/languagecrunch)

------
zengid
Cool! What corpus was this trained on?

~~~
artpar
Using the "en_core_web_lg" for spacy [1] and neuralcoref along with the pre-
trained models on github [2].

[1] [https://spacy.io/models/en](https://spacy.io/models/en) [2]
[https://github.com/huggingface/neuralcoref/tree/bee05b1b55e3...](https://github.com/huggingface/neuralcoref/tree/bee05b1b55e30bf1b4db3f9b931928dcf1c0ffe5/neuralcoref/weights)

