Show HN: WordsAPI

wing328hk · on Jan 6, 2015

Nice project.

I've created SDKs (REST API wrappers) in Python, PHP, Ruby, Java, C#, Android for the WordsAPI:

http://restunited.com/releases/424223873313015558/wrappers

(Objective C, Scala, ActionScript SDKs still in beta)

Hope these SDKs make it easier for developers to consume the API.

impostervt · on Jan 6, 2015

Holy crap, very cool! Very neat service too! I'll link to these from the documentation page later today.

atmosx · on Jan 7, 2015

In how many hours did you do that?!

wing328hk · on Jan 8, 2015

Definitely less than an hour (around 30 minutes if I recall correctly) as the SDKs, documentation, sample codes are automatically generated after entering the endpoint definition. Please give it a try at http://restunited.com.

atmosx · on Jan 9, 2015

Oh good to know!

miket · on Jan 5, 2015

Looking forward to seeing further development!

For a more comprehensive word API, check out the excellent Wordnik API: http://developer.wordnik.com/docs.html#!/word

esperluette · on Jan 5, 2015

You can get Wordnet through the Wordnik API too ... as well as from the American Heritage Dictionary, The Century Dictionary, Wiktionary, and the GNU version of the Collaborative International Dictionary of English

And Wordnik is becoming a not-for-profit, if that makes a difference to you. :-)

Disclaimer: I am the founder of Wordnik. (You might like my TED talk: http://www.ted.com/talks/erin_mckean_redefines_the_dictionar...)

That said, I know the guy behind WordsAPI and he's good people. :-)

deadlysyntax · on Jan 5, 2015

I absolutely love your TED talk.

esperluette · on Jan 6, 2015

aw, thank you!

RKoutnik · on Jan 5, 2015

Slightly disappointed that it doesn't have syllables. Should be easy to add, I have a list here: https://github.com/SomeKittens/Haiku-Generator

sumgy · on Jan 6, 2015

Out of curiosity, how did you generate that list? I looked into doing something similar a couple months back, but i couldn't find nearly as complete of a source for syllable count.

RKoutnik · on Jan 6, 2015

I started with the list from this question [0] and then wrote a one-off script to convert it to the form linked above.

http://stackoverflow.com/q/10414957/1216976

impostervt · on Jan 7, 2015

Just got done adding syllables and pronunciation.

abhididdigi · on Jan 5, 2015

Thank you. I was looking for an API only yesterday, and didn't find any that is as good as this one. Even the Princeton University one didn't fit into my workflow, because it is overly complex.

impostervt · on Jan 5, 2015

That's pretty much why I built this. It's a great resource, but geared towards lexicographers vs developers.

ghchinoy · on Jan 5, 2015

This is awesome! I'd love to hear more about how this was created (node/express, etc.)!

impostervt · on Jan 5, 2015

It's a Node/express app. The word data is in Postgres (hosted by heroku), user/metric data is in mongo (via mongolab.com). Hosting is from heroku fronted by cloudflare.

kamladi · on Jan 6, 2015

What lead you to use Mongo for userdata/metrics and Postgres for words? Are there specific features of each that you're using? I'm new to both, so just trying to learn which use cases prefer one over the other...

ghchinoy · on Jan 6, 2015

Thank you! Impressive. How do you limit anonymous web user's queries?

impostervt · on Jan 6, 2015

Requests made for the demo don't have an access token. On the back end I look for this case, and then see if the request has "when" and "encrypted" parameters. "when" is just a date/time stamp, and "encrypted" is the same thing..encrypted. If I see both those params, I decrypt the "encrypted" and make sure it matches the "when" to validate the server created it, and make sure the "when" is less than one hour.

Otherwise, all requests require an access token.

bovermyer · on Jan 6, 2015

What do you use to cache requests?

impostervt · on Jan 6, 2015

Redis & cloudflare.

kamladi · on Jan 6, 2015

What lead you to use Mongo for userdata/metrics and Postgres for words? Are there specific features of each that you're using? I'm new to both, so just trying to learn which use cases prefer one over the other...

kamladi · on Jan 6, 2015

What lead you to use Mongo for userdata/metrics and Postgres for words? Are there specific features of each that you're using? I'm new to both, so just trying to learn which use cases prefer one over the other...

kamladi · on Jan 6, 2015

What lead you to use Mongo for userdata/metrics and Postgres for words? Are there specific features of each that you're using? I'm new to both, so just trying to learn which use cases prefer one over the other...

mtmail · on Jan 5, 2015

Is this a thin wrapper over http://wordnet.princeton.edu/wordnet/ or does it/will it go beyond that?

andreyf · on Jan 5, 2015

> Princeton University makes WordNet available to research and commercial users free of charge provided the terms of our license are followed, and proper reference is made to the project using an appropriate citation. [1]

Interesting! I never realized this.

1. http://wordnet.princeton.edu/wordnet/download/

logn · on Jan 6, 2015

That's nearly the BSD license. Though if I recall, some of the data files have unique licenses.

impostervt · on Jan 5, 2015

For now, that's the source of most data. I'll be adding more as time goes on. First up is pronunciation.

zo1 · on Jan 5, 2015

Then I'd suggest you properly attribute them for the data, especially if you're going to charge people for access (in my opinion, IANAL). See their site on the matter:

http://wordnet.princeton.edu/wordnet/citing-wordnet/

impostervt · on Jan 5, 2015

You're right, of course. I turned the "about" page on. Was holding off until I could spruce it up a bit, but I guess it's ok for now. Wasn't expecting hackernews to really jump on this.

bbcbasic · on Jan 5, 2015

On their site it says: "Due to funding and staffing issues, we are no longer able to accept comment and suggestions."

What if you give a % of your revenue to them to help them, in return you could be the recommended API on their site?

DeepakShah · on Jan 5, 2015

aren't we supposed to cite the use of WordNet?

zmillman · on Jan 5, 2015

Hmm, what about word stemming? I looked up "windmills" and got an empty result set.

byoung2 · on Jan 5, 2015

Same with "walked". I've used the Porter Stemming Algorithm in the past, and it works well.

http://tartarus.org/martin/PorterStemmer/

chrisfarms · on Jan 6, 2015

The data is stored in postgres, so it should be simple enough to use the Snowball dictionary/stemmer and the tsvector/tsquery functions to sort this out.

lotophage · on Jan 6, 2015

What you really want is a lemmatizer (stemming approximates lemmatization). I believe that NLTK has a WordNet lemmatizer, but I don't know much about it.

impostervt · on Jan 5, 2015

I'll have to see if I can find a good library for this. The ones I tried (like Node Natural) just didn't give great results.

las_cases · on Jan 6, 2015

It is blazing fast for me. I see from another response that this is a Node.js app but perhaps caching might also explain how fast this is. Also, today I have learned that jazz also means "have sexual intercourse with".

impostervt · on Jan 6, 2015

Got a great speedup when I added Redis. When a word is first requested, the JSON is put together from the Postgres database tables, then just stuff it in Redis for subsequent requests.

Since there have been so many requests, most common words are in Redis at this point.

las_cases · on Jan 6, 2015

I have stumbled upon Redis a lot in really cool projects so I definitely need to take a deeper look into it.

WhitneyLand · on Jan 6, 2015

Nice work. Is the order of definitions supposed to be correct? The word "fast" has a first definition of "unrestrained by convention or morality", yet I would expect that to be lower down the list.

impostervt · on Jan 6, 2015

The order doesn't convey any meaning.

SwellJoe · on Jan 6, 2015

I would love to see this made available in an editor. When writing, if I could pull synonyms and definitions up instantly with a single keypress, I suspect my prose would be better. I like to think I have a good vocabulary, but this great blog post (which I found through HN) has had me thinking I could always, and should try to, do better: http://jsomers.net/blog/dictionary

Imagine something akin to tab completion for writing prose.

Swizec · on Jan 6, 2015

Please don't. Thesaurus prose is some of the worst writing out there. You can always tell when a writer tries to go [too far] out of their natural active vocabulary.

Basic rule of thumb for writing prose: synonyms are a myth. No two words have exactly the same meaning.

SwellJoe · on Jan 6, 2015

Did you read the article I mentioned? I believe that context is important.

deepGem · on Jan 6, 2015

Another spin on this is to build an editor for GRE/GMAT exam prep. Huge market.

PeterWhittaker · on Jan 6, 2015

The results for thesaurus, action (second word to pop into my head), and everything are interesting: Short set, long, long set, empty set.

iguana · on Jan 5, 2015

This is really cool!

One issue I found is that it provides alternative spellings as distinct items. Is there a workaround for this?

{ "typeOf": [ "chromatic color", "chromatic colour", "spectral colour", "spectral color", "citrus", "citrus tree", "pigment", "citrous fruit", "citrus fruit" ] }

saganus · on Jan 6, 2015

Very nice work. Interesting what can be done with this.

One question though, does anyone have any idea what the copyright for this would be? If you happen to use X dictionary (one for each implemented language for example, or even maybe as a selectable source), would say, the Oxford Dictionary could sue because you are using their data? AFAIK, facts are not copyrightable, are they? where would this stand?

senorgusto · on Jan 6, 2015

I wish synonyms were sorted according to usage frequency instead of alphabetically... maybe its not possible with the data source though?

impostervt · on Jan 6, 2015

Not possible currently. I'd like to find a source of "how common is this word" and add some kind of quantifiable number to each word. Perhaps I can scan existing open source text and just figure it out. On the backlog.

jobposter1234 · on Jan 6, 2015

Would love to see something like this but that would expand abbreviations. E.g., corp -> corporation (and the reverse).

kevinweaver · on Jan 5, 2015

"An API for for the English language."

For For!

impostervt · on Jan 5, 2015

Omg that's embarrassing. Fixing it now.

hellbanner · on Jan 6, 2015

What is the minimum set of atoms needed to construct the rest of the English language?

nacnud · on Jan 6, 2015

The alphabet?

hellbanner · on Jan 7, 2015

I meant words.

bdoerrfeld · on Jan 11, 2015

Is https://www.wordsapi.com down for maintenance? I'm getting error responses. Thanks!

_bitliner · on Jan 6, 2015

I was wondering which is your market. I mean, who is going to use a service like this? Or which are typical use cases of it?

jfoster · on Jan 6, 2015

Might be useful for search. (not necessarily just web search, but ecommerce as well)

For example, instances of "aqua" should probably match the search query "blue". Google seems like it may already be that advanced, but other search engines perhaps not. Large-scale search engines probably would keep this in their own DB, though.

impostervt · on Jan 6, 2015

No idea yet. Seems to have gotten a good response from hacker news/product hunt crowd. I mainly built it because I needed it for another project.

bilalel · on Jan 5, 2015

Hi,

Nice project! What about adding example of sentences?

impostervt · on Jan 5, 2015

Good idea - I'll add that to my todo list.

dotwebdull · on Jan 7, 2015

How did you build the is_a relationships (eg, Person is_a Animal)?

Ontology engine? If so, what's your source?

gosukiwi · on Jan 6, 2015

Nice! It seems to be quite useful.

nijiko · on Jan 5, 2015

Not receiving verification emails.

impostervt · on Jan 5, 2015

Sorry about that. Using sendgrid, which seems to delay emails from new accounts for a bit.

darkhorn · on Jan 5, 2015

How does it know that finger is part of hand?

deepGem · on Jan 6, 2015

This is wordnet++ Good going.

dspoka · on Jan 6, 2015

Can anyone compare the pros and cons of this to princeton's wordnet?

jbpadgett · on Jan 6, 2015

Could this be used in some way for robot AI to learn to speak?

pseudometa · on Jan 5, 2015

Looks great!

dested · on Jan 5, 2015

Great service

okonomiyaki3000 · on Jan 6, 2015

What? It has no mode for finding anagrams? Useless.