Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: WordsAPI (wordsapi.com)
263 points by impostervt on Jan 5, 2015 | hide | past | favorite | 75 comments



Nice project.

I've created SDKs (REST API wrappers) in Python, PHP, Ruby, Java, C#, Android for the WordsAPI:

http://restunited.com/releases/424223873313015558/wrappers

(Objective C, Scala, ActionScript SDKs still in beta)

Hope these SDKs make it easier for developers to consume the API.


Holy crap, very cool! Very neat service too! I'll link to these from the documentation page later today.


In how many hours did you do that?!


Definitely less than an hour (around 30 minutes if I recall correctly) as the SDKs, documentation, sample codes are automatically generated after entering the endpoint definition. Please give it a try at http://restunited.com.


Oh good to know!


Looking forward to seeing further development!

For a more comprehensive word API, check out the excellent Wordnik API: http://developer.wordnik.com/docs.html#!/word


You can get Wordnet through the Wordnik API too ... as well as from the American Heritage Dictionary, The Century Dictionary, Wiktionary, and the GNU version of the Collaborative International Dictionary of English

And Wordnik is becoming a not-for-profit, if that makes a difference to you. :-)

Disclaimer: I am the founder of Wordnik. (You might like my TED talk: http://www.ted.com/talks/erin_mckean_redefines_the_dictionar...)

That said, I know the guy behind WordsAPI and he's good people. :-)


I absolutely love your TED talk.


aw, thank you!


Slightly disappointed that it doesn't have syllables. Should be easy to add, I have a list here: https://github.com/SomeKittens/Haiku-Generator


Out of curiosity, how did you generate that list? I looked into doing something similar a couple months back, but i couldn't find nearly as complete of a source for syllable count.


I started with the list from this question [0] and then wrote a one-off script to convert it to the form linked above.

http://stackoverflow.com/q/10414957/1216976


Just got done adding syllables and pronunciation.


Thank you. I was looking for an API only yesterday, and didn't find any that is as good as this one. Even the Princeton University one didn't fit into my workflow, because it is overly complex.


That's pretty much why I built this. It's a great resource, but geared towards lexicographers vs developers.


This is awesome! I'd love to hear more about how this was created (node/express, etc.)!


It's a Node/express app. The word data is in Postgres (hosted by heroku), user/metric data is in mongo (via mongolab.com). Hosting is from heroku fronted by cloudflare.


What lead you to use Mongo for userdata/metrics and Postgres for words? Are there specific features of each that you're using? I'm new to both, so just trying to learn which use cases prefer one over the other...


Thank you! Impressive. How do you limit anonymous web user's queries?


Requests made for the demo don't have an access token. On the back end I look for this case, and then see if the request has "when" and "encrypted" parameters. "when" is just a date/time stamp, and "encrypted" is the same thing..encrypted. If I see both those params, I decrypt the "encrypted" and make sure it matches the "when" to validate the server created it, and make sure the "when" is less than one hour.

Otherwise, all requests require an access token.


What do you use to cache requests?


Redis & cloudflare.


What lead you to use Mongo for userdata/metrics and Postgres for words? Are there specific features of each that you're using? I'm new to both, so just trying to learn which use cases prefer one over the other...


What lead you to use Mongo for userdata/metrics and Postgres for words? Are there specific features of each that you're using? I'm new to both, so just trying to learn which use cases prefer one over the other...


What lead you to use Mongo for userdata/metrics and Postgres for words? Are there specific features of each that you're using? I'm new to both, so just trying to learn which use cases prefer one over the other...


Is this a thin wrapper over http://wordnet.princeton.edu/wordnet/ or does it/will it go beyond that?


> Princeton University makes WordNet available to research and commercial users free of charge provided the terms of our license are followed, and proper reference is made to the project using an appropriate citation. [1]

Interesting! I never realized this.

1. http://wordnet.princeton.edu/wordnet/download/


That's nearly the BSD license. Though if I recall, some of the data files have unique licenses.


For now, that's the source of most data. I'll be adding more as time goes on. First up is pronunciation.


Then I'd suggest you properly attribute them for the data, especially if you're going to charge people for access (in my opinion, IANAL). See their site on the matter:

http://wordnet.princeton.edu/wordnet/citing-wordnet/


You're right, of course. I turned the "about" page on. Was holding off until I could spruce it up a bit, but I guess it's ok for now. Wasn't expecting hackernews to really jump on this.


On their site it says: "Due to funding and staffing issues, we are no longer able to accept comment and suggestions."

What if you give a % of your revenue to them to help them, in return you could be the recommended API on their site?


aren't we supposed to cite the use of WordNet?


Hmm, what about word stemming? I looked up "windmills" and got an empty result set.


Same with "walked". I've used the Porter Stemming Algorithm in the past, and it works well.

http://tartarus.org/martin/PorterStemmer/


The data is stored in postgres, so it should be simple enough to use the Snowball dictionary/stemmer and the tsvector/tsquery functions to sort this out.


What you really want is a lemmatizer (stemming approximates lemmatization). I believe that NLTK has a WordNet lemmatizer, but I don't know much about it.


I'll have to see if I can find a good library for this. The ones I tried (like Node Natural) just didn't give great results.


It is blazing fast for me. I see from another response that this is a Node.js app but perhaps caching might also explain how fast this is. Also, today I have learned that jazz also means "have sexual intercourse with".


Got a great speedup when I added Redis. When a word is first requested, the JSON is put together from the Postgres database tables, then just stuff it in Redis for subsequent requests.

Since there have been so many requests, most common words are in Redis at this point.


I have stumbled upon Redis a lot in really cool projects so I definitely need to take a deeper look into it.


Nice work. Is the order of definitions supposed to be correct? The word "fast" has a first definition of "unrestrained by convention or morality", yet I would expect that to be lower down the list.


The order doesn't convey any meaning.


I would love to see this made available in an editor. When writing, if I could pull synonyms and definitions up instantly with a single keypress, I suspect my prose would be better. I like to think I have a good vocabulary, but this great blog post (which I found through HN) has had me thinking I could always, and should try to, do better: http://jsomers.net/blog/dictionary

Imagine something akin to tab completion for writing prose.


Please don't. Thesaurus prose is some of the worst writing out there. You can always tell when a writer tries to go [too far] out of their natural active vocabulary.

Basic rule of thumb for writing prose: synonyms are a myth. No two words have exactly the same meaning.


Did you read the article I mentioned? I believe that context is important.


Another spin on this is to build an editor for GRE/GMAT exam prep. Huge market.


The results for thesaurus, action (second word to pop into my head), and everything are interesting: Short set, long, long set, empty set.


This is really cool!

One issue I found is that it provides alternative spellings as distinct items. Is there a workaround for this?

{ "typeOf": [ "chromatic color", "chromatic colour", "spectral colour", "spectral color", "citrus", "citrus tree", "pigment", "citrous fruit", "citrus fruit" ] }


Very nice work. Interesting what can be done with this.

One question though, does anyone have any idea what the copyright for this would be? If you happen to use X dictionary (one for each implemented language for example, or even maybe as a selectable source), would say, the Oxford Dictionary could sue because you are using their data? AFAIK, facts are not copyrightable, are they? where would this stand?


I wish synonyms were sorted according to usage frequency instead of alphabetically... maybe its not possible with the data source though?


Not possible currently. I'd like to find a source of "how common is this word" and add some kind of quantifiable number to each word. Perhaps I can scan existing open source text and just figure it out. On the backlog.


Would love to see something like this but that would expand abbreviations. E.g., corp -> corporation (and the reverse).


"An API for for the English language."

For For!


Omg that's embarrassing. Fixing it now.


What is the minimum set of atoms needed to construct the rest of the English language?


The alphabet?


I meant words.


Is https://www.wordsapi.com down for maintenance? I'm getting error responses. Thanks!


I was wondering which is your market. I mean, who is going to use a service like this? Or which are typical use cases of it?


Might be useful for search. (not necessarily just web search, but ecommerce as well)

For example, instances of "aqua" should probably match the search query "blue". Google seems like it may already be that advanced, but other search engines perhaps not. Large-scale search engines probably would keep this in their own DB, though.


No idea yet. Seems to have gotten a good response from hacker news/product hunt crowd. I mainly built it because I needed it for another project.


Hi,

Nice project! What about adding example of sentences?


Good idea - I'll add that to my todo list.


How did you build the is_a relationships (eg, Person is_a Animal)?

Ontology engine? If so, what's your source?


Nice! It seems to be quite useful.


Not receiving verification emails.


Sorry about that. Using sendgrid, which seems to delay emails from new accounts for a bit.


How does it know that finger is part of hand?


This is wordnet++ Good going.


Can anyone compare the pros and cons of this to princeton's wordnet?


Could this be used in some way for robot AI to learn to speak?


Looks great!


Great service


What? It has no mode for finding anagrams? Useless.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: