
Show HN: I built a news aggregator/knowledge graph - emrgx
http://emergentdata.co
======
purplecones
Hi this is great! I have some questions.

I'm curious how you decided to model the data in your Neo4J database. How did
you do the 'Suggested Readings' section? How does the cipher query look that
drives that.

How do you like using AlchemyAPI? Is it doing all the NLP stuff for you?

~~~
emrgx
Alchemy is doing all the NLP. Each article is extracted for concepts and
entities (as defined by Alchemy in their documentation). I normalize each term
that is extracted in order to prevent duplicates (there are some duplicates
that still sneak through so it still requires a little bit of data
maintenance). So the way this looks is that their is one node for a term say
"Machine Learning." In one article "Machine Learning" is a concept with a
negative sentiment and high relevance and another article it is an entity with
low relevance but positive sentiment. The relationships house the sentiment
and relevance properties: (machine_learning)-[relevance,sentiment]-(article).

The suggested readings sections pulls the most relevant concept of that
article and finds connected articles with the same concept at a high
relevance. This way suggested articles are more than just key word hits. It's
all about relevance. I'm still continuing to tweak this query and there's a
lot more that can be done with it such as matching sentiment and emotion. As
the dataset grows I'll look to add a feature that pulls a list of articles
based on a cluster of highly associated entities.

As for Alchemy, I've tried a number of different NLP APIs and, in my opinion,
none of them have come close to matching Alchemy's accuracy. It does make
mistakes but at a low enough level that it's easy to manually correct.

~~~
purplecones
Thanks for the background. I'm working on a similar project but currently
parsing news articles using a collection of specific rss feeds and calling
Google's NLP API with the text. It sounds like AlchemyAPI seems be a better
fit in this case.

How are you finding Neo4J is handling the scale of reading and writing all
these stories? I've had a positive experience so far but I'm only in the few
thousands range.

~~~
emrgx
Neo4j handles read/write seamlessly I have found, but I'm only around 10,000
nodes and 20,000+ edges. I've heard use cases for Neo4j in the range of 50M+
nodes. My position on this is not whether Neo4j can handle it but whether your
code and infrastructure can.

------
emrgx
Hi HN: I built a curative news feed covering advancements in technology and
global issues. I'm utilizing Neo4j and AlchemyAPI, as well as some custom
code, to create a knowledge graph in the background. Have a few ideas of
additional features for the dataset but would love to hear some feedback.

