1.) You never set background-color. Not setting background-color results
in unintended consequences on any browser configured to have a default
other than the expected white background. A good example in your case is
visited links become grey on grey (assuming a browser with grey default
background).
2. You need to be cautious of and properly handle notorious
webspam/blogspam regurgitation sites like "recode.net":
Finding the original source can be a PITA, but it's still much better
than rewarding spam regurgitation sites with traffic.
3.) Having a "sign in" icon link on every single article listing looks
odd. Is it the default replacement for some voting mechanism that's only
available to logged-in users?
4.) Grey text on a white background is common fare on the web (including
hn) to de-emphasize some text, but it's a poor design decision in terms
of accessibility. People with even minor visual impairments (i.e. poor
vision) have difficulty reading grey text on white backgrounds.
All in all, it looks interesting. Good Luck With It!
1. Got it. We'll set a background color in the body.
2. I agree. We will think about ways to handle it. As a last resort we may blacklist those kind of publishers.
3. Sign in is only needed to favorite stories (using the little bookmark icon below each line). I agree that it is not clear what the sign in is for, we will fix that.
On #2, using binary block/allow lists for sites works, but long term, it
can be problematic in the sense of maintenance effort (i.e. your time
;). A less common but more interesting approach is to use site-based
weighting. There will still be some manual wrangling involved, but you
might be able to automate some of it eventually (analysis of content on
the site, dns, linked-from analysis, google page-rank lookup, etc.)
With site-based weighting, you can still block with a threshold, but you
gain the benefit of adjusting rankings based on past/known site quality,
and the advantage of setting with a "default weight" for unknown sites
you've never seen before.
It's really just a different approach that you might want to consider to
handle the webspam/blogspam problem.
We already have mechanisms like weighting, whitelisting, blacklisting, etc. in place. We generally trust a good quality source (site domain) to produce good quality content, but there are exceptions. Since our popularity scoring algorithm relies heavily on social signals, it trusts the crowd to reward good content, and ultimately good content will rank higher.
>notorious webspam/blogspam regurgitation sites like "recode.net"
Actually recode.net does a lot of original reporting in addition to regurgitation. Disclosure: some of my former CNET/CBS colleagues are reporters there.
The problem (and I've seen this when building http://recent.io/ as well) is that every news organization does this to some extent. If organization X has a scoop, Y and Z will "follow" it by summarizing and rewriting it. This has been going on for over a century; the AP, founded in the 1840s, does it very well.
In the most egregious cases like the one above it's just a theoretically fair-use excerpt from the original story followed by a "read more" link. But sometimes it serves your users to link to the followup coverage instead. That's when the original article (FT, WSJ, Economist, etc.) may be behind a paywall, or when the followups have more context or additional details.
Even Google News only mostly gets this right: it tries to group stories on the same topic, but sometimes it has multiple buckets for the same topic, and sometimes an unrelated story gets thrown into the wrong bucket.
I'm curious why you chose the categories that you did. Why focus on business/tech instead of having a broader focus on more mainstream current event news?
We decided to start with just a few topics that we are relatively familar with to get a sense of the quality of the curation. We plan to expand to broader and more mainstream topics in the future.
The quality of the curation, at least for my taste and at least at this moment, is exceptional. Maybe the best algorithmic curation I've seen.
I'm curious what inspired you. There are many algorithmically curated news feeds; why add another? Is it because you believe your algorithm is a leap forward? Elsewhere you say it's mostly the same one used by HN.
I'm not sure I agree. I browsed to tech news and got 3 of the same stories from different web sites. The diversity of web sites is great, but recognizing that you are repeating the same story 3 or 4 times is important as well.
> The quality of the curation, at least for my taste and at least at this moment, is exceptional. Maybe the best algorithmic curation I've seen.
Glad you liked it, thanks!
> There are many algorithmically curated news feeds; why add another? Is it because you believe your algorithm is a leap forward? Elsewhere you say it's mostly the same one used by HN.
There are many attempts at "personalized news". Also some niche "community curated" news (influenced by HN and Reddit). And some mostly algorithmic, like the new Digg and Techmeme. We fit on this third group, and are just experimenting to see what we can come up with quality wise.
Thank you for making this, this is exactly what I ever wanted to have and started few times making and did a half assed effort. So thank you. Exactly what I need. Perfect.
> Where and how do you get your news? How do you order them?
We get the news via data feeds like RSS and social networks. The ranking algorithm is similar to the HN one (http://amix.dk/blog/post/19574), but instead of upvotes it uses our popularity score system.
> And why only 30 in each category?
In the future we may add mechanisms to let users browse more links, like a 'Show more' button below the links list, or an Archive section.
> Other than that, nice design. Bookmarked, will see. Btw, I guess comments would be nice.
When you add more topics, I would pay for API access to get an hourly list of popular stories in each topic. If this is ever added please let me know :-)
We're thinking about doing at least daily digests – with the top x stories of the last 24 hours – for each topic, via email and/or RSS. Thanks for your suggestion!
It could be "popularity" or "trending factor" instead. Basically the same way stories are ranked on HN (http://amix.dk/blog/post/19574), but instead of upvotes it uses our popularity score system.
The stories are curated and ranked by an algorithm. We give it a list of publishers, and it does most the work, however sometimes a little human moderation is still needed for more clean, on-topic results. The newest and most popular stories appear at the top, like on Hacker News (we use a similar ranking algorithm). To measure the popularity of the stories, the algorithm uses social signals.
I think I'd echo what jcr said: it needs an additional human layer, to nail down the original source of articles. Else it's just another random aggregator site with a random set of sources, however pretty.
2. You need to be cautious of and properly handle notorious webspam/blogspam regurgitation sites like "recode.net":
"Wal-Mart Scammed Into Selling PlayStation 4 for $90" http://recode.net/2014/11/19/wal-mart-scammed-into-selling-p...
Which leads to:
http://www.cnbc.com/id/102197050
Which leads to:
http://consumerist.com/2014/11/18/terrible-people-create-fak...
Which leads to:
http://kotaku.com/people-are-scamming-walmart-with-bogus-che...
Finding the original source can be a PITA, but it's still much better than rewarding spam regurgitation sites with traffic.
3.) Having a "sign in" icon link on every single article listing looks odd. Is it the default replacement for some voting mechanism that's only available to logged-in users?
4.) Grey text on a white background is common fare on the web (including hn) to de-emphasize some text, but it's a poor design decision in terms of accessibility. People with even minor visual impairments (i.e. poor vision) have difficulty reading grey text on white backgrounds.
All in all, it looks interesting. Good Luck With It!