This is a very early proof of concept and any suggestions on how to make it better are welcome!
Just remove the unnecessary semi-colon at the end of the code ;-)
We initially tried training the classifier only with GitHub based samples and using the user-given tags from there. Although we grouped the tag base into a reasonable number of distinct categories, the way how GitHub users tag their projects turned out to be just too inconsistent and often unrelated to the titles, so manual tagging was seen as a better option for getting decent results fast enough.
If you have any more specific questions feel free to drop me a mail to firstname.lastname@example.org
Getting to the front page via 'Show HN' was very helpful to us. It'd be nice (for others too) to be able to both replicate that success, and soften the blow when you get a grand total of 2 upvotes.
I think it really boils down to making sure that people coming to your site find something new and creative all the time - to help turn lurkers or one-time visitors into repeat visitors. I think PH does that quite well with their podcasts, daily digests, twitter updates (though they are forced but they do work), etc. Also you're building a community site so if the traffic dies in a month keep at it, it looks sites like these take many years to gain that traction. Basically I think you have something really great going here, just make sure to focus on bringing the visitors back and you will definitely have a winner!
I know it was just a joke, but it really doesn't apply to this case.
I'm now feeling the need to properly contextualize hyperbolic jokes, so that people reading don't take it literally.
Wondered if maybe having the list for today, then perhaps some other recent options in a slimmer format either beside or below?
Nice work though!
A feature ive been missing on hackernews, that perhaps you'd be willing to add, is a community written tldr for each link.
I.e apart from title and link, a short (200chars or so) description anf tldr.
The vector of an article can be obtained by summing the vectors of its words (minus stop words). For a topic you just sum up 5-10 of the topic keywords. You don't need to exhaustively list all the topic keywords because word2vec automatically maps them in close vicinity.
This system has the advantage that you don't need a training dataset. It's unsupervised learning coupled with a small amount of supervised topic pointers.
I have a question, does it also index submissions which never make it to the main show page?
Also, shameless plug : I am hosting an event inspired by ShowHn in Hyderabad, India ( showhyd.com )
I basically used reddit's Bigquery data for the dataset (it's huge!). My algorithm and code is here.
As HackerHunt says, /shownew is just a place that awesome new stuff is hidden.
Two bug reports:
1. Do you have a way to receive bug reports other than HN? :)
2. After doing a search, the left category menu disappears, and stays disappeared even after clicking the HH "home" link at top left. This is true for FF and Chromium latest-ish on LinuxMint.
There are two possible bugs here:
a) do you actually want the left menu to disappear, and b) what your intent is for clicking the top left "HH".
- Go to HH.
- Search for something. Results appear as you type, nice. No indication from browser that a new page is loading; guessing no load by design. But left menu disappears.
- Manually erase search bar. Menu back.
- Type out a search again, menu disappears.
- Click "HH" at top left. Browser indicates a page is loading", but the search is not erased and (therefore?) the menu is still missing.
- Re-enter HH either by typing the URL into the browser location and clicking "make it so", or by clicking in from another site (like HN). Search field is empty, therefore the left menu is available.
EDIT: This was going to be a separate bug, but I think it's related to
Scrollbar behavior is buggy.
- Clear site cookies. ("It's the only way to be sure.")
- Don't click anything, just move the mouse around and scroll, with
mousewheel or dragging scrollbar. Scrollbar intact, entire page
- Click in search field, don't type anything.
Scrollbar disappears, mousewheel scrolling has no effect,
regardless of where the mouse hovers.
Entire page jumps slightly to right, appearing to "chase" the
- Type something in search field that gets results.
Scrollbar returns, top of scrollbar is even with bottom of search
field, page does not jump back;
I'm guessing this is "your" scrollbar
rather than the browser's scrollbar. Mousewheel only has effect if
mouse is hovered below the search field,
in the area region where the scrollbar exists.
- Click on any non-active area outside the search field. Search field
jumps left very slightly.
Scrollbar is back to full length (browser's scrollbar?),
but there are now two separate scrolling areas:
- Hover mouse at or above search field level. The entire original
front page, including the missing menu and the default "Today" list of
sites, scrolls up into the area from viewport top to bottom level of
search field (which also scrolls up and away with the rest of the
page). Search results do not scroll.
- Hover mouse below the search field, mousewheel scrolls the search
results, phantom page at top of viewport does not scroll.
- Drag the scrollbar, the "top" scroll area scrolls.
I find this kind of post-factum concerns funny as well (same with banning Mein Kampf etc today) -- it was 1933-1945 when the German people should have been really concerned about Hitler, not 2017. The new threats of today don't need Mein Kampf or the freedom to use "HH" initials -- they can make their own stuff even if they care for the Nazis, and usually they have their own, 2017 names and agendas to use anyway. So all it does it reduce some historical guilt.
If anything, you should be grateful to see other people use "HH" for things completely and utterly unrelated to extremists.