Hacker News new | comments | show | ask | jobs | submit login
PDFtribute website with links scraped from Twitter (pdftribute.net)
85 points by c16 on Jan 13, 2013 | hide | past | web | favorite | 8 comments

Tweets are so short to begin with, why on earth are you truncating them? It's unreadable nonsense.

Nice job here. Any way to direct link the pdfs or get people to submit their URL to Google for indexing?


Thanks. The twitter API gives t.co links, so next step will be to crawl the sites/documents and get a little more data from them as well as purge the blogs which got past the filter. From there we can use 'do follow' links to let search engines find the documents.

Not very helpful when there is very little indication of what each link is or if it is a duplicate and the tweets are truncated. How is this more useful than searching twitter?

We've been hacking away for past few hours. I've not yet had the time to look into the links etc... so data is still scarce.

As for 'How is this more useful than searching twitter?', I think it's nice to have everything in one place and listed. We're collecting data which we can manipulate it at a later date when we have more time. As I see it, gathering the data, then analysing it is far better than coding everything, having lost lots of data and only being able to analyse a small portion of it.

Nice idea. Question - Is there a way to "Storify" these archives? If not, you could make a UI which helps users to search through the data. Another idea would be to collect information on the PDFs linked.

We're working on the search implementation. It will be available soon.

Perhaps filtering on both #pdftribute and the word paper / papers gives better results.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact