Hacker Newsnew | past | comments | ask | show | jobs | submit | Xafnor's commentslogin

I may be biased, but recently online 'OSINT' or 'investigation-like' tool trend seems to have shift toward mass-scraping. I have released a tool recently that lets you search by keywords and operators (OR/NOT/AND/NEAR) through 8.3 billions comments, 2 billion videos (descriptions, transcripts etc) 1 billion users (descriptions, username etc)

In addition it also lets you define filter, for example filter by POI country, date, video category.

No data is inferred, the video location/transcript/category are the ones Tiktok returns. The bot is still scrapping data, this was only 1 month worth of scrapping.

I thought it might be interesting to talk about it, that big scrapping projects can be made by anyone. I always thought fascinating that a single person nowadays can simply decide to scrape everything and build huge libraries like this.

It's a paid tool, so this is a promotion I know it kills the mood for some, but it helps me make even more projects like this (Did one for youtube with 45B comments and 2B users), servers get expensive :)


I've archived public comments posted on Youtube, not all, but about 20 billions so far, from 1.4 billion different users(I archived user information too)

I've made a tool to search through this database, allowing to input a user, which outputs the comments they've wrote.

I've stopped the crawling due to the server cost though, so this database may not be expanded further. I've achieved this amount in 40 days~

This is a paid tool and while I do of course remunerate myself, I also use a good chunk of the money to make new tools and improve current ones, I've been doing that for the past 3 years now and the set of tools on my website grew that way.

If anyone has any doubt I can do test queries on users you'll give me :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: