Ask HN: Best tool for technical web crawler data (entire internet)
13 points by ghawkescs 11 months ago
I am looking for tool or search engine for technical web crawler results. I want to know what sites are using certain products and/or plugins. I have seen nerdydata.com but I am looking for other options.

We are using publicwww for ad scraping (finding which ad network a website uses). It works well!

Our product is very simple:

a) Scrap ads b) Display ads c) Sell subscription to customers.

Advertisers pay to spy on competitors' ad, surprisingly it's quite easy to build if you know how to scrap and implement a search on top of it.

Scraping costs $10 per customer and we sell it to customers for $159 per month.

Besides nerdydata there’s also the similar publicwww.com for custom searches.

Then there’s the usual suspects with predefined technology definitions (builtwith, similartech, datanyze).

Last but not least I’m also building such a product and will be launching next month.

I would love to discuss your needs and see if we can accommodate them - you can reach out to me at mg@locatetech.io

I have tried publicwww but over 80% of the results were either duplicates domains, dead links, or didn't contain my search term.

I'm glad I only wasted $49 and didn't opt for the annual license.

https://builtwith.com/ is in the same space.

Getting security warnings for the site on all browsers, guessing something is wrong right now?

they don't advertise it well, but nerdydata's custom reports had some excellent results... way more than their regular search website shows.

They were also able to customize our search and extract specific data from the page. pretty advanced stuff.

That sounds promising, can they run a custom report across their entire index? I'm trying to get a count of competing products to gauge market size and interest.

They gave us the option of either running a report on their entire index, OR if we had a list of domains to search explicitly (which we didn't have).

Our use case was different, though, we wanted to find websites using a javascript library, and then run a javascript function on all of those sites to extract account information and specific html patterns.

But yes, their larger custom reports index did the trick, would recommend.

Great to know, thank you. Was the pricing reasonable?

