Hacker News new | comments | show | ask | jobs | submit login

I'm always amazed about how there isn't more competition in this space.

As my Master's thesis [0], I built a crawler that did similar fingerprinting (although less generic). It wasn't something breathtakingly novel, but all in all a somewhat successful project.

It detected > 100 CMS, additional features like ad networks, social embeds, CDN, industry detection, company size etc. In the end, you could run a search and get the result as an excel sheet (because apparently that's what people like.)

The whole thing took about 6 months and ended up with > 100 million domains on a single (mediocre) machine humming away at around 100 domains/s. The sales/marketing folks loved it.

Since I was just finishing university, my skills were still pretty raw, so I'd assume that an experienced engineer would be able to do this a lot faster. From what I can tell, there was a lot of demand out there and sites like builtwith sold their somewhat limited reports (at least at the time) for a good amount of money.

[0] http://blog.marc-seeger.de/2010/12/09/my-thesis-building-blo... Previous discussion: https://news.ycombinator.com/item?id=2022192






That was 2010. In 2017 this space is flooded. We all know how to write web crawlers now and this data is sold by hundreds of companies.

Just in France we have 3 or 4 main actors in this space. Can't even imagine how many US-based companies are doing this.

Ha, maybe the market has indeed adjusted without me hearing about it :)

There's plenty of competition in this space—there's just no reason for them to advertise it.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: