Hacker News new | comments | show | ask | jobs | submit login

Interesting; I had never considered scraping sites looking for specific embeds as a way of sourcing potential leads.

Scraping sites looking for complementary or competitive products' customers sounds like a novel way to do market research.

BuiltWith[1] and Wappalyzer[2] offer this as a service.

For software with client-side exposure that can be discovered during scraping, BuiltWith has pretty solid coverage (at least when I used it ~1 year ago).

I don't know if those services do it, but you can also find some really useful intelligence from DNS records. From email and calendar provider data to third party services like analytics trackers and landing page service providers (such as Unbounce). If you have an app that integrates or competes with those services, it can be really useful. If you use a DNS lookup service that provides historical record changes, you can even time your outreach to coincide with their annual renewal period when they're most likely to be entertaining the idea of a switch. Or in the case of an integration, wait until after the renewal period to start outreach since you know they're locked in for another year at least.

You can also use DNS records to link together entity ownership relationships. Say a company has competing product lines and doesn't overtly market them as owned by the same company. If they happen to use Salesforce Communities, the CNAME for the Salesforce community subdomain will be specific to each site but will have the same Salesforce account id in it[3], Now you know that they're operating under the same entity, which itself is useful intelligence, but you also can combine the technology usage you sniffed from both sites together.

[1]https://builtwith.com/ [2]https://wappalyzer.com/ [3]https://help.salesforce.com/articleView?id=000205653&type=1

Are these sorts of scrapers able to grab any data from the post payment side of things in any way? I imagine there are a lot of interesting tags for ad tech, remarketing, etc. firing after checkout.

There's no reason these scrapers couldn't be coded to identify fields, insert plausible but synthetic data that'd validate, and submit forms. At least in the case of lead gen forms where payment details aren't required. It's a bit skeezy, but that's never really deterred the industry before.

Back when I used BuiltWith (1-2 years ago), they didn't appear to do that. But then, it's not really necessary since their use case is just binary identification of users. With the advent of universal tags that fire on every page (and you configure in the backend which page or funnel is considered a "conversion"), you can identify a lot of the ad tech in use without any form submission. Plus a lot of conversion and remarketing tags aren't hardcoded on the post-submission page, but wrapped in javascript functions. With minification and bundling, you can get a high success rate just parsing through the javascript files included on any page.

Where automated form submission would come in really handy would be a competitive intelligence tool that scrapes an entire site (and subdomains), identifies what actions on which page trigger which tags, and stitched together entire marketing funnels. Being able to monitor a competitor's likely marketing funnels (and seeing which ones they keep over time and which change) would be incredibly valuable, and would necessitate knowing precisely which tags fired on every page, including post-submission pages.

Interesting, thanks for sharing--these are great insights.

You're right that universal tags and the event naming that you could parse from the JS could be very valuable, although it would be hard to normalize.

And you're totally right about stitching together marketing funnels for lead gen conversions, but I'm not sure how you would get that from an ecommerce setup short of making a purchase and refunding it (which might be impossible to do in many cases).

Part of me has wondered if there are any partnerships BuiltWith or others have with popular browser addons. I imagine if they snooped this somehow from users, it would be valuable to them (setting aside whether it is ok to get this level of data).

I played around with someone's open source version of BuiltWith (forgetting what it was called) a while back and it was pretty cool to see how it works. I'm not a developer (although learning to code), but I've done similar research manually as part of my job, so this is really interesting to me to see what else can be learned.

For example, if there's a publicly traded ad tech company and you know a substantial customer of theirs just removed their tag, or many customers did in a certain time frame, you could short their stock (or vice versa if you see huge growth).

https://publicwww.com/ is better for searching the web for js or css.

I always wondered if someone could do this for households. Provide a way for people to find everyone with lawn needing mowing, or with a picket fence that would need regular painting, or with a tile roof, or a hedge to trim, or driveway needing repair.

When I was 13 I simply went around all of the houses in the area with overgrown gardens and popped a note through their door offering grass cutting.

Not a wealthy area at all, conversion rate well over 50%

Nice! "Do things that don't scale" applied by a teenager!

Datanyze (https://www.datanyze.com/) has built a nice business doing this. You can learn a lot from understanding what software a company is using on their website. It's especially useful for generating sales leads.

True, but datazyne comes out to be crazy expensive.

Some time ago Rob Walling, on Startups for the Rest of Us, mentioned that Datanyze is closer to real time than BuiltWith. So real time, in fact, that you can get a list of sites that have just (yesterday?) started a trial of your competitor's product, and approach them during the buying cycle.

Ghostery does this as a service: https://sitescan.ghostery.com/

Pretty much everyone in a relevant space is already doing it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact