1. If some change happens at one of your competitor's product (got acquired, changed a beloved feature, etc.), go through threads on public forums discussing it. You'll find a lot of current users (ie customers to poach) of that product. A recent example: Atlassian acquiring Trello.
1b. Also, look through the comment sections of news sites that reported this change. You'll also find tons of current users there.
2. Go on user submitted product review sites. Lots of them out there for different types of products: Chrome extension review page, Capterra, G2 Crowd, etc. Approach the users who left reviews expressing dissatisfaction with the competitor product. Or users who use a complimentary product.
3. If a competitor product hosts customer sites on their servers (Shopify, etc.), you can reverse lookup their IP to find all of them.
Then it's a matter of introducing your product with a semi-personalized cold email, reassuring the prospect you have the competitor product's most vital features and you also do XYZ better than them. Here's a template: http://www.artofemails.com/cold-emails#competitor
Though obviously be careful to filter those. There are some customers who are never happy with just one moon-ona-stick, who you would rather leave for your competitors to waste time on!
On the plus side it was a great leading indicator of a vulnerability in various bits of code because we would see searches that tried to match a particular package and version increase and then shortly thereafter a story would break about some data breach and personal data being stolen.
I was always a bit conflicted by it. I developed a number of tools which could identify this traffic and automatically ban it on our search engine which was the right thing to do, but you can probably sell that information to these organizations. So as a startup it was leaving money "on the table" as it were. I expect the way extract money out of that stream would be to have a site that accepted bitcoin and would return URLs of pages that matched a particular software package pattern.
Its also a great tool for sales people who are trying to sell wordpress themes for example (or identifying you is using your non-free theme without paying).
As my Master's thesis , I built a crawler that did similar fingerprinting (although less generic). It wasn't something breathtakingly novel, but all in all a somewhat successful project.
It detected > 100 CMS, additional features like ad networks, social embeds, CDN, industry detection, company size etc. In the end, you could run a search and get the result as an excel sheet (because apparently that's what people like.)
The whole thing took about 6 months and ended up with > 100 million domains on a single (mediocre) machine humming away at around 100 domains/s. The sales/marketing folks loved it.
Since I was just finishing university, my skills were still pretty raw, so I'd assume that an experienced engineer would be able to do this a lot faster.
From what I can tell, there was a lot of demand out there and sites like builtwith sold their somewhat limited reports (at least at the time) for a good amount of money.
Previous discussion: https://news.ycombinator.com/item?id=2022192
Scraping sites looking for complementary or competitive products' customers sounds like a novel way to do market research.
For software with client-side exposure that can be discovered during scraping, BuiltWith has pretty solid coverage (at least when I used it ~1 year ago).
I don't know if those services do it, but you can also find some really useful intelligence from DNS records. From email and calendar provider data to third party services like analytics trackers and landing page service providers (such as Unbounce). If you have an app that integrates or competes with those services, it can be really useful. If you use a DNS lookup service that provides historical record changes, you can even time your outreach to coincide with their annual renewal period when they're most likely to be entertaining the idea of a switch. Or in the case of an integration, wait until after the renewal period to start outreach since you know they're locked in for another year at least.
You can also use DNS records to link together entity ownership relationships. Say a company has competing product lines and doesn't overtly market them as owned by the same company. If they happen to use Salesforce Communities, the CNAME for the Salesforce community subdomain will be specific to each site but will have the same Salesforce account id in it, Now you know that they're operating under the same entity, which itself is useful intelligence, but you also can combine the technology usage you sniffed from both sites together.
Where automated form submission would come in really handy would be a competitive intelligence tool that scrapes an entire site (and subdomains), identifies what actions on which page trigger which tags, and stitched together entire marketing funnels. Being able to monitor a competitor's likely marketing funnels (and seeing which ones they keep over time and which change) would be incredibly valuable, and would necessitate knowing precisely which tags fired on every page, including post-submission pages.
You're right that universal tags and the event naming that you could parse from the JS could be very valuable, although it would be hard to normalize.
And you're totally right about stitching together marketing funnels for lead gen conversions, but I'm not sure how you would get that from an ecommerce setup short of making a purchase and refunding it (which might be impossible to do in many cases).
Part of me has wondered if there are any partnerships BuiltWith or others have with popular browser addons. I imagine if they snooped this somehow from users, it would be valuable to them (setting aside whether it is ok to get this level of data).
I played around with someone's open source version of BuiltWith (forgetting what it was called) a while back and it was pretty cool to see how it works. I'm not a developer (although learning to code), but I've done similar research manually as part of my job, so this is really interesting to me to see what else can be learned.
For example, if there's a publicly traded ad tech company and you know a substantial customer of theirs just removed their tag, or many customers did in a certain time frame, you could short their stock (or vice versa if you see huge growth).
Not a wealthy area at all, conversion rate well over 50%
I kinda get what you're going for here, but I think there's probably a better way to describe it, "best list of results for a query" sounds a lot like a standard search engine to me.
Here is an example search that can extract the IDs from Google Analytics code on websites into a downloadable list https://nerdydata.com/search?query=UA-%5Cd%2B-%5Cd%2B®ex=...
When you do get to their main domain, it has these weird links (sitemap a b c...) to another domain in the footer of an extraordinary number of other domains (SEO?)
Kind of like using the private keys found in GitHub
Glad to see all these other tools in the market now there was and is clearly a need for it.
I chose a more rudimentary route than this - which was to convert the huge lists of hostnames to their IP addresses.
Everything in our world then utilized shared hosting, so a relatively small list of IP addresses would map.
We then would proceed to scrape the sites for emails, phone numbers. Which was easy because our competitors had standard templates for their clients.
Looks like they added regular expression searches and a few new data sets. sweeeeeet.