
How We Found All of Optimizely's Clients - rexbee
http://nerdydatablog.com/2016/12/04/how-we-found-all-of-optimizleys-clients/
======
Sapph
Always liked this stealthy prospect finding method. Here's another few (best
for B2B):

1\. If some change happens at one of your competitor's product (got acquired,
changed a beloved feature, etc.), go through threads on public forums
discussing it. You'll find a lot of current users (ie customers to poach) of
that product. A recent example: Atlassian acquiring Trello.

1b. Also, look through the comment sections of news sites that reported this
change. You'll also find tons of current users there.

2\. Go on user submitted product review sites. Lots of them out there for
different types of products: Chrome extension review page, Capterra, G2 Crowd,
etc. Approach the users who left reviews expressing dissatisfaction with the
competitor product. Or users who use a complimentary product.

3\. If a competitor product hosts customer sites on their servers (Shopify,
etc.), you can reverse lookup their IP to find all of them.

Then it's a matter of introducing your product with a semi-personalized cold
email, reassuring the prospect you have the competitor product's most vital
features and you also do XYZ better than them. Here's a template:
[http://www.artofemails.com/cold-
emails#competitor](http://www.artofemails.com/cold-emails#competitor)

~~~
martin-adams
Also, if you want to find leads on Twitter, find who follows the your
competitors support Twitter accounts.

------
ChuckMcM
Interesting. The biggest source of annoying search traffic at Blekko was the
people who were trying to find exploitable shopping carts, wordpress blogs,
and forums. It was annoying because one could be 100% certain these folks are
not clicking on ads ever and so it cost to serve and generated no revenue.

On the plus side it was a great leading indicator of a vulnerability in
various bits of code because we would see searches that tried to match a
particular package and version increase and then shortly thereafter a story
would break about some data breach and personal data being stolen.

I was always a bit conflicted by it. I developed a number of tools which could
identify this traffic and automatically ban it on our search engine which was
the right thing to do, but you can probably sell that information to these
organizations. So as a startup it was leaving money "on the table" as it were.
I expect the way extract money out of that stream would be to have a site that
accepted bitcoin and would return URLs of pages that matched a particular
software package pattern.

Its also a great tool for sales people who are trying to sell wordpress themes
for example (or identifying you is using your non-free theme without paying).

~~~
rexbee
Finding unlicensed Wordpress themes is a great usecase! You can identify them
as the theme name is in the stylesheet URL by default.

[https://nerdydata.com/search?regex=true&query=wp-
content%2Ft...](https://nerdydata.com/search?regex=true&query=wp-
content%2Fthemes%2F%5Cw%2B%2Fstyle.css)

------
rb2k_
I'm always amazed about how there isn't more competition in this space.

As my Master's thesis [0], I built a crawler that did similar fingerprinting
(although less generic). It wasn't something breathtakingly novel, but all in
all a somewhat successful project.

It detected > 100 CMS, additional features like ad networks, social embeds,
CDN, industry detection, company size etc. In the end, you could run a search
and get the result as an excel sheet (because apparently that's what people
like.)

The whole thing took about 6 months and ended up with > 100 million domains on
a single (mediocre) machine humming away at around 100 domains/s. The
sales/marketing folks loved it.

Since I was just finishing university, my skills were still pretty raw, so I'd
assume that an experienced engineer would be able to do this a lot faster.
From what I can tell, there was a lot of demand out there and sites like
builtwith sold their somewhat limited reports (at least at the time) for a
good amount of money.

[0] [http://blog.marc-seeger.de/2010/12/09/my-thesis-building-
blo...](http://blog.marc-seeger.de/2010/12/09/my-thesis-building-blocks-of-a-
scalable-webcrawler/) Previous discussion:
[https://news.ycombinator.com/item?id=2022192](https://news.ycombinator.com/item?id=2022192)

~~~
3pt14159
That was 2010. In 2017 this space is flooded. We all know how to write web
crawlers now and this data is sold by hundreds of companies.

------
cblock811
This is just recycled content from them....

[https://news.ycombinator.com/item?id=6363979](https://news.ycombinator.com/item?id=6363979)

------
yeldarb
Interesting; I had never considered scraping sites looking for specific embeds
as a way of sourcing potential leads.

Scraping sites looking for complementary or competitive products' customers
sounds like a novel way to do market research.

~~~
cosmie
BuiltWith[1] and Wappalyzer[2] offer this as a service.

For software with client-side exposure that can be discovered during scraping,
BuiltWith has pretty solid coverage (at least when I used it ~1 year ago).

I don't know if those services do it, but you can also find some really useful
intelligence from DNS records. From email and calendar provider data to third
party services like analytics trackers and landing page service providers
(such as Unbounce). If you have an app that integrates or competes with those
services, it can be really useful. If you use a DNS lookup service that
provides historical record changes, you can even time your outreach to
coincide with their annual renewal period when they're most likely to be
entertaining the idea of a switch. Or in the case of an integration, wait
until _after_ the renewal period to start outreach since you know they're
locked in for another year at least.

You can also use DNS records to link together entity ownership relationships.
Say a company has competing product lines and doesn't overtly market them as
owned by the same company. If they happen to use Salesforce Communities, the
CNAME for the Salesforce community subdomain will be specific to each site but
will have the same Salesforce account id in it[3], Now you know that they're
operating under the same entity, which itself is useful intelligence, but you
also can combine the technology usage you sniffed from both sites together.

[1][https://builtwith.com/](https://builtwith.com/)
[2][https://wappalyzer.com/](https://wappalyzer.com/)
[3][https://help.salesforce.com/articleView?id=000205653&type=1](https://help.salesforce.com/articleView?id=000205653&type=1)

~~~
shostack
Are these sorts of scrapers able to grab any data from the post payment side
of things in any way? I imagine there are a lot of interesting tags for ad
tech, remarketing, etc. firing after checkout.

~~~
cosmie
There's no reason these scrapers couldn't be coded to identify fields, insert
plausible but synthetic data that'd validate, and submit forms. At least in
the case of lead gen forms where payment details aren't required. It's a bit
skeezy, but that's never really deterred the industry before.

Back when I used BuiltWith (1-2 years ago), they didn't appear to do that. But
then, it's not really necessary since their use case is just binary
identification of users. With the advent of universal tags that fire on every
page (and you configure in the backend which page or funnel is considered a
"conversion"), you can identify a lot of the ad tech in use without any form
submission. Plus a lot of conversion and remarketing tags aren't hardcoded on
the post-submission page, but wrapped in javascript functions. With
minification and bundling, you can get a high success rate just parsing
through the javascript files included on any page.

Where automated form submission would come in really handy would be a
competitive intelligence tool that scrapes an entire site (and subdomains),
identifies what actions on which page trigger which tags, and stitched
together entire marketing funnels. Being able to monitor a competitor's likely
marketing funnels (and seeing which ones they keep over time and which change)
would be incredibly valuable, and would necessitate knowing precisely which
tags fired on every page, including post-submission pages.

~~~
shostack
Interesting, thanks for sharing--these are great insights.

You're right that universal tags and the event naming that you could parse
from the JS could be very valuable, although it would be hard to normalize.

And you're totally right about stitching together marketing funnels for lead
gen conversions, but I'm not sure how you would get that from an ecommerce
setup short of making a purchase and refunding it (which might be impossible
to do in many cases).

Part of me has wondered if there are any partnerships BuiltWith or others have
with popular browser addons. I imagine if they snooped this somehow from
users, it would be valuable to them (setting aside whether it is ok to get
this level of data).

I played around with someone's open source version of BuiltWith (forgetting
what it was called) a while back and it was pretty cool to see how it works.
I'm not a developer (although learning to code), but I've done similar
research manually as part of my job, so this is really interesting to me to
see what else can be learned.

For example, if there's a publicly traded ad tech company and you know a
substantial customer of theirs just removed their tag, or many customers did
in a certain time frame, you could short their stock (or vice versa if you see
huge growth).

------
dantiberian
> Our search engine is different from search engines you’ve used before.
> Traditional search engines are geared towards providing answers, whereas our
> goal is to give you the best list of results for a query.

I kinda get what you're going for here, but I think there's probably a better
way to describe it, "best list of results for a query" sounds a lot like a
standard search engine to me.

------
aresant
There is a current service - builtwith.com - that does a nice job of
productizing this type of research - eg:

[https://trends.builtwith.com/analytics/Optimizely](https://trends.builtwith.com/analytics/Optimizely)

~~~
rexbee
It looks like Builtwith only has a few predefined technologies. NerdyData can
search for any string in HTML or JS files on millions of websites and do
regular expression searches as well.

Here is an example search that can extract the IDs from Google Analytics code
on websites into a downloadable list
[https://nerdydata.com/search?query=UA-%5Cd%2B-%5Cd%2B&regex=...](https://nerdydata.com/search?query=UA-%5Cd%2B-%5Cd%2B&regex=true)

------
seriocomic
Slightly OT - Anyone else find it hard to (other than the single link to the
search) navigate to their main site from the blog? Strange to have its own
domain without core/nav links back to the main site.

When you do get to their main domain, it has these weird links (sitemap a b
c...) to another domain in the footer of an extraordinary number of other
domains (SEO?)

------
dorianm
I thought they were gonna use those client ids and get more info about them
using Optimizely API or some hack.

Kind of like using the private keys found in GitHub

------
krmmalik
I want to know how to do this. There are a number of apps companies use that
fit into my demographic. I want to know how to find them. Do you provide this
as a service? I couldnt find a link to a homepage or anything.

~~~
dandelany
See [https://nerdydata.com/search](https://nerdydata.com/search)

------
timsayshey
Cool but they need better coverage of the internet. Looks like they are only
scraping some sites. I couldn't find any sites in my niche. Top alexa sites
only? ¯\\_(ツ)_/¯

~~~
rexbee
We crawl 200+ million websites :) Click the "Deep Web" button under the search
box on the homepage.

~~~
timsayshey
Thanks for the tip but I tried the "Deep Web" option and still can't find any
of the sites. They are small sites that get about 100-500 visitors per month.
Also, there are a lot more than 200+ million websites on the web.

~~~
true_religion
What is your niche?

------
garazy
On a interesting side note Optimizely was a very early customer of ours
(BuiltWith) starting in 2010 which helped them find customers for their own
tool based on sites using their competitors (which at the time wasn't very
many), I don't think it will bother them that other businesses can do the same
thing.

Glad to see all these other tools in the market now there was and is clearly a
need for it.

------
RabbitmqGuy
I wonder if you could use nmap or something like
masscan([https://github.com/robertdavidgraham/masscan](https://github.com/robertdavidgraham/masscan))
to figure the IP addresses of people using a certain software(say mongodb on
port 27017). And then reverse looking up those IPs to figure out which
companies they belong to and then you contact the said companies to sell
something.

------
ErikAugust
Cool stuff. Back in the day, I worked for a website hosting company and was
tasked with finding several of our major competitor's customers.

I chose a more rudimentary route than this - which was to convert the huge
lists of hostnames to their IP addresses.

Everything in our world then utilized shared hosting, so a relatively small
list of IP addresses would map.

We then would proceed to scrape the sites for emails, phone numbers. Which was
easy because our competitors had standard templates for their clients.

------
inian
This can be achieved just by using the publicly available HTTP Archive data
set and Google's Big Query to search it.

------
jasim
How do you do market research for SaaS that don't leave an artifact in the
client website? Consider project management - what ways can we find people who
use Trello/Basecamp?

------
carlmungz
Nice work. Can imagine tools like this would be hella useful for identifying
potential new employers or users of an open-source library you're developing.

------
rch
Handy for finding employees too:

[https://nerdydata.com/search?query=humans.txt](https://nerdydata.com/search?query=humans.txt)

------
shefaliprateek
love nerdydata, always wanted to build something similar myself., but would
any sane business person enter the space now?

------
optgotgot
Love Nerdydata.com! I used their service a while back to find optimizely
clients for my sales team to go after.

Looks like they added regular expression searches and a few new data sets.
sweeeeeet.

