
Why scraping and ecommerce are a perfect fit - sradu
http://blog.twotap.com/post/102415169870/why-scraping-and-ecommerce-are-a-perfect-fit
======
firloop
Interesting. So, it seems like you aren't respecting robots.txt. I picked Old
Navy, as it was on your supported stores page [0], and went to their
robots.txt [1]

    
    
        User-agent: *
        Disallow: /buy/
        Disallow: /checkout/
    

So, do you have permission to violate robots.txt, as I'm sure there is some
automated interaction with checkout/purchasing pages? Or I am I missing
something about how TwoTap works? Scraping is one thing, but accessing when
the management of the website prohibits it seems like a big no no.

[0] : [https://twotap.com/supported-stores/](https://twotap.com/supported-
stores/)

[1]: [http://oldnavy.gap.com/robots.txt](http://oldnavy.gap.com/robots.txt)

~~~
razvanr
Oldnavy and a few other retailers are not active yet. We've pre-built these
integrations despite not having requests to sell their inventory just yet.

I'd mention more on the BD side but can't at this point for competitive
reasons. The fact that we currently support sending orders through to 450
retailers does not mean we have deals in place with all of them, but that the
infrastructure is built to allow this to happen -- if affiliates or publishers
get an approval from retailers or the affiliate networks that govern this.
Perhaps we should make this clearer on the supported stores page.

All in due time. The industry as a whole is being pushed to decide which
models they will embrace -- and as always some will be slower to adapt than
others. The pressure comes from lost revenue on mobile which makes retailers a
LOT more flexible now compared to even 6 months ago when talking about this.

Considering multiple format screens and devices fragmenting retailers
distribution channels over the next years this is set to become an even bigger
chapter down the line.

------
tommccabe
Looks like I'm at one of the retailers you crawl. Recently, our site was
getting hit with a web crawler that was following links incorrectly. I black
listed several IP addresses from accessing the site and now I wonder if it was
this!

Does your crawler obey robots.txt rules?

~~~
sradu
Tom, that most likely wasn't us.

We don't spider retailer websites. That means we don't follow links or go
hardcore on building a database of products.

We hit your website:

* if someone has asked us information about a product url

* when we place an order

* weekly for regression tests

Ping us on contact@ and we're more than happy to jump on a call and describe
exactly what we're doing. Most of the time we're completely un-noticeable
except for the fact that you're getting more orders.

We know for sure nobody is spidering through us.

~~~
tommccabe
thanks for the confirmation. how do you get the product URL in the first
place?

~~~
razvanr
That's the app developer's responsibility.

~~~
Jake232
Is there any caching that goes on here? I would presume not as the vendors
prices could change prices / details at any point.

------
josephjrobison
I'm confused about the legality of scraping. Is it completely open, or are
there some restrictions on scraping any site without explicit permission?

~~~
malandrew
AFAIK, the main legal issue is a trespass to chattels tort. The data collected
is generally uncopyrightable if not reproduced in it's entirety without
modifications. The relevant case is Feist v. Rural [0].

IANAL, but I think the best bet for staying technically legal is to use
jurisdictional arbitrage and tit-for-tat to liberate the data. If someone
scrapes a US server and are in the US and they generate enough load to deprive
the owner of use, then they are technically liable for damages under trespass
to chattels. If they instead trade scraping labor with people in other
jurisdictions, then that other entity would be liable. There might be some
other legal defense/attack that might be usable by the entity who has the data
being liberated, but I reckon it would be tenuous at best.

[0]
[http://en.wikipedia.org/wiki/Feist_v._Rural](http://en.wikipedia.org/wiki/Feist_v._Rural)

~~~
incunix
Doubt they would prosecute if they are making more money through the
scrapping.

------
monksy
I don't understand why you're pro-scrapping. ( I did write a blog post on
this, and I believe that I posted it to HN before:
[http://theexceptioncatcher.com/blog/2012/07/how-to-get-
rid-o...](http://theexceptioncatcher.com/blog/2012/07/how-to-get-rid-of-
screen-scrapers-from-your-website/) )

But, wouldn't it be more beneficial to get websites to open up an API to you,
communicate to them to do so, or even offer consulting services to build an
API?

I know that there are a few cart/store offerings out there. it seems to me
that they would have an API.

Magneto:
[http://www.magentocommerce.com/api/soap/checkout/checkout.ht...](http://www.magentocommerce.com/api/soap/checkout/checkout.html)

OpenCart Propretary API: [http://opencart-api.com/](http://opencart-api.com/)

Prestashop API:
[http://doc.prestashop.com/display/PS14/Using+the+REST+webser...](http://doc.prestashop.com/display/PS14/Using+the+REST+webservice)

~~~
razvanr
Good question. That's because this method doesn't scale and fails as a
solution to the industry's challenges.

There's companies that are trying to get retailers to implement APIs but this
leads to a fragmented ecosystem. Year's past payment processors that sold
"pay/checkout with ..." buttons and wallets have failed to achieve significant
merchant adoption despite being fuelled with marketing spend in the billions.

The solution everyone embraces seems to rely in building an independent and
neutral piece of infrastructure (an API) that any publisher can integrate and
that plugs into every checkout out there. It's the missing pipes in ecommerce,
anyone can use it and nothing really changes (we don't process payments, it's
all automated etc) -- and conversions go UP.

I'm repeating some ideas in the post but on the publisher side it's worth
noting NONE would entertain the idea of integrating multiple APIs -- one for
each merchant. Did I also bring up the required combined efforts of all
merchants to keep those APIs up & running? :)

So pro-scraping because it's the only way to build adoption in ecommerce.

------
grandalf
The hard part is not scraping, it's returns. For many kinds of online
products, the return rate is over 40%. The shopper must be completely aware of
how to contact the merchant of record and how to return the product.

Also, if you are scraping a large retailer you are effectively required to be
PCI DSS level 1 compliant, which takes a bit of extra effort.

~~~
opendais
Are you sure that is true for all retailers of those products?

I deal with a high return rate industry [specialty products many customers
can't size correctly] and I only see return rates of 3-7% depending on the
product. 40% seems very high.

~~~
rgbrenner
Zappos has a 35% return rate.

[http://www.internetretailer.com/2010/03/31/get-
back](http://www.internetretailer.com/2010/03/31/get-back)

~~~
razvanr
This is strongly correlated with brand values they push in certain marketing
campaigns and both returns as well as excellent service are promoted. Flipping
it you could say they attract people that do returns more than avg.

------
lloyddobbler
I've worked with two shopping search engines, and interestingly, scraping
sites was one of the things they did to build up their inventory as well. The
big difference being, they simply organized the products into a searchable
format, then sent traffic to the ecommerce site and let them handle the
checkout . What you're doing is arguably more complex.

(They also prioritized the feeds that were sent to them directly by retailers
above the scraped items feeds - thus prioritizing paid listings, similar to
the Google SERPs - so a different business model entirely.)

That being said, a very cool concept - and agreed that, given the relatively-
small number of ecommerce platforms out there, scraping then erving them up
seems pretty scalable. Interested to see how it goes.

~~~
razvanr
At least one of them might have been using our technology in the backend,
especially if they're one of the top 5 shopping search engines.

The downside to feeds is that they become obsolete very quickly, especially if
the product is popular. Products sell out very quickly, retailers lose money
on traffic they can't onboard and shoppers get frustrated.

Thanks for your thoughts!

~~~
lazyjones
> _At least one of them might have been using our technology in the backend,
> especially if they 're one of the top 5 shopping search engines._

Which of these "top 5 shopping search engines" have you worked with? You don't
seem to mention any on your website.

> _The downside to feeds is that they become obsolete very quickly, especially
> if the product is popular. Products sell out very quickly, retailers lose
> money on traffic they can 't onboard and shoppers get frustrated._

Feeds are the only way to keep up with frequently changing listings from large
retailers (apart from doing live API requests) since scraping is several
orders of magnitude slower. Amazon gives selected partners incremental feeds,
scraping their millions of products takes days.

------
coupdejarnac
I built a CJ scraper for a deals website that is now defunct. What a pain it
was to maintain. All the different retailers dump their data into CJ in
different ways. I might just put it on github if anyone's interested. Python +
chromedriver + beautifulsoup + mechanize

~~~
Jake232
I'd be interested in seeing this.

------
blaze33
I tried the demo with a Lego castle priced 99€ and got a grand total of more
than $10k...

FYI, Lego showed me the French version of their website as it's where I live.
You seems to only offer shipping in the US though that's not clear reading
your website. Still very interesting.

Product URL: [http://shop.lego.com/fr-FR/Le-ch%C3%A2teau-
fort-70404?fromLi...](http://shop.lego.com/fr-FR/Le-ch%C3%A2teau-
fort-70404?fromListing=listing)

Screenshot: [http://imgur.com/mlr8Q2e](http://imgur.com/mlr8Q2e)

~~~
razvanr
Nice :) We currently focus on the US market with both retailer as well as
publisher integrations, we should make that clearer perhaps.

Stay tuned though, we'll have news on this.

------
dchuk
Can anyone go into a bit more detail about how the affiliate commissions work
here? From what I have read, I would feed my affiliate link through TwoTap and
you would then handle the cookie and conversion and everything?

If I was using URLs gathered from a Commission Junction datafeed, is this
basically a plug and play solution? Or do I need to process those URLs?

Do you have a backend stats dashboard? Or would I still rely on CJ for that
data?

~~~
sradu
We simulate what a shopper would do. We first go through your affiliate link
(which drops a cookie) and then go on the retailer website to place the order.

All the commissioning, connecting/talking to retailers, receiving the money,
is directly between you and the affiliate network. We're plug and play :)

We do have a stats backend where you can see all the purchases that went
through Two Tap. And you can also use CJs dashboard just like you are probably
doing right now.

~~~
dchuk
Thanks for the reply. Are the retailers cool with all of this?

~~~
razvanr
Yes. It used to be more controversial 2 years ago.

They're only reticent to not getting the consumer data, breaking the
relationship with shoppers or not processing the payments. And control over
who sells their inventory obviously -- which is already in place through their
relationships with affiliate networks.

Tick those boxes and they're cool with it and supportive. That being said
we're still expanding tech support for retailers faster than BD can can keep
up -- 75 new retailers monthly at this point. That's why we're pushing all our
affiliates to get an approval before using Two Tap in order to keep getting
affiliate revenues.

Experiments from Twitter and Facebook also do a good job at educating the
market which works in our favor -- at least merchants learn what they don't
want :)

------
quaffapint
So you guys are scraping all the product information for a retailer and
keeping it up to date? Or is it all live, you fetch it when that particular
url is called? Where do you get the list of retailers to scrape?

~~~
razvanr
Not really, we're not building a product catalog.

We're fetching the live data only for the products requested via an URL. Two
Tap mimics a consumer visiting the retailer and getting that info for
themselves which also allows retailers to retain their analytics layer with no
negative impact.

Our current supported stores span the top 500 as well as a number of specific
integration requests.

------
Animats
This is sort of what Google Shopping was before it went all-ads.

~~~
opendais
That isn't quite right. Many vendors provided them with XML feeds of their
products just as they do with Amazon.

~~~
razvanr
Correct. We don't get any input from the retailers. Two Tap can get product
availability info and place an order just by having the product URL, nothing
else.

Also, the full retailer inventory is available, unlike FB or other models that
require the shop to upload a certain number of products.

------
dmritard96
How many proxy nodes do you have?

~~~
sradu
Paraphrasing newrelic "it takes a village to count our proxy nodes".

We have our placing orders infrastructure on AWS, and whole in-house cloud
dedicated to product crawling built on top of Digital Ocean.

~~~
dmritard96
hm, I didn't mean infrastructure, I mean, did you buy proxy nodes from someone
like sslprivateproxy and slap HAProxy in front... Most ecommerce sites wise up
to bots crawling them and have a robots.txt that suggests you might not want
to...

------
michaelmcmillan
This seems hard, but I think that's your big advantage (business-wise).

------
notastartup
I don't get it. Is this just a middle man between all the retail websites and
the publishers? Sort of like what Google is doing with the product search and
also giving comissions on the items sold?

~~~
razvanr
Yes, you could say that. We're laying down pipes in ecommerce so you can send
an order to a merchant from anywhere on the web in a standardised API.

Retailers can extend their reach and make their inventory shoppable from
anywhere with an internet connection and publishers can build ecommerce in
their apps.

~~~
notastartup
Interesting but why the need for web scraping? If the retailers saw the
monetary benefit of this wouldn't they go out of their way to provide you the
data directly and an API as well?

~~~
lurcio
Tap certainly would solve the feed problem - but thats a problem for
affiliates, not retailers :-)

Its likely that I don't have much clue what I'm talking about, but I can't see
_great_ benefits for many retailers from this as it stands. They like more
control over leads and to onboard customers into their brand. Typically, this
is managed through affiliate programs and partnerships.

Two Tap looks like it allows prospects/customers to be kept under the
publishers wing. The retailer gets the transaction, but the publisher keeps
the relationship - where the real value (capital) is created and can be
realised.

As someone with an interest in a small time publisher - I'm very interested in
this.

However, I have a concern with how well it sits with affiliate t&c's. These
vary by program of course - but those I know prohibit scraping by any means.
That's stopped me doing similar on 80legs or other in the past.

If above is right, Two Tap may well have to develop the kind of relationships
with retailers that better serves their needs at the mouth of the funnel. Then
Tap risks becoming like anyone other program (CJ/AW..) - & the scraping will
likely have to stop. I'm sure they know the space better than anyone here
though - and certainly better than me - so I'd be interested to hear if they
have these relationships in place with the scraping MO, or are confident of a
way forward.

Given that - the CTA only leaves me wondering whats on the other side of the
wall: "Sounds interesting? Let’s talk! - Sign up below"

Do you have traffic/other requirements. Whats the pricing? It'd be nice if you
could inform me more/make me work less if I want to use your service :-)

~~~
razvanr
You're right on a lot of accounts. Let's talk, can you get in touch at hello@
please? Happy to talk about all of this.

The main difference now is that both retailers as well as affiliate networks
are actively trying to push this model once they've seen they can make 5x more
money from the same amount of impressions.

Also, the publisher gets to keep PART the relationship -- the one dealing with
the users interaction with their product. The relationship with the retailer
that sells the product is also kept intact -- consumer loses just one
touchpoint (a visit to the product page) but gets same confirmation email,
returns, customer support etc. The retailer has same load on servers and gets
the exact same user data they would normally (shipping, email, address,
billing, payment etc).

