
Show HN: Spider Pro – easy and cheap way to scrape the internet - hyperyolo
https://tryspider.com/
======
seeekr
While looking at the tool I've realized that I've built something similar many
years ago. I wonder if it's worth digging up the source code, polishing and
publishing it. Does the market need more of these tools? Are there features in
this type of tool on the market that seem to be completely missing or
inadequately implemented?

And it's likely that running these as a server-side application is ultimately
more appealing than this simpler (to implement) automation inside the user's
browser, right? Seems that many companies providing similar (server-side)
scraping tools have been successfully sold off... Is that an indication of
(still existing) high demand for these?

EDIT to add: The tool and the accompanying website look fantastic, by the way.
Congratulations on the 1.0 launch, Amie!

~~~
meritt
Does the market need yet another browser extension scraper like this? No, I
don't think it does.

[1]
[https://chrome.google.com/webstore/search/scraper?_category=...](https://chrome.google.com/webstore/search/scraper?_category=extensions)

[2] [https://addons.mozilla.org/en-
US/firefox/search/?platform=wi...](https://addons.mozilla.org/en-
US/firefox/search/?platform=windows&q=scraper&type=extension)

~~~
Mathnerd314
25 is not very many on AMO, there are only a few that seem relevant and most
are unmaintained. The only one that really seems usable is the webscraper.io
one. Maybe ScrapeMate but the author has left and is working on the
Python/cloud ScrapingHub. Similarly Chrome has Web Scraper Plus w/ poor
maintenance and the rest are wrappers/helpers for various websites.

The extensions market isn't particularly lucrative, they're like mobile apps
but with only 33% of users even knowing they exist. The SaaS companies don't
have a growth limitation hence their success. But if you want an extension for
a resume item or something getting a few thousand AMO users or Chrome reviews
shouldn't be hard.

------
uptown
Interesting that their demo screen cast shows Zillow. Zillow is fairly
aggressive in applying anti-scraping defensive measures.

~~~
mahesh_rm
.. which may be the reason why their demo screen cast shows Zillow.

~~~
uptown
Yeah, the problem is that Zillow imposes IP bans on you when you've been found
to be scraping their site.

~~~
invisiblerobot
Which is why any serious effort involves rotating pools of proxies.

~~~
walrus01
Not just rotating pools of proxies but sometimes shady gray market residential
proxies, so that you can appear to be coming from hundreds or thousands of
unique geographically distributed end-user DOCSIS3/ADSL2+/VDSL2/GPON/whatever
last mile end user customer netblocks.

If you want to go down a rabbit hole of shady proxies run on
compromised/trojaned end user SOHO routers or PCs, google "residential proxies
for sale"

[https://www.google.com/search?client=ubuntu&channel=fs&q=res...](https://www.google.com/search?client=ubuntu&channel=fs&q=residential+proxies+for+sale&ie=utf-8&oe=utf-8)

~~~
core-questions
Once worked for a place using this to scrape search engines.

It's amazing how easy and comparatively cheap it is to get access to thousands
of residential IPs. Is it via spyware running on people's machines? Shady
people working at ISPs doing nefarious things for cash? We never knew....

The key thing to know is that if you want your traffic to come from an IP "in"
some other country (according to geolocation databases anyway) it's really
only a few bucks a month to get a proxy. Most of them have poor IP reputation
so they suck to use on Google, but work very well for everything else out
there...

~~~
luckylion
> Is it via spyware running on people's machines? Shady people working at ISPs
> doing nefarious things for cash?

Might be as simple as [https://hola.org/](https://hola.org/) &
[https://luminati.io/](https://luminati.io/) \- "unblock a website, download
our VPN client", meaning you "unblock" by using somebody else's line. And the
also sell access at luminati. Most users aren't aware of the implications.

~~~
walrus01
It's a combination of three general things:

a) The type of "services" luckylion mentions where people have opted in to a
shady gray market thing reselling proxies through their connection.

b) compromised home routers/gateway devices/internet of shit devices

c) compromised home PCs (mostly windows 7/10 trojans/botnets)

------
johnwheeler
As an aside, it looks like this was created by one person, which shows an
amazing level of talents in design, UX, programming, marketing, and presumably
devops. Kind of scary.

~~~
shujito
is that a bad or a good thing?

~~~
meddlepal
Well it makes them a damn unicorn so... good thing?

~~~
bob1029
Depends on the unicorn. Some can spin up the full stack, put it under source
control, wrap a CI/CD pipeline around it, and have it deployed to a TLS-
encrypted public website in under an hour.

I think you can actually find the above level of developer - what many would
classify a 'unicorn' \- in approximately 1-2% of the workforce.

That doesn't mean they understand your business or products, know how to work
well with other individuals, etc. There are a lot of other factors beyond just
the productivity angle.

Now, the above developer _also_ with the ability to fundamentally understand
the business, as well as interact with all of the key people while performing
said productivity stunts _on a consistent basis_... I think you are getting
into the .1-.01% range.

~~~
rajacombinator
Wut. If you think someone can build the OP product in less than an hour, this
is a new level of HN delusion ...

~~~
mariomariomario
I imagine that "full stack" in this context refers to the equivalent of a
Hello World project wrapped around a CI/CD pipeline.

~~~
bob1029
Correct. Not an actual complete project, but the boilerplate required to begin
implementing & delivering features.

~~~
rajacombinator
Ah, yes, good and experienced engineers can do that.

------
bvm
Ah it reminds me of Kimono Labs. I miss that product, it was fantastic.

~~~
soared
I was trying to remember the name! I set up an incredible amount of automation
with Kimono Labs.. that was one of the best products I've ever used. I
remember when they shut down, but only realized now they were acquired by...
Palantir?

~~~
MattyMc
Me too. Absolutely one of the best products I’ve ever used. It’s also the
product that taught me that I can’t trust a SaaS business with important work.
Wish they still existed.

------
ikeboy
webscraper.io is free and has more functionality. I've used it for a quick and
easy way to scrape data off multiple pages.

Probably has a slightly higher learning curve but once you get past that it's
easy.

------
seanwilson
Any comments on using Gumroad as your payment processor? I went with Paddle
for a Chrome extension - it seems more flexible (e.g. you could do team
subscription plans with custom pricing) but I think Gumroad is a lot easier to
integrate (e.g. they have a simple license check API).

------
xurias
Unfortunate that the contact us page doesn't work. I'm interested, but I
wanted to ask if a FF extension is planned, since that's my primary browser.

~~~
hyperyolo
yup, firefox is supported! Though not distributed through the webstore - I'm
planning to move it to the store sometimes soon.

~~~
xurias
Awesome, thanks.

------
joshmn
This is scary good and easy to use. Clever idea with the Chrome extension too.

~~~
mmcwilliams
Clever, but how long until sites learn to detect the extension and block the
user?

~~~
penagwin
Detection is always a cat and mouse game. Using an extension in a real browser
(instead of electron/headless chrome) is probably one of the hardest to detect
because it requires running a "real" browser.

Of course somebody will find a way to detect it, then the extension maker will
patch it, and the cycle will continue.

~~~
mmcwilliams
Correct me if I’m wrong, but is it not still as simple as knowing the “chrome-
extension://“ unique id of the extension? I’m aware of the cat and mouse
aspect of scraping and that was one of the pitfalls I’ve been wary of as a
fingerprinting vector.

~~~
SquareWheel
I'd be surprised if sites had permission to read a chrome-extension:// URL.
That'd be a sizable privacy leak.

~~~
mmcwilliams
I'm not sure about the chrome-extension protocol, but this API still seems to
be present: [https://developer.chrome.com/extensions/runtime#method-
sendM...](https://developer.chrome.com/extensions/runtime#method-sendMessage)

~~~
penagwin
I just tested in chrome 77, and I could only do
`chrome.runtime.sendMessage("clngdbkpkpeebahjckkjfobafhncgmne",
{},{},console.log)` from within the Stylus extension page, not from an
external page like hacker news.

------
turtlebits
Seems to work well, but it appears that you can't re-scrape a site without
starting from scratch, it's a one time deal.

~~~
tmikaeld
Yeah, it needs a lot of additional features.

I hope this isn't one of these "abandon-ware" kind of deals.

------
Sephr
The DRM scheme could use some work. Here's a simple crack (run from the
extension license page):

    
    
        chrome.storage.sync.set({
            spider_valid_license: {
                key: 'No license (00000000-00000000-00000000-00000000)',
                lastChecked: new Date('Jan 1 3000')
            }
        })

------
riantogo
I'm not familiar with web scraping but have been looking for feed(s) from
shopping sites (e.g. deals.ebay.com, deals.amazon.com etc.). In good ol' days
they used to publish RSS, but not any more. Can I use this for what I'm
looking for? Will eBay and Amazon end up banning me? Alternately does anyone
know of a good service that aggregates shopping feeds?

------
porker
This has a nice UI. It reminds me of Kantu (now
[https://ui.vision/](https://ui.vision/)) which I've used with varying degrees
of success. That works by recording Selenium scripts; is Spider Pro entirely
custom?

------
visualphoenix
This is lacking clarity in the payment schedule... Is it One-time? Monthly?
Yearly?

~~~
ithkuil
EDIT: snippet from the website to help answer this question without requiring
you to click. I didn't intend this to be a rude answer, I don't think it
deserves downvotes.

" \+ No server involved (zero subscription fee for you!)

...

Unlike other web scraping softwares, it requires only one time payment to
scrape for unlimited time and data. No more subscriptions or huge fee for your
small data analysis projects! "

~~~
gravypod
How is this at all sustainable as a business model? My company is in charge of
scraping a few TB/week. Would we be allowed to use your service?

~~~
primitivesuave
This is a browser extension that takes the place of copy/pasting from web
sites (ie manual scraping), what you’re looking for is probably more
automated.

------
wumms
Error info: _contact us_ at the bottom shows _page not found_.

~~~
esnard
This is quite ironic, since crawling its own website is surely one of the best
ways to detect broken links.

~~~
y4mi
its not a crawler.... its a browser extension to extract data from an
optionally paginated list.

------
ct520
good-time to showcase given this was recently in the news. Looks awesome
thanks for sharing [https://news.bloomberglaw.com/privacy-and-data-
security/insi...](https://news.bloomberglaw.com/privacy-and-data-
security/insight-linkedin-data-scraping-case-9th-circuits-trigger-for-cfaa-
liability)

------
atarian
Since the extensions are being distributed via Gumroad, how would updates
work?

------
s3nnyy
Have a look at dashblock.com, they went through YC recently

------
neovive
Nice example of Tailwind CSS website as well.

------
chirau
How is it the cheapest if it is not free? There are plenty of free scrapping
tools out there

~~~
tmikaeld
How many are offline-only?

Almost everyone one chrome web-store are online-based and "freemium" only.

~~~
CobrastanJorji
Sorry, maybe this is obvious, but what is an offline web scraper?

~~~
bauerd
Probably meant installed and running locally

~~~
tmikaeld
Yes.. It felt silly writing "serverless".

------
impatientduck
Nonsense - beautiful soup is free.

Don't pay for something you can program yourself.

~~~
Cyberdog
Sure, everyone's time is free too, after all.

(Alternate smart-ass reply: Surely you coded the browser, operating system,
and network stack involved in posting this response by yourself, right?)

~~~
sillydinosaur
[https://www.crummy.com/software/BeautifulSoup/bs4/doc/](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

I mean...

soup.find(id="link3") # <a class="sister"
href="[http://example.com/tillie"](http://example.com/tillie")
id="link3">Tillie</a>

Is this hard?

~~~
Cyberdog
For HN's audience, no, that's not hard. But this product is clearly not only
for us, and you've also cherry-picked a very simple example.

There's also the aspect of downloading the page in the first place and dealing
with things like authentication and bot detection which a product like this
helps solve.

I personally don't have a use for this product right now, but I won't be so
bold as to say I'll never find a case where using it wouldn't be easier or
more cost-effective than hacking up my own solution.

