

Ask HN: Get feedback on my bootstrapped web scraping as a service project? - notastartup

Hi HN,<p>I worked on https:&#x2F;&#x2F;scrape.it for the past 5 months or so. I would like some feedback please.<p>Thank you.
======
techdragon
The UI for defining the scraper targeting rules is an A+ in my book. I've used
a number of visual tools for this over the years but most try to do to much
and the usefulness suffers. Yours doesn't.

Your price however feels off by about a factor of somewhere between 4 and 8. I
haven't really got your system usage data to fully analyse but it feels too
high a price. If your using full desktops and chrome browsers and the overhead
is higher than I'm expecting based on past experience then perhaps the pricing
is off by a factor of only 2

That also doesn't even account for the ability to take the scrape rules,
reformat them to run with a headless scraper like scrapy when the website is
suitable and is able to be scraped by scrapy, offering a reduced cost scraping
option powered this way, throttled back to "human speed" is probably best
priced around $1 or $2 a month per crawl thread.

I'm not saying the chrome browser powered crawling part couldn't be so
inefficient you need to charge $20 to make 1 crawler pay for itself, but if
that's the case, it's a growth limiter you should consider ways to optimise.

Right now your UI is about 10 times better than the open source Portia UI from
scrapinghub, but your also 10x the cost of operating a Portia crawler. It's
just something to keep in mind.

TL;DR? Nice product, but a bit expensive.

~~~
notastartup
You can create as many crawlers as you want. If you created a 100 crawler in a
month and ran them, it would cost less than a dollar.

------
dennybritz
It looks great, very good job.

One question you should be prepared to answer is how you compare to
alternatives:

\- Services like Kimonolabs and Import.io

\- Building a scraper with Scrapy

\- Using Scrapehub

From what I understand the price point of the above is lower (some are free)
while still offering the same features. What can you offer that they don't?

~~~
notastartup
\- Simple and Minimal UI

\- Crawl & Extract Static, Dynamic, Single Page Applications.

\- Fixed pricing for unlimited scraping

I used Kimonolabs but found it falling short of Import.io. Import.io is great
but it felt very bloated and complicated to use.

If you already know how to use Scrapy or a free solution and it works for you
then honestly you won't need it.

There's also a free plan where you can run any number of jobs but each job has
a limit of 30 minutes and a wait line.

Further adding to the price point, you can create as many scraping jobs and
run it as often as you want but in the free plan there's a wait line at peak
load.

------
Jake232
Needs more details, a couple of examples that instantly spring to mind:

Do you spread requests out across multiple IP's to avoid bans / rate limiting?

Do you run javascript on the pages?

~~~
notastartup
No, it will not fly through thousands of pages at a blinding rate using
multiple ip addresses, it goes about at a human like pace to avoid getting
banned. If an ip is banned a new one may be assigned. If you need to go faster
you will need concurrent jobs.

All the jobs use chrome browser, and supports any website.

------
Patrick_Devine
It looks quite easy to use from your video.

Which cloud provider do you use for your backend crawl, and do you run the
scraper from a single node or from multiple nodes?

------
lgmspb
Page itself looks nice, but why you've decided to have a custom scroll? I
think it would be so much better without it. Will try to look into app itself
later.

------
notastartup
clickable link: [https://scrape.it](https://scrape.it) since you can't click
the one above.

