
Show HN: WrapAPI v2 – Build APIs, scrapers, bots on any website - wrapapi
https://wrapapi.com/v2
======
captn3m0
Suggestion: Please be upfront about pricing. I've been bitten with similar
tool dying because they were too shy to ask for money.

~~~
chipperyman573
They have a link to the pricing page[1] on the navbar

[1] [https://wrapapi.com/v2#/pricing](https://wrapapi.com/v2#/pricing)

~~~
wrapapi
Thanks for linking to this, chipperyman573! The parent comment used to be
valid, but we added a dollar amount for the "Business" plan based on this and
a few other comments; we are still willing to discuss discounts for non-
profits, smaller companies, etc.

------
OhSoHumble
I've always felt that I service like this would be great for Code for America
projects. A big problem I have with creating technology applications for civic
goods is that the government is _terrible_ at providing open data. Even when
they do open source civic data, they do a terrible job of it.

An example of this is California drought data: automatically grabbing data on
the drought is incredibly difficult because it involves scraping HTML tables.
I tried to build an API that presents drought data so volunteers would have an
easier time building out data visualizations. I ended up just getting
exhausted doing all the scraping work.

I then moved onto a new project: building a free-to-use Padmapper for
affordable housing. The data for income restricted apartment units are driven
by a government contracted vendor. A city county will declare income
stabilization policies and legally enforce them against landowners and then
the landowners would send over their list of units to the vendor.

This would be great except the vendor does the bare minimum. Padmapper looks
amazing but, really, it's only applicable for the upper middle class due to
explosive housing costs in the Bay Area. So, in order to provide a more modern
website and mobile application for the community, I started to scrape the
vendor's website. It was terrible. I kept getting throttled. So I gave up.

~~~
dhruvkar
What was the vendor's website?

~~~
OhSoHumble
socialserve.com

------
wrapapi
Hi everyone! We just released the second version of WrapAPI

We have a new WrapAPI API Builder looks like a browser, and is as easy to use
as one too. You can define your API's inputs with a quick tap on the address
bar, and point and click at the data you want to extract.

We also have a Chrome extension is smarter and better-integrated than ever. It
records your requests and It'll automatically create parameter inputs for the
values that change between requests to the same endpoints. The contents of
your captures are immediately ready for you to start defining outputs and data
to extract too.

Let me know if you have any questions or feedback!

~~~
sharemywin
if you take a screen shot of all the items being scraped you could build a
dataset for a pretty powerful AI. Something that takes an image of a webpage
and out puts machine readable data. Not saying there's a NN that can do it
right now but it seems like eventually it could get there.

~~~
sharemywin
That same dataset in reverse could be an interesting GAN too. takes useful
data and outputs a webpage for it.

------
zackify
Doesn't work at all with JS.

This is a big thing on many sites now.

Also, since that is the case, you could build this in a few hours using
something like: [https://github.com/bda-research/node-
crawler](https://github.com/bda-research/node-crawler). Yes, it would have no
gui, so you lose that.

~~~
RandomBookmarks
If JS a problem for you, try Kantu. It works with screenshots and uses OCR for
scraping. The beauty is that it works with any kind of site. But clearly, the
speed can not match a node.js or perl based scraper (mechanize etc), so it is
not suitable for high volumes.

~~~
gardnr
Do you find it better than Phantom?

Just reading about Kantu now. It reminds me of
[http://www.sikuli.org/](http://www.sikuli.org/)

~~~
RandomBookmarks
Yeah, the concept is the same as Sikuli, but all inside Chromium (and the OCR
is better).

>Do you find it better than Phantom?

It depends. Once you have a working script, web scraping with Phantom is much
faster and much more resource efficient. But since Kantu works visually, you
do not have to touch any page source code. That makes it much easier/faster to
create the automation in the first place, especially for complex sites with
date controls, drag & drop and other Javascript.

------
superasn
By the way your Onboarding step-by-step wizard[1] is really awesome. I've used
similar scripts on my sites before but they keep breaking because the users
often click on some div or button (or due to mobile phones) not intended
(they're only learning) and then wizard can't sync to the next step and the
whole thing breaks :/

Is this happening on your site? If not, would appreciate some tips about
coding it and how to handle exception cases where the wizard can't keep in
sync or user click on unintended page elements.

[1] [http://i.imgur.com/aJzuvSD.jpg](http://i.imgur.com/aJzuvSD.jpg)

~~~
wrapapi
Thanks! We used this awesome library called React-Joyride
([https://github.com/gilbarbara/react-
joyride](https://github.com/gilbarbara/react-joyride)) which made setting up
the product tour a breeze. Since our product tour is on a single-page app, it
works quite well.

The most helpful part is that you can pass a callback which will trigger
before/during/after each step, which can let you ensure that the state of the
page matches what you're expecting. In our case, we use it to make sure that
you're switched to the right tab, etc. Take a look! I highly recommend it.

------
webninja
This tool is really well thought-out and useful! I made a working API in less
than 1 hr. This tool has a much better design & implementation than Kimono and
easier than using Python 3 + Beautiful Soup 4 which is how I made my previous
web scrapers. This tool also works for POSTing to web forms.

~~~
dsacco
No offense, but your comment sounds like astroturfing (I'm not saying you are,
just that it's part of a pattern I see).

I often see one or more commenters write what seems like an excessively
positive thought dump on Show HNs. It just doesn't seem like the natural
conversational tone everyone uses, but I can't quite put my finger on it.

Has anyone else noticed it? Is there a term for this sort of writing style?

~~~
webninja
It could be that I need to work on my writing skills. I'll admit, I'm an
systems engineer not a writer. On the other hand, HN commenters tend to convey
a healthy dose of cynicism and skepticism. Also it's known that negative
comments come across as more trust-worthy than positive comments on the
internet. I simply used this tool and it did what it said it did. I don't give
a positive review unless I had a good experience. But it is easier for all of
us to believe people with some degree of skepticism and cynicism.

------
salimmadjd
What if the content you want to scrape->API is behind a login gate? Is there
an option of authentication?

~~~
wrapapi
You can actually write an API endpoint that'll retrieve a state token (which
includes the cookies). For one example, you can use view our Hacker News login
endpoint at
[https://wrapapi.com/#/view/phsource/hackernews/login/latest](https://wrapapi.com/#/view/phsource/hackernews/login/latest)

That endpoint will then emit a state token, which includes the session
cookies. You can feed that state token into your next request and it'll
authenticate you

------
programbreeding
This looks awesome. Thank you!

I wanted to give you a heads-up that the youtube video at the end of your
joyride tutorial is broken.

It tries to play this:
[https://www.youtube.com/watch?v=10yKzP3gtkc](https://www.youtube.com/watch?v=10yKzP3gtkc)

------
randomsofr
"Price: Contact us"

Why?

~~~
dyim
Likely because they have custom pricing based loosely on how much business
value they create for the customer. E.g. if a philatelist wanted to scrape
stamp catalogs, and if an industry-specific analytics platform wanted to
scrape a directory of prospects - you'd want two different prices. Otherwise,
you'd either 1) leave stamp enthusiasts out in the rain, or 2) leave a whole
lot of meat on the bone w/r/t enterprise pricing. There might also be a
consulting upsell!

~~~
dsacco
Eh, that seems like a non-problem. The solution to me is to leave stamp
enthusiasts out in the rain. If your SaaS product can provide a lot of value
in enterprise companies, $500 a month is not a lot to ask. And many people
just want to see the price of a line item if it's a productized service so
they can go back to someone higher with a purchase request.

When I was last working inside an organization and reviewing vendors for a
product, it really left a bad taste in my mouth when they had "Ask for
Pricing." I get it, my consulting work is basically Ask for Pricing, I
understand the business strategy. But it's such a headache to sit through
bullshit product demos for multiple vendors over a few weeks just to hear that
their pricing structure is way out of line.

There is this idea that a lot of companies have, where they're more
"professional" or conversion-optimized by removing public pricing and putting
everyone through a sales funnel. But that concept only works if 1) you have a
great product and 2) you have a great sales team, capable of making my time to
failure in the conversion process _fast and painless._ Every company thinks
they have this, but they almost never do. I really don't think you want to
optimize your business for keeping stamp enthusiasts happy.

------
startupdiscuss
The problem with opaque pricing is this: people don't want to start
experimenting with something if it could be infinitely bad. i.e. if they can
imagine the worst.

In the back of their heads, some people imagine the service is going to be
huge, and then they worry that all the profits will be paid out to wrapapi.

Better to have a high headline number and then offer discounts for certain
uses (non-profit, open source, students, etc). People are optimistic about how
much money they might make so a high headline future price for when you
graduate from the free tier is not necessarily bad.

------
throwaway8123
Good luck--- I built a service that did very 'light' scraping and it was
forced to shut down. I imagine your days are numbered.

------
RandomBookmarks
Interesting... how does this compare to
[https://a9t9.com/kantu/scraping](https://a9t9.com/kantu/scraping) ?

WrapApi seems to tackle the same task (web scraping) from a very different
angle. I wonder if anyone has used both and can compare.

~~~
wrapapi
WrapAPI is meant to not only do scraping (reading information), but also to
(1) perform actions with side effects and (2) allow for complex chaining

Let's say you have a web-based inventory management system or CRM that
requires a login, but you want to take data a customer has sent you in a
spreadsheet and automatically batch enter it into the CRM, which doesn't have
that functionality. You could then:

1\. Create an API endpoint that allows you to log into that system and return
a state token

2\. Create a second API endpoint that's parametrized the inputs of the form to
create a new inventory entry

3\. Chain those 2 API endpoints together so that the 2 actions are actually
combined into one API call

Our focus is not only on getting data, but automating the many things that you
or your company does with websites to save time

~~~
egfx
Man, what happened to Yahoo pipes? A tool that did all this so well.

------
superasn
This is reallly neat. But I think you guys are losing a lot of potential
customers by not having a clear cut pricing on you site.

I've used similar services like parsehub.com in the past and if they didn't
have a pricing page I would have never tried it. Just my 2 cents.

------
krmmalik
Im asking since the term API is mentioned. Is this designed for technical or
non technical people? Im a non coder but really could do with the scraper so
would this work for me?

~~~
wrapapi
This is designed for at least semi-technical people, but it's really not that
hard to give it a try for simple sites. Try watching the video, and shoot me
an email at peter@wrapapi.com if you run into any issues!

------
notwhoyouthink
Glitchy on Chrome 57. When I load a page to build, I'm unable to scroll more
than 1/4 of the page then it jumps back to the top.

~~~
wrapapi
Let us know what page you're trying at peter@wrapapi.com, and I'll take a look
and fix it ASAP.

------
profalseidol
How does it do when websites doesn't have a good standard in their html?

You are using xpath here right?

------
danvoell
How does this compare to Kimono?

~~~
nicoboo
Kimono shut down on February 29th, 2016 and the cloud service has been
discontinued. It only exists as a desktop app now.

Bought by Palantir, they retired in a good way, keeping people's data
available for a moment and communicating well.

It was a great product still complicated to get a practical business model.

This WrapAPI v2 is an alternative I think, but I would use them with care as
the economical model is unsure and it seems to be really new, still promising!
:)

~~~
codezero
The desktop app + browser plugin still seem to work fine. I've run into a few
things that don't quite work well, like pages that have a combination of
click-to-paginate and auto-scroll-paginate, but in general, it's good.

------
krystiangw
This is awesome. I'll give it a shot in my some of my next projects.

------
MUCHZER
Great! It really look like a nice portia fork

~~~
kasbah
Any evidence that it is in fact a fork of portia?

[https://github.com/scrapinghub/portia](https://github.com/scrapinghub/portia)

------
Jayakumark
The UI looks more like from import.io

------
graphememes
I've never had one of these work properly, better to build your own using some
language and an html / text parser

------
matz1
using the css selector can easily cause the page to be unresponsive.

~~~
hsource
Can you let us know which page and selector you're trying? I can debug it

~~~
devin
Try typing in "span a" as a CSS selector for the HN homepage.

------
iambpentameter
Does this violate any current US law?

~~~
cookiecaper
The software itself probably wouldn't, but the use of it for anything anyone
cares about probably would. The CFAA, etc., make unwanted scraping illegal and
this has been tested repeatedly in court.

The company that runs this software as a service needs to be very careful.
3Taps was similar and got destroyed for relaying data scraped from Craigslist.

Contacting the server after its operator has expressed its wish for you to
stop is a violation of the CFAA (in that you are "exceeding authorized access"
and/or gaining "unauthorized access" to a protected computer system). If it's
found that the site's ToS is binding upon you, which it typically would be,
you don't really even need separate notice to be held liable.

Storing a copy of a web page in RAM creates a copy that is eligible for
copyright protection, and it is likely that any implied license to read that
page will be invalidated by the access revocation.

IANAL.

~~~
wolco
Another court stated that copying data into a ram buffer for under 1.2 seconds
was allowed. Depending on how they structure this it might be legally allowed.

[https://books.google.ca/books?id=a-yu2-JUQNAC&pg=PT249&lpg=P...](https://books.google.ca/books?id=a-yu2-JUQNAC&pg=PT249&lpg=PT249&dq=copyright+ram&source=bl&ots=5UVZjSNmlF&sig=DQ1k2TAEknpFp701zrq3PWkcChM&hl=en&sa=X&ved=0ahUKEwiQn_KJ2rbTAhXJxYMKHcZfCWo4ChDoAQhzMA0#v=onepage&q=copyright%20ram&f=false)

~~~
cookiecaper
Your link doesn't work for me, but I was able to find the case you're
referring to:
[https://en.wikipedia.org/wiki/Cartoon_Network,_LP_v._CSC_Hol...](https://en.wikipedia.org/wiki/Cartoon_Network,_LP_v._CSC_Holdings,_Inc).

Thanks for that! Like I said, I'm not a lawyer and I'm sure there are other
gaps in my case knowledge. It's certainly positive to see the Second Circuit
recognizing that there is some need to consider the transient nature of RAM
copies before ruling them infringing.

The ruling suggests that _MAI v. Peak_ did not address the transitory argument
merely because it was not raised by the litigants, and that the precedent set
there (which wouldn't have necessarily been binding anyway) is therefore not
abrogated by ruling that some RAM copies are transient enough to fail to
qualify.

Importantly, the durations listed here describe the runtime of the content,
not the amount of time the data is held in the RAM. It is said that the system
would buffer 0.1 seconds (100ms) of content at one point and 1.2 seconds of
content at another point.

The Court does not seem to establish "1.2 seconds" as a general benchmark for
RAM transience, but rather it suggests that transience should be considered on
a case-by-case basis, per the language of the statute.

However, the general rule of thumb is that if a copy exists long enough to
derive any value from it, it is non-transient. Guidance from the Copyright
Office [0] reads:

>[...] _we believe that Congress intended the copyright owner’s exclusive
right to extend to all reproductions from which economic value can be derived.
The economic value derived from a reproduction lies in the ability to copy,
perceive or communicate it. Unless a reproduction manifests itself so
fleetingly that it cannot be copied, perceived or communicated, the making of
that copy should fall within the scope of the copyright owner’s exclusive
rights. The dividing line, then, can be drawn between reproductions that exist
for a sufficient period of time to be capable of being "perceived, reproduced,
or otherwise communicated" and those that do not. As a practical matter, as
discussed above, this would cover the temporary copies that are made in RAM in
the course of using works on computers and computer networks._

and scrapers have been held liable for copyright infringement via RAM copies
on multiple occasions. Ticketmaster v. RMG states:

>[...] _copies of ticketmaster.com webpages automatically stored on a viewer
's computer are “copies” within the meaning of the Copyright Act._

despite the fact that they likely would've been held for a much shorter time
than either 100ms or 1.2 seconds.

Notably, this was before the case referenced above, but it's typical of later
cases, and it succinctly demonstrates that courts are likely to find RAM
copies of an entire work (the web page) more likely to be of non-transitory
nature than snippets of ~ 1/1500th of an entire work, regardless of how long
they're stored in RAM.

[0; PDF]
[https://www.copyright.gov/reports/studies/dmca/sec-104-repor...](https://www.copyright.gov/reports/studies/dmca/sec-104-report-
vol-1.pdf)

