The presentation is beautiful and the website is great, but the tech broke so I have no idea how or if this even works. This is a wonderful concept and one I've talked about doing with others. I was really excited to try this. I watched the demo video and it seemed straightforward.
I went to try and use it on the demo page it provides, going through and adding things, but when I went to save it, I just received an error that something went wrong. Well, crap. That was a waste of time. Oh well, maybe it's just me.
Alright, I'll give it another shot using the website they used in the demo. Opened up a Hacker News discussion page and started to give it a try. Immediately it was far less intelligent than the demo. Clicking on a title proceeded to select basically every link on the page. Somehow I clicked on some empty spots as well. Nothing was being intelligently selected like it was in the demo. Fine, that wasn't working tremendously well, but I wanted to at least see the final result.
Same thing: just got an error that something went wrong and it couldn't save my work.
Disappointing. I still might try it again when it works 'cause it's a great idea if they really pulled it off. So far: doesn't seem to be the case.
Sorry you had a bad first experience. We've tested this on a lot of sites, and it works stably across a lot of different cases, but haven't solved it everywhere. Thanks for letting us know about the discussion page. We'll look into the bugs right now
Having written a whole lot of crawling code throughout the years, I can totally understand how monumental of a task this is. This really does look like a cool product. Glad you're actively hunting down ways to improve the demo before asking for money :)
Feel free to drop me a line if you want any specifics about the troubles I had.
hacker news has probably been scraped too many times. that's why it's not working. it didn't work for me, but other sites worked easily.
...i will comment that the user experience for first adding elements is not what i expected. once u understand it's great overall, but i'd have some popup tooltip type guide thing going on for the first time a user uses it explaining that you should choose one element first, and then check a matching element. ...Then explain the number bubbles next to the property field. You don't get what those numbers are at first. ..Then for the property field, somehow make it more obvious that you should fill it out.
Basically you have the interface setup so any action can be done at any time, but it should be presented as a forced sequential set of steps to the end user, at least at first.
HN pages are possibly the worst case, very hard to infer structure from due to its 1998 coding standards. You'll have a better chance with an alternative interface like http://ihackernews.com/ or http://hckrnews.com (no comments though).
I really wish that HN wasn't even in the running for a "worst case". For a community that seems to be all about UX and innovation, shouldn't it run on at least a marginally user-friendly piece of software with this-century markup?
I get that it has a sort of kitschy or retro appeal, but it's just basically a pain to use and looks terrible.
I can't tell you how often I click next page to find that I've taken too long and my session or whatever has expired.
No. This site isn't about UX or innovation, it's about tech start ups. It's a constant reminder that something can be successful even if it was written in a LISP dialect and has a bunch of UX misses as long as the core idea is valid and the product usable enough.
I've used a bunch of HN skins that were supposedly better designed, but none of them stuck. Apparently it's just plain unnecessary for HN to be better.
The website that lets technologists hang out with people who might give them large sums of money to see their ideas succeed doesn't have to be good, or pretty.
If the website were better, or prettier, that would not add any additional value to the previously mentioned large sums of money.
(Oh, and advice, and being able to chat with industry leaders and experts on diverse arrays of topics, etc.)
I like HN because it gives me exactly what I want. A list of interesting links, ranked by what people that I truest more than news sites think of them. Oh and offers comment sections for each of them. HN is wonderful link discovery during compile time or if you need a break.
If "improve" would involve turning it into AJAX-heavy app with images, CSS effects and some weird-ass scrolling interceptors, that requires me to load 200+ files for seeing the main page, I would rather live without any improvements.
I think better terms might be "clean up" or "restructure".
Kimono can handle several pages with malformed and old/bad HTML. We're still in beta though, so we're handling more edge cases as we encounter them. Should work on HN main page.
Have you thought of implementing a system for custom handling of edge cases? Realistically you can't handle all edge cases, so leaving the last 2% up to the user (which is a developer) would be a good idea. It'd still save him 98% of the work, but also give him the comfort of knowing it won't break even if site X is doing something weird with their HTML in the future.
I'll leave it to you to figure out how one would implement such a feature ;-)
OK so you're saying that instead of using a scraper to deal with malformed data out there (the whole reason of its existence), instead we should use a format that is better suited for machine representation? That's like saying 'yeah I've got this car here to take you to places that are very far away, too far to walk; except it doesn't work very well if you want to go far away, so you're betting off just staying at home. Or walk along time.'.
Scrapers exist to take data out of HTML, malformed or not, when an API or feed is not available. Yes, ideally HTML is written in a semantic, annotated machine-parseable way; that also enables smarter search engines, better accessibility, interoperability and so on. That's one of the main reasons behind the changes to the standard made in HTML5.
A better example would be "hey I have this cool device that will improve your mileage by 100%, but it only works on cars built according to current emission standards...".
Kimono can handle many pages with malformed/ old HTML. Of course, it's still beta and there are still pages that break it, so we're improving it as we go with the help of early adopters like everyone on this thread. Of course, our goal is an ideal state it works everywhere perfectly :)
I'd love to argue that PG probably thinks this is a good thing. (no scraping). And for good reasons.
Who is paying the bandwidth bills in the end, and for whom, you might ask. And then if those users will end up contributing to HN in any intellectual way in the end, or are just scraping content to spin in their auto-blogs filled with ads.
hn pages are not too hard, each story is a <tr>, title is in 'tr .title' and link is in 'tr .title a'.
There are some irregularities (i.e. YC announcements without score, more button, self posts) but it's not really structurally complicated (compare reddit, which has 3 "score" fields per link)
In my case, worked for some pages and not for others. Currently I'm using Feedity: http://feedity.com for all business-centric data extraction and it has been working great (although not as flexible as kimono).
It worked great for me (http://www.kimonolabs.com/kimonoapp/aws-status-check). It would be helpful if you shared what you tried to do when it failed. I don't work for or with Kimono, but am curious what does and does not work.
The Simile group at MIT did something similar back around 2006. Automatic identification of collections in web pages (repeated structures), detection of fields by doing tree comparisons between the repeated structures, and fetching of subsequent pages.
The software is abandoned, but their algorithms are described in a paper:
Oh, hey, memories. I worked one summer with David Huynh (who you're linking to there) and David Karger (his thesis advisor) on one of the Simile projects.
I vaguely remember playing around with this tool you mentioned. I thiiiiink it was this one[0], although it seems to be superseded by this one[1] now.
Just had to chime in and say that David Huynh and his fellow programmers will be forever heroes to me and a small group of data journalists who depended on Gridworks/Google Refine/OpenRefine
If you're interested in hosted solutions that try to do automatic identification of pages, diffbot is worth a look. We've had some good experiences: http://diffbot.com/
Show me it working with authentication and you will have a customer. Scraping is always something you need to write because the shit you want to get is only shown when you are logged in.
how are you going to do it without having to know the actual authentication key(s)? if i don't trust anyone enough to give my auth away, and so unless the site being scraped has some sort of oauth support, how are you going to get any data?
of course, if this was an offline product, or self-hosted product, then it would solve that problem of auth instantly.
Would there any way to fake the beginning of an OAuth session with Facebook, Google or any other OAuth authenticated site? Kind of like replaying cookies to hijack sessions?
The route of proxying the web page presents much difficulty in doing actual authentication on Facebook or Google's website via the proxied webpage without first rewriting most of the javascript and hijacking their Ajax calls on the fly.
The approach I took was to hijack the Cookies from the browser once the user has signed in after on e.g. Facebook via the browser extension.
The route of proxying the website does in fact do away with the need to install any external 3rd libraries.
This browser extension I built coupled with the web service its integrated to does allow for scraping of pages from Facebook, Google and LinkedIn logged in pages as well.
Hah, I've been working on this recently with Facebook, on a TV set-top-box. It was painful and I ended up giving up. xd_arbiter.php is the key, I think.
Creator of Automately here, our service could definitely be something you might be interested in. While we aren't directly in the business of web scraping, we do have a powerful automation service that can accomplish those needs using simple javascript and our powerful scalable automation API.
We are accepting early access requests right now.
Check us out! http://automate.ly/
I've written more web scraping code than I care to admit. A lot of the apps that ran on chumby devices used scraping to get their data (usually(!) with the consent of the website being scraped) since the device wasn't capable of rendering html (it eventually did get a port of Qt/WebKit, but that was right before it died and it wasn't well integrated with the rest of the chumby app ecosystem).
This service looks great, good work! But since you seem to host the APIs created how do you plan to get around the centralized access issues? Like on the chumby we had to do a lot of web scraping on the device itself (even though doing string processing operations needed for scraping required a lot of hoop jumping optimization to run well in ActionScript 2 on a slow ARMv5 chip with 64mb total RAM) to avoid all the requests coming from the same set of chumby-server IP addresses, because companies tend to notice lots of requests coming from the same server block really quick and will often rate limit the hell out of you, which could result in a situation where one heavy-usage scraper destroys access for every other client trying to scrape from that same source.
Access, legality and rate limiting issues come up a lot. We're working on a couple things to address them. The first is an intelligent job distribution system that consolidates scrapes across users and hits sites (and pages) at human-like intervals. the second is to create a portal for webmasters that allows them special privileged access to analytics on data being extracted from their sites, and the ability to "turn on or off" kimono APIs if they see fit. this way, via kimono, a webmaster at chumby could "provision" certain kimono users. we're still yet to see whether the later works out. thanks for the input
I'm curious how you plan to avoid/circumvent the inevitable hard IP ban that the largest (and most sought after targets) will place on you and your services once you begin to take off?
I could have really used a service like this just yesterday actually, I ended up fiddling around with iMacros and got about 80% of what I was trying to achieve.
It's a great question. What we're really trying to do is make data accessible programmatically and at scale. We want to connect data providers and data consumers with APIs in a way that's mutually beneficial vs. being a tool for data theft. Our hope is to (once we scale) actually work with data providers directly on the on the distribution of their data so the IP ban becomes a non-issue.
But isn't the point kinda to let the users come up with data providers themselves? If you say "Only these 500 data providers are available for scraping", you don't have a business. If you don't' have such a limitation, you'll not be able to work directly with all data providers. You'll have IP problems.
Another feature, simple one: Allow to add some filters to the data stream. For example: only posts that contain word "bitcoin" in the name or only those with 50 upvotes or more.
Thanks. We support regex matching now. Try dragging to select text, if there's a relevant regex pattern and kimono will find it (there's an example inthe blog post). You can preview (and soon, you'll be able to edit) the CSS and Regex also in advanced mode.
thanks. glad you like it. pagination, dynamic tabs (and crawling in general) is a big feature we really want to add soon. a lot of people are asking for it. the challenge will be integrating it with the current UEX which we're trying to keep super simple.
Constructive Tone: I figured that it might be nifty to scrape cedar pollen count information from a calendar and then shoot myself an email when it was higher than 100 gr/m3.
This would be a pretty difficult thing to grab when scraping normally, but the app errors before loading the content:
Thanks for letting us know. Just tried and am getting the same error. The page is loading content dynamically from another source... We'll look into this and see if we can get it working on this page
Great work so far. The tool was very intuitive and easy to use.
My suggestion: once I've defined an API, let me apply it to multiple targets that I supply to you programatically.
The use case driving my suggestion: I'm an affiliate for a given eCommerce site. As an affiliate, I get a data feed of items available for sale on the site, but the feed only contains a limited amount of information. I'd like to make the data on my affiliate page richer with extra data that I scrape from a given product page that I get from the feed.
In this case, the page layout for all the various products for sale is exactly the same, but there are thousands of products.
So I'd like to be able to define my Kimono API once - lets call it CompanyX.com Product Page API - then use the feed from my affiliate partner to generate a list of target URLs that I feed to Kimono.
Bonus points: the list of products changes all the time. New products are added, some go away, etc. I'd need to be able to add/remove target URLs from my Kimono API individually as well as adding them in bulk.
Thanks for listening. Great work, again. I can't wait to see where you go with this.
Thanks a ton for the feedback. Getting data from multiple similarly structured URLs programmatically is something we're working on now. We love hearing about the use cases you want to use this for so we can make sure we build out the right features to make kimono useful for you.
The people who scrape data at Scraperwiki -- which was made by the same people who opened up parliamentary transcripts in the UK for the first time, and the UN's proceedings, and data about how MPs in London vote -- generally don't have an option to buy anything because the data's hidden by governments from the people who paid for it, on purpose.
But by all means take this opportunity to dismiss all of us as freeloaders.
This is a great tool! In a past life we needed a web scraper to pull single game ticket prices from NBA, MLB, and NHL team pages (e.g. http://www.nba.com/warriors/tickets/single). We needed the data. But, when you factor in dynamic pricing and frequent page changes you are left with a real headache. I wish Kimono was around when we were working on that project.
I love how you can actually use their "web scraper for anyone" on the blog post. Very cool!
That UI made me go wow, this could be an awesome tool. Idea that pops into my mind is being able to grab data from those basic local sites run by councils, local news papers etc and putting it into a useful app.
How dedicated are you guys to making this work because I'd imagine there are quite a few technical hurdles in keeping a service like this working long term while not getting blocked by various sites?
Love your suggestion. We're committed to making kimono better and we're working on it all the time. We want to make sure it's a responsible scraper, so want to work together with webmasters in cases where there might be blocking but the data is legal to share...
HTTPS is definitely a problem for proxy servers unless you the proxy server rewrites all the URLs in the html pages loaded as well as all the URLs of the Ajax calls to point back to the proxy server.
> Web scraping. It's something we all love to hate. You wish the data you needed to power your app, model or visualization was available via API. But, most of the time it's not. So, you decide to build a web scraper. You write a ton of code, employ a laundry list of libraries and techniques, all for something that's by definition unstable, has to be hosted somewhere, and needs to be maintained over time.
I disagree. Web scraping is mostly fun. You don't need "a ton of code" and "a laundry list of libraries", just something like Beautiful Soup and maybe XSLT.
The end of the statement is truer: it's not really a problem that your web scraper will have to be hosted somewhere, since the thing you're using it for also has to be hosted somewhere, but yes, it needs to be maintained and it will break if the source changes.
But I don't see how this solution could ever be able to automatically evolve with the source, without the original developer doing anything?
It would be great to automate this eventually. For now, we're trying to make it really easy to set up and rebuild the scraper. If it goes down, you'll see it in the status on your user dashboard. We're also implementing alerts, so you can opt to get an email notification if a scrape fails
I assume you get an error on the hourly, daily, monthly, whatever update which you are notified about. Then you can redo the semi-manual setup of the scraper.
Wow, this is looking good, I wish I had it available to me 6 months ago! Nice job :D
I don't know if it's just me or not, but it's not working for me in Firefox (OSX Mavericks 10.9.1 and Firefox v26). The X's and checkmarks aren't showing up next to the highlighted selections. Works fine in Safari.
I'm coming at things from a non-coder perspective and found it easy to use, and easy to export the data I collected into a usable format.
For my own enjoyment, I like to track and analyze Kickstarter project statistics. Options up until now have been either labor intensive (manually entering data into spreadsheets) or tech heavy (JSON queries, KickScraper, etc. pull too much data and my lack of coding expertise prevents me from paring it down/making it useful quickly and automagically) as Kickstarter lacks a public API. Sure, it is possible to access their internal API or I could use KickScraper, but did I mention the thing about how I dont, as many of you say, "code"?
What I do understand is auto-updating.CSV files, and that's what I can get from Kimono. Looking forward to continued testing/messing about with Kimono!
to be fully usable for me, there are some features missing:
- it lacks manual editing/correcting possibilities: i've tried to create an api for http://akas.imdb.com/calendar/?region=us with "date", "movie", "year". unfortunately, it failed to group the date (title) with the movies (list entries) but rather created two separate, unrelated collections (one for the dates, one for the movies).
- it lacks the ability to edit an api, the recommended way is to delete and recreate.
small bugreport: there was a problem saving the api, or at least i was told saving failed - it nevertheless seems to be stored stored in my account
Thanks for the feedback. We're working on a feature that will allow you to edit APIs you've created and also edit the selectors and regex (right now, in advanced mode, you can see them, but cannot edit). We're looking into your bug now...
To me, the favicon merely looks like a sumo wrestler’s head with a short ponytail and scowling/serious eyebrows. I can’t tell what NSFW thing you see it as.
Spitballing here, but it could be a POV dick's eye view.
I am really reaching however.
edit: Interesting downvotes, I dont mean to be rude, I am just trying to see what someone could find offensive. (I dont see the problem with the favicon)
Nice work, this is much better than I expected! Does it require Chrome? It doesn't seem to work in Safari for me. Also, does Kimono work for scraping multiple pages or anything that requires authentication?
7.0.1, the latest. I also don't have Flash installed, but it doesn't look like you're using Flash. The entire top bar doesn't show for me. Feel free to email me and I can send you screenshots.
I found the one click action for selecting an entire column of values as well as the UI/UX on the top column of the page to be very impressive. We were thinking of a nice clean way to represent that particular UI/UX flow in this browser extension we built as well. Will incorporate that in our next release.
I like how you've thought through the end to end use case: not just generating an API, but actually making it usable. I've done my fair share of web scraping and it's not an easy task to make accessible and reliable -- good luck!
It makes me wonder if there isn't a whole "API to web/mobile app with custom metadata" product in there somewhere. I can imagine a lot of folks starting to get into data analysis and pipelines having an easier time of it if they could just create a visual frontend in a few clicks.
Yes, we're excited about the possibilities of an end-to-end use case as well... in fact we were surprised when we found a lot of interest in front-end output layers on top of the APIs than just the APIs themselves. Would be curious to know what output features would be most valuable for you.
Well. Think about spreadsheets. Think about live spreadsheets powered directly by APIs. Bingo!
I'm doing a couple of data mining projects right now and simply being able to query and look into the API outputs, as well as my local database, without building a custom frontend would've saved me a bunch of time. But I'm thinking more of the knowledge worker, or even the power user who wants to view their Fitbit, Up and Lark data all in the same dashboard. Can't help but thinking this already exists somewhere though.
We all know there are a lot of existing tools that does the same things. But I've not met one with such a polished UX.
Kudos to the Kimono team, I'll definitly recommend your product.
Great request... it's on our feature shortlist. Definitely a feature we want to implement as soon as we can (after we tackle some basics like pagination and getting images)
Apart from that here are other items. You may have these on your list, but can count my vote to prioritize.
1. Pagination
2. Image URLs
3. Focus on page types such as product pages, posts etc. That way its easy to go from content to content. Will help crawling too
4. Link back to original page included in JSON
Finally common sites/pages used by multiple users of your systems should not count against the API count requirement under pricing. You may want to charge against total calls, like Parse.
Cool concept. One concern I'd have about this type of tool is that when it encounters something it can't handle, I'm stuck. Writing your own scraper means that you can modify it when you need to. I think the ultimate solution would be something like Kimono with the ability to write snippets of custom javascript to pull out anything that it can't handle by default.
We're in the middle of implementing a more power developer version of the tool to handle the use cases you're talking about. The beginning of this is surfaced under the "advanced" tab in the data model view where we show the selectors and regular expressions that are produced. we want to ultimately let you edit those to customize the extractor. From there it'd be super cool to implement the javascript snippet feature you suggested.
I'm normally a bit worried when a thread quickly fills up with praise, but this looks very nice.
It's something I have thought about, as I'm sure many people who have done any amount of scraping have, but never went forward and tried to implement. The landing page with video up top and in-line demo is a pretty slick presentation of the solution you came up with. Good job.
If it changes the format significantly, the scraper will break, so for now you'll have to use the tool to rebuild. You will see on your API status page that it's down. As for robots.txt, we do respect it... for now we're leaving that to the user, but we're trying to implement a proactive way of checking for disallows and stopping those scrapers from being built.
At the moment, we rely on users to be responsible. We spell it out in the terms and FAQ. We've been in private beta, keeping usage very limited until today. We fully understand the seriousness of the issue as we scale. We're committed to becoming a responsible bot that respects robots.txt
I would say how you're scraping differs from say how Google, a search engine, scrapes. I'm not sure there is a way in robots.txt to define for each use? Knowing the data in a structured way, but then allowing it to be displayed in full off-site is quite different than using the scraped data for linking into a website.
Feedback from webmasters is really helpful for us. We want to make sure we're making data available via API responsibly, so would love to hear your suggestions/ thoughts as we define a scalable solution.
The example doesn't seem to work right on Firefox. On Chrome, if I click "Character" in the table then it highlights the whole column and asks if I want to add the data in the column. On Firefox, clicking "Character" just highlights "Chatacter" and that is it.
Yeah, we've been working on this for a while too... took a while to polish it a bit before we could put it out there. Will check out exfiltrate.org - looks cool!
Nice tool, slick UI. It worked for some pages and not for others. Currently I'm using Feedity: http://feedity.com for all business-centric data extraction and it has been working great (although not as flexible as kimono).
Nice job! I really liked, it's a fantastic idea! And your UX is great! Just one thing I've found when testing: I've had some problems with non-ascii characters, when I was visiting brazilian websites, such as this : www.folha.com.br.
Well done on the product & solving a clear need! This is extremely useful for hackathons/prototyping. I also loved the live demo in the blog post and you did a wonderful job with the design/layout/colorscheme of the site.
We don't support logging in yet, but it's a feature we're working on adding. Scripting will also be cool, but it's right now further down our feature queue
I don't think this can beat the speed of a hand-tune crawler. When I write crawlers, I skip rendering page and javascript execution if it isn't needed, which massively speed up the crawling process.
There's definitely things that a custom-built scraper can do more efficiently than kimono, but our focus right now making scraping accessible across a broad enough range of web sites.
Really cool idea and tool. Still need to test this out properly. Is it possible to scrape note just one page but a stack of them? For example - a product catalog of 1000 SKUs extending upto 50pages.
Does such Webscraping is allowed legally. Since it is not done directly from our servers and if any legal action will be taken by the scraped website , will it be on kimonolabs..or the user..
really excited to see this. i've had the idea (and nearly this execution) in mind for years but no use or ambition to get it done.
given the pricing though i'm almost motivated to make my own. as a hosted service the fees make sense with the offerings. but not only would i rather host my own- it would be cheaper all around. would you consider adding a free or cheap self hosted option?
aside, i think there is a mislabel on the pricing page. i'm guessing the free plan should not have 3 times the "apis" than the lite plan.
I thought it was intentional. Swedish chef style. I like it, but I'll need to go back and re-read to understand how I can use this on other pages than the homepage/demo page. I've nothing immediate to try it with right now. :)
Edit: I'll watch the video after work, probably will clear everything up for me.
This looks really useful, and I'm trying to figure out if I could use it on a project I'm working on, but hitting an issue. I sent a support message. Nice job!
Looks awesome, however I keep getting errors and 404s. Could this be an issue on my end (seems to be working for others) or just HN making the servers beg for mercy?
Wow looks amazing. I tried doing some queries on public directories, and it even supports parameter passing. Will be using this for some side projects.
JS-heavy sites can be tricky. We position it so it should execute after most of the on-page JS, so it handles a lot of cases. There are still sites that break it though.... we're trying to tackle these guys one by one right now, as we try to generalize a broader solution
I'm not sure scraping itself is illegal, depending on what you're doing with the data. (Though it may be against a site's Terms of Use which may be binding. IANAL.)
I can tell you that on several occasions I've scraped commercial sites with the permission of the owner. They want me to have access to the data but don't have the time or ability to create a proper API.
Thanks... yes, public data from governments is a great use case. Often a lot of apps built using scrapers will wind up driving up traffic/ sales a the source site so it's okay. We want to do responsible web scraping, so will respect webmasters robots.txt files to make sure it's legal.
I love the execution, but I also see inherent problems.
Robots.txt is just a convention to advise crawlers. I'm confident most sites explicitly state this is against their terms of service.
You will encounter terms along the lines of:
"Unauthorized uses of the Site also include, without limitation, those listed below. You agree not to do any of the following, unless otherwise previously authorized by us in writing:
Use any robot, spider, scraper, other automatic device, or manual process to monitor, copy, or keep a database copy of the content or any portion of the Site."
The law isn't entirely blind to conventions, though. They don't guarantee anything, but if a court understood that there exists a convention for saying "no robots, please", and the robot operator in question followed it, then a court could well look less favorably on the damages claims of a website operator who didn't make use of the widely known convention.
You've got a valid point. We want to eventually create a space that allows responsible scraping - so webmasters can have access to analytics on what's being scraped and can explicitly turn off kimono APIs for their domains if they see fit. We also think there are use cases for people who own their own data. Often, APIs will provide a way for companies to streamline their internal app development and figure out what to expose to the developer community before investing in an expensive API deployment.
Thanks, for the suggestion. We're rolling out advanced mode soon, which will allow you to edit the CSS selectors and RegEx operating on the page's HTML to define the selected data elements
as someone building a home grown proprietary scraping engine. Consider alternative locations of elements. Most sites are using templating engines so its fairly reliable to find things in the same place, but more often than you might expect, things move a round ever so slightly. Navigation is a fun one also. ;)
So sorry for missing this earlier. See our response in comments below: "At the moment, we rely on users to be responsible. We spell it out in the terms and FAQ. We've been in private beta, keeping usage very limited until today. We fully understand the seriousness of the issue as we scale. We're committed to becoming a responsible bot that respects robots.txt"
Not yet, but it's our #1 feature request, so we're working on it now. For now, you can make multiple APIs (one for each URL). If the URL takes query parameters though, you can re-use the same API and programmatically cycle through query parameters.
Yes, in concept quite similar. We wanted to make something that you can use from within your browser as part of your natural workflow, without installing any other software.We also really wanted to figure out the right data association intelligently based on user selections vs. asking users to think through a data model up front
I went to try and use it on the demo page it provides, going through and adding things, but when I went to save it, I just received an error that something went wrong. Well, crap. That was a waste of time. Oh well, maybe it's just me.
Alright, I'll give it another shot using the website they used in the demo. Opened up a Hacker News discussion page and started to give it a try. Immediately it was far less intelligent than the demo. Clicking on a title proceeded to select basically every link on the page. Somehow I clicked on some empty spots as well. Nothing was being intelligently selected like it was in the demo. Fine, that wasn't working tremendously well, but I wanted to at least see the final result.
Same thing: just got an error that something went wrong and it couldn't save my work.
Disappointing. I still might try it again when it works 'cause it's a great idea if they really pulled it off. So far: doesn't seem to be the case.