I'm not sure. There are a couple of existing services that do that already, and ...

Toast_ · on May 29, 2017

>I remember yahoo pipes supported it, that's a plus.

Yup, pretty much why I used pipes to begin with. I think being able to scrape/manipulate/output data, while being able to keep it private, would be a fantastic service. Looks good so far!

onli · on June 3, 2017

I added a block to download a page (instead of a feed), another block for extracting content via css selector or xpath, and a feed builder block to later combine those (but the extract block already creates a feed one could use as pipe output). If more is needed please say so, I am now convinced this fits well to this page.

Toast_ · on June 13, 2017

Yo, good stuff on the added features. I'm currently using Huginn to scrape data, use a portion of that data to format a post request, and then combine the results of the post request with the scraped data, finally output as rss. Maybe some features to consider: ability to format get/post/put/delete request, ability to correctly (in order) merge objects (I haven't had the chance to try out your merge block yet). The merge, in my opinion, would be the biggest consideration as I have to use a custom agent to merge my Huginn events, and it's really a pain. Great start to the service man, keep up the good work!

edit: I just tried the download agent on a site (http://www.plndr.com) and it's throwing parse errors, and clicking the [x] won't close the output box, but the red portion works.

onli · on June 13, 2017

Thanks :)

I now understand the issue with the red portion and the [x]. That will be fixed soon.

For the page, there was a bug with get params, those killed the output inspector. I fixed those now, it should be better able to fetch pages like http://www.plndr.com/product/browse?a=34714&catId=0&version=.... I was able to extract the product names from there in an example page, just a download block and an extract block selecting `.product-cell .product-title`. If you still have problems, would you please comment again, open a bug on https://github.com/pipes-digital/pipes/issues or send me a mail? Kind of crucial to iron the kinks out.

The parse errors are annoying, but I failed silencing them so far. The XML parser is throwing them regardless of try-catch, I don't know why. But they will be just ignored later on: 'View output' should show the pure html (instead of parsed and highlighted XML) instead. That seemed to work fine so far (but might fail in a different browser than those tested...).

Toast_ · on June 14, 2017

Sure, I'd be happy to help. On viewing the html, I'm not sure what software stack you're using, but maybe check out the riko[1] library, which was recommended in a parent comment.

[1] https://github.com/nerevu/riko

onli · on June 14, 2017

Pipes is a Ruby stack, with sinatra at its core. But viewing the HTML is actually all client side, meaning javascript. When the block output view gets more complicated I'll probably move it to the server.

For now I added some code to detect the different formatted parse errors in webkit browsers, maybe that catches also yours?

But I'm not very happy with just showing the HTML (though it goes through a formatter at least) in that error case. In the long term the extract block should get a visual element picker to create selectors, then it will matter less, but that's something for later.