

Ask HN: Any open source alternative to element selection on a webpage within a page? - paraschopra

I know the title sounds confusing, but I wasn't able to put it with more clarity.<p>Anyhow here is what I am looking for: a (preferably open source) solution such as the one implemented in 'Select Content' step of http://www.dapper.net/dapp-factory.jsp<p>Dapper lets you select elements of a webpage inside the browser and do stuff with it. I was wondering how they do it and whether any of you have implemented similar functionality?
======
Scriptor
I know exactly what you're talking about and have done a similar thing in one
of my own projects. Basically, you need an iframe to contain the HTML, and
fetch the actual page through the server. If you load the page directly though
the iframe, Javascript will not be able to manipulate its contents.

I've posted a Pastie of the Javascript code here: <http://pastie.org/369649>.
It does use jQuery though. The w variable points to the window object of the
external page that's loaded, so you can pass it around to other functions.
Note that 'pageframe' is the ID of the iframe element.

As for the server side code, I used Django, with Python's urllib2 to load the
page and BeautifulSoup to turn relative urls into absolute urls. If you end up
using BeautifulSoup, note that even though it's supposed to handle invalid
HTML, there are some pages that still won't work because Python's HTMLParser
library isn't robust enough.

------
Fenn
You want crowbar + solvent: <http://simile.mit.edu/wiki/Crowbar>
<http://simile.mit.edu/wiki/Solvent>

Basically from an opensource MIT project to do very close to what you're
trying to do.

Alternately, you could contact YC startup <http://www.awesomehighlighter.com/>
for advice perhaps (which does EXACTLY what you're talking about).

------
euroclydon
Is something like this what you are talking about:
<http://www.crummy.com/software/BeautifulSoup/>

~~~
paraschopra
BeautifulSoup is for parsing webpages, so it could be used but I think it
would be an incomplete solution. You would still have to make a wrapper to
display the parsed page.

