I've got heaps of experience with both, and I cringe every time I have to touch the phantomjs API.
It feels like a half-assed imitation of node's, and for the most part isn't even internally consistent. For example, you can render to a file, or to a Base64 string. But heavens no, you can't render to stdout -- the file type is decided by the file name, so /dev/stdout is out of the question and the only workaround is making a pointless symlink. That's not to mention the showstopper bugs with it being literally impossible to exit() the process from inside a script in certain cases.
Don't get me wrong; phantom is awesome and it's great at what it does. But it's not "a dream compared to Node.js".
Or you could write a script that scrapes the site and pulls the content out automatically. That will probably save you time right off the bat. And if the client realizes that they want another piece of information pulled from each page, you just make a minor tweak to your script and rerun it.
Once you remove the requirement for remote, unauthroized access, then every data transformation process become "scraping".
That's not a "problem", you shouldn't be using Webkit to download files.