Worked successfully in Windows CMD for me, without using the \bin shell script:
C:\path-to-py27 reader_archive\reader_archive.py --output-directory C:\mystuff
I know next to nothing about Windows Command and python. Tried to apply your method and "path-to-py27" is not recognised.
Does it mean that i should put the path to the Python 27 program directory? in what form exactly (on my computer, it is at C:\Python27.
Or does it just mean the path to the directory where the app is already?
I tried without C:\path-to-py27, just typing "reader_archive\reader_archive.py --output-directory C:\mystuff" and got the following response: "Traceback (most recent call last:)
set PYTHON_HOME=C:\mihaip-readerisdead\reader_archive\reader_archive.py, line 12, in <module>
ImportError: No module named base.api
After that, you should be able to just run "reader_archive\reader_archive.py --output-directory C:\mystuff"
(and you should use PYTHONPATH in this case)
I didn't read the instructions too well, so the half hour I spent carefully deleting gigantic/uninteresting feeds out of my subscriptions.xml file was all for naught. Because I didn't know I needed to specify the opml_file on the command line, the script just logged into my Reader account (i.e., it walked me through the browser-based authorization process) and downloaded my subscriptions from there -- including all the gigantic/uninteresting subscriptions that I did NOT care to download.
So now I've gone and downloaded 2,592,159 items, consuming 13 GB of space.
I'm NOT complaining -- I actually think it's AWESOME that this is possible -- but if you don't want to download millions of items, be sure to read the instructions and use the opml_file directive.
My only gripe would be the tool's inability to continue after a partial run, but since I won't be using this more than once that's probably OK.
All web services should have a handy CLI extraction tool, preferably one that can be run from a CRON call. On that note, I'm very happy with gm_vault, as well.
Edit: getting a lot of XML parse errors, by the way.
If the XML parse errors are listing any item IDs, feel free to email them to me (mihai at persistent dot info) and I'll see if there's any workaround from my side.
Edit: If it's "XML parse error when fetching items, retrying with high-fidelity turned off" messages that you're seeing, then those are harmless (assuming no follow-up exceptions). The retry must have succeeded.
The tool uses the "high-fidelity" Atom output mode for getting at item bodies. That preserves namespaced XML elements and other extra data from the feed. It uses JSON for everything else, and will fall back to regular Atom output if the high fidelity mode is not well-formed (it was added in late 2010, as things were winding down, and thus never got a lot of testing).
Should we be concerned with errors like this?
[W 130629 03:11:54 api:254] Requested item id tag:google.com,2005:reader/item/afe90dad8acde78b (-5771066408489326709), but it was not found in the result
Is there some way to avoid all the years of explore and suggested items with reader archive? I tried limiting the maximum number of items to 10.000 but it was still running and growing after 12 hours. Interesting though, what it was able to accomplish in that time.
Thank you. mihaip, if you are ever in Houston I will buy you a beer/ and or a steak dinner.
echo %pythonpath% gives c:\readerisdead
I copied 'base' from the readerisdead zipfile to c:\python27\lib & also copied the base folder into the same folder as reader_archive.py
C:\readerisdead\reader_archive\reader_archive.py --output-directory C:\googlereader gives "ImportError: No module named site"
What am I doing wrong? How can I get this to work?