Great work! I have some questions if you don't mind.
I kind of wonder why you chose XML, and the page doesn't really elaborate on that much. Wouldn't CSV be much smaller (XML is pretty noisy format, though the 7z can probably take care of that) while at the same time more searchble? You can search CSV with just grep, with XML it gets little more difficult, and CSV is also more simple to load to a database in case someone wanted to make TPB web mirror or something.
EDIT: sorry if I seem annoying :) but I had something like this on todo-list for a while, and you did my work for me, so I kind of wonder why you made different decisions.
- I was saving the comments, too, so right now, a torrent element has comment elements as sub-elements; in CSV would probably need to be two tables/files instead of one, which would get a little more complicated
- I didn't want to think so much about escaping newlines (that are in the comments and infos) and the delimiters, right now I was only escaping < to < and > to >.
- It was easier to check whether the script is working correctly or not (probably the top reason :) ).
- I thought parsing XML would be easier than parsing some other format, since there are tools already available for that.
But well, if somebody will really want CSV version he can easily transfer that from XML...
"I didn't want to think so much about escaping newlines (that are in the comments and infos) and the delimiters, right now I was only escaping < to < and > to >."
Then at the very least you need & -> & too.
Alternatively, wrap the whole comment in CDATA, though don't forget to replace ]]> with ]]>]]<<!CDATA[ or something like that so you don't get spurious CDATA closures. (There may not be any in there now, but there will be once people hear about you doing this...)
I think it will only go down if the people keeping it running stop bothering.
They have stable network access covered via the Swedish pirate party. This party (which have two of Sweden's twenty seats in the European Parliament) have by doing solid work won a lot of legitimacy in media over the past two years or so, I think.
I kind of wonder why you chose XML, and the page doesn't really elaborate on that much. Wouldn't CSV be much smaller (XML is pretty noisy format, though the 7z can probably take care of that) while at the same time more searchble? You can search CSV with just grep, with XML it gets little more difficult, and CSV is also more simple to load to a database in case someone wanted to make TPB web mirror or something.
EDIT: sorry if I seem annoying :) but I had something like this on todo-list for a while, and you did my work for me, so I kind of wonder why you made different decisions.