Hacker News new | past | comments | ask | show | jobs | submit login

This is common and just means that the rsync server your warrior tried to upload to was too busy. It'll retry and try another upload host if you leave it to do its thing.

Got my threads stuck on that as well. Increased concurrency now to the maximum of 6 so that those 60 seconds delays don't choke it completely, or at least less. Working reasonably so far, 5 threads waiting for the upload to complete, 1 still going. (Can't imagine that 1 thread continuously working will remain unbanned 24/7 anyway.)

It also happens every time it retries, so it spends most of the time doing nothing.

    Retrying after 60 seconds...
Is that really normal?

The downloads from reddit work fine, but if the upload doesn't work then I don't see the point of running this.

Yes, it is really normal when lots of people try to upload at the same time. Bandwidth is limited, so when lots of people start to run the warrior, the servers need some space to do their thing. Also, IA has limited bandwidth, so sometimes that's the bottleneck too.

If you give it time, it'll work eventually. Up the concurrency to max, so you can have more items in the upload state, as long as you don't start hitting rate limits from reddit, it'll be fine.

The point of having many people run it is to maximise the number of different IP addresses scraping the data.

Even if you are only using a small percentage of your available bandwidth you are still helping out by running this.

If they attempted to max out the download bandwidth of all clients they’d only end up getting everyone IP banned by Reddit, and then the scraping would not be successful.

So even if most time is waiting to upload the scraped data, it’s still good.

Slow and steady.

Please allow it time to catch up. It always does.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
