Show HN: A Python Script to Download Thousands of Wallpapers at Once

slenk · on April 23, 2018

Side question - are there any websites like wallhaven, but with less people and anime? I'm thinking the type of content Chromecast uses, or /r/TechnologyPorn?

xemoka · on April 23, 2018

Have you seen http://interfacelift.com/ ?

umeshgmrl · on April 23, 2018

What about https://unsplash.com/ ?

kspy · on April 23, 2018

I enjoy https://papers.co

shubb · on April 23, 2018

You might want to async the downloading part. It is often faster to download 5 images at once than to download them one after another.

In the choices section you use input("text") properly one place but not others. You use a couple different ways of decoding codes, and the dict way is nicer, but also consistency is nice. Also, I am not sure you handle bad input (default all?)...

Personally, I'd pull the meat out of the loops in main() into functions - GetImageList() and GetImage(). Relatively complex there so it would be easier to read and spot errors in those bits of code in isolation.

ASalazarMX · on April 23, 2018

Unless you're in dire need of thousands of wallpapers (most of them you are going to delete anyway) it's better not to hammer the website. I'd even limit the download rate.

ejdanderson · on April 24, 2018

Agreed - I looked at the code hoping for a good example of async in python as suggested by the “at once” in the title.

jfz · on April 23, 2018

Using i and j in loops makes code much less self-explanatory, especially when you could use "page_index" and "image_index" instead.

lou1306 · on April 23, 2018

Even better, replacing

     for i in range(len(imgid)):

and similar lines with:

    for i, img in enumerate(imgid):

would allow one to get rid of all these list accessors.

myroon5 · on April 23, 2018

It looks like this downloads them one-by-one instead of downloading at the same time.

Is there a simple Python equivalent to Java's .parallelStream().forEach() that would allow these calls to easily be run in parallel?

mynewtb · on April 23, 2018

Sure thing...

    from multiprocessing import Pool
    with Pool(8) as p:
        p.map(function, sequence))

myroon5 · on April 23, 2018

Neat.

I see that you explicitly specify 8 as the number of processes here. In Java, parallelStream() will pick a sane default for you if you haven't previously specified (based on the number of available processors, I believe). Is something like that possible in Python?

jstarfish · on April 23, 2018

The ThreadPool class picks a sane default (number of cores), but I believe it uses Python threads instead of processes.

pddubs · on April 23, 2018

You could just use threads here too - CPython releases the GIL during IO.

  from concurrent.futures import ThreadPoolExecutor
  with ThreadPoolExecutor as executor:
      executor.map(function, sequence)

kpenc · on April 23, 2018

Every function has side effects. Could've written the whole thing in one function. Also, PEP8.

apetresc · on April 23, 2018

One function? Could've written the whole thing in one `wget` call.

pknerd · on April 23, 2018

teach me how.

apetresc · on April 23, 2018

Here's a start:

  for x in $(curl -s https://alpha.wallhaven.cc/random | pcregrep -o1 "https://alpha.wallhaven.cc/wallpaper/(\d+)" | sort | uniq) ; do wget "https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-$x.jpg" ; done

akx · on April 23, 2018

bash, curl, pcregrep, sort, uniq and wget is not what I'd call "one wget call".

slenk · on April 23, 2018

To the hardcore bash users, I think they call that 'easy'. I have someone like that on my team - holy crap some of the bash stuff they can come up with

lou1306 · on April 23, 2018

Hardcore bash sure sounds like fun, but when things start getting too big or messy I usually find a Python script with some `subprocess` [0] tricks to be way easier on the eye.

[0]: https://docs.python.org/3/library/subprocess.html

zo1 · on April 23, 2018

I hope none of that stuff makes it into your main codebase? If there is even a remote chance of requiring maintenance by someone other than the person that wrote it, that is. Which is probably 99.99% the chance, unless it's temporary/throwaway code.

shubb · on April 23, 2018

I guess wget has a useful spidering function that could probably page through the websites search results page, downloading all the preview and real images. You'd have to do the login bit as a different call and fetch the first search pages url yourself?

mynewtb · on April 23, 2018

Every code has side effects. Splitting code into appropriate functions makes those side effects much more managable.