WAV is almost always lossless with PCM data. I'm not sure where you got the impression that "you don't see that in the wild too often". Depending on what kind of analysis you need to having your audio at 8k is going to deem any results useless. I would have it minimum 16k and aim for 44.1k in order to preserve the top end which is where a large quantity of useful information is. The reason most sets are recorded in 8khz is that they are running MFCC's which are quite stubborn and insensitive to the high end anyway with most enough information for machine learning existing in the bottom end. If you're doing music, or environmental sounds you really need to preserve the other frequency bands.

