Keep in mind, S3 "fails" all the time. We regularly make millions of S3 requests at my work. Usually we get 1:240K failure rate (mostly GETs), returning 500 errors. However, if you're really hammering an S3 node in the hash ring (e.g. Spark job), we see failures in the 1/10K range, including SocketExceptions, where the routed IP is dead.
You need to always expect such services to die in your code, setting the proper timeouts, backoffs, retries, queues, and dead letter queues.
Sometimes it's a 404 for an object written 1 sec prior, other times it's an S3 node that died mid request. Retry gets you to a different node.