If the Google version of S3 supports append, why not?
One reason these things don't support append is at some point they need to choose the "version" of the object. Usually this is done when an object upload has completed.
If they allow arbitrary appends to objects, then they would have a hard time assigning any type of ordering to them, as the concept of an object being "complete" would be thrown out the window.
(EDIT: and what does it mean to have a GET on an object, if you don't know the latest version to return?)
I think something like this could be implemented, but it would probably be an entirely different product that supported some specific traditional file operations (rename, ftruncate, link, etc) but had different scaling properties.
The real benefit of the new append blob is that you have a one-request append (instead of read list, upload block, commit list).
Also, append blobs (like block blobs) are limited to 50000 append operations.
A) Keep a consistent manifest of chunk range to keys.
B) Keep a ordered list of keys that represent the DAG.
In case A, you'll be able to assemble your blob in parallel even.
If you're happy with out of order appends, just use a container file format like Parquet where appends are actually additional file creations
After leaving that running over night, all of the files appeared to be uploaded... until the owner of the company needed to use them.
I'm still not sure if that's an exceptional use case, but it left a pretty bad taste in my mouth about S3 ever since.
I've considered "faking" the append functionality by making a new file per append action, then performing a periodic compaction.
Even compaction-via-combine-and-delete-old is clunky.
aws s3 combine --target s3://bucket-name/output-file.txt \
I've seen "eventually" consistent mean up to 24hrs in the face of problems. Several minutes seems common when versioning/bucket replication is enabled.
1. Start a multipart object upload
2. Issue "Upload Part - Copy" requests for each part of an object ( http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPartCopy.html )
3. Complete the multipart object upload
1. Enable bucket versioning
2. Download the parsed S3 objects
3. Start a multipart object upload to S3 with the specified target object as the object name
4. Reupload the parsed S3 objects as parts of a single multipart object upload
5. Delete the previous parsed objects once the multipart object upload is complete (a delete marker should be added to the top of the version stack, but the previously stored version should still be available if you specify its handle/version id).
Why do people who ask questions fall slightly short of providing enough information to meaningfully answer them?