
Ask HN: Why does S3 still not support append? - whatnotests
Is there some technical reason why it&#x27;s simply infeasible? Is there some fundamental architectural decision laid down years ago that prevents &quot;append&quot; operations?<p>If the Google version of S3 supports append, why not?
======
codemac
Google Cloud Storage does not support append.

One reason these things don't support append is at some point they need to
choose the "version" of the object. Usually this is done when an object upload
has completed.

If they allow arbitrary appends to objects, then they would have a hard time
assigning any type of ordering to them, as the concept of an object being
"complete" would be thrown out the window.

(EDIT: and what does it mean to have a GET on an object, if you don't know the
latest version to return?)

I think something like this could be implemented, but it would probably be an
entirely different product that supported some specific traditional file
operations (rename, ftruncate, link, etc) but had different scaling
properties.

------
abuqutaita
FWIW, Azure supports Append Blobs:
[http://blogs.msdn.com/b/windowsazurestorage/archive/2015/04/...](http://blogs.msdn.com/b/windowsazurestorage/archive/2015/04/13/introducing-
azure-storage-append-blob.aspx)

~~~
victorNicollet
The ability to append to a block already existed with block blobs, which are a
bag of data blocks and an ordered list of block identifiers: you could just
create a new block, then commit a new list with its identifier at the end.

The real benefit of the new append blob is that you have a one-request append
(instead of read list, upload block, commit list).

Also, append blobs (like block blobs) are limited to 50000 append operations.

------
jedberg
S3 is a key/value store. Appends don't make sense in that context. If you
think of it as a key/value store, then a lot of their constraints start to
make more sense.

------
mailslot
Google Cloud Storage does not support append. Their docs: "... you cannot make
incremental changes to objects, such as append operations or truncate
operations."

~~~
whatnotests
Oops my mistake. I coulda swore.

Still wishing.

~~~
tacos
Azure recently added it for logging scenarios.
[https://azure.microsoft.com/en-us/blog/azure-storage-
release...](https://azure.microsoft.com/en-us/blog/azure-storage-release-
append-blob-new-azure-file-service-features-and-client-side-encryption-ga/)

~~~
victorNicollet
Azure already had append support for block blobs (i.e. you could upload new
data to an existing blob without having to upload the entire blob). But unlike
S3, Azure Storage tends to favor features (and consistency) over performance.

------
ChuckMcM
Well answered by other comments here, the whole "eventually consistent blob"
vs "directed graph of mutation operations" problem. FWIW, this is a good
distributed systems interview question :-)

~~~
bm1362
You can really do either with S3- it's just an eventually consistent,
immutable kv.

A) Keep a consistent manifest of chunk range to keys. B) Keep a ordered list
of keys that represent the DAG.

In case A, you'll be able to assemble your blob in parallel even.

------
EwanToo
S3 is eventually consistent, appending an eventually consistent file is going
to get very messy, very fast - what happens when an append reaches a replica
node before an earlier one does?

If you're happy with out of order appends, just use a container file format
like Parquet where appends are actually additional file creations

------
bpchaps
After a decently large RAID failure, I needed to gzip and send as many large
files and send it over to S3 as quickly as possible on the risk of another
failure. The script would gzip the file and then sync it up to s3, all in its
own backgrounded processes. If two large files would get sent at the same
time, both would die, then /leave incomplete files/.

After leaving that running over night, all of the files appeared to be
uploaded... until the owner of the company needed to use them.

I'm still not sure if that's an exceptional use case, but it left a pretty bad
taste in my mouth about S3 ever since.

~~~
donavanm
It sounds like you were missing the "Content-MD5" header on your put requests.
As i recall S3 will return an HTTP error response if the complete object does
not match the Content-MD5 the client sends. The other issue with the HTTP
protocol is that the request body doesnt have a mandatory delimiter. The
client/server cant really distinguish between a terminated TCP connection and
a complete HTTP body without the optional Content-Length/Content-MD5 headers.
It really sounds like one or more of your latge files were timing out
somewhere and the checksum was t sent.

------
gonyea
Because reconciling 2 separate appends to 2 separate nodes which have
different copies of the data would be a huge mess.

------
yid
S3 is more of a simple key-value store than a full filesystem (and for good
reason). I suspect the reason their docs push the filesystem metaphor so much
is because filesystems are more familiar to many people, and _most_ filesystem
semantics can be implemented using a key-value store. In that sense, there is
no update() or append() in S3, just a simple set().

~~~
dheera
Also, because AWS doesn't provide a decent networked file system where
multiple instances can simultaneously mount the same volume at the same time
in read/write mode. S3 is as close as one can get, in many cases.

~~~
objectivefs
You can use our ObjectiveFS[1] if you want a networked file system where
multiple instances can simultaneously mount it read/write. It is backed by S3
and gives you a standard POSIX interface.

[1] [https://objectivefs.com](https://objectivefs.com)

~~~
dheera
Thanks, interesting. Can you comment on how it compares to s3fs? I've used
s3fs and it can sometimes be buggy (as in, to the point where files get
clobbered and corrupted) and slow (especially in listing directories with a
large number of files). Does ObjectiveFS solve these issues, and do you have
any reliability statistics?

------
difosfor
I mostly miss a MoveObject operation to rename files myself, but I guess they
are keeping things simple and scalable etc. on their end and requiring us to
work around it with the existing lower level operations.

~~~
nickcw
You can use a server side copy then a delete to rename an object reasonably
efficiently. I guess that is what you mean by using existing lower level
operations, but if you don't you might find that helpful!

------
codeonfire
I don't think it has a traditional filesystem. It probably just writes all
puts sequentially as fast as possible and stores the location and then
replicates. The easiest way to append would be to read the object, append, and
then write to a new object. If they did that internally there would be no
transfer out and no revenue although they could probably charge for the
internal expense. Another reason is that people would probably think that
appends are no big deal and try to append continuously to multi-gigabyte
files. If this is the case then it is best to let the client handle appends
where costs are out in the open.

~~~
whatnotests
All excellent points, especially about cost.

I've considered "faking" the append functionality by making a new file per
append action, then performing a periodic compaction.

Even compaction-via-combine-and-delete-old is clunky.

    
    
        aws s3 combine --target s3://bucket-name/output-file.txt \
          s3://bucket-name/input-file-1.txt \
          ... \
          s3://bucket-name/input-file-n.txt
    

I, for one, would pay extra for that.

~~~
benjiweber
The lack of read-after-delete consistency makes this tricky.

[https://aws.amazon.com/s3/faqs#What_data_consistency_model_d...](https://aws.amazon.com/s3/faqs#What_data_consistency_model_does_Amazon_S3_employ)

I've seen "eventually" consistent mean up to 24hrs in the face of problems.
Several minutes seems common when versioning/bucket replication is enabled.

~~~
mikgan
I can second that. Personally I'd like to see them support symbolic links so
version controlling and rolling deployments of static websites becomes a
little easier.

------
jheriko
You mean appending things on the end of files or what? If so, probably because
its trivial to work around by storing the data in new files - and large data
where this would be valuable should be broken down into pieces for n different
reasons anyway.

Why do people who ask questions fall slightly short of providing enough
information to meaningfully answer them?

