
Show HN: Amazon S3 API Support for Backblaze B2 Cloud Storage Service - y4m4b4
https://blog.minio.io/experimental-amazon-s3-api-support-for-backblaze-b2-cloud-storage-service-685e0f35a6d7
======
krullie
I've never really been able to figure out what a good strategy is for object
storage organization. Do you create a bucket per application instance? user?
organization?

Right now I'm playing with a new service and came up with this which is
probably over engineerd:

Here is an actual object key including the bucket:
7dcdb229600e4467a2714866e0d406f6/85/26c/c0271374067b5db832adb7909a7/bbda55db15266f7ce2284d8f5f66fc85e495e2b12265ef87537237ad5e2658b24c081970332417f60e5fc352ae9b8c1031398c02ecde03eb29af2d3c8eda8a4b/y18.gif

Given the file's uuid is aabbbcccccccccccccccccccccccccccccc

for original images: {{organizations_uuid as
bucket}}/aa/bbb/cccccccccccccccccccccccccccccc/{{sha512sum}}/{{originalfilename}}

And for all derivatives of it: {{organizations_uuid as
bucket}}/aa/bbb/cccccccccccccccccccccccccccccc/derived/{{this files
uuid}}_{{filename}}

My thinking was that:

\- using the organizations' uuid (which can have multiple users) as a bucket
makes backing up per organization and having on prem deployments easier.

\- Encoding the file's uuid in the object name can identify it easily and by
splitting that uuid up in 2/3/rest would help with spreading of objects.

\- Encoding the file's sha512sum in the key name would enable checking that
file's integrity even without a database.

\- putting all derived files under derived but with the original file's uuid
prefix makes the link between them clear.

I know this will result in long object names as shown above in the actual
example but it does include quite some information. What parts of this is
considered bad practice? Do you have any real world examples for other
strategies? They seem hard to come by.

~~~
aaronds
Perhaps I'm missing something about your use case, but I only create buckets
per application, or sometimes file category (videos, profile images,
whatever).

Trivial example following your org/user pattern:

my_app/profile_images/org_id/user_id/aabbccccccc.jpeg

I then obviously have a reference to that file in my db.

~~~
krullie
I don't have any other real use case for bucket per org other than easy bucket
mirroring, backup and maybe migration from shared hosting to on premises.

I didn't think of using different prefixes for different media usages. We for
example would then use thumbnail/originating_file_uuid.png and
poster/originating_file_uuid.png.

How would you handle uploaded media then?

~~~
aaronds
Something to the effect of my_app/profile_images/org_id/user_id/uuid_thumb.png
uuid_large.png uuid_original.png can work.

Then in your app, you can have a way of specifying which version of the file
you'd like to reference, for example:

`user.avatar.large -> '<path>/uuid_large.png'`

Not sure if that helps?

~~~
krullie
The problem with that is that the originally uploaded filename is lost. At
least without storing it in a separate database.

I could shorten the key by moving the sha512sum from the url to a CHECKSUM
file.

org/file_uuid/original/original_filename.png CHECKSUM
org/file_uuid/thumb/160x90.png 48x48.png org/file_uuid/poster/1k.png 2k.png
org/file_uuid/other/

~~~
tedmiston
> The problem with that is that the originally uploaded filename is lost. At
> least without storing it in a separate database.

Sure, but that's a tradeoff nearly every website accepts because they just
need the image itself. If you do want to preserve the original filename, is
there a reason for not just keeping it in a database?

~~~
krullie
I'd like to have these systems as decoupled as possible or at least have some
meaningful information without a dependency on an external datastore. This
might be just me being paranoid and overthinking it but after having to deal
with a nasty monolith of an application for the last couple of years and
finally convincing the rest that we need to change if we want to be able to
expand I want to do it right.

------
therealmarv
This is something Backblaze should provide on their side. But kudos and thanks
minio for the work. Really impressive!

------
homero
Gorgeous, when I first beta tested it, I told them to make it s3 compatible
but they convinced me it wasn't the best idea. I still want it though.

------
anoother
First time I've heard of B2. Makes you wonder why it's not S3-compatible in
the first place...

~~~
jjeaff
According to them, it is to keep their cost low. The b2 API requires a call to
get an endpoint before you make the upload.

S3 just uses one endpoint and proxies the upload where it wants to go. B2
saves money by not proxying everything coming in.

------
microcolonel
Let's hope Amazon doesn't start doing something anticompetitive with the
network routing to B2 from EC2, and instead competes on price.

~~~
ComputerGuru
What’s the point of an answer like that? Does Amazon have a history of
sabotaging network links between, say, Google’s or MSFT’s networks and their
own? Or is this just an attempt at being funny?

~~~
jjeaff
They have a history of sabotaging anyone who competes with them, though not
with AWS.

For example, they banned all Chromecast sales from amazon.com after they
launched the firetv.

