
Cloud Filestore, high-performance file storage for GCP users - theDoug
https://cloudplatform.googleblog.com/2018/06/New-Cloud-Filestore-service-brings-GCP-users-high-performance-file-storage.html
======
mleonard
A few questions come to mind after reading the docs (minus the currently
404ing ones): (1) is this a zonal or regional product? ie does it replicate
data across zones in a region? (2) roughly what latencies should I expect to
see? eg for the following: a 1KG, 1MB, 1GB read and write. Any info here would
be helpful. (3) does it have close-to-open consistency? or something weaker /
stronger? (4) any plans to add gcp pub/sub integration somehow? would be great
to be able to subscribe to changes like with other gcp storage products. (5)
any plans to move from NFSv3 to NFSv4? (6) backups available or planned as a
feature? eg to GCS (7) can you share anything about how it is implemented?

Thanks

~~~
jabl
Yeah, considering NFSv4 was published almost 20 years ago (December 2000), not
supporting it is kinda disappointing.

~~~
kjeetgill
I'm unfamiliar, what does NFSv4 add that Cloud Filestore users will be missing
out on?

~~~
jabl
A good place to start is to read the introduction chapter of the NFSv4.0 RFC:
[https://tools.ietf.org/html/rfc7530](https://tools.ietf.org/html/rfc7530)

Further minor versions add other things that could be useful for cloud usage,
e.g. in NFSv4.1
([https://tools.ietf.org/html/rfc5661](https://tools.ietf.org/html/rfc5661) )
parallel data access, sessions, improved delegations, and in NFSv4.2
([https://tools.ietf.org/html/rfc7862](https://tools.ietf.org/html/rfc7862) )
server-side clone/copy, sparse file support.

------
_wmd
They've undercut Amazon EFS by 33% for the budget option ($.20/gb vs.
$.30/gb). Hope this kicks off a little pricing war - shared filesystem is a
hugely useful solution for migrating old apps, would be nice if it were cheap.
Both options are still grossly overpriced (IMHO)

~~~
panarky
Amazon EFS bills for usage while Google bills for provisioned capacity.

So unless you run GCP at a very high % of provisioned capacity, it's more
expensive.

I'm pretty confident EFS can't match GCP's 700 MB/s and 30,000 IOPS though.

~~~
halbritt
I did a fair number of tests last year and couldn't get anything like that. If
anyone is interested, the poorly formatted output of that is here:

[https://github.com/halbritt/benchmarks](https://github.com/halbritt/benchmarks)

~~~
tjenkinsqs
Can you please elaborate a little? EFS scales with size, i.e. did you have at
least a few hundred GB on there or was it mostly empty? Roughly how many IOPS
did you get?

~~~
halbritt
IIRC, I populated the thing with 10TB or thereabouts.

For small block sizes, I got hundreds of IOPS.

Latency was pretty terrible in almost all cases.

I wrote a great deal on this topic for internal consumption. Those benchmarks
weren't really meant for the general public. My biggest conclusion is that EFS
isn't useful for any workload where performance is a concern. Unfortunately,
it's priced so high, that I'd never consider using it otherwise.

------
mixmastamyk
Very happy that GCP is opening in LA, however pushing the VFX angle is
interesting considering the bulk of that work has moved overseas due to tax
credits. Sent the link to a friend at Dreamworks that still has a sizable
amount of work in Glendale.

Will also be interesting to find out how they are solving/pricing the immense
bandwidth/storage requirements needed to make such work practical.

~~~
manigandham
They had a big event about this for the media/vfx industry today:
[https://cloudplatformonline.com/LA-Cloud-Region-Launch-
Media...](https://cloudplatformonline.com/LA-Cloud-Region-Launch-Media-
Home.html)

The data all lives in the cloud, only the models and specs are sent back and
forth. Zync is their rendering tech:
[https://www.zyncrender.com/](https://www.zyncrender.com/)

------
manigandham
This is great for persistent storage on Kubernetes clusters. Storage has been
complicated so far and while there are external solutions for pooling and
replicating locally attached disks, this is much nicer and simpler for most
uses.

------
paulsutter
As convenient as this seems, there are good reason that many shops prohibit
the use of NFS in production.

There’s nothing to prevent any application from doing full error checks on
every disk i/o call, dealing with timeouts, etc. Except that nobody wrote that
stuff into software designed for local disk.

~~~
tannhaeuser
> _As convenient as this seems, there are good reason that many shops prohibit
> the use of NFS in production._

Such as?

~~~
yjftsjthsd-h
Reliability issues, IME.

------
shoo
At $enterprise-dayjob we've ended up using EFS in AWS after migrating an
application that was using NFS. One of the main obstacles to adopting this was
perceived lack of security as EFS didn't support encryption of NFS connections
between EC2 instances and EFS mount targets. AWS EFS released support for
encrypting the NFS connections earlier this year, using Stunnel.

What's the corresponding story re security for cloud filestore?

~~~
mlrtime
How did you get around the terrible performance of EFS. Are you not
read/writing lots of small files?

~~~
brazzledazzle
NFS in general seems to struggle with that scenario. Do you get better
performance with your own NFS server with that type of workload?

~~~
X-Istence
A NetApp filer on 40 Gbit/sec will do NFS plenty fast for running VMs directly
to/from the system using ESXi for example.

EFS is not even in the same ballpark in terms of speed :/

~~~
brazzledazzle
Right, I’ve had that setup in the past and I wouldn’t disagree with that but
I’ve also seen a netapp choke on lots of tiny files over NFS.

------
lazharichir
The price is more attractive (read cheaper) than AWS, but it is still on the
pricey side considering the storage sizes people work with these days.

Using persistent storage with Google sets me back a little over $175 per TB of
SSD, per month, without networking factored in.

At $0.20, time a thousand for a TB, Cloud Filestore comes to $200. Let's see
how the performance goes.

~~~
gigatexal
Spin up a huge render job, use the perf of the storage, and then nuke the
storage as you have a final product? That's probably what they're getting at.

------
stemuk
This seems like a really unfortunate name choice to me. When I first read the
title I misread it as Cloud Firestore, which just has one letter different to
Cloud Filestore.

~~~
rishav_sharan
Aren't these the same? Cloudstore and Firestore are the same product with
different branding.

~~~
manigandham
...no, first of all there is no such thing as Cloudstore.

Cloud FIREstore is a document-store database for Firebase, which is the GCP
suite of services primarily for mobile apps.

Cloud FILEstore is a NFS file system mountable across multiple compute engine
VMs.

------
theDoug
Product page:
[https://cloud.google.com/filestore/](https://cloud.google.com/filestore/)

(disclosure: I work in GCP, but not on this product. Just happy to see it go
public for more users.)

~~~
khc
Is it intentional that on the page, there's a list of Partner Solutions that
are essentially a competitor of Filestore?

~~~
vishwajeetv
I think these are the solutions that GCP Partners have created USING
Filestore.

~~~
manigandham
No, they are alternatives that have existed as solutions before Filestore was
created. They still have unique features, better performance and other
advantages if you need them.

------
cowmix
The performance of EFS is horrible. I hope this is better. On paper, it seems
more performant.

------
chrisprobert
I'm curious whether this is backed by Google File System
([https://static.googleusercontent.com/media/research.google.c...](https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-
sosp2003.pdf)), or something else.

~~~
ebikelaw
Nothing at Google has been backed by GFS in many years.
[https://www.wired.com/2012/07/google-
colossus/](https://www.wired.com/2012/07/google-colossus/)

~~~
puzzle
Well, the poster probably meant whatever GFS incarnation is around these days
(Colossus is v2 or v3, depending on how you look at things). In the end,
almost everything at Google is backed by "GFS": Bigtable, Blobstore, GCS,
Spanner, Megastore, etc. There's little else that is not backed by GFS, but
talks to D directly. At least that has been mentioned in public, of course.
Still, none of it is user facing/serving, though.

------
chippy
Hows it compare (pricing / performance) to using Cloud Storage, for non web
accessible files.

Example use case, screenshots of websites where instances both request and
write / overwrite them.

~~~
manigandham
This is a product that gives you a file system, like attaching a disk, but
over NFS so it's available for multiple servers to mount over the network.

If you don't actually need a disk-based file system and just want to
read/write individual files as objects, then object storage like Cloud Storage
is your best option.

~~~
namibj
Specifically, it depends on whether you need random writes to files and/or
(some) file locking semantics/consistency.

------
stevekemp
Even google's blog is prone to spam-comments!

------
merb
wow the premium filestore is as good as a single ssd (replicated). that's
really cool to use for mysql/postgres databases on gke, way cheaper than using
GCP SQL if you already have a GKE cluster.

~~~
thesandlord
Whats the advantage of using NFS instead of just attaching a Persistent Disk
to a StatefulSet? Wouldn't that be a lot cheaper?

~~~
manigandham
NFS is much faster to attach than PD to a GCE node, which can take minutes at
times and lead to crash loops waiting for the storage to be ready. This is
especially problematic if the disk needs to be moved from one node to another
for some reason.

~~~
toredash
Isn't running databases on NFS, really bad?

~~~
manigandham
Depends. There's nothing wrong with NFS itself, although v4 is better
obviously. There's a performance trade-off with going over the network and
sharing the storage with multiple servers, along with the necessary metadata
for each file, but if you don't have much contention then it's just like a
drive with higher latency.

If that latency is low and your workload can handle disk concurrency well then
it works fine. It helps if you use (or configure) a database with more
sequential access and buffering for large updates rather than lots of random
small writes, as well as spreading the data over several disks.

AWS EFS has latency problems which make it problematic but this product seems
to have better performance profile which could work well.

~~~
toredash
I get the performance issue but I'm more concerned about data integrity.

I did a quick lookup in the MySQL docs
([https://dev.mysql.com/doc/refman/8.0/en/disk-
issues.html](https://dev.mysql.com/doc/refman/8.0/en/disk-issues.html)) and
was surprised that this isn't really an issue.

Learned something new, thanks!

~~~
manigandham
Databases all use some kind of write-ahead logging so you'll be safe as long
as that file is safe, and they're even capable of recovering the file all the
way up to any corrupt records that may have been appended at the end.

You shouldn't use multiple servers writing to the same volume for database
drives, but other than that it's no different than any other disk that might
lose connection. Most VMs "local" disks are still attached over the network
anyway, emulating a PCIe bus interface instead of NFS.

------
evancox100
This is also what EDA/ASIC design tools need to move to the cloud.

~~~
madspindel
What is EDA/ASIC design tools? AutoCAD?

~~~
namibj
No, but exotic software people use to design integrated circuits, like a WiFi
modem, or a Bitcoin miner. Autodesk, the creator of AutoCAD, sells Eagle, an
EDA software. It does not go lower than you can go with a soldering iron,
except for allowing solder joints that are better created with hot air /
infrared heating, and fancy multi-layer circuit boards with components on the
inside.

------
oavdeev
So now they have two cloud storage products, called Cloud Firestore and Cloud
Filestore? Their branding team is on top of their game, as always.

~~~
deesix
Disclosure: I work on GCP

Thanks for the feedback. As the person that named both products, I can say we
spent a ton of time debating this but we felt that the fact one is an
enterprise file share and the other a document database service focused on
mobile and web would mean very little conflict for customers. We will keep an
eye on any customer confusion it might cause.

~~~
ttul
This will be extremely confusing for Japanese and Korean speakers.

~~~
yongjik
Actually, not that much for Korean speakers, because "file" becomes "pa-il"
and "fire" becomes "pa-i-eo". (Damn English triphthongs...)

Hopefully they won't launch Cloud Pyrestore any time soon...

~~~
lioeters
Similarly for Japanese, "file" is pronounced/written as "fairu" and "fire" as
"faiyaa". If the service docs are translated, then they would look like (and
sound as) difference names.

------
gigatexal
premium tier iops is a fixed perf vs the per GB iops scaling of AWS, nice.

------
ebikelaw
"Typical" availability number is both low and wishy-washy.

------
danra
Given Google's history, is there any good reason to believe this service would
still be supported by Google in a few years and not be replaced by yet another
new-and-of-course-much-better-than-the-old-one iteration?

~~~
numbsafari
It's currently in Beta, which means it will hopefully eventually go GA. As
with any product from any company ever, just because it's in beta doesn't mean
it necessary ever goes GA.

In the case of Google Cloud Platform products, many of them [1] are subject to
the deprecation policy [2]. Basically it states that they'll give you one year
advance notice of any intent to deprecate those products. This is functionally
the exact same policy as that offered by AWS [3].

[1]
[https://cloud.google.com/terms/deprecation](https://cloud.google.com/terms/deprecation)

[2] [https://cloud.google.com/terms/](https://cloud.google.com/terms/) (see
Section 7)

[3]
[https://aws.amazon.com/agreement/#2._Changes](https://aws.amazon.com/agreement/#2._Changes).

