There is something i don‘t understand about Firestore: If you use the web client your javascript code is directly communicating with the firestore database, you have no control about some user specific limits. How do you prevent some rogue actor from effectively doing billions of reads or writes.
With server side logic you can implement rate limits but not with a fire store web client. The only thing you can do is to limit your monthly budget? So one bad guy can burn all your money for a month without you having any possibility of limiting it?
In these kind of systems, your app users also become database users and effectively each visitor gets their own token to access the database.
You then add rules to limit users so they can only read/write their own data. This works fine for many simpler scenarios and avoids the need for a backend completely, with the potential risk for overuse.
If you need more control and protection then you should have a backend layer to get the data server-side.
By "In these kind of systems, your app users also become database users", do you mean by RBAC or some other session-based control on the backend, or does Firestore perform some intelligent user entity resolution I'm missing?
As far as I can tell, any naked frontend-only client serving content without its own access management sounds quite open to a single bad actor spawning a large number of sessions on the same host, and on a small number of distributed hosts, to eat up all available resources. I'm not inclined to ever trust a number of users behaving nicely at n>100 personally.
Given that Firestore appears to advocate "serverless" client-driven applications, unless there's some foolproof DDoS mitigations I'm finding it a hard sell, especially on the whole "if you need more control and protection, do X" argument - you rarely need that until somebody straight-up exploits you, and when you do, you're not going to be particularly sympathetic towards Google's marketing speak.
I struggled with this. We implemented Firestore (successfully) in a production mobile game for the real-time chat and configuration aspects. The SDK WebSocket implementation for real-time updates was a huge factor here.
At the time we were using the Firestore beta, Security Rules were also very poorly documented with no tooling apart from the editor (beta product - I know, I know).
Ultimately we found several vulnerabilities in our own implementation of Firestore that were difficult or practically impossible to resolve on our end. The workaround was to communicate with Firestore only by using the admin SDK.
I still completely vouch for Firestore and Firebase as a product.
You have control over user specific limits. You can use Firestore Security rules, and if that doesn't do it you can create a service that runs server-side code to do what you want and then have the client hit that service.
I think the main problem is that they bolted security (access control) on top of their existing system, rather than make it an integral part of the design.
We built it specifically for Cloud Firestore and as an integral part of the system from the beginning. We always like to hear where we can do better to help prioritize the next set of improvements, and would love to know what gives you the impression of being bolted on.
I’m building an app at the moment with Firestore and 99% of it is amazing or better. It’s a really fantastic tool and y’all are enabling people like me who are less adept to make good stuff. So thanks!
Two minor notes (one on security rules):
(1) in the tutorials I’ve seen, the security rules have received little attention and the documentation I’ve read is a bit confusing about the spectrum of possibilities for security rules. Tbh, I’ve punted security rules altogether until I’m closer to “beta ready” as opposed to writing them along the way. I suspect this is bad practice but haven’t found a better way. Much of this is my fault lol.
(2) “presence” systems still require the use of Real-Time DB. +1 feature request to support user connect/disconnect events in Firestore.
firebaser here Completely agreed that we should do a better job in documenting the security rules. They're very powerful, but a huge paradigm shift for most developers, one that is too easily ignored (at your own and your user's peril). We've already released a bunch of improvements on the documentation, and continue to work on adding new documentation, video tutorials, and console improvements. Please keep telling us where we can/should improve security rules further though.
And yes, also agreed on needing a solution for presence system, either in Firestore itself, or separate from it.
That would be a really weird user experience given how long this has been in Beta. Google has had over 1.5 years to get access control right, so I bet (hope) this is not an issue for new users.
Limiting writes would be fairly easy. You could have an onWrite Function trigger[0] that writes activity to a rate limit Firestore document. When new write requests come in, the Firestore rules can read that rate limit docuemnt and decide if the particular write should be allowed or not.
Limiting reads is interesting. I think generally the idea is data is cheap enough that most people don't have to worry about it. If you do you could gate access through an API in a Function.
This is a good question. I think a more general feature it's lacking is auditing (as far as I've been able to find), from which you could judge whether a user was abusive and revoke permissions. There's profiling tools, which you may be able to hack together something for auditing if you were so inclined: https://firebase.google.com/docs/database/usage/profile
How would an automatic system even prevent unwanted deletion of database entries or that users access others' data? Maybe I am lacking imagination but without server-side validation the practical use cases seem very limited.
When I last looked at Firebase, it seems like they bake auth into database so in this case a user is limited on what data they can read, like a select where uuid === uuid https://firebase.google.com/docs/database/security/user-secu.... And I assume non-authenticated read requests don't have access to everything...
Cloud firestore supports placing rules on the database to prevent access by users that aren't authorized. You can independently control get, write, update, list, etc. The way the rules work can sometimes be irritating, in particular for list queries, but they do work.
In terms of rate limiting the only one I'm aware of is writes to a single record are rate limited. This impacts normal development when making counters that need to be updated, which leads to an odd "sharded counter" model.
For a subscription based Chrome extension (https://www.checkbot.io/), I use Firestore, Firebase and Cloud Functions to handle subscription activation and login/authentication. I use Paddle for payments which fires a web hook to a Cloud Function when a purchase is made.
Works great so far for no cost and it hasn't required a single bug fix or any maintenance since launch. Firestore definitely has a lot of surprising caveats though and you need to design your app and data around this to avoid trouble later.
Does anyone have any stories to share about when they outgrew Firestore, what they migrated to and how? I wouldn't be keen to use NoSQL if my data became more complex.
Paddle's pricing seems a bit high at 5% + $0.5. Is there a reason why you opted to use Paddle over Stripe? Curious because I'm similarly working on a Firebase MVP at the moment and looking to integrate a donation feature.
EDIT: nvm, you answered the Stripe vs Paddle question in another reply.
It does seem high but even doing tax returns is annoying enough so I've no desire to get involved with EU VAT and tax issues selling to other countries. This way I can set up payment and it runs itself which is worth the cost and Firebase means I don't have to think about DevOps too. Gumroad is another payment option to consider.
I couldn't find one for Paddle to use but I'm sure I saw a few for Stripe. I wanted to use Paddle because they do EU VAT for you. I could be tempted to open source it though.
The gist of it is when a subscription is purchased a web hook triggers a Firebase Cloud Function that stores the subscription data (i.e. email address + subscription expiry date) in Firestore then when someone logs in via Firebase authentication you check if the email address is linked to a valid subscription. When the next payment is made to the subscription, you bump up the subscription expiry date. I've found there's little that can go wrong once it's up and running.
That's really useful, thanks. As is the pointer to paddle, I think they may have been missing some features last time I checked them (some time ago, or I just missed it) but that was a long time ago and they seem to have everything covered now. Fantastic! Saves some code I've been putting off writing.
Would also love to hear more about how this is done... I'm in the process of setting up Stripe subscriptions on a website, but have been scratching my head over the best way to accomplish everything without setting up my own back-end server.
I'm an engineer at Stripe and thinking about this problem - drop me an email? In profile. Would love to run a few ideas past you (or others frustrated by this).
1. Disappointing that it has gone to GA without providing proper search (after a long beta). Can anybody explain why a service run by the worlds #1 search company continues to point users to third party services if you want to implement a basic text search in a database? I genuinely don't understand that.
2. Feature requests (or complaints) re backup, queries and documentation are not new. Nor are the answers (or excuses) we see here, which revolve around scalability and (until now) being in beta. BUT all users I have ever heard say they use the product for speed & ease of setup & convenience - for MVPs, not for massive googlesque data. It almost feels like the product market fit is not quite right. So, forgive my technical ignorance here, but worst case scenario, why not provide the features with the caveat that they are slow or expensive or won't work above a certain size of db? Isn't half a solution better than no solution?
The database and whole Google/Firebase app suite thing has some strong points. But to be frank, and I'm sorry if I'm being dismissive of hard work and technical wonders here, from the perspective of a customer and outside observer, a number of things smell quite off.
1. Mainly because it's both a different search problem (general DB vs specific to web search) and hard engineering wise given our model; we implement not only the cloud database, but embedded versions for iOS, Android, and Web - not to mention real-time functionality and tailoring it to how our index engine works, etc. While we have a lot of customers and use-cases that don't need Fulltext Search, we totally agree it's important and have done explorations on how we'd deliver something along these lines.
2. Agreed. During the beta program we've delivered the managed export and import service for backups, adding array contains capabilities to queries and have got close enough to delivering Collection Group queries to mention them as part of GA. For documentation our tech writing team as done a lot of updates, new pages, and fixes - we know there is always more to do. Cloud Firestore is definitely used in production and at scale by our customers, and with nearly 1 million databases being created the range of use cases and traffic/load patterns has been vast. Our beta program involved working with a lot of them to improve things like hardening and scalability to ensure we can meet our 5 nines of availability SLA.
"Isn't half a solution better than no solution?" -> In a lot of cases, absolutely not. A half solution that falls over when you tip a certain point of scale can result in extended downtimes, since the solution often ends up being "we need to completely rearchitect this", which isn't easy or quick when your business is out of commission.
"from the perspective of a customer and outside observer, a number of things smell quite off." -> Sorry to hear this, I can only hope the continued hard work from the team will turn you around.
Thank you for the answer. I have to admit it doesn't quite sway me, for reasons such as those below, but thank you.
e.g. yes I realise they are different search problems, but I'd presume that Google is nonetheless well-equipped to handle the document db one. the only apps I can imagine that couldn't benefit from a search box are games - anything content-focused or ecommerce focused needs one, and the majority of utilities benefit too (yes I can do chat without search, but it certainly benefits from being able to search through chats) - any examples? yes I realize having to do Backend/iOS/Android/Web is hard (as it is for everybody else), but with on device cases at least the db is smaller. Im sure you do have big users, I didn't mean to imply otherwise, but with my admittedly very limited knowledge I'd still wager that a majority do not see uptime and scalability as the most urgent improvements, but rather those we are discussing. In our case, give us just 2 9s of uptime, and give us the above queries and searches even if 2x as slow and expensive as you'd like them to be, and limited to a db the size of an average relational db, and that would beat the extra 3 9s of uptime and the super scalability any day. Not least because, I don't mean to be rude here, just candid, but if we were ever to reach a point where we needed that massive scale and uptime, I'm not sure I'd be keen to trust Google with user data.
To be clear - I like several things about Firebase/Firestore, which is why we use it and why I'm insisting on badgering you here. I just wish I could be completely comfortable with my choice rather than wondering every day if I shouldn't just use something else.
Curious - what do you use? any suggestions of good serverless NoSQL? with the built in auth and extras Google/Firebase offer? What about relational? I've used Back4App, but with it being small and FB having killed Parse, it wasn't right.
Have you looked at Sanity [1]? You get both a fast cloud backend (with a powerful GraphQL-like query language with joins, transactions, object-level security, text matching, real-time change notification, etc.) and a collaborative editing front end (optional, but recommended!). The latter is open source and modular. You can deploy serverless just like Firebase/Firestore. Gatsby is a popular way to serve Sanity.
Disclosure: I've contributed to their database tech, but I don't work for Sanity. Mostly just a fan.
Been loving Firestore! It has been my first real experience w/ NoSQL in an MVP to production-ready quickly. It's been SO easy to experiment with and learn. Community has been great.
We were using RTDB and migrated firestore. The querying limitations in RTDB is really painful.
Transitioning was not that difficult. Basically wrote a script in node.js to migrate the data. (our db size is <100M)
Anyone with inside knowledge of Firestore care to comment on the ETA of more advanced query functionality?
We have played with Firestore quite a bit, but rely heavily on the ability to do aggregate queries. Reading all of the documents and performing this on the client side is nowhere near good enough. Nor is triggering functions to update a "count" or "sum" property on a doc.
Edit: Looks like a PM answered on another thread...
"It's a point of internal discussion on scalable ways to achieve this, but nothing we can promise. We definitely see the need for it."
I get that it isn't supported. We have followed the recommendations mentioned in the docs but found the usage of these workarounds to be sorely lacking in performance and reliability.
As a student developing a system for a friendly NGO I must say that having angularfire integrate with Angular made this an instapick because I wanted to get started asap.
The issue now is that it really feels like the docs need more examples. I'd take 4-6 hours to add a new feature and I'd spend them reading the docs. Sometimes I'm not sure if I don't get something or it's a missing feature.
I guess the community is still growing but it's really hard to find answers to some simple problems.
angularfire has an inactive gitter and there isn't one for firebase specifically. It would be really amazing if firebase devs could take some time to answer the community on gitter or something, at least at the beginning before there's more people knowing the answers.
EDIT: Found your link to the google group. I guess I'll ask there then :)
AngularFire lead here, I'd recommend you check out our GDE Jeff Delaney's community here https://angularfirebase.com for AngularFire related content. He has tutorials, lessons, and maintains a very active Slack Channel.
Thanks for the feedback. Feel free to send any feedback on missing examples/tutorials that would have made things easier to our discussion group (https://groups.google.com/forum/#!forum/google-cloud-firesto...). Our tech writers see the feedback and use it to help plan their work.
any chances you and your team are working on a GraphQL access layer for the Firestore?
It feels like nested queries(using refs for relations), basic mutations and (especially) graphql subscriptions would be a great match with Firestore and a big improvement in dev experience, imho.
I have created a GraphQL layer over the top of Firebase and while it was pretty easy to do so using Cloud Functions, I agree that it would be great if Firebase offered this out of the box without needing to do all of the work you currently have to, to use it.
I'm a big fan of firestore and live in Colorado. Are there any community events that you host or recommend that would allow me to talk to you, other people at Google, or other firestore community members?
Hi, I’ve enjoyed using Firestore so far, but one aspect I find very limiting is the fact that you are limited to one Firestore database per Google Cloud project. Are there any plans to change this? Allowing multiple Firestore databases would make setting up staging and testing environments much easier.
Hi there. :) I'm an engineer on the Cloud Firestore team. I can't speak to if or when we'll be able to remove this restriction, but please know that the folks working on this (like me) are very aware of what a pain in the butt this is. I'm sorry that I can't give you a more satisfying answer, but I hope we'll be able to change this soon. Thanks for being a customer!
I love Firestore, using it pay per go to push a lot of bytes, even some logs to see them real-time sync in browser and it scales really well and it is cheap.
I've used Cloud Firestore extensively in some of my personal projects. I also have lots of feedback and suggestions I'd love to share with you if interested. Is there still a way to participate in this program or get in touch with you?
What open source components does Cloud Firestore use?
Can you point towards papers that describe the algorithms and data structures that Firestore uses? What infrastructure does it rely on to perform reliably?
The backend is almost entirely custom, though many common low-level open-source components are used pervasively. For example, for components written in C++ we use Abseil, Google Test, etc.
The SDKs are all open source and make use of other open source components:
You can find language specific libraries for many popular languages under the GoogleAPIs org on GitHub. To install, go to the defacto package manager for the specific language.
I'm interested in learning about Firestore architecture and how it solves difficult problems to understand and trust it. I'm not interested in client libraries.
Why isn't there a hosted, easy to use relational version of Firestore? I'm using Firestore right now and it's absolutely frustrating in the lack of relations, the lack of accumulation queries (COUNT, MAX, MIN) and in general the limitations of storing what's essentially arbitrary JSON. How is it that to count the number of entries you either need to build in a counter on your own (which can cause race conditions because counters are hard) or manually go through the entire collection and count?
Another option would be a typed JSON db, essentially you could store JSON that corresponds to typed structs a la serde. Would't solve a lot of my problems, but at least I'd have some built in validation.
An earlier commenter noted that one of the features of Firestore is user-level security, i.e. clients can only access their own data.
This makes sense for many apps in a document-based/NoSQL model, and may let you avoid any need for your own back end.
But it makes less sense when you start introducing relations and data models become more complex.
e.g. pretty soon we'll need to join to a lookup table. That presumably means that that lookup table needs to somehow be opened up to all users, while for other tables the user can only access their own data. Starts getting complex real fast.
At what level? Because I know StackExchange runs off of relational databases and they get a solid amount of traffic (with some Redis, but the foundation is relational). Even if relational doesn't scale for Facebook or whatever, if I'm making a piddly little chrome extension or a small website, a simple relational database will scale just fine.
Firebase is designed around syncing, and I imagine that has something to do with it.
If you want a relational database for just one user's data, you could read a SQLite data file over the network when your user opens the app, perform SQL queries over the data in memory, then write the entire database file back across the network when the user hits a "save" button.
But you probably don't want a "save" button, you just want the user's state to be constantly updating back to the server.
And you probably don't want to write back the entire data file every time you save, just the parts that have changed.
So Firebase is optimized to solve these problems, and I bet the data structures they use are not also optimized for supporting all kinds of relational queries.
While the move seems to be towards NewSQL databases (Spanner/CockroachDB), the answer to relations in NoSQL is to model the data differently. This can generally resolve a lot of the problems inherent in the need for relations.
If Cloud Spanner is too big, then you'll almost certainly be well served by Cloud SQL (fully managed MySQL and PostgreSQL): http://cloud.google.com/sql/
Firestore is literally built on top of spanner. So yeah, that's the answer. It's definitely pricey at first glance but it's a fixed cost based on nodes, compared to say datastore where you pay for storage+reads+writes. Different use cases.
Having heavily used Firebase Realtime Database (firestore's ancestor), I think I will approach this one very carefully.
Firebase Realtime Database was a nightmare with frequent downtime, sometimes minutes, occasionally an hour. Also almost weekly, all clients sometimes wouldn't get notified of document changes which was crippling for our app.
Firebase Realtime was basically a Mysql instance which Google ended up with when they bought the Firebase ecosystem. Not scaleable at all. You see hints about this in some places when they talk about using it in production. I am sure some Google engineer went nuts when he saw what they have to support now.
Firestore is a complete engineered Google solution based on the Google Datastore but with added features from recent years (spanner, cloud functions). This is very obvious if you compare the limits between Datastore and Firestore.
Hi EZ-E - completely understand the concern. Cloud Firestore's architecture is very different and engineered to be an extremely highly available system backed by a 5 nines SLA. It has no scheduled downtime either.
I've been using cloud firestore for 6 months now and haven't had an issue with it. Designing around their rules system has been annoying, and the lack of backups bit me once pretty hard (I built https://firesafe.app as a result), but other than that it's been great.
An export is one component of a backup solution. :) Scheduling is another (https://firebase.google.com/docs/firestore/solutions/schedul...). (Checking restores is critical, too; that's also feasible [managed import into a new cloud project] but requires a bit more legwork.)
I still haven't seen a great reason to go NoSQL over, say, Postgres. I'll think about a good application but then realize that it'll be a PITA to do something slightly different than what I first imagined.
Usually the bigger reasons are easier scaling. By taking a different approach to data, it's often easier to scale across data centers etc. The second being the shape of your data. If most requests are simple key/value queries, a much simpler model can work better, more so if you can keep all your data for a given query together (document dbs in particular), this can have performance benefits over a typically normalized database.
Another is when you need read and more specifically write performance that a single system cannot keep up with. When you hit these boundaries, it gets interesting. Sometimes it's just easier to design for such a system up front.
If it's an internal application SQL first is probably fine, if it's SaaS you may want to look at alternatives.
I'd love to use Postgres, but Firebase is really nice in that it provides first class libraries that abstract away the requests and querying, to the point where querying Firestore just feels like calling an async function on your front end. Plus, you don't even have to think about deploying. There's something really great (and risky) about having an always deployed, always ready backend. Also basic user auth is really easy.
If you're doing a long term project, then Postgres makes sense since the deployment/setup costs are one time. But for short projects Firebase is very nice.
All other things being equal, I wouldn't go NoSQL over Postgres.
What makes Firestore interesting to me (not used it in production) is that you can avoid the need for a backend completely. Your normal architecture is Client <-REST-> Backend <-> DB. You can avoid all the deployment and development in the backend by doing the work in the client.
"More features coming soon. We're working on adding some of the most requested features to Cloud Firestore from our developer community, such as querying for documents across collections"
Looking forward to it. If this comes in, pretty much it removes the need for creating top level collections or am I missing anything?
I've been using Firestore in a React app and I love how it has simplified everything. There's no need to have redux or even an in-memory store in the "context". I just wrap my components with a withFSQuery helper and it automatically updates even when the state changes "server-side".
I didn't use any library. Seemed easy enough to just create a few helpers. I rely on Firebase completely for the data. Since Firebase already keeps a local client cache as well as an offline cache, I didn't see any need to maintain separate state in React -- outside of the HOC components. They subscribe automatically to the data when the component mounts and unsubscribe when the component is destroyed. Any changes, whether from within the app, from the server, or from another instance of the app; have changes synced everywhere for free.
The only local state I have at the app level is state that doesn't get stored in the DB.
Polar is basically a document management and annotation platform. You put all your reading in it and maintain it long term along with annotations, highlights, etc.
Firestore is really nice in that you can target multiple-platforms pretty easily. There are SDKs for basically every platform.
It's definitely not perfect but I'm pretty happy with the decision.
Will there ever be support for aggregate queries? Or queries for non-existent values?
I have a rather large dataset that's tough to scan over and I find the functions/transactions to build my own aggregates not entirely accurate enough since a document has rate limits for writes.
I totally understand its a complex issue. Is there someway I could subscribe to updates regarding this subject for the future (even if it's 2 years away I would enjoy testing any potential developments here)
Hi there! One thing you might consider is using a function to replicate your Firestore data to BigQuery, depending on the value the latter would add. (Very fast OLAP; I'm a big fan.)
The issue we had with cloud Firestore is the inability to _really_ query documents through the Firebase console. It’s very basic and there is no third-party tooling available yet.
To run any kind of specific query you’d need to handle that application side. Just something to consider.
This service feels pretty shoehorned into the rest of GCP. There are little connectors everywhere for Firestore specifically, and they don't fit in cleanly with the rest of GCP offerings. Why is Google so bullish on this thing? I must not be seeing the appeal.
Where do you feel it is shoehorned in? Firestore is easy to access from App Engine and Cloud Functions (and GKE/GCE as well) using the built-in service accounts, it has managed export into BigQuery. Would love to know where you found friction when using it with other GCP services.
> Why is Google so bullish on this thing?
While I can't speak for Google as a whole, I'm personally bullish on Firestore for a few reasons:
1) Gives you the same "client-side only" model that Firebase popularized. This makes creating applications much faster as you remove the whole server side component and basically all your ops work.
2) True "serverless" pricing and ops. Because you only pay for reads/writes and not instance time, your costs scale linearly with your app's usage. And you don't have to worry about sharding or other operationally complicated things to scale your database as you grow. The caveat here is you have to structure your data and queries well, otherwise your costs will skyrocket [0].
3) Gives you the same "server side" functionality as Cloud Datastore (which Firestore is basically the next generation of [1]). This means you can use Firestore in place of a traditional NoSQL database like MongoDB.
4) Strong consistency. One of the biggest problem with Cloud Datastore (and most NoSQL databases) is eventual consistency. Datastore worked around this with "Entity Groups" which in my opinion were super clunky to work with and very limiting. Firestory is strongly consistent so you don't have to worry about this even at scale, which is super nice.
5) GCF Triggers. The fact that you can trigger a cloud function when something is written to the database is very powerful, it's basically like a traditional database trigger or stored procedure but you can do anything.
The biggest feature gap for Firestore is the querying abilities. While it is way better than the original Firebase DB, it's nowhere close to a relational database or something like Cloud Spanner. The team is working on it though.
The "real time" stuff is interesting but not really super relevant to the things I like to build.
I mentioned it in a comment above, but one way in which it feels shoehorned in is the fact that you can only have one Firestore per Google Cloud project. Having only one Firestore per project makes setting up test or staging environments a hassle.
It's not internal corporate lingo, it's how business that produce products plan the lifecycle of their products, and while it's commonly used in software, it's a very standard term across verticals.
It's technically possible for us to build that, but we're focusing on the most common requests at this time. If anyone has that please let your account reps know (they feed us lots of great feedback for prioritization), or start a discussion on our Google Group: https://groups.google.com/forum/#!forum/google-cloud-firesto...
Yes, this is what we like about RDS/DynamoDB/CosmosDB. Being able to run low-latency replicas (bonus if they accept writes) is important for distributed apps, but the current multi-region deployment adds too much overhead.
We are... We're not massive yet but we have about 250 users who've updated about 20GB since our launch. Still really early as we haven't had a massive launch yet.
I find the limitations of their query language interesting, no way to test for undefined properties in a collection (the client libraries just give up, and I don't see a workaround); and no way to match on inequality rather than equality. This exists in some other odd database engines, but it's frustrating in general use scenarios (where, in this case, I don't exactly see how performance is a limiting factor).