
Firebase outages and misleading status reporting - sauldcosta
https://medium.com/@scosta/why-firebase-sucks-ce5d2302eb20
======
mattbillenstein
AppEngine had the same problems - seemingly every week some component of the
service would be down for some non-negligible amount of time (laughably it was
often search -- we're talking about Google here).

I've generally found AWS more reliable than GCP - even when GCP isn't having
downtime, you'll occasionally get 503's from their APIs, so you need to wrap
all your calls to them in retries.

AWS has had multiple instances of cascading EBS backplane failures, but
outside of that I've found their core services pretty reliable -- 400+ days of
uptime on a lot of VMs in systems I've worked on -- I avoid EBS when I can.

My advice is to keep your stuff simple - PaaS might seem attractive, but you
have so little control as you mention when something goes down. Embrace multi-
cloud by using the lowest common denominator of tech available - virtual
machines, dns, networking, and instance storage if that suits your needs.
Treat vms as disposable - and make sure you have system, service, and data
redundancy at that level to survive the failure of an entire availability zone
across your application.

~~~
latchkey
AppEngine had some big failures early on, but I (and some friends) built a
$$$$$$$$$$$$ company on AppEngine (and GCP) and couldn't have done it without
it. The stability the last few years has been extremely good. Our base logic
was that we trust Google to hire and train talented DevOps more than we can do
it and it sure sucks carrying a pager.

~~~
mattbillenstein
Snapchat?

If your app is that big, someone is always carrying a pager for when there are
problems. The difference is on PaaS, you can't do a damn thing about it if
it's a problem with the platform.

I've helped multiple companies get off of app engine because even for
companies losing money (startups), it's too unreliable -- and actually very
slow (datastore) if your app is relational. Also, it's very very expensive if
you hit the datastore hard.

~~~
latchkey
Not quite, but first MVP in 3 months and $80m gross revenue in the first year.
Selling t-shirts. We did it with 3 engineers, no devops or qa teams and
definitely no pagers. We had zero downtime and the very rare bugs were fixed
on the next push to master (CI/CD) and real testing.

I'm not saying the datastore is perfect, but using the datastore has well
known and predictable limitations that need to be engineered for. It is
definitely not something you can RTFM later on. Just like any database to be
honest. It is not a relational database. It doesn't do aggregations. It is for
storing data (using Objectify [0]) and memcache is for caching that data.

[0]
[https://github.com/objectify/objectify](https://github.com/objectify/objectify)

------
EZ-E
We had the same exact problem with Firebase Realtime Database. Our product
uses it heavily and is dependent on its latency so we notice anytime an issue
appear.

The unacceptable thing is : not only outages are fairly common, many smaller,
briefer outages and disruptions are not even reported. For example the day
after the 2 hour outage mentioned in the article, there was an issue where
while writing to the database seemingly successful, but the clients listening
to the changes would NOT receive the notification that the data their are
observing was updated, for more than 30 minutes. It wasn't reported in
Firebase's status dashboard.

Google bought Firebase back then, and to replace Firebase Realtime Database,
Google developed Firebase Firestore (now in beta). I suspect that Firebase
Realtime Database isn't receiving much attention these days and that the
service will be closed after some time.

------
xrd
Have to say, having worked in a huge organization with multiple clients
accessing services, I much prefer the firebase solution. You still have
downtime in any polyglot solution and the problem is pretty clear here (it's
firebase database, not one of dozens of legacy layers...). When you own the
entire stack it is amazing how much of the organizational effort goes into
obscuring who is responsible. And the stack is much more opaque.

It really is possible to design a system around firebase with a much smaller
team. You give up control but control is a myth anyway. And, Firestore is
actually designed to support offline mode, so wonder if they neglected to
design for that feature which might help here.

The unfortunate reality is that we are in a moment where Firestore is beta and
Firebase Database is not supported as it should be. Google should do a better
job of helping people to migrate and explaining the roadmao. I imagine the
writer of this article just doesn't have as much company clout to get that
level of involvement from Google. This was probably an attempt to get that
attention that other higher paying clients can get.

------
pg_bot
If you need to build a product that relies heavily on real time updates, I
would look into using Elixir and Phoenix.[0] They nailed the channel
abstraction which is the main entry point for realtime communication over
websockets. It takes me hours to make scalable realtime applications in what
would normally take me days using other systems. The language may take some
time to get used to, and the ecosystem isn't as mature as other languages, but
what is there is incredibly impressive.

[0]: [https://phoenixframework.org/](https://phoenixframework.org/)

~~~
com2kid
Firebase does a lot more, including a slew of Auth options that make life much
easier.

Add to that the ability to resolve connections dropping out (common on mobile)
and that their libraries have been ported all over the place, and Firebase is
a defacto answer for mobile developers. It can be up and running from in less
than 30 minutes for someone who has 0 experience in cloud development.

It is hard to replicate _that_.

~~~
pg_bot
The common use cases for firebase can be easily reproduced with Phoenix.
Phoenix also comes with a handy presence feature that allows you to track
whether someone is currently using the product. (Think which present users in
a chat room)

I understand the skepticism, but I would highly suggest taking a look and
playing around. It's really, really good plus you get to fully own everything
you build ;)

~~~
PKop
Firebase has this presence feature as well.

Also, "fully owning" everything isn't a selling point for everyone. Some don't
want to own the uptime, manage infrastructure devops etc. Serverless/managed
services have their use cases.

Small teams, individual developers, bootstrapping an app quickly, running a
web app with no servers to manage... Often times much more valuable
capabilities than being able to reimplement to functionality already
availabile to you for very low cost.

------
crystaln
Firebase is really awesome. However there kinds of reliability issues and the
lack of integrity and communication with which Google handles such things are
major reasons I would avoid committing to it. On top of that, Google's history
of overlapping products (Firebase or Firestore?) and discontinuing or foot
dragging support make decisions confusing and commitment harrowing.

Amazon on the other hand has a history of committing to clear product
direction which makes committing to their platforms much easier. Amplify and
AppSync for instance feel like safer choices.

~~~
nslog
The Amplify and AppSync models are also architecturally more scalable as you
don't have one big opaque DB and endpoint in a single region.

------
dotmanish
I stopped using the realtime database once firestore was released in beta. So
haven't experienced the downtime you have demonstrated in the status graphs,
but Firebase's SLA [1] for realtime database apparently guarantees service
credit for monthly uptime less than 99.95%. To corroborate your observations,
check if you received this credit:

Less than 99.95% but equal to or greater than 99.0%: 10% credit

Less than 99.0%: 30% credit

[1] [https://firebase.google.com/terms/service-level-
agreement/](https://firebase.google.com/terms/service-level-agreement/)

~~~
joeblau
Is Firestore a more reliable version of Firebase's real-time database?

~~~
dotmanish
It's a different database altogether - document-oriented at that.

[https://firebase.google.com/docs/firestore/rtdb-vs-
firestore](https://firebase.google.com/docs/firestore/rtdb-vs-firestore)

------
nslog
Check out the AWS offerings (Amplify + AppSync) if you're rolling off
Firebase: [https://aws-amplify.github.io](https://aws-amplify.github.io)
[https://docs.aws.amazon.com/appsync/latest/devguide/welcome....](https://docs.aws.amazon.com/appsync/latest/devguide/welcome.html)

~~~
jaxondu
Amplify+AppSync client SDK support is pathetic compared with Firebase. No
official support for Flutter, Xamarin and Unity apps.

------
sampl
Not really convinced firebase is “covering it up”.

The official status page breaks down availability by-service with descriptions
of each outage and updates with timestamps.

[https://status.firebase.google.com](https://status.firebase.google.com)

~~~
dahart
> The official status page breaks down availability by-service

That’s part of the problem, actually. I’ve noticed for years that some
Firebase service distruptions go unreported, and it was clear that reporting
individual services was a way to avoid showing the end-to-end summary. It
doesn’t matter that all of Firebase’s servers are up and running, if the end-
to-end service they provide isn’t working.

~~~
Guidii
Firebase offers a variety of individual services, and most apps pick up only
the services they need. So reporting service-by-service makes more sense.

~~~
dahart
That’s true, and beside the point. The problem isn’t reporting individual
services, the problem is giving the impression that uptime for individual
services equals uptime for Firebase as a whole.

------
joeblau
I just build a service/website last weekend using App Engine and Firebase.
After reading these comments and this blog post, I think I might migrate it
over to AWS. I didn't realize that Firebase was so unreliable.

------
romed
Firebase RTDB is basically the legacy product that barely works. Firestore is
the post-acquisition product built on Google tech. It’s a rotten situation. I
noticed the outage mentioned in this post because it took down Ford GoBike
(and Citibike and all the other Motivate/Lyft bike share systems).

------
burtonator
I've been thinking about implementing Firebase as part of Polar:
[https://getpolarized.io/](https://getpolarized.io/)

The idea is that you update your documents (PDF, HTML, etc) into Polar, tag
them, and then we sync them to the cloud. Then when you go to another machine
like work or home your documents are always synchronized.

At first I fell in love with Firebase and was very very excited to start
implementing it.

They've spent a ton of time working on the initial implementation experience.

Their Firebase Auth support was amazingly simple to setup. Same with Firebase
hosting. It's top notch. You can be up and running with a CDN hosting with SSL
in like 2 minutes and the firebase tools are exceptional.

Cloud Firestore seems really interesting and easy to setup. It's basically
designed for 'apps'. IE user-facing apps and works pretty well if all the data
is private to the user.

I do struggle with these issues of reliability though. At Datastreamer
([http://www.datastreamer.io/](http://www.datastreamer.io/)) we use Hetzner
and have about a half petabyte stored there.

It's a blog content search engine which we license to other startups so high
availability is critical.

Their infra is amazingly reliable. Very very happy here.

The problem of course is that you then have to manage your own software stack
which of course requires extra effort on your part.

------
ramkalari
How does firestore fare in terms of reliability? I heard it is a cleaner and
more scalable version of Firebase.

------
jondubois
I think that PaaS and BaaS where you don't have access to the back end is a
dead end. It's going to go the way of Windows Server. Open source solutions
will always win in the end when it comes to developers.

~~~
wild_preference
People also balked during the transition away from FTP. SSHing into servers is
precisely the thing you want to get away from whether it's to change code or
to hotfix your nginx.conf or to do a quick apt-get install.

Doesn't mean that we don't need SSH ever, but 99% of the time it's something
we use because we're too lazy to setup automation.

I reckon you're using open-source here to mean self-hosted, but that doesn't
really change anything. For example, the reason every small company I've
worked at didn't have a way to analyze their logs/stderr and coincide them
with other events for debugging was because they didn't, not because they
couldn't.

~~~
anothergoogler
"FTP -> SSH -> proprietary console" does not look like evolution over a
gradient of control to me. I don't understand why you're comparing FTP to SSH
when SSH is lower level than FTP. FTP "throw it on the server and let mod_php
deal with it" deployments were decidedly higher level than SSH-based ones. FTP
deployments were often coupled with GUI-based steps, for instance database
migrations run from Drupal web app.

------
novaleaf
I think that now, Firebase is build on Google Cloud Datastore. I have used
Datastore in production since 2015, and have had no outages, but if I had to
do it again I think I'd go normal RDB, just because query support is extremely
limited (no full text search) and "schema change == data rebuild" issues.

~~~
dfee
Do you mean that you’d use something like a managed Postgres AND build and run
a backend service that interfaces a web client to that database?

~~~
novaleaf
yeah exactly.

------
ankit219
yeah, we have suffered too. Initially we were using firebase Real Time DB for
authentication as well as delivering messages. Messages suffered outages every
now and then (and we suffered more cos our backend is in Python Django and
Pyrebase comes with its own set of issues on top of Firebase). When we found
out messages arent being delivered, we switched to pusher as a backup first
and then to websocket. Now we use Firebase only for authentication (via real
time database) and Notification sending, and still have a backend/app trigger
every time there is an error on firebase.

I have always wondered what a reliable backup to the realtime db could be.
Havent found much till date.

~~~
nslog
AWS AppSync and Amplify

------
iamleppert
Serves you right for using a “real-time database” (whatever that is). I’m sure
your chat product feature could have been designed using a flat file as a
datastore and a simple web socket server.

~~~
dang
Please don't be a jerk on Hacker News. The idea here is: if you have a
substantive point to make, make it thoughtfully; if you don't, please don't
comment until you do.

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

