
Go reliability and durability at Dropbox - astdb
https://about.sourcegraph.com/go/go-reliability-and-durability-at-dropbox-tammy-butow
======
tlb
What does "Reliability of 99.9999999999% (twelve 9s)" even mean? Obviously you
have to exclude large classes of user-visible failures (network outage,
account over quota) to achieve that. I don't think they're claiming less than
0.00000000001% chance of a zombie apocalypse/Mad Max/ex Machina/asteroid
impact end-of-times situation. So just what failures are counted?

For comparison, public telephony systems aimed for five 9s. That was usually
expressed as "20 minutes downtime over 40 years, combined hardware and
software budget, for outages affecting more than 32 users." One software crash
requiring human intervention would count for more than 20 minutes, so you were
allowed <1 of these in 40 years system lifetime.

~~~
coldtea
> _For comparison, public telephony systems aimed for five 9s. That was
> usually expressed as "20 minutes downtime over 40 years, combined hardware
> and software budget, for outages affecting more than 32 users." One software
> crash requiring human intervention would count for more than 20 minutes, so
> you were allowed <1 of these in 40 years system lifetime._

And all of that is total bogus (the "aim", not your information), as no public
telephony system (and surely not in my country) ever had anything close to
that.

A few hours of downtime a few times a year is much more like it, although it
has been getting better over time.

~~~
kuschku
Not really. The German phone and power networks get quite close to this
reliability. I've had less than 30 minutes combined downtime in my life.

~~~
apk17
How would you know? Or at least, how would I know? If, back in the POTS day,
the exchange went down half the night, I wouldn't have noticed.

Nowadays, you can look into your router logs. And Telekom had serious issues
with their VoIP stuff.

Likewise, even the apparently planned outages at my previous location exceeded
the 30 minutes. (It's still good, but not _that_ good.)

~~~
kuschku
> Likewise, even the apparently planned outages at my previous location
> exceeded the 30 minutes. (It's still good, but not that good.)

Yeah, they're starting to do maintenance here now, too (for introducing the
500/200mbps VDSL2), but I've never been with Telekom, and my existing ISPs
never had real issues.

------
kylequest
Here's the actual video:
[https://youtu.be/5doOcaMXx08](https://youtu.be/5doOcaMXx08)

------
didibus
> It’s easy to be productive in Go.

Hum, I'm not sure what this means. Is it saying go is a productice language,
or just that you'll master go quickly and reach peek go productivity quickly?

~~~
barsonme
Both, really. Go's so small you can be quite proficient with in months. While
it's not as productive as, say, Python (wrt how quickly you can get your code
up and running) it's much quicker than other languages (and nicer to use in
the long run).

~~~
smegel
> months

? Go is tiny. The tutorial takes an afternoon. A C programmer will be
profecient within days.

~~~
IshKebab
They may learn the language in days but not the standard library. That's
always what takes the most time.

~~~
wvh
I agree. And may I add the ecosystem: which third party modules you should or
shouldn't be looking at, and if you should rely on them or roll your own.

~~~
stouset
Then months for best practices, idioms, and understanding the pitfalls around
concurrency and channels, which are nowhere near as "batteries-included" as
people would lead you to believe.

This "hype" around go letting you be productive sooner is honestly just
bullshit. I've been programming for 20 years, with C, C++, Ruby, Perl, Scheme,
Haskell, Rust, Go, Java, and others I've surely forgot. Sure, I was able to
start writing go within a day. But it was _terrible_ go, and months later I
was still dealing with the consequences of early bad decisions.

This isn't unique to go by any means. But if you go in thinking you're going
to write a project that does some sort of non-trivial task and be able to rely
on that later, you're going to have a bad time.

~~~
greenhouse_gas
But that's true with all those languages too. If you're an experienced Java
programmer moving to Rust, you'll write Java in Rust.

So it's 1 day of learning Go + 1 month of learning style vs 2 weeks of
learning Rust + 1 month of learning style. [1]

[1] Note, I think that Rust has the best community when it comes to getting
beginners up to par, whether in #rust-beginners or on /r/rust. It just is a
hard and not forgiving language.

~~~
stouset
> But that's true with all those languages too.

That is exactly my point, and I said as much in my post.

> So it's 1 day of learning Go + 1 month of learning style vs 2 weeks of
> learning Rust + 1 month of learning style.

More like 1 day vs. 1 week, with six months of ramp-up before you're writing
anything approximating production-quality code. At which point, does it really
matter that you were able to write garbage in one day rather than seven?

~~~
logicallee
Hi. Could you (or anyone else here!) kindly answer this question:

[https://news.ycombinator.com/item?id=14934690](https://news.ycombinator.com/item?id=14934690)

One way to get your correct answer is for me to propose a wrong or partial
answer. How is this as a message to the past (i.e. I am guessing your answer
to the above request):

WRONG/FAKE ANSWER:

"Note to past self: although you can write code that compiles on day 1 and day
2, before considering your code idiomatic and ready to build on top of (or
even deploy to production), DO/LEARN THE FOLLOWING:

1) read and work through all of The Go Programming Language book. This teaches
all idioms.

2) Practice and use Go's testing tools (built-in). Always use its reformat
tool applied on every save from your IDE.

3) Begin using a better error and logging library than the built-in error
passing idiom. Google this.

4) Use a debugger. Google this.

5) Security and versioning with go get is broken: Google this and learn
vendoring with versions. Otherwise your code cannot go on production.

6) You control garbage collection frequency. Learn to set this. In emergencies
disable it entirely to trade memory for latency to gain one or two
milliseconds (approach hard realtime), for example when you are dropping
requests. Then reenable when you can take the (very small) hit. Garbage
collection is very efficient and low-latency.

7) Channels and concurrency (goroutines) do not work as described.
Specifically:

\- This link will solve your problems with channels:
[https://stackoverflow.com/questions/41200505/whats-the-
best-...](https://stackoverflow.com/questions/41200505/whats-the-best-
practice-to-synchronise-channels-and-wait-groups)

Adopt it. Then they work as described.

For goroutines:

Follow this document -
[https://gist.github.com/pzurek/6642797](https://gist.github.com/pzurek/6642797)

After incorporating the above specific changes, you are ready to commit to
production and build on top of what you want to build.

"

is the above message completely wrong and bullshit? Then please correct it.

Your insight and experience are appreciated and I would love to read what you
would write as a message to yourself in the past, to save those lost months.
Thank you.

~~~
stouset
Those months weren't "lost". There's no shortcut to learning idioms, best
practices, and — more importantly — understanding the most natural way to
model a solution in whatever language you're working with.

Sure, read books and blog posts. Test things. Learn debugging tools. But
nothing really substitutes for actually working with a language. Having an
experienced mentor helps, but only if they're advising you on _why_ they chose
a certain approach, or why a critiqued approach you chose was bad.

~~~
logicallee
Essentially I am asking you to be that mentor (to your past self) and share
the email you could write, specific to YOUR particular coding that you were
doing. Think a mentor should include "why"? Then include why.

Remember: I didn't ask for a shortcut, I asked for specific sentences that you
could have sent back in time to prevent this:

>months later I was still dealing with the consequences of early bad
decisions.

Your sentences can be literally anything, specific to your specific situation,
that could have prevented those bad decisions.

So, let's be clear. Your message to the past you reads:

"Hi - I am you from the future. I was asked to send back a message in time
listing specific things you can do right now, having just learned Go, based on
my having to deal with your code, that can prevent your bad decisions which
you will be dealing with months later. It is impossible to put this into
English words. So, I have no advice for you of any kind. Fuck you, past-me.
And also fuck me - I'll just have to deal with your lack of understanding. I
hope you have found this mentoring by me to be as helpful as I have. I don't
believe in mentoring."

So, that's the new version of your message, based on your review of your own
code and memory of your decisions and learning process at the time.

Well, okay. I guess I accept your viewpoint. (Note: if there's some other
reason you don't want to answer publicly, such as not talking about your
codebase, you can email me at the email in my profile.)

I am looking for specific architectural advice, using your experience as a
case study.

------
0xCMP
Oh, I thought Dropbox was using Rust instead of Go for a lot of things, but
maybe they ended up using both. I can see why they'd have wanted to be just
moving to either Rust or Go since from what I understand they used to be
mostly Python for everything.

Cool that they use Go a lot.

~~~
0xFFC
Go and Rust are not competing for same space. They are different language for
different purposes.

~~~
coldtea
Rust is multi-purpose, and with improved tooling and a little more maturity
(Rust 2.0) I can see it getting all the (network/server) systems and command
line stuff Go does.

(But not vice versa: Go wont be able to handle the no-GC close-to-the-metal
use cases).

~~~
painted
Rust is great, but to be honest, Rust's learning curve is way steeper than
GO's. And this may slow down the Rust's domination.

------
mostafah
I have an off-topic question: This is the second company (after GitLab) I see
with an “about” subdomain. Is this a new trend of using “about.x.com” for the
marketing website and “x.com” for the web app? Is there a blog post or
discussion about this?

~~~
sytse
At GitLab we first used www dot the marketing site and the apex for the app
but many people assumed they would have the same content. That is why we
introduced about. Cool to see we might have started a trend.

~~~
tokenizerrr
Wow, never noticed that. When did you change? It would have confused me so
much.

------
justinclift
Anyone know where in the talk it has the mention of "Debugging tools (mostly!)
work well"?

I'm skipping back and forwards through it, but the talk isn't in the same
order as this article which is making it very difficult without watching the
whole talk from start -> end.

Asking because debugging is a pain point I've been having with Go for a few
months, so am surprised to see it described as mostly working well. I'd like
to get my debugging experiences to at least that level of "(mostly) working
well". :D

~~~
ctrlrsf
What are you having trouble debug? Or what do you think isn't working well for
you?

~~~
justinclift
The two main problems are:

a) Pretty much any platform other Linux doesn't seem to work (for debugging).
;)

b) Breakpoints occasionally not firing even when running Linux (Fedora 25 in
this case)

This is with CGO enabled code, which isn't optional as it's from the libraries
we use (no choice).

Using Delve for debugging (though Gogland atm), as we're not yet in
production.

Fairly worried about how things will er... go ;) in prod though when we get to
rolling things out.

With Delve, it's apparently the state of the art with Go debugging. If
something else actually works reliably though, I'm happy to try it. :)

------
apta
> The biggest pain with Go that Tammy identified was in dealing with race
> conditions.

> Data races are the hardest type of bug to debug, spot, fix, etc.

Exactly what Rust aims at preventing. Sad to see that the industry is not
learning.

~~~
jacquesm
> Sad to see that the industry is not learning.

Sad that Rust advocates are not learning. This sort of comment is what drives
people away from Rust. Stop ramming your stuff down other people's throats. Go
build that exclusively Rust based Dropbox clone that outperforms Dropbox and
show how well Rust performs in that situation.

Rust has trade-offs just like Go has trade-offs. Being honest about the
deficiencies of ones chosen platform is a good thing, it helps to keep you
sharp and to avoid problems associated with those deficiencies.

Besides having an over-zealous community that posts off-topic comments all
over threads that have nothing to do with Rust, Rust has deficiencies too.

Note also that Dropbox is already using Rust in some places.

~~~
innocentoldguy
Is it off-topic though? Go has warts and is plagued by poor design decisions,
just like most software. I think it is worth mentioning those, and offering
recommendations that don't have those issues, so people can make the best
technical decisions when choosing a stack for their projects. It also may help
Go maintainers in fixing legitimate complaints in Go, thus making it better.
I've never used Rust, but I have used Go, and found certain things about it
unappealing, so I'm interested in this information, and I'm also interested in
Go improving.

------
didibus
Does rust gurantee data race free code?

~~~
general_pizza
This explains pretty well: [https://doc.rust-
lang.org/beta/nomicon/races.html](https://doc.rust-
lang.org/beta/nomicon/races.html)

------
zzzcpan
Yeah, Go has the worst possible model for concurrency there is - shared memory
multithreading. Hopefully more and more companies will realize how bad this
model really is and start looking into languages with decent concurrency
models, like Erlang and Elixir or at least stick to event loops.

~~~
twic
The worst _possible_ model? Oh my dear fellow, not even close:

[http://catb.org/esr/intercal/ick.htm#Multithreading-using-
CO...](http://catb.org/esr/intercal/ick.htm#Multithreading-using-COME-FROM)

~~~
rrdharan
My favorite is still the multithreaded apartment model...

[https://blogs.msdn.microsoft.com/larryosterman/2004/04/28/wh...](https://blogs.msdn.microsoft.com/larryosterman/2004/04/28/what-
are-these-threading-models-and-why-do-i-care/)

------
dan-compton
What a terrible article.

~~~
TRManderson
>This post was best-effort live-blogged at the conference

Cut them some slack.

~~~
breakingcups
To promote their own service. While the actual talk will probably be actually
posted online in multiple forms.

------
nolanpro
> She talked at GopherCon 2017 about

Ya'll too young to remember OG Gopher
[https://en.wikipedia.org/wiki/Gopher_(protocol)](https://en.wikipedia.org/wiki/Gopher_\(protocol\))

