
Dubsmash: Scaling to 200M Users with 3 Engineers - yarapavan
https://stackshare.io/dubsmash/dubsmash-scaling-to-200-million-users-with-3-engineers
======
gjhgqpqndpe
Sort of interesting just to hear about the ups and downs of companies like
dubsmash. They were often cited as an example of Berlin's future as a startup
city [1]. They went from 35+ employees to 27 [2] to now 12 as they've stated
in this post. They also moved from Berlin to New York, which seems to imply
they felt like the city couldn't offer what they currently need. It looks like
in the process of moving they didn't take that many of the employees with them
(maybe this was also a way out of strict German employment rules?) Seems like
a bit of an attempt at a restart (co-founder Roland Grenke seems to be gone,
etc).

[1] [http://www.wired.co.uk/article/european-
startups-2016-berlin](http://www.wired.co.uk/article/european-
startups-2016-berlin) [2]
[https://techcrunch.com/2016/11/30/dubsmash-9m/](https://techcrunch.com/2016/11/30/dubsmash-9m/)

~~~
kolmogorov
It also sounds odd that they have 3 engineers and 12 employees. What do the
other people do? And hopefully they had more than 3 engineers back when they
had 35 employees...but even then why would they choose to fire engineers and
have that tech to non tech ratio?

~~~
tqkxzugoaupvwqr
Dubsmash relies heavily on copyrighted content from big studios (at least it
did when it became popular). I guess most staff works hand in hand with media
companies to promote their content inside the app.

------
gaius
I hope those three engineers have meaningful equity and exposure to upside
because it sounds like they’re working like sled dogs

~~~
vemv
Thought the same. My wild guess is that those engineers are at their career
peek in terms of energy / ability to deliver glue code, but a few years behind
getting to be a well-rounded engineer that can live/work sustainably.

------
aerovistae
Three engineers maintain code in Java, Swift, previously Objective-C, Go,
Python (both Django and Flask), Node.JS, considering Kotlin, and additionally
make use of Celery, RabbitMQ, React, Redux, Apollo, GraphQL, Postgres, Heroku,
AWS, Jenkins, Kubernetes, Redis, DynamoDB, Elasticsearch, Algolia, Memcached,
and more.

I might be an inexperienced engineer by comparison, but I'll be honest, that
sounds absolutely fucking insane. These three people must be geniuses to be
able to use all of that with sufficient mastery to effectively handle 200M
users.

------
pepijndevos
Sometimes I wonder if there are any internet companies (startup or otherwise)
that do customer support. With numbers like that, it's hard to imagine one of
those users getting even one second of attention with any problems they might
have.

~~~
dan_mctree
You can only really do customer support if it makes financial sense, which it
won't unless you make a significant amount of money on your average customer.
Tech companies that don't have sales, but instead take their revenue through
ads or through selling data are making cents per customer. With average profit
that low, even 1/1000 customers making use of your support for 5 minutes would
destroy any chance of profit.

------
mooreds
Pretty cool story. All about automating all the things!

Would love to read more about whether they started with microservices or had
an MVP monolith that they then cut parts off of.

------
bambax
> _We since have moved to a multi-way handshake-like upload process that uses
> signed URLs vendored to the clients upon request so they can upload the
> files directly to S3._

How does this work in practice / where can one learn more about this?

~~~
rawnlq
I want to make sure that I understand the security aspect of this.

You can argue that the user can upload anything using the original api anyway.
But in the original case you can do server-side validation before the upload
is proxied. I am thinking stuff that are domain specific like only allowing
videos that are 6 seconds long or something.

You can move the validation to the client but the client can be easily
modified. An actual user might not do this but someone trying steal your
storage space (for serving malware or something) might?

These signed urls also seem to expire based on time so you can potentially
save the url and upload again later if you allow generous expiration. (again,
not really something I see being a huge problem)

But I guess these aren't really serious issues compared to the cost savings.
Am I missing other ways this can be exploited?

I am looking into the GCS version, not S3, if that matters:
[https://cloud.google.com/storage/docs/access-
control/signed-...](https://cloud.google.com/storage/docs/access-
control/signed-urls)

~~~
ryanworl
You would use two buckets in this case. Input bucket gets consumed by worker
processes to do the transcoding (and validation) and then they upload into the
output bucket. The output bucket is what you serve to clients (hopefully with
a CDN in front).

~~~
rawnlq
Thanks! This is a great solution but none of the tutorials/blogs I read on
pre-signed uploads mentions it.

Do you have links (or just keywords) to learn more? Will I need to add
something like Cloud Pub/Sub to my stack?
[https://cloud.google.com/solutions/using-cloud-pub-sub-
long-...](https://cloud.google.com/solutions/using-cloud-pub-sub-long-running-
tasks)

This is more complicated than I imagined so I am not sure the cost saving will
still work out (factoring in development time and extra code maintenance
cost).

~~~
ryanworl
No, I don’t, sorry. What I can promise you is that you’ll thank yourself for
implementing it! There is hardly any additional complexity here because you’d
probably be uploading the derived content somewhere anyway. Now you’re just
putting in a different place than the source.

You can use whatever queue you’re comfortable with so long as you can pipe the
upload events from the bucket into it. The pattern I’m outlining is just a
physical separation of buckets to make access control much harder to screw up.

------
pul
> However, we discovered after some time that the custom Python implementation
> for those workers was dropping up to 5% of the events. This was mostly due
> to the nature of how reading happens with Kinesis: every stream has multiple
> shards (ours up to 50!) and each reading client would use a so-called shard
> iterator to keep track of where it was reading last. Since the used machines
> could always crash, be recycled, or scaled down, we needed to save those
> shard iterators in some serialized format to Redis and share them across
> machines and process boundaries. Since we had so many shards, every once in
> awhile we would skip events and hence lose them.

I've never worked with Kinesis, but in Kafka you'd store offsets specifically
to solve this issue. When one of the members of a consumer group would drop
out, the partition (read: shard) would automatically be reassigned to another
member. This gives an at least once delivery guarantee, combined with
idempotent actions gives effectively once semantics. No need to loose any
messages. What was the issue that the dubsmash engineers were solving here?

~~~
alexatkeplar
With Kinesis, you would just use the Kinesis Client Library
([https://github.com/awslabs/amazon-kinesis-client-
python](https://github.com/awslabs/amazon-kinesis-client-python)) which would
automatically handle committing the offsets to DynamoDB.

Home-rolling a checkpoint-free event pipeline is a rookie mistake; it's a pity
they didn't come across our Snowplow project (Apache 2.0 event pipeline
running on Kinesis, Kafka and NSQ,
[https://github.com/snowplow/snowplow/](https://github.com/snowplow/snowplow/)).

------
pinarello
Anyone knows how they handle copyrights of movies and music?

~~~
coob
They don't

------
karterk
> Although we were using Elasticsearch in the beginning to power our in-app
> search, we moved this part of our processing over to Algolia a couple of
> months ago;

How many records are you storing on Algolia?

~~~
sundev
I'd love to know also, Algolia pricing seems extremely expensive, but I'd also
imagine at 200M users they have sufficient funds to pay for it.

~~~
searchfaster
If you are interested in trying out an alternative, please let me know.

~~~
Redsquare
I would, we run 2 128gb enterprise algolia clusters at significant cost, 200
million documents

~~~
searchfaster
Please take a look at [https://searchera.io](https://searchera.io) for a demo.
My personal email is on my profile.

------
ramshanker
Their signup page failed. It wouldn't accept my email. And when I go to
password reset, it says User/Email does not exist.

Neither does Facebook login work.

------
dominotw
jobs link from that page
[https://www.dubsmash.com/jobs/](https://www.dubsmash.com/jobs/) seems to be
dead?

~~~
wkd
Bad link, remove the trailing slash

~~~
sundev
Its the little things :)

------
gagabity
200M users on Heroku must cost a fortune!

------
the_scrivener
I am genuinely curious about the trade-offs, as the bad and the ugly are not
mentioned. Being realistic, there are too many moving pieces there, and yet
the team of 3 remains experimental?

