
Building a r/place in a weekend - mxstbr
https://josephg.com/blog/rplace-in-a-weekend/
======
josephg
Thanks for the support everyone! It was a really fun little project. I'm happy
to answer any questions people have.

I'd also like to say that I highly recommend everyone does projects like this
from time to time. I don't know any way to gain programming skills faster than
making small throwaway projects using new tools and techniques.

~~~
thidr0
Any tips for coming up with good projects to do? Something like this is
obviously beyond the scope of most novices.

~~~
popey456963
I'd honestly disagree. Start small, don't expect to build something that works
for 100k users and you'll be fine!

Take the bare bones of this project:

\- Canvas.

\- Websockets.

That's literally it. You'll need to know how to draw on a canvas, and how to
send and receive WebSocket information. You can quite happily keep the current
state of the canvas in an in-memory array, perhaps saving it to a file every
few minutes in case the server crashes. Then, perhaps, when that's done you
can swap our your in-memory array for a REDIS bitfield, improve the web
sockets to no longer use JSON, but instead binary? Both of which should be
only a few tens of lines of changes, but after that you'll be able to support
tens of thousands of simultaneous users with hundreds, if not thousands of
changes per second.

The thing with this project that's complex is the number of users required to
use this at once, lessen the requirements a little and you'll come up with a
simple project.

------
franciscop
> "I'd never actually used kafka before, and I have a rule with projects like
> this that any new technology has to be put in first just in case there are
> unknown unknowns that affect the design."

I have been doing the same intuitively for as long as I can remember but never
stopped to realize this or why. I wonder what else I've learned by doing like
this that now I use unconciously.

~~~
lucb1e
I think Jeff Atwood (known from CodingHorror and StackOverflow) once talked
about "de-risking" a project. I'm not sure if this is a general, well-known
business technique, but from all the business things I've had in school, this
was never any. It made a lot of sense to me too and having a term to throw
around helps convincing others that it's a good idea.

Teachers typically want to first make up requirements and use-cases, then
functional design, then technical design, then either code and tests or first
tests then code (depending on the teacher)... Basically, you wait till 60-70%
of the work is done to discover design flaws. Later on we had some Agile stuff
as well, but more as a "this also exists" rather than "this is how it's done".
Doing some prototyping and benchmarking to see whether something works at all
was never part of anything.

One, exactly one subject ever had a performance requirement: 1000 simultaneous
users in a multiplayer game. And it had to work over Java RMI (which makes no
sense at all). I was the only person of two classes who pushed for (and was
finally granted) the use of raw sockets. I was the only person who took this
as a challenge and ran thousands of prototype clients on the school's
computing cluster on a Saturday night so I'm not taking anyone's compute time.
They never even looked at it. But next Wednesday is the last thing I will ever
have to do for that school (unless I have to resit) and I'm so happy I'm done
with their shit and can do my own thing next. Properly.

~~~
franciscop
I can see though how arguably it's more applicable to personal or startup
(flexible, able to pivot) projects, not so much to medium-big companies. I can
see how it might be more important to implement it regardless of it being 1h
or 1 week than asking whether or not it's a good idea to implement it.

And also arguably the university teaching focus on these companies. That is
why you have all of these fancy ways of encapsulating dependencies and
wrapping them into oblivion.

Funnily enough a really similar thing happens in other degrees. I studied
Industrial Engineering and can calculate whatever you want about the
cinematics of a robot arm but it wasn't until I set up with some friend to
learn how to make one from scratch that we really knew what it was all about.

------
moopling
You could make it so that blank pixels are free to draw on, but it takes
longer to redraw subsequent times. This would encourage the board to get
filled up even with a small number of users, but would eventially allow things
to be "locked down". Also would work well if you expect users to scale with
time!.

For example, first draw = free. 2nd redraw 10 seconds, 3rd redraw 20
seconds... capped a 5 minutes. Not that the actual implimentation is that
important.

~~~
josephg
Oh thats a great idea! That would also encourage people to draw new things
instead of defacing old ones.

One of the first things that happened when the site went up was that someone
started drawing something, and then someone else immediately start spamming
junk pixels over the top of it. Despite being able to draw literally anywhere
else in the world they thought the best use of their time was to ruin someone
else's creation. It was kind of disappointing to witness.

I might make that change now actually - the simplest form of that is very
easy. I can just make white pixels cheaper to draw on, and for everything else
there's stiffer rate limiting penalties. (Which isn't quite what you said, but
I think its the MVP version of it)

(Edit: This is implemented now. You can draw over 25 white pixels in each 10
second window, but only 10 colored pixels)

------
calcsam
Cached:
[https://webcache.googleusercontent.com/search?q=cache:-I98fR...](https://webcache.googleusercontent.com/search?q=cache:-I98fR6PE7kJ:https://josephg.com/blog/rplace-
in-a-weekend/+&cd=1&hl=en&ct=clnk&gl=us)

~~~
josephg
Thanks. My blog is running on the same server as the app itself. Its a small
linode machine with only 1 CPU core, and nginx isn't keeping up with the
traffic of both sites while fighting for CPU with ghost and kafka.

So much for my big talk about performance numbers. I'm fixing it as fast as I
can.

~~~
lucb1e
Serving a static page shouldn't take much though... Are you using Wordpress?

~~~
josephg
Ghost, and I agree. I think the server might be getting thrashed by bots
hitting sephsplace.

I've spun up a new much bigger server to handle the load. I'm just waiting DNS
to propagate and it should start running much smoother.

------
gurgus
This is super awesome work. Well done - you should be proud!

I have to admit, I remember stumbling across your comment when you accepted
the challenge and in my mind I scoffed, thinking you were never going to do
it. Boy was I wrong! Once again, this is super cool.

~~~
josephg
Heh thanks!

I feel like there's two kinds of people who make bold statements like that:
There's young people who are suffering from the Dunning-Kruger effect -
inexperienced but think they're hot shit. Then there's people who've actually
done a lot of hackathon-type events and as a result know what it takes to pull
them off successfully. (Time, caffine, and a deep familiarity your tools.)

~~~
firebones
As the one who challenged you in that original thread, what drew me to your
initial comment was the great point that you made: that much of the time and
difficulty in doing something novel is making many of the tough decisions, and
that once those design and technical decisions are made (and revealed), it
seems "obvious" to others, and is judged simple in comparison.

Congratulations on following through, and demonstrating your core premise!

What were the top things that you felt _weren 't_ captured by that premise--
for instance, undocumented decisions that you had to discover on your own, or
cases where you made tradeoffs that led to unexpected complexity? Were they
maianly around bot-mitigation?

~~~
josephg
Thanks for saying so!

> that much of the time and difficulty in doing something novel is making many
> of the tough decisions, and that once those design and technical decisions
> are made (and revealed), it seems "obvious" to others, and is judged simple
> in comparison.

Yes - one of the things that drew me to the project was how building this in
an event-sourcing style fits so well here. Doing it that way solves some of
the architecture problems reddit talked about in their blog. It seems obvious
to me that this is a good approach, but obviously not everyone shares that
view!

> What were the top things that you felt weren't captured by that premise--for
> instance, undocumented decisions that you had to discover on your own, or
> cases where you made tradeoffs that led to unexpected complexity? Were they
> maianly around bot-mitigation?

Thats a great question, but I didn't spend much time surprised.

The thing I was most concerned about was kafka, but integrating kafka turned
out to be was delightfully easy. I had to write some code to buffer recent
operations in my server for catchup - I wish kafka had an API for that, but
that wasn't hard to work around.

I think getting notifications working would have been a time sink but I
explicitly removed them from the spec so I wouldn't have to deal with them.

It took me way too long to get kafka actually running through systemd on my
linode. But I've spent enough time with apt-get that I wasn't _surprised_ ,
just disappointed.

I was surprised how quickly people started drawing smut, and how much time I
needed to spend early on cleaning things up or writing tools to remove large
bot-drawn genitals.

There are still a lot of decisions around rate limiting that I feel uneasy
about. I worry that reddit's 5 minute rule wouldn't work for a little website
like mine. I allow ~1 edit per second. Is that a good idea? I don't know. Its
an expensive experiment to try different values and see what happens because
there's a community involved. And I don't have reddit's huge user base. But
maybe I'm being unnecessarily risk averse by allowing so many edits. Forcing
slow editing is bolder - it requires a longer commitment to draw, but is
probably also much more satisfying to people who create content.

~~~
cicloid
Did you consider using Docker for the provisioning of Kafka?

A couple of days ago I remember reading how difficult was to deploy Oracle on
Linux and how Docker made this a breeze. I wonder if Kafka would also fall
onto the same premise.

~~~
josephg
Probably. I was bullish on docker in the past, but I'm no longer convinced its
worth the trouble for small projects. It adds an awful lot of operational
complexity for what is essentially a more complex abstraction around
processes.

I think its a nice tool for deployment and making reproducible builds, but a
lot of other things become harder through docker - like managing a databases's
data, and communication between local processes.

Maybe the tooling has improved in the last few years, but I've gone back to
the raw unix coalface.

~~~
Drdrdrq
>... but a lot of other things become harder through docker - like managing a
databases's data, and communication between local processes.

It doesn't have to be this way. If you use shared folders to persist data on
host you are in no worse position than you would be in if you used natively
installed app, persistence wise.

I think the Docker's focus on orchestration (which makes business sense for
them) is the reason why running DBs in containers got bad reputation. But
really, if you use shared dirs with host and view containers as processes you
can use them for DBs too.

IPC with containers OTOH forces you to architect the system as a bunch of
microservices, which is usually not a bad idea either.

------
reledi
Event Sourcing is a great pattern that could be used for many applications. It
gives you an audit trail for free and lets you rewind to any point in time.

You don't even need to use a dependency like Kafka. We built a tool for
tracking the lifecycle of software we ship, it uses Event Sourcing with
snapshots and is open source:
[https://tech.fundingcircle.com/blog/2016/09/06/shipping-
in-f...](https://tech.fundingcircle.com/blog/2016/09/06/shipping-in-fintech/)

~~~
unixhero
In this context, what do you mean by pattern?

------
drewmate
Would you consider streaming/broadcasting yourself doing a project like this?
I think it would be interesting to pop in and watch you work, especially if
once every couple hours you provided some commentary about your thought
process and what the latest hurdle to overcome is.

------
chungy
That was pretty fun. I missed out on the original until many days after it
happened, but just now I started to recreate the Windows 95 start button in
the bottom left corner... seemed almost daunting to even do that many pixels,
but it took no time at all until other people started collaborating and
finished the button in a time I couldn't imagine at the start of that
miniature journey.

I have no clue who those people are. It's just anarchy and the only thing we
have in common is a canvas :-)

~~~
pending
I was repairing the Germany flag above, when I saw you wanted to draw a
Windows 95 start button I moved it up to make space. I was amazed when I saw
that people building the button helped me move it up a notch :-) ps: also did
some grey pixels on the button. great fun indeed

------
technion
I could well have missed it but - have you got any scripts or config
associated with setting up Kafka as you're using it?

I haven't used it before, and the comments on your writeup make it sound more
approachable than I'd expected.

------
mwcampbell
First of all, great work. It's been a long time since I've done a weekend hack
like this myself.

However, it seems to me that Kafka is unnecessary in this system. It's clear
that, at least in the final version, the system isn't designed to scale beyond
one application server. For one thing, you're storing the ban list on the
local filesystem. So it's definitely not 12-Factor compliant. And you're
storing a local snapshot of the image. So why send the edits out to Kafka,
only to have them come back in to the same process?

~~~
josephg
Even if I'm only using a single machine, using kafka allows me to load balance
across a local cluster of server processes. This is important for nodejs apps
because node is single threaded. And if I skipped kafka I'd still need to
store the edit log somewhere for catchup. I've written that code a few times
now, and I think its arguably more complicated to implement than just using
kafka directly.

But right now to help deal with load the process is running across 2 machines.
I just had to manually copy the ban list and snapshot database files across.
When the server came up it pulled the snapshot version out of the database
file, caught up from the kafka log and went to work.

Having a nice solution to distribute those files would be lovely - but I made
the whole project start to finish in 2 days. I'm not going for 12 factor
compliance here.

~~~
mwcampbell
Ah, makes sense.

------
wiradikusuma
honest question, how did you fill out the canvas? you put it live, and then?
thousands of people (including those who care enough to setup bot) suddenly
came from your tweets?

~~~
Strom
I imagine quite a few came from the HN thread [1] where he originally made the
bet that he could do it. That's how I found the site. It was 95% blank and I
just decided to draw a bit. Later I came back several times to see how my
drawing had lasted and fixed some minor vandalism. Also had to completely
redraw at one point due to the Merkel bot. [2] Finally, after the blog post
discussing how he built it started to get traction the vandalism rate got so
high that I decided to write a bot to maintain the art for me.

\--

[1]
[https://news.ycombinator.com/item?id=14109158](https://news.ycombinator.com/item?id=14109158)

[2]
[https://twitter.com/josephgentle/status/853312152223965184](https://twitter.com/josephgentle/status/853312152223965184)

~~~
josephg
Someone also posted it to the original r/place subreddit. I think a lot of the
traffic came from there:
[https://www.reddit.com/r/place/comments/65hs9j/inspired_by_r...](https://www.reddit.com/r/place/comments/65hs9j/inspired_by_rplace_a_twitter_user_is_making_his/)

------
youmar
I might be going a bit tangential. I did read the article but I always wonder.
How would I go about learning all the technology involved to build all this..
is there like a guide for people who know the basics of php and mysql. I'm a
civil engineer by trade and only do this as a hobby sometimes.

~~~
lftl
I make my living programming, but I've never had any formal training --
programming used to just be a hobby as well. I learn the best by digging into
what other people have done and seeing how they did it. So I'd suggest taking
a look at the GitHub repo for this project and set up your own instance of it.
Learn how to install and configure the dependencies. Then make a few small
changes that forces you to learn how the code works.

------
RichardHeart
Seph's law: Programming is 95% decisions and 5% typing.

You are an inspiration! If people grasp what you've done then you've lowered
the fear people have of copying things. That should lead to more attempts,
failures, improvements, competition, a true catalyst!

~~~
josephg
Aw thanks! And yeah I agree. The way to get good at programming is to make
lots of stuff.

> Seph's law: Programming is 95% decisions and 5% typing.

:) In real projects there's usually the other 95% of the time spent reading
the existing code and figuring out how it works. But thats much harder to fit
in a glib saying.

~~~
RichardHeart
"the other 95% of the time" haha

------
twic
Would the server-side bit of this be a good, or at least fun, benchmark? More
towards the TechEmpower end of the pool than TPC. There's a pretty clearly
defined API, and some fun concurrency and data-handling to do.

------
gricardo99
Just the write-up of what he did would take me more than a weekend.

------
siscia
Great work thanks fir sharing!

One decision that I don't agree on is the choice to send messages in order.

I usually prefer to flood the client of messages and attach at each message a
timestamp (a monotonic increasing integer) and having the client to re-order
everything.

It is cheaper from the server point of view and the worked is done by the
clients.

Are there any reason why you picked your specific solution?

Just a technical question, that I am very curious about. I guess that there
are concerns that I am overlooking at the moment...

~~~
josephg
That would work, but I don't think it would make anything simpler. You would
essentially be reimplementing parts of TCP on top of whatever unordered
protocol you'd use to make sure your client view is eventually consistent. All
the APIs I used strictly order messages. Websockets simply don't have a UDP-
equivalent, unless you want to send the events over plain long-polled HTTP
requests or something, but that sounds like an inefficient recipe for
disaster.

And if you did that the client would need to be able to track and fetch lost
messages. That adds client complexity and server complexity for the extra
endpoints.

My imagination is haunted by premonitions of bugs. In one ghostly image I see
edits getting silently lost sometimes and not knowing why. Just, sometimes if
I draw a line, on your screen you see the occasional block missing until you
refresh your browser. In another premonition I imagine a packet reordering
system thinking its missing a single operation and waiting on it forever. To
the user it looks like everything has frozen completely.

We have well-implemented protocols that deliver messages in order. I see no
reason not to use them.

~~~
siscia
Definitely, mine was just a simple question on the design, it is interesting
to see how different people thinks ;-)

The work to keep everything in order, definitely, has to be done somewhere.
You are doing it twice, once sending the messages actually in order, another
in the TCP stack. Clearly the one on the TCP stack "comes from free" for
anybody reasonable.

Like yours, also my mind is haunted by possible bugs, however in this cases I
prefer to borrow from the Erlang philosophy and embrace possible failure. The
way I model this problem is that the packets arrives usually in the same
order, or one close enough, to the order in which they are send. It is rare
that a single packet get lost, but if this happen I want to be able to reask
for it and don't block the whole rendering.

I would have accept to receive messages slightly out of order and have a
guarantee of something like the last 10/20 messages.

Would it require some reimplementation of the TCP stack? For sure! Would it be
the common case? Definitely not! Would it make the architecture more
resilient? I believe so but I may be wrong.

Now, to be clear, your work is amazing! I am really glad that you shared it
and it I a pleasure to have this kind of technical conversation. Given your
technical and time constraints I would have done just the same.

Just wondering if you have any thoughts on my counter points.

Happy Easter!

------
gspetr
Great work.

I'm curious about the algorithm for bans, once you're done with the project I
hope that you will disclose it.

~~~
josephg
The system isn't fancy. Its just the bare minimum code to stop the specific
forms of abuse I was seeing.

I replied to another comment with details:
[https://news.ycombinator.com/item?id=14125518](https://news.ycombinator.com/item?id=14125518)

------
ralusek
This is absolutely great, and also much closer to how I would have done it as
well. Reddit's implementation seemed like they chose so many wrong tools.

------
socmag
Fantastic job, congrats and Happy Easter, it was very nice following along. A
lot of people were rooting for you.

Someone get this man a chocolate egg, he deserves it!

------
ziikutv
Its down

------
EGreg
What if we had reddit but with arbitrary group activities instead of just
chat?

------
arglebarnacle
It looks like it's down for me. Is anyone else still able to see it?

------
janci
will be possible to make this entirely serverless with WebRTC P2P ?

------
mschuster91
Hahaha, awesome work. And sorry from Germany for the Nazi morons spamming you.

~~~
waitwhatt
Germany did nothing wrong.

~~~
mschuster91
I highly suspect at least the spamming of distorted/vandalized pictures of
Merkel is the work of German alt-rights (or, to be accurate, neo-Nazis).

Spamming swastikas, though, this one is a known modus operandi of trolls
worldwide.

~~~
andoon
It says "rules: no swastikas", which is just an invitation for people to draw
swastikas.

This reference is better than a swastika though.
[http://i.imgur.com/sFwteao.png](http://i.imgur.com/sFwteao.png)

~~~
josephg
I added that after someone got snippy at me after I deleted their 'art'. This
way expectations are clear about when I'll intercede with my admin powers.

------
stanislavb
Good job!

------
rv11
super awesome!!

------
65827
Jesus, how many more weeks are people going to write 10 articles a day about
this silliness? You'd think it was the first online webpage that did something
besides display hypertext?

