
Show HN: Staffjoy V2, now open source - philip1209
https://github.com/staffjoy/v2
======
philip1209
I announced two weeks ago that we are shutting down Staffjoy [1] and open
sourcing our code. Last week, our primary V1 repo was submitted to HN [2].
Yesterday, I open-sourced our V1 microservice Chomp for computing shift
scaffolds from forecasts [3]. I also also published our YC Fellowship
application [4] and pitch decks [5] on our blog.

Today we opened the V2 repo of Staffjoy, which contains a microservice
architecture, detailed here [6]. V2 was a monorepo, meaning that all of the
code is in one repo. This is the largest and most sophisticated of our
repositories.

You can learn about the difference between V1 (autoscheduling of hundreds of
workers) and V2 (Excel replacement with text messages for small businesses) in
our design journey blog post [7].

We are continuing to open source our last repos, which are mainly V1
microservices. Specifically the V1 Cron microservice, V1 mobile applications
(iPhone + Android in React Native), V1 Scheduler microservice (our first
scheduling algorithm), and the V1 Mobius microservice (which solves an
assignment problem) will be open-sourced in the coming days.

If you have any questions - please let me know! I'm providing contract support
for anybody wishing to deploy, modify, or customize Staffjoy code through
Moonlight [8].

[1] Shutdown announcement:
[https://news.ycombinator.com/item?id=13647382](https://news.ycombinator.com/item?id=13647382)

[2] V1 "Suite"
[https://news.ycombinator.com/item?id=13730488](https://news.ycombinator.com/item?id=13730488)

[3] Chomp service [https://github.com/staffjoy/chomp-
decomposition](https://github.com/staffjoy/chomp-decomposition)

[4] YCF Application: [https://blog.staffjoy.com/staffjoys-yc-fellowship-
applicatio...](https://blog.staffjoy.com/staffjoys-yc-fellowship-
application-5771c8a105cd#.lvnoezlpr)

[5] Pitch decks: [https://blog.staffjoy.com/staffjoys-pitch-decks-that-
raised-...](https://blog.staffjoy.com/staffjoys-pitch-decks-that-
raised-1-7m-15812018968#.5k08omn51)

[6] V2 Architecture:
[https://blog.staffjoy.com/staffjoys-v2-architecture-9d2fcb40...](https://blog.staffjoy.com/staffjoys-v2-architecture-9d2fcb4015fd)

[7] V1 to V2 Design Journey:
[https://blog.staffjoy.com/staffjoy-v2-ca15ff1a1169#.ttejeklw...](https://blog.staffjoy.com/staffjoy-v2-ca15ff1a1169#.ttejeklwn)

[8] Contract support -
[https://www.moonlightwork.com/staffjoy](https://www.moonlightwork.com/staffjoy)

~~~
bsbechtel
Phillip, I'm sure going through all this as a part of your shutdown isn't
easy, but it's really great that you're taking the time to do it. I'm sure
there are many who will benefit from making all of this open source. Many
thanks and best of luck going forward!

~~~
philip1209
Thanks! Please reach out if I can be a resource too.

------
troyk
We are sharing your sentiment that a lot of our user-facing errors could be
limited by moving away from weak typing (current stack is Ruby). In hindsight,
did Go and protobuf prove well, we are evaluating Go and possible Elixir as a
sweet spot between static and dynamic typing.

~~~
philip1209
Yes. I would always use gRPC-gateway in the future. It's an amazing library.
Check out the `protobuf` folder - we basically defined the whole api there in
a language-agnostic way. For client libraries, you can autogenerate gRPC
definitions quickly. For servers, gRPC-gateway provides a Swagger definition
that meant that we always had 100% up to date API docs that include a Postman-
style way to execute calls. The auto-generated API docs sped up front-end
development so much.

I think that access to the docs requires logging in, so here are some
screenshots: [http://imgur.com/a/R0AvB](http://imgur.com/a/R0AvB)

Jason Chen is using Elixer/Phoenix for his new project, by the way, and thinks
that it's the future for Rails developers.

~~~
troyk
gRPC-gateway looks very attractive. Thank you for posting this for the rest of
us to learn from, a brilliant effort for sure.

~~~
philip1209
Its documentation is horrible for getting started. Lmk if you have issues.

------
spacetexas
What are your thoughts on the re-write in hindsight, would you have done it
differently looking back on it? Perhaps have gone for a more 'monolithic'
approach?

~~~
philip1209
Yes, in hindsight I would have done more of a monolith datastore with fewer
services. Faraday for auth was amazing - we could put anything we wanted
behind it and centralize authentication/authorization. I liked how
configurable Kubernetes was for everything from secrets to environments .
Managing service discovery errors was annoying. Monorepo was a big win - I
would always do this in the future.

------
Dowwie
To what extent did you try to refactor your Python code/architecture prior to
adopting Go for v2? Also, could you share one example of the dynamic typing
related problems? Specific metrics would be helpful to understand v1
limitations. Thanks for considering my question!

~~~
philip1209
I talked about this a little bit more on reddit [1] - basically, running
Python code provided a lot of challenges.

We did a lot of design research prior to starting the V2, and we realized that
we needed to make huge changes in our data normalization. For instance, in the
V1 - we assumed that workers could only have one "role", but we realized that
people can rotate between many jobs often. Making this change to the API while
maintaining support for existing users would have been really difficult.

So, we basically started looking at all of the work we would need to do for a
data normalization change, and said "well wait - if we were to build it from
scratch, how long would that take?" That conversation also made us realize
that we could jettison a lot of unneeded features if we started a V2.

We chose Go to address some of the issues that we had with Python, including
ease of deployment, ease of running and lack of static typing.

[1]
[https://www.reddit.com/r/Python/comments/5w3o6t/staffjoy_v1_...](https://www.reddit.com/r/Python/comments/5w3o6t/staffjoy_v1_has_been_open_sourced_if_you_want_to/de96ybk/?context=3&utm_content=context&utm_medium=user&utm_source=reddit&utm_name=frontpage)

------
hoodoof
Real-world software can be messy because sometimes you just have to get stuff
done within time constraints.

~~~
philip1209
Yes. Protocol buffers and grpc-gateway made it easy to scaffold out a quick
api, write all the interior logic, and know that there would not be any major
issues due to strong typing. We used Gogoproto to annotate the protobuf files
so that database crud is easy too.

------
dragonsh
Having worked with flask+Python eco-system and golang do you think if you
continued with Python for v2 it might have been better (since you were already
familiar with Python even though golang is statically typed)?

Also do you think Python 3.6 static checking with mypy would have eliminated
need for go?

Another question is for algorithmic scheduling why you didn't use Python
libraries but Julia, given your v1 was written in Python?

~~~
philip1209
No, I was having issues with too much "magic" in Python libraries. I'm
continuing to use Go for my new company.

In particular, running SQLAlchemy was a nightmare (see a reddit comment I made
about it here [1]). I also wanted a more parallel language - I was tired of
having to use queues to do parallel tasks. Go's static typing made everything
easier, from auto-generating Swagger definitions to having usable code
documentation via Godoc. Most of our runtime errors while using Python were
due to type mismatches that could have been caught by a compiler.

I'm unfamiliar with Python 3.6 static checking. I'll have to look into it.

For the algorithmic scheduling - our algorithms in Julia predate the existence
of a web app. We had customers using the Julia-based algorithm using the
protocol "spreadsheets over email". The website came later, and we ended up
switching the algorithms from Julia to Python after the web-app launch [2].

[1]
[https://www.reddit.com/r/Python/comments/5w3o6t/staffjoy_v1_...](https://www.reddit.com/r/Python/comments/5w3o6t/staffjoy_v1_has_been_open_sourced_if_you_want_to/de96ybk/?context=3)

[2] [https://blog.staffjoy.com/retro-on-the-julia-programming-
lan...](https://blog.staffjoy.com/retro-on-the-julia-programming-
language-7655121ea341)

~~~
dragonsh
Thanks for the reply and insight.

Why you had issues with magic in Python libraries, as I understand one of the
main design goals of Python and its libraries is "Explicit is better then
implicit" PEP-20 [1].

Indeed for our own project we chose python instead of ruby and ruby on rails
just because of this PEP-20[1].

Your blog and moving to go alarmed us if there is anything wrong with python
eco-system.

We tried go and due to too much boilerplate code and quality of the libraries
we dropped it and when it comes to interacting with database especially
postgreSQL all the speed advantage of go is not very useful as most tasks were
IO intensive. Also the database driver in golang for postgresql is not as
mature as psycopg. For CPU intensive tasks we relied on scientific python
which is written in C and it worked well with multiprocessing and asyncio in
python.

Also for the specific error you encountered in race conditions on update with
multiple request. We solved it by using SQLAlchemy with_for_update()[2] which
actually takes care at database level to make sure transactions are properly
handled. As database we were using is postgreSQL, we wanted the database to
handle ACID compliance, not implement in application code.

[1][https://www.python.org/dev/peps/pep-0020/](https://www.python.org/dev/peps/pep-0020/)

[2][http://docs.sqlalchemy.org/en/latest/core/selectable.html#sq...](http://docs.sqlalchemy.org/en/latest/core/selectable.html#sqlalchemy.sql.expression.Select.with_for_update)

------
kondro
Interesting to see how other companies' application stacks end up looking.
Thanks for releasing this.

~~~
zackify
Their React app definitely has a lot that could be done better. It is nice
seeing what other people come up with in big orgs and learning from what they
have written. Sometimes I have imposter syndrome to the max but this sort of
codebase I can actually read

~~~
zalmoxes
Their Go services definitely read like a very beginner or prototype project.
There are no tests, lots of global state, and little separation of concerns
between request/response business logic etc.

~~~
philip1209
Yes, it was a rush job. It started off nice, with tests and stuff. Then, we
had to hit a deadline. I wrote 100% of the backend code. There were no other
authors. Getting tons of functionality written without time to go back and
refactor or improve as I improved my knowledge sucked, plus there were a lot
of other non-code responsibilities at the time. If I had time, I would have
spent a lot more time improving this code. For instance, I discovered gRPC
interceptors way too late :-)

I recently started contributing to Buffalo, the Go web library by Mark Bates,
and it addresses a lot of the issues we had, such as managing webpack for
development environments.

Fun fact: I wrote a quick scraper in Go to relearn it before jumping in to the
V2, and Francesc from Google did a code review of it for the first "Just For
Func" episode. That made me realize just how much I had to improve my Go code:
[https://www.youtube.com/watch?v=eIWFnNz8mF4&t=2s](https://www.youtube.com/watch?v=eIWFnNz8mF4&t=2s)

~~~
philip1209
> I really enjoyed that episode :)

> Curious, since it was just you writing the backend, what was the reasoning
> behind doing microservices + react SPA. Both of those require a lot of
> commitment, and coordination. Microservices especially are something I'd be
> more likely to consider with a very large team/many teams instead of a
> single developer.

> Would you choose a microservice architecture again?

For microservices - we were planning integrations with messy systems. Think,
having to poll an API, deal with XML, or have to do custom auth on lots of
different incoming data endpoints. Microservices made it easy to add new
experiments (like our ical service) easily without adding complicated logic to
the central datastore. For reference, we already had API keys and signed
agreements with four integration partners (POS, HR, etc) that we wanted to
roll out in a short period, with more people lining up. We also wanted to add
messaging capabilities beyond SMS quickly, so that's why we architected the
bot like that - where some providers had strict sending limits (like twilio)
but others did not require as much of a strict queue.

For react - I didn't pick that and I've actually completely unfamiliar with
the library. I'm learning VueJS now and that would be my choice for future
SPAs. Now that I understand how shared components work, I would have changed a
lot of the system design.

I'm already starting on my next company. It's a monorepo with three folders:
pkg (packages), cmd (any `package main` commands), and static (all JS, SCSS,
etc that gets built by Webpack then wrapped up in a single bindata.go). The
primary app is a monolith. However, I'm making it possible to add additional
service, such as cron jobs in separate containers or a command line utility
based on the same protobuf definition.

~~~
zalmoxes
Thanks for being so transparent and sharing your learnings! Good luck with
your future project.

------
OutsmartDan
Hi Philip, does your app build really take 20 minutes?

~~~
philip1209
The Bazel build system [1] is the open-source version of Google's internal
build system. It's tough to set up, but once you do - it caches all builds,
down to the docker container generation. On changes, it analyzes affected
upstream projects and selectively rebuilds them. So, after an initial build,
rebuilds are blazingly fast. In fact, the simple act of creating a pull
request means that most production builds/deploys come straight from cache.

Here [2] is a screenshot of the actual "master branch test, lint, build, and
deploy 15 containers to kubernetes" job from our Jenkins. It took about 10
minutes on the build box every time.

That being said, in order to make builds work with Bazel, we had to do some
wonky stuff. We committed built Javascript files, and we had to manually
commit a lot of other types of data files (like bindata.go and protobuf
outputs).

[1] [http://bazel.build](http://bazel.build)

[2] <I'll drop this in as soon as Imgur/S3 is back online . . . >

~~~
philip1209
[2] [https://i.imgur.com/cpeld6C.png](https://i.imgur.com/cpeld6C.png)

