
Lessons Learned from Building an AI Writing App - jeffshek
https://senrigan.io/blog/how-writeupai-runs-behind-the-scenes/
======
minimaxir
About GPT-2 finetuning: I have a Python package which makes the process
straightforward
([https://github.com/minimaxir/gpt-2-simple](https://github.com/minimaxir/gpt-2-simple)),
along with a Jupyter notebook to streamline it further
([https://colab.research.google.com/drive/1VLG8e7YSEwypxU-
noRN...](https://colab.research.google.com/drive/1VLG8e7YSEwypxU-
noRNhsv5dW4NfTGce)).

~~~
aaron-santos
What a great coincidence. I found gpt-2-simple on Friday and just got it
running in a Flask on Fargate a few mins ago. GPT-2-simple made the process so
simple that my biggest problems were infra and not inference.

Have you heard of any success on running in a lambda?

~~~
minimaxir
GPT-2 small might be too big/slow for a lambda (admittingly I am less familiar
with the AWS stack, more familiar with GCP). In the meantime, I do have it
running on Cloud Run ([https://github.com/minimaxir/gpt-2-cloud-
run](https://github.com/minimaxir/gpt-2-cloud-run)) with decent success.

------
sdan
Love the writeup! Not sure if it's just me, but showing how your entire infra
works is just amazing (I wrote something similar on how I did my infra:
[https://sdan.xyz/sd2](https://sdan.xyz/sd2)).

How much do you spend per month on this?

Also why do you need a GPU? Last time I played around with GPT, you didn't
need a GPU to run with inference (you already have the weight files).

Also are you running 774 or just 335?

~~~
jeffshek
Regarding the diagram - yeah, it's SO much easier to visualize when you see
it. It takes oddly a bit of time just to make boxes fit though. Huge fan of
Traefik too :)

Most of my fine-tuning was on 355 (otherwise, 774 is too hard to train).

However, the default prompt on [https://writeup.ai](https://writeup.ai) goes
to 774.

The GPU with inference is helpful for much faster responses -- for instance,
it's easy to run inference (CPU) if you're getting a one way shot of 200 words
(no revisions), but if you're constantly changing and revising, then speed
ends up mattering a lot for UX.

EDIT: Looked over your articles, I'm literally floored/amazed you're in high
school and know this much. The whole world is your oyster.

------
solidasparagus
Really cool writeup! I'm surprised you trained with TF and then deployed with
PT. I feel like PT is considered easier to train with while TF is easier and
faster to set up production-level serving. I think you might have had to do
less work if you used TF Serving.

~~~
jeffshek
[https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)
(was) based on PyTorch, so they originally had lot of the models like
gpt2-small. Because of that influence, PyTorch was probably going to win
there.

The one con about TF Serve (TFX) is packaging the entire model into the
container (so that ends up being 3gb?+). This was a couple of months ago, so I
might be wrong by now ... It was an area I wasn't very confident I could do,
TF Serve is really new, so there aren't many guides (and many were already out
of date).

~~~
solidasparagus
TF Serving has been around several years and is pretty mature. (It predates
TFX). You can stick the model in the container or have the container pull the
model from blob storage (or have no container at all if you really want).

It's too bad you didn't find a good guide - if you have the training dump a
SavedModelBundle at the end, you can have a production-quality serving
microservice up and running in about two lines of code -
[https://www.tensorflow.org/tfx/serving/docker](https://www.tensorflow.org/tfx/serving/docker).

But it doesn't really matter since you got it working.

------
throwaway_bad
This was very informative and thoroughly convinced me that I am not good
enough to do ML dev ops myself if I ever want to deploy a model.

Is there an easier way? Shouldn't there be some company who will take my money
to instantly turn my models into a microservice?

~~~
rectangletangle
I'm pretty sure there's more than a few options out there.

Just a quick google search found these:

[https://www.floydhub.com/](https://www.floydhub.com/)

[https://algorithmia.com/product](https://algorithmia.com/product)

(I'm not affiliated with them in any way, and haven't ever used their
services.)

------
hint23
To give some perspective, the setup of
[http://textsynth.org](http://textsynth.org) consists in a single C Linux
executable of 250 KB on the server and in 150 lines of Javascript code on the
client without any dependency on other libraries...

------
Havoc
Nice write-up. :)

Must have cost a fortune if each instance gets it's own GPU.

I suspect the CUDA via docker added a fair bit of complexity.

I toyed around with TF2 on a VM and while difficult to get the versions
aligned it wasn't as troublesome as the docker sounds.

~~~
solidasparagus
All cloud providers have prepackaged VM/Containers with all the versions
aligned and GPU libraries + DL frameworks preinstalled with full GPU support
enabled. They are generally called things like Deep Learning container/VM/AMI.

I always try to use one of those.

~~~
Havoc
>All cloud providers have prepackaged VM/Containers with all the versions
aligned and GPU libraries + DL frameworks preinstalled with full GPU support
enabled.

ah didn't know that. I've been spinning up blank nix boxes which does end up a
little fiddly until you find a combo that works

------
alphagrep12345
Amazing write-up. Do you mind sharing how much this costed you?

------
samirsd
trying to do this with video style transfer (haven't successfully attached
gpus yet so it's sloooowww): [https://www.vcr.plus](https://www.vcr.plus) \--
example video:
[https://storage.googleapis.com/vcr_plus/out_6ae00222-9111-46...](https://storage.googleapis.com/vcr_plus/out_6ae00222-9111-46b6-8e3f-5239120d8b0e.mp4)

------
Jack000
Have you tried batch inference on the GPU? In my experience it increases
throughput by an order of magnitude.

