
Managing Apt Repos in S3 Using Lambda - daenney
http://webscale.plumbing/managing-apt-repos-in-s3-using-lambda
======
paulddraper
Interesting idea to use Lambda. (Personally, I've used Aptly.)

\---

For the Apt transport, I recommend apt-boto-s3
([https://www.lucidchart.com/techblog/2016/06/13/apt-
transport...](https://www.lucidchart.com/techblog/2016/06/13/apt-transport-
for-s3/)) that I authored. Unlike apt-transport-s3, it

    
    
      * Works with AWS v4 signatures (required in some regions)
      * Supports If-Modified-Since caching.
      * Uses pipelining.
      * Uses standard AWS credential resolution, like ~/.aws/credentials or IAM roles
      * Allows credentials to be specified per-repo (in the URL)
      * Supports both path and virtual-host styles for S3 URLs
    

And it works with proxies if you use HTTPS_PROXY/HTTP_PROXY environment
variables (though I haven't tested this).

~~~
szinck
Oh cool. The only reason I chose apt-transport-s3 was that it was included in
Ubuntu 16.10. It also only uses stuff from the standard python library so
there's less bootstrapping before it is usable.

That does look really cool though. For proxies it would be nice if it
supported the standard apt-get proxy declaration. Apt-transport-s3 basically
does that and sets the proxy env vars.

You could probably get yours into the offical ubuntu repos if you wanted,
though.

------
leetrout
Great write up. We use S3 for our Apt & Yum repos at work, too, only with
website hosting turned on for the buckets (losing some of the security OP's
solution has but adding simplicity).

I'd like to point out a tool I'm using now for building and managing our Apt
repos- Aptly [https://www.aptly.info/](https://www.aptly.info/)

It's really nice and Just Works™. It even has S3 publishing built in. Oh, and
you can build Apt repos on Debian, CentOS, MacOS X & FreeBSD (thanks Go!).

To solve for Yum & Apt I wrote a tool in Go that keeps all the metadata in a
JSON file in S3. It's just a map of S3 buckets that are repos to a list S3
URLs for each RPM / Deb. Whenever that file gets updated, just like OP, I send
an event from S3 to Lambda. Unlike OP, tho, I just have Lambda launch a task
in ECS and then my repo builds (& S3 syncing) takes place in containers (1
Ubuntu container for Apt repos & 1 CentOS container for Yum).

~~~
daenney
So, this way be dragons but... You could just have a small Javascript or
Python wrapper that simply calls out to Aptly. As long as you compile Aptly
for the right target you can bundle it up in the zip that your ship to Lambda.
Since it's Go it'll just run. Then your wrapper just takes care of calling
Aptly with the right arguments depending on what event got sent from S3.

~~~
leetrout
I actually prefer using the containers. It's easily repeatable anywhere I can
run docker (doesn't have to be AWS). I can also avoid any race conditions by
checking to see if the task is already running.

(And in general I'm not a fan of trying to do a lot in lambda. I like it a lot
more as a glue layer / event handler).

------
helper
I use deb-s3[1] which is sort of like this, but handles uploading the package
and updating the Packages index in a single step. deb-s3 can also sign your
Release file so apt won't complain about unsigned packages.

The post mentioned deb-s3 and dismissed it as more complicated to setup than
this solution. While the lambda solution is neat, I'm not sure I would
describe it as simple. For now I think I'll stick with deb-s3.

[1]:
[https://github.com/krobertson/deb-s3](https://github.com/krobertson/deb-s3)

~~~
szinck
True, lambda setup is not super simple. However with deb-s3 you have to have a
local copy of the .deb, whereas with what I'm doing the files never leave S3.

------
otterley
If you care about your time, PackageCloud is a hosted solution for managing
dpkg and rpm repositories. It supports repository signing out of the box. I'm
a happy customer.

[https://packagecloud.io/](https://packagecloud.io/)

------
cheez
I love this!

But I have to admit I'm a little confused. Is this mostly for large
repositories? I ship some software on multiple platforms and part of my build
process builds an apt repo (available over HTTP) as follows:

    
    
        run_gpg_command dpkg-sig -g " --yes" \
            --sign builder *.deb && \
            rm -rf repo && \
            mkdir repo && \
            cp *.deb repo && \
    
            cd repo && \
                (dpkg-scanpackages . /dev/null > Packages) && \
                (bzip2 -kf Packages) && \
                (apt-ftparchive release . > Release) && \
                    (run_gpg_command gpg --yes -abs -o Release.gpg Release) && \
                    (gpg --export --armour > archive.key)
    

Then I sync it to S3 using aws s3 sync.

What are the problems doing it this way?

~~~
daenney
It's not just for large repos. The idea is that you can just build your
package, drop it in S3 and the rest is taken care of. You don't need to
generate the rest of the metadata yourself, like the Packages or Release file.

It does however not address the issue of signing in this case. Would be
interesting to see if someone can extend it using an additional bucket to
store credentials with the necessary policies to only allow the Lambda
function to get and use them.

~~~
cheez
Seems to me that it's not really beneficial for most small software publishers
in that case. They can just use a variant of the script I posted above, I
suppose.

------
akerl_
Looks really interesting!

I may be about to show my apt inexperience here, but I don't see any explicit
callout of a signing step. Does apt handle signing only at the package level,
such that you'd just sign when you build a single package and then upload
that?

I've written a tool for managing Archlinux repos in S3 (
[https://github.com/amylum/s3repo](https://github.com/amylum/s3repo) ), which
I run in a container myself so that I can use my GPG key to sign the packages
and the full repo metadata. I'd love to move to something like this, so I'd be
able to let AWS handle running the lambda on my behalf, but I've yet to figure
out a good way to do that with the signing structure necessary

~~~
leetrout
Not OP but I left another comment further up-

I also run my Yum & Apt repo builds in containers and I have lambda launch
those containers when S3 is updated...

In order to facilitate signing I have my GPG key on the container host machine
and mount that directory as a volume in the container so the container has
access to the key as well.

For initially provisioning the container server I deploy the GPG key as an
encrypted blob and decrypt it on the host.

~~~
akerl_
Yup yup. I do similarly at the moment. I was brainstorming a bit more, I think
if I were to port to Lambda, I'd probably use a KMS-encrypted S3 object to
store the private key.

That said, it would mean fixing up my original key strategy. At the moment,
I'm using a signing subkey of my personal GPG key, since it's just a personal
repo and I control the VM/container where the stuff happens. By comparison, if
I were dropping it into a lambda/s3, I'd probably be more motivated to split
onto a fully separate key for the repo.

------
garadox
Very interesting - I've been working on exactly the same approach for a yum
repo, but hadn't solved the race condition issue yet. I might have to "be
inspired" by this post :)

------
omn1
I'm wondering what people are using to _build_ custom deb packages. We
currently have a set of simple shell scripts which first install the
dependencies, then compile some source code and finally use checkinstall to
build a Debian package from that. It works, but I'd much rather have a config
file with all the required steps instead of a buggy shell script. Is there any
better way?

~~~
otterley
fpm -
[https://github.com/jordansissel/fpm/wiki](https://github.com/jordansissel/fpm/wiki)

~~~
omn1
Looks good! Additionally, there's fpm-cookery, which comes pretty close to
what I want: [https://github.com/bernd/fpm-
cookery](https://github.com/bernd/fpm-cookery) Somebody also created a Docker
container for that: [https://github.com/andytinycat/fpm-
dockery](https://github.com/andytinycat/fpm-dockery)

Now combining this tool with Jenkins, which triggers a build whenever I change
the recipe would be all I need for now.

------
johnnycarcin
I like that this solution takes into account the "race condition" for doing
multiple things at once. To me that was the biggest issue I ran into with
deb-s3 and it's why I wrote deb-simple ([https://github.com/esell/deb-
simple](https://github.com/esell/deb-simple)).

------
earless1
My co-worker just setup something similar for RPMs. his solution was Lambda +
ECS - [https://github.com/erumble/s3-repo-
sync](https://github.com/erumble/s3-repo-sync)

------
pwelch
This is an awesome write up.

~~~
szinck
Thanks! And thanks everyone else for all the comments!

