Hacker News new | comments | show | ask | jobs | submit login
Connecting a Git Repository to Amazon S3 and AWS Services (amazon.com)
140 points by freedomben 11 months ago | hide | past | web | favorite | 45 comments

If you're planning to use this as a way to deploy your static website to S3, I highly recommend using Netlify[0] instead. I have configured it to automatically publish anything that I put on the `deploy` branch and it supports letsencrypt certificates as well as rewrite rules

0: https://www.netlify.com

+1 Also way better than surge.sh in this area.

A few months back I created a tool [0], which deploys static content from GitHub to S3, with GitHub push events and AWS Lambda. This was more like a PoC but worked quite well for me. During development I was very happy with the usage of jgit, which allowed me to stay in the Java stack (even if it is not the perfect choice for Lambda imho). There was and still is much on the ToDo list. If you are interested in the sources, you can take a look at it.

[0] https://github.com/berlam/github-bucket

Nice! I'm planning to make my travis build handle the publishing to S3.

From the title of this post, I was expecting something like Dulwich [0] which has the ability to store individual objects in an Swift object store.

But instead it is a webhook which reacts to pushes on a repo.

[0] https://github.com/jelmer/dulwich/blob/master/README.swift.m...

Edit: title has been improved in the meantime

I thought it would be about jgit's S3 support[0], or git-remote-helpers[1].

0: http://download.eclipse.org/jgit/docs/jgit- 1: https://git-scm.com/docs/git-remote-helpers

I like the free t2.small EC2 instance, pretty sweet, got a private git server on there but still figuring out how to actually use Git.

Just to make sure you're aware, your free 750 hours per month will only be available for the first 12 months of your AWS account. According to the AWS website, it's t2.micro instances that have the available free hours, not t2.small. Make sure you're not getting unexpected costs in the billing section.

See https://aws.amazon.com/free/

Yeah you're right, sorry said the wrong one (t2.micro) it's funny when you look at it, it seems cheap like $0.012/hr but you multiply that by 750 and yeah... I briefly went up to t2.medium and I was like damn $30/mo for a server, that's a lot (to me).

OVH has nicer prices for roughly same specs eg. 2cores 8GB ram at I think $16/mo but that detachable/scalable is really nice without data loss on server configs.

Yeah good point about the year, I was surprised when I was charged the $99.99 for Amazon Prime haha

Trust me, you don't need scalability. For a personal server, get a VPS (get one from a provider that charges you a fixed monthly, don't get one with a shitty cpu (somewhere like DO/Linode does fine for your use case)).

I use OVH right now I like them but I recently came across the free AWS tier as well, just messing around, a client insisted on using AWS so I set one up with CentOS (not regarding the private Git). Migrated their WP site from DO to AWS EC2. I'm messing with Private git because I can't justify the $7/mo cost through GitHub.

yeah I run a vps too, multiple domains but I do like that functions as a service approach.

I unfortunately started with GoDaddy haha hilarious, overpriced that economy $1 thing garbage.

Just my opinions but yeah... had some caching problems too wouldn't replace the content ahhhhh. Saying it was my fault "flush dns" I'm testing with Tor externally in other countries. Also I'm using Ubuntu no flushing required I don't know.

Edit: moving up to t2.medium was because I implemented defuse/php-encryption and it took a toll encrypting decrypting, didn't help I had a dumb mistake where the cloning of data was stacking and not deleting so it was getting longer and longer... I know can use a better language but PHP is what I know at this time. I was able to drop back down to a t2.small but damn MySQL man... uses so much resources... with 2GB RAM it's idling like at 1.4GB or something... it just grows. I know you can tweak the settings. It's built on WordPress, I'm suggesting to strip the function of the site out of WordPress but future improvement.

Edit: not crapping on WordPress the plugins/dynamic aspect is cool and if I thoroughly understood it my shit wouldn't be so janky. I interfaced with an Evernote Sync plugin and was surprised the comments were in chinese haha but it's PHP so I could read the code.

Using memcache(d) was cool, yeah it was a cool job all in all.

Edit: You know what's hilarious is HIPAA compliant servers DAMN! That's expensive, I clearly was not using HIPAA servers as those were $300 - $600/mo at least. But not just that but the architecture of the site, I clearly expressed that I am not a security expert. But AWS goes over a bit on the design of separating interface with the storage of data. At least we've got zero-knowledge design of everything encrypted... that's kind of a problem in a way too, when the key-obj to encrypt is not known and you've got strings that you can't decrypt and you start getting "failed to decrypt" error messages like "oh shit..." I think that was just an initial config though hopefully.

Had a recent never-ending loop problem wow... you can watch the server spike to 100% CPU usage in HTOP and you're like "ah crap something's wrong"

Regarding private git repo, have you looked into Bitbucket? They offer unlimited free private repos. I've been using them for about half a year, very happy so far.

Haha... research right?

That's funny. I'm curious what's the catch besides someone else hosting your data of course that's ridiculous/how do you explain using everything else...

Good point and thanks. Would trust their setup more than mine, speed up the learning process too but it wasn't bad setting it up.

I've been using Bitbucket's free private repositories for several years and it works great for me. I think the catch is just that you can only have five users on a team, so you need to convert to a paid account if you bring in more people. If it's just your own stuff, then obviously that's not a concern. I think it's just a loss leader to bring people in and try to convert some of them to a paid offering at some point.

Yeah at this time I'm the only developer for where I'd like to apply git to.

Thanks a lot, would speed up the process not worrying about the permissions/setting it up. Still have to read some guides on Git but yeah I appreciate the tip.

I also like that it's not on the same server as the production server just because that one has limited resources at this time. Granted again it's just me.

...or gitlab, very nice (and free) repo hosting, in-house or online.

+1 for Bitbucket

I've run HIPAA complaint services in AWS normal, dedicated instances with encrypted RDS (Postgres) as the data store. Orchestrated for redundancy and never cost me anywhere near those numbers!

I am paying 4,63€ for a VPS that would cost me $30 on AWS. If you don't need scalability, always look into traditional (VPS/bare-metal) hosting.

Yeah paying in advance helps too I do 1 year for like $40-$50, single core 10GB 2GB ram

edit: the one for $30/mo on AWS is 2 Core, 4GB RAM, like 30GB space, what specs are you getting for that price? For me roughly that (though $ is not as strong) that's what I mentioned above through OVH.

I'm at Hetzner and I'm using it for some bandwidth-hungry stuff. Hetzner includes 2 TB egress traffic per month which alone would cost me about $60 on AWS. It's easy to forget about the data transfer when looking at AWS/GCP/Azure prices.

If you genuinely need bandwidth, there are providers that offer unlimited bandwidth that HN users have tested:


> notamy: I have an application on OVH (on the USD $3.50/month plan) that pushes/pulls >10TB/month

That entire discussion is recommended for anyone looking for a cheap VPS.

Wait is data transfer equivalent to bandwidth? I never even look at bandwidth. Sorry if that's obvious I have seen that "metric" before and didn't click till now.

For really low bandwidth stuff (<1GB month) it won't hit you with AWS. But as soon as you're over that, it gets real expensive real quick. E.g. 1TB on AWS (traffic out of AWS) costs you $90 most regions, I believe, while you often get at least that much included elsewhere.

For Hetzner as mentioned, once you exceed the included bandwidth (which for most VPSs and servers is 2-3TB at least per month), it costs less than 2 euro/month extra.

This isn't likely to affect you for a personal git repo, but mentioning it as it's one of the things that often burn people with AWS, as the AWS bandwidth rates are crazily high to the point where for high-bandwidth applications even if you want/need to use AWS it's sometimes cheaper to run your own CDN in front of it to reduce bandwidth charges.

My order of preference if I don't need AWS specific services tends to be Hetzner, OVH and Digital Ocean, depending on what additional geographical regions I need (Hetzner is only in Germany) or other service requirements.

AWS is great for specific use-cases as long as you keep a close eye on costs.

Wasn't able to respond to you earlier

Thanks for the tip, I wasn't even considering bandwidth. I think when I first was setting up EC2 I got a "bad batch" if that's a thing, I should say instance. I couldn't figure out why the TTFB was varying from 3-5 seconds I tried so many things eventually I had to switch and since I didn't have elastic IP I think, I switched servers (regions) to whatever I was assigned on restart and it was instant/correct like in the 10s/100s ms range. Anyway I'll keep an eye on that, don't have users yet but something to keep in mind, thankfully not really serving media mostly text.

For very fresh comments, HN disables the reply link on the comment overview to deter flame wars. You can click on the comment date to see only the single comment and then the reply link will be visible. Or just wait 2 minutes.

That is interesting, I like it.

I don't know I am often wrong, just need to learn how to deal with it, the proper thing is to accept/thank/learn move on haha.

Where from? Care to plug them?

Assuming you like them, trust them, etc. My home "server" was... "reclaimed" recently and until I buy new hardware I really should get my shit up and running again.

Hetzner in Germany. If you want international hosting, look into OVH and Scaleway.

You don't need a private server at all to figure out how to use Git.

No I was just being cheap, I didn't want to pay for the Private Git server through Git Hub, but I do need to know how to use it. Like the push to live(instead of SFTP), separate branches, allocating resources assuming you're using Git on the same server that's hosting your site, apparently it's not bad if you don't have a lot of users/commits.

Git is decentralized, you don't need any server anywhere for it. You could just use it locally and email patches around.

Yes you COULD, but you can also haul water into your home via buckets from the well :-)

But setting up a git server is so easy if you don't care about having a UI like github or gitlab. Git talks SSH, so pretty much just spin up a VPS, get it SSHable, then install git, create a bare repo, and push to that path.

Oh yeah I don't use it that way, but if you just want to learn how to use git setting up a server isn't needed.

Also I agree that emailing patches seems like an archaic way to do things, but that's how Linux handles PRs.


While learning git, if you want a place to push, an SSHable place is still unnecessary. The bare repo can simply be another directory on the same system.

If you want a Git repo hosted on Amazon, I think you can get 50 Gigs of storage using their CodeCommit service for free. It obviously isn't the same as learning how to set things up on the server for yourself, but it might be useful depending on what you are trying to do.


If you don't want to run your own server, https://aws.amazon.com/codecommit/pricing/ is pretty much free for many use cases

Fun fact about codecommit: If you want to set up a mirror from other repository (eg Github), you'll have to have an ec2 instance just to deal with the mirroring (codecommit doesn't support)[1].

Yeah, no thanks, if i'll have to spend money on an EC2 instance to run my mirror, i rather use this ec2 instance to run the mirror itself (no need to spend more money with codecommit on my usecase). Also, it's really strange that codecommit doesn't offer this option, i know gitlab does.


I don't know about lambda functions but I think you can do all of this with rsync.net since we have both s3cmd and git in our environment:

  ssh user@rsync.net s3cmd get s3://rsync/mscdex.exe

  ssh user@rsync.net "git clone git://github.com/freebsd/freebsd.git freebsd"
So you could run a cron job that would clone a repo to your rsync.net account and then publish (or backup) that repo to an s3 account.

Or whatever.

This allows you to hook your repository directly to Amazon AWS.

You are suggesting the exact opposite.

I'm not suggesting anything. Those two tools (s3cmd and git) are in place and you can do whatever you'd like with them in whatever order.

I just pasted those two example commands in that order because that's what order they are in on our howto page...

Surely the exact opposite is, whenever a new object is written to some S3 bucket, it gets automatically added to a git repo in GitHub…

All of this just to avoid creating a Git connector?!

You should understand what problem they are solving first; then comment.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact