Serving a website from a Git repo without cloning it

sho · on Feb 19, 2024

> It's fairly common to use git repositories as a vehicle for serving websites. The webdev pushes their changes to some branch of a publicly available git repository

This doesn't require the git repo to be public. My go-to "day one" static website deploy strategy is to set up a git remote on some VPS, point nginx at its public html folder, then "deploy" to that with a git push, authed by ssh key.

You'd be surprised how far you can get with simple tricks like this. I served about the first 6 months of what would become a multi-hundred-million company with this hilariously simple setup running on a $20/m DO VPS.

geek_at · on Feb 19, 2024

I do the same. My blog on a 9€/mo Contabo VPS survived multiple HN/Reddit hugs.

Lateley I've been playing with more configuration options like I have two servers configured behind a load balancer but with a heavy focus on server 1. Only if the response time of server 1 is under 500ms, server two is used, which hosts an almost identical site but with ads. So under normal circumstances nobody will see ads on my blog unless I get a huge spike (more than 600 concurrent users ususall)

p4bl0 · on Feb 19, 2024

I've been publishing my personal web page like this for more than 15 years now (the repos log history goes back to early 2009). I use a post-update hook that makes sure on each push that the working directory is on par with the master branch (so that things that are pushed to other branches aren't made public) and then the public site is rebuilt (using a Makefile and custom scripts making heavy use of xml2, 2xml, sed, and the coreutils, but that's another story).

I can see why the author of the linked blog post would want to serve directly from the Git repository: it's fun. But I agree that it seems uselessly inefficient compared to (re)generating a static version of the site only when changes occur.

If there is no processing needed like in the case of the blog post, a simple post-update hook can easily do the trick: go to the public directory served by the web server, then set the GIT_DIR environment variable to point to the .git directory of your repository, then execute the command `git checkout main-branch -- .` to get all the static files out of the repository). This would then have the exact same external behavior as what is proposed in the linked blog post, but will be way more efficient (except of course if you have no visitors and update your website very very often).

qudat · on Feb 19, 2024

Oh I really like that idea!

Over at pico.sh we are trying to make it dead simple to host N static sites using common tools: rsync, scp, and sftp.

https://pgs.sh

It really should be as simple as you are describing!

supriyo-biswas · on Feb 19, 2024

This approach is fine for static websites, but for web applications you should probably block requests to /.git so that people can’t clone your application off of the web.

p4bl0 · on Feb 19, 2024

The .git directory doesn't necessarily need to be in the public directory at all.

ozim · on Feb 19, 2024

It is always good reminder anyway to tell people to either make public dir not root of repo or block .git.

quectophoton · on Feb 19, 2024

If anyone wants to read more about this, search for `GIT_DIR` and `GIT_WORK_TREE` environment variables.

trifurcate · on Feb 19, 2024

Or, put your files under html/ in your repo, and push to /var/wwwroot (using the common /var/wwwroot/html setup)

dirkc · on Feb 19, 2024

If you're just serving static files, wouldn't a S3 website be easier and more robust?

0xFF0123 · on Feb 19, 2024

With unbounded running costs... There is something comforting about knowing it's always going to cost you $20.

dirkc · on Feb 19, 2024

True, didn't think about it from the limited cost perspective. Would be a lot of traffic to get you past $20, but a possibility like you say.

sitkack · on Feb 19, 2024

Put CF in front of that VPS and it is better than S3.

cpach · on Feb 19, 2024

One can also have CloudFront before S3, IIRC.

dirkc · on Feb 19, 2024

That is my goto setup for static or semi-static sites. In theory you could still see a runaway bill with CloudFront, although I haven't checked the numbers to actually see what magnitude of traffic would be needed for it to really hurt?

*edit* did a quick check, based on my definition of hurt about 1,000 request per second sustained for the month or > 12TB traffic would hurt. But then again, I'm sure there is some other weird edge case that AWS bills for that could incur more severe costs with less traffic. And that is what I take 0xFF0123 to imply

foldr · on Feb 19, 2024

Last time I checked it was a bit clunky to get these working with HTTPS.

dirkc · on Feb 19, 2024

If you use CloudFront it's easy now. Some years back you either had to do a complicated setup or pay something like 2k for a certificate!

foldr · on Feb 19, 2024

Hmm, yeah, doesn't look too bad: https://repost.aws/knowledge-center/cloudfront-serve-static-... Docs are confusing and you have to be careful to choose the option that's actually end-to-end HTTPS.

myaccountonhn · on Feb 19, 2024

It's very easy these days. I use nginx+nixos for hosting and it was one line to say use https.

foldr · on Feb 19, 2024

I was talking about hosting a website in an S3 bucket.

throwaway828 · on Feb 19, 2024

What was your multi hundred million dollar company?

sho · on Feb 19, 2024

Unfortunately, it wasn't my hundred million dollar company.

AndreBaltazar · on Feb 19, 2024

I was checking if this would be possible with GitHub, but querying the dumb protocol at https://github.com/<org>/<repo.git>/info/refs just returns the following

  Please upgrade your git client.
  GitHub.com no longer supports git over dumb-http: https://github.com/blog/809-git-dumb-http-transport-to-be-turned-off-in-90-days

RandomWorker · on Feb 19, 2024

I just ssh in periodically run a git pull and it’s done. I have a text file with the various things I need to do. Takes less than 2 min of work. I’ve not been convinced automating this process yet would save me much time.

iamflimflam1 · on Feb 19, 2024

The cheapest solution I’ve found for static sites is an S3 bucket with Cloudfront sat in front of it. Costs next to nothing.

sakopov · on Feb 19, 2024

Cheapest option I found is a Jekyll site deployed out of a GitHub repo (can be private) to Cloudlfare S̶i̶t̶e̶s̶ Pages. No cost at all even with a custom domain.

nindalf · on Feb 19, 2024

Cloudflare Pages, I think. I agree though, it costs nothing. They don't even have my card on file.

And nothing is free forever, but my sense is that this is going to stay free longer than most other free options.

GordonS · on Feb 19, 2024

Cloudflare Pages is great for static sites! I use it for a Hugo blog, and it "just works" - plus, deployments are really fast, like a few seconds.

It's all just so simple and fast that it reminds me of the days of SFTP'ing files to prod. Aaahhh.

sakopov · on Feb 19, 2024

Yep, CF Pages indeed! I agree with your sentiment. It always surprised me how much CF offers under their free tier and wouldn't shock me if they start pulling profitability levers and charging for many of these offerings.

wodenokoto · on Feb 19, 2024

Why add cloud flare to the mix?

sakopov · on Feb 19, 2024

You're right. GitHub offers all of this at no charge but your repo has to be public. Cloudlfare allows you to use a private repository.

jamietanna · on Feb 19, 2024

Or use GitLab and you can use a private repo

jasonjmcghee · on Feb 19, 2024

I see a ton of different paid solutions in these comments for hosting static sites- why not just use GitHub Pages?

I've had stuff on front page a handful of times never had a complaint about slowness / errors- it's just a static site.

emadda · on Feb 19, 2024

I have found Firebase hosting be quite solid for my static sites.

I usually have a ./sh directory which is like my "control panel" for the project, then I run ./sh/deploy.sh from the root which builds and does a `firebase deploy`.

Firebase handles HTTPS and seems to have the lowest latency of the CDN's I have tried, even though they all claim to be the "fastest".

I prefer to have a explicit "button" to deploy rather than working with git commits.

Chico75 · on Feb 19, 2024

Google cached version: https://webcache.googleusercontent.com/search?q=cache%3Ahttp...

vtbassmatt · on Feb 19, 2024

It’s hugged to death now so I can’t re-load the page. But I think the author of this post went groveling into the objects/ directory for their contents. That will work fine until the repo is GC’d, and then those loose objects get moved into a packfile.

merlincorey · on Feb 19, 2024

Why not just use Partial or Shallow Clones? [0]

[0] https://github.blog/2020-12-21-get-up-to-speed-with-partial-...

verticalscaler · on Feb 19, 2024

git clone --depth=1 is fast because it doesn't grab the full commits history but OP is getting at a specific singular file directly.

It is neat, not aware of a git incantation so fine grained. Hope it gets built into Git directly.

merlincorey · on Feb 19, 2024

Ah, thank you for the clarification -- as far as I know that is indeed unique.

It's a thimbleful sized shallow clone, one could say.

paulddraper · on Feb 19, 2024

You can do a shallow clone and partial checkout.

verticalscaler · on Feb 19, 2024

Cool! Why you gotta tease me like this without busting out a one liner? :)

Gemini disagrees with you btw:

Unfortunately, directly combining git shallow clone and git partial checkout to grab just one specific file isn't straightforward. Here's why:

Shallow clone: This limits the downloaded commit history to a specific depth, but it still fetches all files involved in those commits. While reducing data, it wouldn't restrict files solely based on your needs.

Partial checkout: This lets you specify which files to include in your working directory, but it requires a full clone initially.

However, you have a few alternative approaches to achieve your goal:

1. Shallow clone + Sparse checkout:

Use git clone --depth=<commit_depth> --single-branch=<branch> to shallow clone the specific branch and limit history. Create a .git/info/sparse-checkout file containing only the path to the desired file. Run git read-tree -u to update the index based on the sparse-checkout file. This method downloads a limited history and only keeps the specified file in your working directory.

2. Partial clone with server support (limited availability):

Check if the server supports partial clones (currently implemented on Github and some Gitlab self-hosted instances).

Use git clone --filter=blob:none <url> for a "blobless" clone that only contains file content, no history or directory structure.

Add the desired file path to the .git/info/sparse-checkout file and run git read-tree -u as before.

This approach minimizes downloaded data but requires server support and won't work everywhere.

Karellen · on Feb 19, 2024

Because then some process on the server has to fetch/checkout the clone each time there's a push?

Whereas reading directly from the repo gets around that.

p4bl0 · on Feb 19, 2024

But then some process on the server has to fetch multiple files from the repository each time there's a request.

It seems this solution consists in getting around doing a bit of work when it is actually necessary (on website update) by doing a lot of work over and over.

Karellen · on Feb 19, 2024

FTA:

> This sounds like a lot of steps to serve a single file, but there's two key optimizations which can be made. The first is to cache the root tree's hash in memory, which skips two lookups right at the beginning. The root tree's hash will only change when the latest commit of the branch changes, so it's enough to cache it in memory and have a separate background process periodically re-check the latest commit.

> The second optimization is to cache tree objects in-memory using their hash as a key. The object identified by a hash never changes, so this cache is easy to manage, and by caching the tree objects in memory (perhaps with an LRU cache if memory usage is a concern) all round-trips to the remote server can be eliminated, save for the final round-trip for the file itself.

Also, the "background process periodically re-check the latest commit" seems like a bit of overkill if the repo is local; just caching and checking the mtime of `refs/heads/main` should be enough to decide whether the root tree needs re-reading.

tsukurimashou · on Feb 19, 2024

I serve my cloned repos directly and remove access to . files with

  RewriteCond %{THE_REQUEST} ^.*/\.
  RewriteRule ^(.*)$ - [R=404]

in my .htaccess

lakomen · on Feb 19, 2024

I guess it wasn't such a good idea to do, huh?

This site can’t be reached mediocregopher.com took too long to respond. Try:

Checking the connection ERR_TIMED_OUT

stevefan1999 · on Feb 19, 2024

Congratulations! That means you basically figured out how the clone procedure works and found a way to do so just in a partial way (also in an unsafe way). But it is a cool idea, nonetheless.

Also check out the Scalar [1] project and its predecessor, GVFS [2], both from Microsoft to manage their monorepo via a VFS layer.

[1]: https://github.com/microsoft/scalar

[2]: https://github.com/microsoft/VFSForGit

vsuperpower2020 · on Feb 19, 2024

Why are you congratulating the author? There's no reason to be condescending.

ttymck · on Feb 19, 2024

What is unsafe about it?

est · on Feb 19, 2024

I think it means if you serve it to the public, a hacker might eventually find a way to enumerate all of your .git repo.

yau8edq12i · on Feb 19, 2024

Then don't put sensitive material in a repo that you decide to essentially make public and serve to the whole world.

TeMPOraL · on Feb 19, 2024

And then what happens?

pacifika · on Feb 19, 2024

Stevefan’s configuration of the website apparently