Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Browse the Steam Christmas sale visually (hydralist.com)
23 points by reitzensteinm on Dec 24, 2014 | hide | past | favorite | 7 comments

I find it pretty neat. I usually get a dozen games on my steam store page, based on the games in my library (=my supposed taste decided by an algorithm). With this I get a whole (and unexpectedly large) picture of the sale. I'm overwhelmed by the amount of data that you scrape to build this. 80 GB? is it because of the videos?

No problem using Opera on W7 here.

It's because it has 80,000 images, multiplied by 14 thumbnail sizes and two file formats. The videos are actually only a small proportion of the file size, because I am currently only doing mp4 and two sizes of ten second parts.

Serendipity is what I'm going for - I've spent about $60 over ten or more purchases while developing the site. Stuff I just decided I have to have.

Actually, I have a list in my account: Geometry Wars x2, Toy Soldiers Complete, LA Noire, War War 2 Time of Wrath, Rise of Nations, Panzer Corps, Fieldrunners, Rigonauts, Bulletstorm, ShaderTool, Tradewinds x2, Knights & Merchants, Dead Space, Crimsonland, Wings!, Imperium Romanium, Hero Siege, Robocraft. Wowee. Most projects go into the red based on server costs, but not this one :)

I built this over the last couple of months, and recently crunched to get it done in time for the Christmas sale. I made it (mostly).

Technical details are:

* Scrapes all content off of Steam, spits out 14 size variations in both jpg and webp for images. Auto detects webp support (gives a nice ~20% boost, most of which I use for increased quality).

* Size of the static files is approaching 80gb, which is getting tight, because I'm running this on a Hetzner server with a 240gb SSD.

* Written in Clojure + ClojureScript.

* Everything is stored in Redis currently, there's no relational database yet, though I'm planning to use yesql and postgres when account information etc is required.

* The entire database is generated every ten minutes, and is then streamed as a collection of static files to clients. There are sever side queries - it streams the information about every single game to you, though it takes several minutes sometimes. If the dynamic webserver goes offline but nginx stays up during a session, the site will lose no functionality.

* It follows that the filtering is done on the client side. Every time the main database is updated, or the filter is changed, it applies the current filter to it, to get a collection of games that are then randomly displayed.

* I started out using Om, but recently switched to Reagent after too many head scratching errors (performance seems better too). I'd expect Om to scale better to a highly complex site, but with something like this with a ton of simple elements, simplicity won out.

* In order to get the most out of the diffing algorithm during render, the main page is stored in a shallow tree, pages of screenshots, rather than just a collection of screenshots (which would be more natural). In this way, pages of 100 screenshots that have not been updated can be skipped over, whereas the naive approach would result in each screenshot having its key compared (and skipped).

* ClojureScript has been quite solid (Clojure is a rock), with the exception of a few bizarre errors that have sometimes cropped up, where I have to kill the server cljsbuild auto, do a clean, and then restart. It happens about 1 out of 100 commits.

* While not an issue now since it still fits on one server, scaling out such a large, randomly accessed database of images is going to be an interesting problem. Serving from standard disks, even with large caches (32gb on even cheap Hetzner servers) will not be sufficient. Caches help a lot with power distributions and long tails, but here the accesses will be more or less a constant distribution. So the entire collection will have to sit in SSDs fast enough to serve 1gbps (probably 100 megabytes per second with overhead) of 10-100kb files.

* I hash all image addresses already to point to the correct server, and in the future will most likely use a consistent hashing scheme (so that one dead server doesn't completely invalidate the cache on all of them). With further support for video, 4k screenshots etc, I'll be moving the static store onto S3. So a future setup may look like: 4 servers with 1TB SSDs each, 3TB in S3, so if one server goes down the 3TB will remain entirely on cache.

If anyone has any games in the current Steam sale, I'm working on a small sponsorship system, and I'm more than happy to swap some promotion space for some Steam keys. I don't yet know if the site will get traction, but if it does, it's going to require Youtube level bandwidth per user, so I'm looking at ways to make sure it at least breaks even (my day job is still game development).

I'd really love some feedback! I posted this last night, but then deleted it when I realized it really was time for me to sleep.

I'm having trouble figuring out which games each tile is. I'm using IE11 on Surface 2 pro. When I tap a tile, it slightly expands, but I don't see a title anywhere.

Hmm, sounds like a bug. The game should expand out to an 8x8 square. I'll try to reproduce on a few devices here.

I think it's going to require a mobile version, since it's quite slow on lesser powered platforms.

Getting blank black page on iPhone. Just FYI.

I definitely need a mobile version of the site. It strains a PC, and it is hit and miss whether anything displays on mobile devices.

Probably something based on swipable pages, rather than infinite scroll.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact