
Open sourcing Grid, the Guardian’s new image management service - room271
https://www.theguardian.com/info/developer-blog/2015/aug/12/open-sourcing-grid-image-service
======
jackgavigan
This is a bit of a blast from the past for me.

 _> Unfortunately, the incumbent system was nearing end-of-life, having been
around for over 15 years._

Back in 1999, I was a systems/support engineer at the vendor (Picdar) who
supplied that incumbent system. I spent an evening onsite in the Guardian
newsroom when the system went live. I recall there was a England
(international soccer) game on that night, and we timed how long it took a
picture of the first goal to land in the system (IIRC, Reuters won, with a
picture that arrived something like 10-15 minutes after the goal was scored).

~~~
sdoering
Anecdotes like this are the reason I keep returning to HN. They always shine a
smile over my face.

Greetings from Germany.

~~~
jackgavigan
Well, there's more where that came from...

I really enjoyed my time at Picdar. I joined the company in 1998 and it was my
first exposure to proper enterprise systems. I got to play with Sun
Microsystems servers, fibre-channel storage arrays and HP magneto-optical
jukeboxes that could store >1TB (that was a _lot_ in the late '90s).

I once went to a customer site to check a server that kept having problems and
discovered that it was located at the downstream end of a row of big IBM
boxes. The first IBM box sucked in air, spat it out the other side slightly
warmer, where it got sucked in by the second IBM box, which warmed it up a
little more, and so on until it eventually got blown out onto our poor little
Sun server, like it had a fan heater pointed at it. We got them to move our
server and the problems abated.

My boss (technical co-founder Andy Heather) was a proper hacker. Picdar had
its own SCSI drivers for HP's jukeboxes, its own image format (based on
lossless JPEG, if I recall correctly, but wrapped in a file format that
included the photo's metadata), and they eschewed SQL in favour of their own
in-house database and query language called NACAS (for Named Access Control
And Search). I learnt a huge amount from him, largely because he had infinite
patience.

For a long time, my answer to the question "What's the biggest mistake you've
ever made?" was "I ftp'd a bunch of large files from a production system on
top of themselves, resulting in each file getting truncated to 1024 bytes. The
client hadn't kept backups." The files in question were tar-style media
archives containing preview images for a major newspaper's photo archive. Andy
and another developer had to stay in the office until about midnight to fix my
fuck-up by re-generating the preview images. I went home at about 7pm because
there was nothing I could do. In hindsight, I should have stayed. :-/

Picdar was the first opportunity I had to work with serious software
engineers, one of whom introduced me to the Apple Human Interface Guidelines,
which has shaped my approach to UI/UX ever since.

There was a framed cheque on the wall for >$1m from a large database company.
Said database company had won a contract with one of the big UK newspaper
groups to provide a system that indexed their article archive and allowed them
to do free text searches. Unfortunately, it didn't work too well (they loaded
the article archive in, and ran a test query; after half an hour with no
response, they decided to switch the system off), so the newspaper group asked
Picdar if they could provide an interim solution. They did, it worked (the
same query took just 6 seconds), the "interim" solution became a permanent
solution, and the newspaper group forced the large database company to pay
Picdar for the solution.

One of the reasons they hired me was because I came from an Internet
background. Part of my role early on was to write a strategy white paper
outlining the opportunities the Internet offered for the company. I remember
doing a lot of research into e-commerce and online payments (in the context of
allowing our clients to licence their image archives on the web). I
_completely_ failed to recognise the opportunity to offer image hosting to the
masses or the potential of online advertising. Picdar could have been
Flickr/Picasa/ImageShack/500px. Deeply embarrassing for me, in hindsight.

I once deployed a demo system (running on Linux, with a RAID setup that was a
complete bastard to set up because nobody had documented how the Linux RAID
tools worked) at a big magazine company. For some reason, the technology guy's
office was on the same floor as a bunch of fashion magazines. He was one of
two men on the entire floor (there were about a hundred girls). His office was
glass-walled and there was a constant stream of models walking past, on their
way to castings and photoshoots. That was a memorable couple of days.

In late 1999, I was approached out of the blue and offered the CTO role at a
funded startup. I agonised over leaving but it was an offer I couldn't refuse.
The right decision in hindsight but it was a real wrench at the time.

~~~
sdoering
Wow... ... lost for words. Thanks a lot! Not working that long. I started in
publishing and self trained to pivot to data/web analytics. Nowadays I work at
an digital agency crunching numbers, advising customers, writing white papers
or trying to (objectively) benchmark different sites using the tools at hand.

But these "war" stories are reminding me of some people that showed me the
first html, gave me the first *nix OS and made me enter the internet at
14.4...

This were the first proper hackers I encountered. I remember, one of them,
being 15 at the time, working as the only server admin at a local isp, while
still going to school. One night all our IRC bots just disappeared showing
him, that something was really wrong at his workplace.

We were the only ones left awake and so I offered to fetch him (me being 19
and able to drive) and drive im to work. There I saw for the first time
someone in parallel working on a routing server, compiling a kernel on a local
machine and just for the sake of it bringing a local workstation up to date.
Never had seen fingers flying that fast over three different keyboards, while
I had my first experience on a 2mbit connection (the connection, that this
provider had overall).

Well - that made me learn some more things in regards to computers and the net
- but in hindsight, I could have made my way faster.

------
edpichler
Same Idea of Globo (Brazilian TV conglomerate)
[https://github.com/thumbor/thumbor](https://github.com/thumbor/thumbor)

About Grid, it seems the images are stored on FTP. Am I right?

I look for a solution like these, but more flexibe, where I could host my
images on AWS, Azure, etc.

~~~
theefer
It has some similarities with Thumbor (incl. cropping and resizing assets,
etc), but the two are quite different. Thumbor goes a bit further than Grid
with features like face recognition, etc, which we are looking at for the
future. Thumbor also supports dynamic resizing, whereas we prefer to generate
static assets and use external services (ImgIx in our case) to handle dynamic
resizing and optimisations.

Grid is more of a media management service, allowing quick search of indexed
metadata, organisation and collaboration using labels, quick upload into the
system, rights management, etc. It also has extensive APIs to allow various
degrees of integration.

Grid stores images in S3 (could be abstracted to any storage system with a
small amount of effort). FTP is only used as a source for ingesting images
into the system, and we're looking to scrape that and replace with S3-based
ingestion in the near future.

\- Séb, Lead developer on Grid

~~~
foolinaround
What are Grid's similarities and differences over a Digital Asset Management
tool ( DAM )?

~~~
theefer
As other replies have said, this is functionally a DAM.

It is particularly adapted to the requirements of publishers, in that it
supports large number of images (we have over 3M currently), can scale to
ingest many new images continuously quickly (publishers often receive lots of
images from agencies and wires), indexes all the metadata to power a very fast
search, allows collaboration of various roles involved in the use of assets
and production of content, etc.

Unlike most commercial DAMs, which can be quite costly to run and acquire,
Grid is also Open Source. We didn't find any existing DAM (incl Open Source)
that fit our requirements, in particular in terms of ease and speed of use,
powerful Web-based interface, rich APIs, etc.

You will have to review alternatives to know which one is the best fit for
your use case.

Hope this helps clarify what Grid is wrt other DAMs.

Best,

\- Séb, lead developer on Grid

~~~
jackgavigan
To what extent did the old Picdar solution influence the design of Grid?

~~~
theefer
As you can probably see yourself from the screenshot and video, not much.

There may be some subconscious influence from the old system and other image
systems we have used (Lightroom, Picasa, Google Photos, Flickr, etc), but I
can't think of particular features inspired from Picdar.

------
lefthandme
I've recently started using gridFS for a proof-of-concept project (mainly
because I wanted to leaarn more about it), and was wondering about your choice
of S3 over something like this (or an equivalent).

In my case, were this project to go into production, I'd probably recommend
using S3. Given you have 8TB of assets, I was wondering if, at that scale, S3
still represents a lower TCO, or if you had other reasons for using it?
Obviously you are using Dynamo for your metadata and SNS/SQS, and hence have a
pretty AWS-oriented stack to begin with.

Anyway, just asking out of interest.

------
telekid
Here's a related and similarly interesting look into the New York Times' CMS,
which includes some clever image management tools:

[http://open.blogs.nytimes.com/2014/06/17/scoop-a-glimpse-
int...](http://open.blogs.nytimes.com/2014/06/17/scoop-a-glimpse-into-the-
nytimes-cms/?_r=0)

------
pjmlp
For a split second I thought about CERN´s Grid:

[https://en.wikipedia.org/wiki/Worldwide_LHC_Computing_Grid](https://en.wikipedia.org/wiki/Worldwide_LHC_Computing_Grid)

Don't you love everyone using the same names?

~~~
jamesgorrie
Strangely ours had more to do with Tron & the ascetic than it did any sort of
computing configuration.

One of the picture editors on it claimed, "it looks a bit like the Grid from
Tron" \- and it stuck.

------
batoure
[fastly]

oh that box, thats the global cdn technology that makes this entire stack
possible. Lets not mention it or discuss how these types of technologies have
made it possible for us to fundamentally change how we do business.

~~~
junto
To be fair they are talking about the technology they have developed
themselves and are now outsourcing.

They are using Fastly as a paid service, so why should they talk about it?
It's just a building block of their delivery solution.

It isn't like they discuss the web server software they use either.

------
h1fra
This kind of article are very interesting. Have a glimpse of what big player
do for their backend can inspire many of us. I'm really enjoying reading these
blog post

Also this product look very cool and well thought

