
Arrays in Postgres - paratrooper
http://craigkerstiens.com/2012/08/20/arrays-in-postgres/
======
jeffdavis
One of the differences with postgres is that it's OK to use interesting data
types. Other systems treat it as though it were somehow wrong.

See my post here: [http://thoughts.davisjeff.com/2009/09/30/choosing-data-
types...](http://thoughts.davisjeff.com/2009/09/30/choosing-data-types/)

And no, using arrays is not an automatic violation of first normal form.

~~~
fnl
> And no, using arrays is not an automatic violation of first normal form.

One of the conditions of the 1NF is that each row and column contain one and
only one value. While a NULL might be disputable, using multiple values as
shown in the two examples (tags, items/price/quantity) are a clear violation
of the 1NF. Only if the array data were truly "one value" (e.g., a vector) it
would not be a violation. (I.e., I am not trying to contradict your statement,
but I think it helps to add a bit of clarification.)

~~~
ocharles
If you never need to query 'inside' the array, then I don't think that's a
violation of 1NF at all. If you treat the entire array as an atomic value,
then I would think that's still 1NF.

~~~
guan
Even if you do need to query inside the array, how is this different from
something like WHERE x like 'prefix%'? Is that a violation of 1NF too?

~~~
ocharles
Arguably, yes, because you haven't really broken your data down as much as you
need to; but this is a fairly contentious issue - just see
<http://en.wikipedia.org/wiki/First_normal_form#Atomicity>. If you are
frequently needing to query inside data, you may well gain from moving that
'prefix' into a separate column, or more drastically rethinking your data
model if that's appropriate.

~~~
guan
In this example prefix is something the user typed in. So textcolumn LIKE
'%blahblah%'.

------
EzGraphs
There are probably some useful applications for this, but I am not really a
fan of databases supporting complex data types. It's not because of a
commitment to any particular normal form or theoretical construct. It is
because on a project that involves multiple developers over a period of time,
this sort of special functionality provides "surprises" that are not terribly
pleasant.

\- Nonstandard SQL is required

\- Database specific functions are used

\- Later developers can be confused by the use of a non-standard data type

\- SQL Commenting does not happen much in practice in my experience

\- Array data can be handled using existing SQL constructs, so it is never an
absolute necessity

\- Other languages are better equipped for handling the data types (in my
fuzzy subjective assessment)

My experience is mostly with Oracle, which has been adding various data types
for years (XML, Objects, Arrays, etc). I can't think of a specific case where
their use proved to be a real specific benefit to a project... though the
usual argument is improved performance.

------
jaggederest
Ah yes, the venerable array type. I remember trying to normalize applications
written in VB6 and Access that used arrays with foreign keys instead of many
to many relationships. Good times.

~~~
joevandyk
While I probably wouldn't store things in an array, it's useful to get data
back out in array format sometimes.

    
    
        select array_agg(email_address), home_state from users group by home_state
    

Will give you a list of home states and all the email addresses that belong to
that state.

~~~
radiowave
Right. A case I've used them for is a recursive query which returns a set of
rows, where each row is some end-point that matches the query criteria, the
rows can include arrays representing the list of nodes traversed to reach that
end-point, or some notion of the path cost by hop.

------
nwenzel
The MADlib add-on out of UC Berkeley for Postgres and Greenplum uses arrays
for inputs into its algorithms. MADlib.net is worth checking out for some in-
database analytic and machine learning goodness. It's not going to replace R
or python but it has potential as part of your toolbelt.

------
LarryMade
Looks like the path to code/data obscurity to me, why not have properly
labeled fields for all the elements and save yourself the headache of
remembering the fancy trick you implemented years ago.

------
ksherlock
I used Postgres arrays in a project a few years ago -- The university course
catalog was so awful I had to create my own. There was a 7-element array of
integers (one for each day of the week) where each element was a bitmap
indicating if the class was active in that period.

To the DB driver, it was just a string -- { 0, 0, 0, 0, 0, 0, 0} -- so I split
it into an array manually in client code.

------
rdegges
Excellent article on Arrays. I wish I knew about these before, there have been
so many times I _should_ have used this instead of serializing / deserializing
JSON myself via TextFields :(

~~~
joevandyk
Postgresql 9.2 supports json, btw!

You can store json, or, even better, you can write sql that returns an
arbitrary json structure.

So you can have one sql query return a nested array of hashes of arrays of
hashes... handy if you need a retrieve a lot of different data at once that
doesn't fit into a neat set of rows.

~~~
einhverfr
What we need now is a function that will take an arbitrary JSON element and
return a data structure associated with it. This shouldn't be too hard using
plv8js.....

This could then be used as an input format for object-relational modelling.

------
eckyptang
The more you put in the database black box, the more scalability problems you
will encounter in the future. It's more economical to scale out than buy
bigger databases servers.

Please just don't do it.

~~~
kiwidrew
Happily, Postgres keeps improving performance with each release. According to
benchmarks [1], the upcoming version 9.2 will scale up to 64 cores for read-
heavy workloads. Unless your application is going to grow _really fast_ \--
keeping in mind that the hardware capabilities will continue to improve each
year -- then it's just premature optimization to worry about horizontal
scaling. Just upgrade your database server once a year and reap the benefits
of Moore's law.

[1] [http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-
abo...](http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-
about-64.html)

~~~
eckyptang
I love your optimism. The real world doesn't work like that. The real world
punishes you for every shitty feature you pick and every bad chunk of code.

Our application is 15 years old, and we're on a 64-core machine with 768Gb of
RAM and 35Tb of disk.

We're running at 80% capacity.

Where do we go from here? Yes, we rewrite and scale out for the measly cost of
£450k. That cost would have been avoided with the appropriate due diligence.
That is not a cost anyone wants to swallow.

Then again I don't work disposable CRUD applications...

