

Creating PostgreSQL Arrays Without A Quadratic Blowup - drob
http://blog.heapanalytics.com/creating-postgresql-arrays-without-a-quadratic-blowup/

======
danbruc
_A lot of languages handle this idiom more gracefully._

Actually they don't. You can not resize a (static) array. What you really want
is a dynamic array, an array list or whatever you like to call it. In this
case you can double the size of the underlying array when you have to
reallocate it and get the amortized runtime down but at the price of using up
to twice the memory you need.

Maybe array_append is to blame because its pure existence somewhat implies
that you are working with array lists and not with arrays.

~~~
drob
Correct -- they handle the idiom more gracefully in the context of an array-
backed list, not an array. I elided the two as postgres arrays don't quite fit
cleanly into either. (For example, evaluating arr[i] is O(i) for postgres
arrays of variable width types.)

------
kyllo
_result := array_append(result, ( 'num=>' || i)::HSTORE);_

Yeah, I've never read a line of PLPGSQL before this, but I can tell by the
assignment operator := that this is obviously creating a copy of the array
every time you add an element to the array. This would also cause a quadratic
blowup and/or a memory blowup in any other language I've heard of, not just
PLPGSQL. Anyone who has taken an introductory algorithms & data structures
class really should know better than to write code like this.

~~~
ScottBurson
But anyone who has taken a graduate-level data structures course should know
how to implement functional sequences with log-time concatenation or better.
There are several known techniques.

So your inference is, in fact, incorrect. The assignment must make a logical
copy, but this does not have to involve making a physical copy.

~~~
kyllo
Yes, but you had better be sure that's what your compiler is going to do,
before writing code like `result := array_append(result, new_element)` inside
a loop.

------
ScottBurson
I think highly of Postgres, but this is disappointing. It's well-known now how
to implement functional array-like data structures with log-time
concatenation. For a good summary see [0].

I think these techniques should be standard practice in programming language
implementations.

[0] [http://stackoverflow.com/questions/3271256/implement-an-
immu...](http://stackoverflow.com/questions/3271256/implement-an-immutable-
deque-as-a-balanced-binary-tree/3274355#3274355)

~~~
empthought
PostgreSQL has array-like data structures with good performance
characteristics; they're called "tables."

Edit: my mistake, PostgreSQL calls them "relations."

------
klodolph
Seems guilty of trying to do imperative programming in a relational database,
then wondering why the performance is not good. Why use an array and not rows
in a table?

~~~
drob
Everything is denormalized into an array so we can evaluate funnel queries
quickly, with no joins.

~~~
batbomb
Did you try Clustering with PG (aka index organized table in Oracle)

~~~
SigmundA
My understanding is PostgreSQL does not have true indexed organized tables
(Clustered Index in MSSQL). You can create the table as clustered on an index
and it will do it one time, if new data is inserted it is not clustered, you
have to manually recluster exclusively locking it. Also this does not help
performance nearly as much in PG because the index used to cluster is still a
normal index and used normally with indirection. The only performance benefit
comes from the fact that the data being searched for might be on the same page
saving some disk I/O. A true clustered index the table IS the index, the table
is seeked directly. This also saves disk space and some I/O since a separate
index is not built redundantly storing the data.

That being said an array should be even faster than a lookup to a secondary
table with real clustered index since the array data would be in the row data
of the main table incurring only the offset lookup to retrieve. There are
definitely cases where hierarchical data structures are superior in
performance to normalized ones. You gotta watch out though for hierarchical
structures that store keys inline as the add their own space and I/O overhead
(See shortening key names in Mongo DB). Arrays are a good choice though, they
allow multi-value data inline without key overhead.

------
dsugarman
this is classic OO programmer programming SQL

~~~
twic
This algorithm is O(n^2) in classic OO languages too!

I think what this is is a translation of a classic OO "array.add(x)", which
would usually be O(1), into "array = array + x", because that's the closest
you can get in this language. However, mapping that translation back into a
language you know, that should raise alarm bells, because it looks like it
should be O(n) and allocate garbage like mad.

So, rather, i think this is classic insufficient paranoia and self-doubt when
doing an unfamiliar new thing.

