

Friends don't let friends use loops - adamnemecek
http://bonesmoses.org/2014/07/25/friends-dont-let-friends-use-loops/

======
meowface
Generalized: Friends don't let friends closely intermix their general purpose
programming language code with a DSL or query language of another system
they're interfacing with. This also closely applies to engines like
Elasticsearch, or passing off some work to another executable.

------
dragonwriter
There are two different issues that should be considered separately (but
aren't in the article):

1\. Using client-side loops of single-statement inserts vs. building a single-
statement multi-row inserts

2\. Using (SQL standard) inserts using (db-specific) bulk copy commands

This article compares client side loop executing single-statement inserts
against a bulk copy, but that's not always a good comparison. As well as the
standard-vs-db-specific issue, bulk copy commands often have significant
semantic differences from inserts that go beyond single-row per statement vs.
multiple-row per statement (the latter of which INSERT can do.)

There are very good reasons to avoid client-side looping issuing single-row
inserts when you want to do a mass insert. But jumping to bulk copy has other
warts (that, like the facilities themselves, vary from database to database)
-- for instance, its a general loose-coupling rule (critically to allowing DB
refactoring) that applications should be blind to whether the relations it
uses are implemented as base tables or views, but Postgres's COPY only works
to update base tables, not updatable views. (It also doesn't interact with
other Postgres-specific features the way that INSERTs do -- e.g., while
_triggers_ and _constraints_ are applied on COPY, _rules_ are not.)

I'd be interested in seeing a looped-single-row-insert vs. single-command-
multi-row insert performance comparison, but INSERT (of any kind) vs. COPY
comes with considerations besides performance that aren't addressed in the
article. I'm not saying that developers (as opposed to DBAs) should never use
bulk copy commands in applications -- but doing so _isn 't_ the generally-
appropriate approach to avoiding loops with single-row inserts from the
application side.

------
ultimatedelman
Misleading title? Loops can be really efficient and beneficial, just maybe not
in this very specific circumstance. I clicked on the article thinking it was a
new programming trick on how to avoid loops to do the same work loops do.

SPOILER ALERT: it's not.

~~~
notduncansmith
Yeah, that's along the lines of what I was hoping for too. Maybe something I
could take to my boss to convince him to let us use functional languages like
Clojure at work? Nope. Instead, I got hit with an offensive, sweeping
generalization, backed up by nothing. The author didn't demonstrate that
"developers don't understand databases" \- he only demonstrated that he can
write (intentionally) bad code.

~~~
dragonwriter
Its also kind of weird seeing a DBA advise application developers to use bulk
copy functionality as a general approach to avoid single-statement inserts --
generally, most DBAs I've met want application developers to stay as far away
from bulk copy functionality as possible.

------
bmh100
Solid, simple post. A similar lesson applies to performing numerical
operations on a set of numbers, when linear algebra could be used instead. The
performance differences are huge.

~~~
dagurp
I would love to see an example if you have one.

~~~
bmh100
Check out this example with pies: [http://www.mathsisfun.com/algebra/matrix-
multiplying.html](http://www.mathsisfun.com/algebra/matrix-multiplying.html)

Now, you might often see this particular problem in terms of database queries,
and that is fine. But the uses I have seen of linear algebra in machine
learning are essential to achieving performance. There, the algorithms are
often highly iterative, using the same exact source data. To perform such
calculations with a loop is extremely inefficient. Even chasing pointers can
be a significant cost. There is a reason why Fortran code is generally faster
than C for linear algebra out of the box.

Also see:
[http://stackoverflow.com/a/146186](http://stackoverflow.com/a/146186)

------
meritt
This is a specific PostgreSQL command. While it is indeed faster it's not very
maintainable and will definitely need rewriting for any other database. For
example, with MySQL, you could write to a temp file and execute a "LOAD DATA
INFILE".

Not entirely seeing the real benefit achieved here though.

~~~
dragonwriter
> This is a specific PostgreSQL command.

You could reduce database roundtrips in a somewhat similar manner by doing a
single-statement multi-row insert (I don't know if the particular DB library
here supports multi-row inserts, but Postgres does and if its an FOSS library
the lack of support is fixable) without resorting to the vendor-specific bulk
copy command used in the article (which also has semantic differences from
INSERT which might surprise people who are lured into thinking it is a
semantically-equivalent performance optimization.)

------
cratermoon
I wonder how many times that import has to happen? Don't spend an hour
optimizing a second.

------
snarfy
The amount of N+1 issues, loops, database connections, and lazy loaded
properties in the LINQ code I deal with everyday repels the mind. The little
bits I fix are just spit in the ocean.

