
Mixing DOM and PHP objects for fun & profit - kouiskas
http://dt.deviantart.com/blog/38193586/
======
hackernewz
Sounds like really bad coding taken to another level. At some point in your
program you will know which thumbnails are needed for the page. Instead of
structuring that logic and commenting it well, perhaps even turning it into a
sub-system or library, you pepper the templates with database calls. And then,
you waste more CPU time by scanning the full HTML output of a page multiple
times.

This is like AJAX all on the server side. All the hassle of async processing
with none of the benefits.

~~~
kouiskas
I think you misunderstood the article, the point of that technique is to
greatly reduce the amount of database calls. We don't "pepper the templates
with database calls", to the contrary, we avoid doing them on the spot like a
bad implementation would. We treat the data needs for these objects all at
once, as late as possible (so we can group the data fetching needs of as many
objects as possible), greatly reducing the amount of DB calls needed.

We don't scan the HMTL either, this system doesn't have templates, this is
another point of the technique. The data structure holding the output contains
both DOM and objects precisely to avoid a templating syntax, which would be
costly to scan.

The tree that contains intertwined objects and DOM is only traversed once,
when it's echoed. The data resolution is done in passes, but doesn't require
traversing the "output" tree.

~~~
adnam
Ok, so you're not writing "mysql_query()" in the middle of your templates, but
you are letting them decide what data they need. Your view layer is pulling
data down from the model layer. And the whole DOM-Object "intertwining" thing
would worry me: how do you test it, for a start.

~~~
kouiskas
Only some of the data, when it makes sense to use that technique. Most of the
data fetching on our pages is still done the old-fashioned MVC way. You can
keep the best of both worlds by carefully selecting which parts you decide to
defer. The technique is most interesting in situations where the relationship
between controllers and views is complex and "bubbling" a new need for data
inside a view's logic back to its controller is difficult. It makes even more
sense to use when you identify similar needs in views/controllers pairs that
are far in scope from each other. Like the AJAX example I gave, you can group
DB queries across completely separate AJAX requests that have nothing to do
with each other.

The intertwining part is quite straightforward to sanity check and unit test.
For its output buffering aspect, PHP provides a way to check that you closed
as many levels of buffering as you opened. As for object resolution, you can
detect easily when things go wrong like attempting to render objects whose
data hasn't been fetched by the resolution passes. It's difficult for me to
prove "how safe" that part of the technique is without going into a lot of
details about our implementation. That tool is so low-level for us that it's
one of the most tested, error-checking areas of our code. When debugging you
can also visualize the tree of intertwined DOM and objects at any stage of
construction or resolution.

------
yinrunning
Wow. This is a really ugly workaround for a problem that began when you
started using MVC. I would assume that any savings in db calls is completely
wasted in processing overhead. Even more was wasted just coming up with this
rigmarole. More reasons why MVC is bad juju.

~~~
banksyw
With respect I think you have missed the point a little. The thing this idea
solves is separating data-fetching optimisation from code layout. Regardless
of whether you use MVC or entirely procedural code, it gets very hard to keep
data access optimal when you have very complex and highly modular pages
without introducing hacky globals or coupling unrelated parts of the code
together (should be obvious why that's bad).

For any one page you may be able to change the structure so that it is more
efficient without compromising the code too much but some of those modules
will need to be reused in a completely different context on a different page.

I'm perhaps not explaining very well but this idea isn't solving deficiencies
with MVC, it is solving a problem inherent in ANY attempt to modularise/re-use
code. To gain the same data-access efficiency without MVC would leave your
code unreadable not to mention un-maintanable with a complex site and high
frequency changes.

Many sites simply don't need this so yes it would be overkill and of little
benefit for them. I don't think the article was suggesting such a scheme would
work well for everyone else. If you do have pages which legitimately need to
make hundreds of queries/cache gets to resolve the data requirements though,
techniques like this can help a lot.

------
tlack
I wish the examples here were more concrete. This could be a powerful
mechanism, beyond the usual tiresome MVC.

~~~
Nycto
I had the same thought, so I whipped up how I would have implemented it with
PHP:

<http://pastebin.com/YD79HaxU>

Caveat: That code is completely untested

~~~
toolate
That looks like a good start, but is missing the multi-pass approach from the
original article.

------
adnam
"Datex"? Scary. Not sure I agree that the example given at the beginning of
the article is a big problem (why not just make the model apis more fine-
grained?), and also not sure what this has to do with MVC specifically.

------
mildweed
Do you have any stats on the CPU/memory usage difference using this approach?
Obviously your database/memcache calls are optimized, but I'm curious what the
overall effect is.

~~~
kouiskas
When we deployed it, I started by adding only the overhead (output buffering
and storing the output in a tree to render it later) throughout the entire
codebase. When I benchmarked the before/after of this step, the performance
difference on our pages wasn't measurable, it was within the margin of error.
Which means that it was all gain from then on when we started using it to
really save time on the DB and cache.

It's impossible for me to compare right now between using it and not using on
a given page, because it requires so much rewriting and change in the way you
structure parts of the code that you can't just turn it on or off.

I think the best way to compare would be to fork an open-source framework to
use that technique and then look at the difference in CPU and memory usage to
run the same site. I wish I had time to do that kind of tedious research for
an article, but I don't... My goal was merely to share the idea, I'd be
thrilled if someone picks it up and does an implementation that everyone can
measure.

I developed that technology over months on a constantly shifting closed source
codebase with 20+ developers committing code daily, that's another reason why
comparing before/after is a bit difficult to achieve sometimes. Deploying that
tech was a massive task itself, very far from a single source branching.

If I had to guess, I'd say that PHP memory usage would be increased, but not
by much. After all you're only storing as much data as your final page HTML
output - generally you want to keep that to a small size - plus some very
small objects. However if your traditional MVC framework was already doing a
lot of output buffering, there might not be a difference, the buffering is
just moved to this new technique. As for CPU, I don't think it would be
noticeable, we're just adding things to a small tree then traversing it once.

I think what is most wasteful about out implementation is the little extra
code it creates to deal with datex instead of just echoing content. That's why
I mention in the article that this would be better if handled at the language
level, where all the concerns about memory and CPU could be highly optimized,
in addition to benefiting from lighter syntax.

------
amccloud
So... lazy loading database queries and including partial templates?

