
Automatic Flushing: The Rails 3.1 Plan - ivey
http://yehudakatz.com/2010/09/07/automatic-flushing-the-rails-3-1-plan/
======
judofyr
I’ve been doing some research for this earlier, and my conclusion was: This is
_very hard_ , if not impossible, to implement automatically. The main problem
is that it’s impossible to handle exceptions correctly without making the
whole stack aware of it.

Currently, when an exception occurs, the system can simply change the response
(since the response hasn’t been sent to the client yet, but is only buffered
inside the system). With this approach, a response can be in x different
states: before flushing, after the 1st flushing, … and after the xth flushing.
And after the 1st flushing, the status, headers and some content has been sent
to the client.

Imagine that something raises an exception after the 1st flushing. Then a 200
status has already been sent, togeher with some headers and some content.
First of all, the system has to make sure the HTML is valid and at least give
the user some feedback. It’s not impossible, but still a quite hard problem
(because ERB doesn’t give us any hint of where tags are open/closed). The
system also need to take care of all the x different state and return correct
HTML in all of them.

Another issue is that we’re actually sending an error page with a 200 status.
This means that the response is cacheable with whatever caching rules you
decied earlier in the controller (before you knew that an error will occur).
Suddenly you have your 500.html cached all over the placed, at the client-
side, in your reverse proxy and everywhere.

Let’s not forget that exceptions don’t always render the error page, but do
other things as well. For instance, sometimes an exception is raised to tell
the system that the user needs to be authenticated or doesn’t have permission
to do something. These are often implemented as Rack middlewares, but with
automatic flushing they _also_ need to take care of each x states. And if it
for instance needs to redirect the user, it can’t change the status/headers to
a 302/Location if it’s already in the 1st state, and therefore needs to inject
a <script>window.location=’foo’</script> in a cacheable 200 response.

Of course, the views _shouldn’t_ really raise any exceptions because it should
be dumb. However, in Rails it’s very usual in Rails to defer the expensive
method calls to the view. The controllers sets everything up, but it’s not
until it needs to be rendered that it’s actually called. This increases the
possibilty that an exception is raised in the rendering phrase.

Maybe I’m just not smart enough, but I just can’t come up with a way to tackle
all of these problems (completely automated) without requiring any changes in
the app.

~~~
raggi
It does require changes in the app, but app authors who need this kind of
performance benefit will be willing to accept that hit. The solution is far
better than the alternatives:

\- Allowing users to flush manually (people screw this up real bad) \-
Changing the rack spec (allowing for #each on the body to be lazily yielded,
and terminating on nil or the like) \- Moving to an always async stack
(totally kills most users)

Yes, there are plenty of issues with this, and I agree with your concern, but
it is also something which can have a marked effect on performance for users.
It's also worth noting that a well componentised partial can render an error
in-place of the partial itself, for example, rendering a page that contains
the whole layout, and a single red box of errors (say a render of the _new
partial can be added to the buffer after a _create fails, instead of rendering
the success box). Yes, that requires some refactoring of the application
(rather than using for example, the standard 302 approach).

It's also worth noting that a larger class of applications that would find
this actually useful should generally have reasonable test coverage and code
maturity. Whilst this isn't always the case, we also don't protect users from
eval, and other evil tools, in ruby or rails.

~~~
judofyr
From the post:

 _For Rails 3.1, we wanted a mostly-compatible solution with the same
programmer benefits as the existing model, but with all the benefits of
automatic flushing_

And from there he goes on with very specific implmentation details and the
only caveat is some API change. This gives the impression that this is
something you can easily enable for any app.

I just want to point out that 100% automatic flushing is pretty much
impossible with the current state of Rack/Rails, and there's still plenty of
work before there's anything _near_ flushing support in Rails.

In addition, everyone should be aware of the trade-off you're making with
flushing (potentially sending 500 responses as 200 Ok etc.)

~~~
raggi
Yeah, I had already told Yehuda about the potential concurrency issues for
apps switching over to this, as fibers are stacks and stack switching can lead
to concurrency issues, especially dropping in a re-ordering of the render
chain like this. He said this is another of the reasons it should be optional.

I'm surprised he didn't raise this in the article, but I guess the article was
more about how it could work, than how it will in the final build, or how it
should in all cases.

I've got to disagree partially, or at least present a different point of view
about the 200/500 argument. It could be considered acceptable for the response
to be a 200, as the server has not completely errored out, it is only a
portion of the response that has errored, and at the application level. It
seems that some apps would return a 500 in this case, and then render a page,
suggesting that the server and app are broken. This is really something that
could very quickly turn into a bikeshed discussion, but you can probably see
my point even if you don't agree (I'm not sure I agree in all cases, but it is
food for thought).

As you probably well know, I've been doing the async on rails and other
frameworks game in ruby for about as long as anyone else that's publicly
producing code in this arena (in ruby), and I have to say that this solution
that they came up with is actually far better than most of the other hacks. It
would be really nice if ruby had performant generators for yielding, but
without them, we're left with less options. This fiber approach is getting us
pretty close, and I think it's worth exploring, even if it turns out to be a
bad idea for most people.

The Rack API won't really ever change significantly for the better in this
arena. People don't seem to want to lose the simple #call returning a tuple
protocol, and without making that either: a) asynchronous (in as much as not
relying on a return value, but a call to a response object), or b) some
significant changes to the contract for body, we can not really optimise many
ways to provide both simplicity and reasonable levels of granular control of
IO. ryah of node.js fame was writing an IO driven server years ago when I was
first working on the async rack hack that is in thin, zbatery, rainbows and
flow. We had a lot of discussions back then about how best to fit this stuff
into the Rack API. We both desired to use something similar to
Enumerator#next, but this would never fit much better into the existing
setups, and as already noted, is not very performant by default.

As you say, this is not trivial, but I would argue that this means we need to
experiment with more approaches, as neither of the currently presented
solutions (my async api, and this fiber api) are ideal, nor is buffering large
volumes of response data in memory, or taking a bigpipe / highly ajax style
approach.

~~~
judofyr
I see your point and I agree that it's possible to make your app gracefully
handle exceptions. However, there's no way Rails can automate it for you: you
still have to carefully design your app to handle it.

If Rails can help you make it _easier_ , that's great, I just don't see how
it's a _mostly-compatible_ solution as Yehuda wrote in the post.

I fully agree that we should explore this option, but at the same time we
should make people aware of the trade-offs and not present it as some setting
you can simply enable by _config.automatic_flushing = true_ (which makes Rails
do all the hard work).

Yehuda didn't even mention the word _exception_ in the blog post, so I wasn't
sure if was aware of the issues or not.

------
Twisol
So how does this work with Rack? Unless I'm mistaken, you have to return the
body all at once, which entirely negates the benefits here. I don't see Rails
mandating that an asynchronous server be used (i.e. Thin, Mongrel2, etc.), so
I'm rather confused.

~~~
judofyr
In Rack you need to return a body which responds to #each (which yields
strings); it doesn't need to return the body all at once:

    
    
        class Dummy
          def initialize(controller)
            @controller = controller
          end
          
          def each
            @controller.render.each { |part| yield part }
          end
        end
        
        @body = Dummy.new(self)

~~~
Twisol
Aaaah, and the work is done within #each and not within #call. I see. The only
issue is if you have a middleware that modifies the output, because unless
you're careful and/or you're doing something extremely minor near the start of
the page, it'll all be processed in the middleware rather than the server. So
the server still gets it all in one piece, and so does the client.

------
briandoll
From Yehuda on twitter: "BTW: Those who have brought up issues with
exceptions/status codes re: flushing, you're right, but it's not specific to
the fiber solution"

------
zbanks
Cool idea. It's one of those "cheap" speed boosts which are always nice to
find/have.

It'd be nice to see this implemented in Django as well...

------
aaronblohowiak
This encourages having SQL queries initiated by the view, after the header has
rendered. This seems antithetical to MVC to me.

~~~
collint
Not really, in Rails, you might have this controller code:

@things = Thing.where(:it => "good")

And this view code:

<% for thing in @things %> <%= thing.name %> <% end %>

But the SQL query doesn't fire in the controller. It gets kicked in the view
when you "for x in y"

Concerns still wonderfully separated.

~~~
aaronblohowiak
Not really, in Rails, you might have to do more than just retrieve some
models. For instance, you might have to load up the current user, grab some
stuff from memcache, check with your SSO system to validate the session, and
then retrieve the data pertinent to the current request. Then, you might have
to make some data modifications (which will create transactions and hit your
db.) Finally, the view rendering can begin.

In only the trivial cases can you defer the actual SQL queries from being
performed before the view is rendered.

~~~
dasil003
I don't get your point. Just because you may have to do some things in a
before_filter or whatever doesn't mean you need to do all things there. As far
as accessing stuff out of memcached is concerned, there's no reason that can't
also be deferred to the view.

Claiming that lazy loaded queries is only a benefit for "trival cases" is a
strawman. It's a hugely powerful functionality for ActiveRecord that you can
utilize in many ways, and would be very hard to implement without low level
support.

Cached attributes can often easily be made available via concise single model
methods that operate transparently without the controller OR the view needing
to know they are cache-backed. Plus, even if you are loading stuff out of
memcached in the controller, it's going to be fast, because that's the whole
point of memcached.

ActiveRecord meanwhile, normally takes a huge percentage of rendering time.
Being able to defer those queries while still allowing the controller to
declare them is actually a huge combination of performance flexibility and
separation of concerns. Previously, if you wanted to defer them "cleanly",
you'd have to create model methods, but even there you would have to pass
params through somehow or generally do something uglier than what you have to
do now.

~~~
aaronblohowiak
These are my assumptions:

1.The performance goal is to return http & html headers as quickly as
possible.

2\. Business logic belongs in the models, as triggered by method invocations
from the controller.

3\. A good deal of time is spent on business logic and request handling
(authentication / set-up / before_filter stuff.) Often, this requires network
I/O to backend systems (or databases.) While some data retrieval is necessary
only to render the view (and can thus be deferred gracefully,) other times
your application logic depends the completion of these lengthy requests in
order to complete the desired state modification.

4\. Since we want to return the headers as quickly as possible, we can either
a) figure out how to send the headers before the controller or b) figure out
how to delay the processing until after the controller.

\---

I think that that a) is better than b).

I like views that exclusively take data and format it for output. I like
having the core business logic in the models, and I like having the system
guards and request setup concentrated in the controllers. This way, I can have
an exhaustive understanding of the tasks performed by an action without having
to read through all of the views and their partials.

If we hide some of the processing within model actions and call those model
actions from the view, then I no longer can assume that all major processing /
I/O has happened by simply reading the controller and the model methods it
invokes. Instead, I also have to read all of the views.

This violates the expectations I have about processing times and MVC.

If instead, we could detect the requested format and return immediately with
the headers, then we could perform the overwhelming majority of the processing
while the browser is busy downloading static assets.

~~~
dasil003
There definitely is something that bothers me about using fibers in order to
hide an important optimization detail from the developer, and I agree it would
be better to have explicit control.

However I see the core team's point that they don't want to completely
overhaul the API in a way that's going to break arguably a majority of
existing apps, and at the same require the developer to pay attention to more
details than may be necessary.

However on the topic of breaking expectations about processing times, I
couldn't disagree more. To me, the main benefit of MVC (or any architecture
choice) is in isolating responsibilities, not isolating code execution. It's
not a big mental leap to conceive of a ActiveRecord relation as declaring some
data that's needed that may or may not be actually queried, depending on if
the view needs it. This helps your MVC separation, because it means you don't
need to extract view logic from your template just to make sure the controller
doesn't load something unnecessarily. Lazy execution of this form is a very
powerful optimization technique, just ask any Haskeller.

