This is very interesting in more modern fields of reinforcement learning where d...

james_s_tayler · on Aug 22, 2019

Well, the paper doesn't say that that doesn't work, just that it's not optimal. If you set the aircon to a given temperature it will cycle on when the temperature dips below the desired value and cycle off when it exceeds it. You're essentially regulating it by the error-states which the paper says doesn't minimize H(Z), so while it works it's only 'partial successfuly'. I guess that means the aircon doesn't keep the temperature rock-solid, but rather it fluctuates it within an acceptable range. I mean, it's a fine result for every-day life, but I guess theoretically according to this paper a better outcome is actually achievable??

BoiledCabbage · on Aug 22, 2019

> You do build a model but its model of observations and actions, not the heater itself. In fact, for vast majority of things we control such as bicycle or fan or oven, people rarely knows the model of these very complex systems (in terms physical principals and internals).

I think you're mistaking the term model for a replica. You don't need a replica, but you do need a model for your relevant tasks.

If the heater needs 30 mins to warm up before it starts pumping heat, you'll need to model that. If it overheats after running for 3hrs straight you'll need to model that too. You don't need every molecule of the physical item replicated but you do need to model all of the relevant behaviors.

dTal · on Aug 22, 2019

This sounds a little circular. If a "model" is defined as a representation of a system of sufficient fidelity to control it, then of course all controllers contain models. Deciding which behaviors are "relevant" would appear to be the key issue here.

mannykannot · on Aug 22, 2019

You have a point, but I think it is useful to have a unifying general principle to use in analyzing the design of regulators, and having a theoretical basis makes it harder to dismiss.

I can think of a couple of places where it could have avoided accidents (one tragic and one expensive.) The 1974 DC10 crash in Paris was supposed to be impossible because the airplane could not be pressurized unless the cargo door was properly latched, but the mechanism depended on the position of the handle rather than the latching pins. At Three Mile Island, the operators were trained to use the pressurizer water level as a primary indicator of the state of the system, and turned off the emergency cooling feed as a consequence.

longtom · on Aug 22, 2019

A model is in practice mostly an approximation rather than a replica. Think of a water bottle that is about to fall off a table. You brain does not model the trajectory of 10^26 atoms it consists of, but it has some abstract representations of bottles and the way bottles propagate through space, approximately by Newton's laws. Not only does this suffice for catching the bottle before it falls, but the computational capacity of the brain and the resolution of our sensory organs and of light rays fwiw. are also way too limited for us to model the bottle entirely.

carapace · on Aug 22, 2019

> The peculiarities of cybernetics. Many a book has borne the title “Theory of Machines”, but it usually contains information about mechanical things, about levers and cogs. Cybernetics, too, is a “theory of machines”, but it treats, not things but ways of behaving. It does not ask “what is this thing?” but “what does it do?” Thus it is very interested in such a statement as “this variable is undergoing a simple harmonic oscillation”, and is much less concerned with whether the variable is the position of a point on a wheel, or a potential in an electric circuit. It is thus essentially functional and behaviouristic.

~W. Ross Ashby "Introduction to Cybernetics", http://pespmc1.vub.ac.be/ASHBBOOK.html

I worked at a place that had some sort of Just-in-Time heater for the sinks in the bathroom. One of them oscillated too hot too cold too hot too cold, etc. Eventually someone adjusted some setting and it worked properly. That setting was (part of) the "model" of the heater+sink+bathroom system.

jabl · on Aug 22, 2019

For many simple systems a PID-style controller does well enough, but for more complex systems some kind of state-based model tends to work out better, whether the model is built on knowledge of the system or by observing its behavior.

For the latter, there's an entire field devoted to this problem called "system identification", one popular textbook seems to be online nowadays at http://user.it.uu.se/~ts/sysidbook.pdf

longtom · on Aug 22, 2019

Just looked up model-based RL to understand the difference and found this article to be surprisingly elucidating:

https://medium.com/@jonathan_hui/rl-model-based-reinforcemen...

ssl232 · on Aug 22, 2019

I guess you could make it more intuitive by thinking about the system in terms of the units of your feedback (e.g. force on an actuator like a bicycle handle bar in your example) rather than the deeper, underlying physics. In the end it doesn't matter which you use as long as it works.