Abstract for jervis_tr115

Cambridge University Engineering Department Technical Report CUED/F-INFENG/TR115

POLE BALANCING ON A REAL RIG USING A REINFORCEMENT LEARNING CONTROLLER

T.T.Jervis and F.Fallside

December 1992

In 1983, Barto, Sutton and Anderson published details of an adaptive controller which learnt to balance a simulated inverted pendulum. This reinforcement learning controller balanced the pendulum as a by-product of avoiding a cost signal delivered to the controller when the pendulum fell over. This paper describes their controller learning to balance a real inverted pendulum. As far as the authors are aware, this is the first example of a reinforcement learning controller being applied to a real inverted pendulum learning in real time.

The results show that the controller was able to improve its performance as it learnt, and that the task is computationally tractable. However, the implementation was not straightforward. Although some of the controller's parameters were tuned automatically by learning, some were not and had to be carefully set for successful control. This limits the usefulness of this kind of learning controller to small problems which are likely to be better controlled by other means. Before a learning controller can tackle more difficult problems, a more powerful learning scheme has to be found.

(ftp:) jervis_tr115.ps.Z (http:) jervis_tr115.ps.Z
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
(ftp:) jervis_tr115.pdf | (http:) jervis_tr115.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.