Search Contact information
University of Cambridge Home Department of Engineering
University of Cambridge > Engineering Department > Machine Intelligence Lab

Abstract for rummery_tr166

Cambridge University Engineering Department Technical Report CUED/F-INFENG/TR166

ON-LINE Q-LEARNING USING CONNECTIONIST SYSTEMS

Gavin Rummery and Mahesan Niranjan

September 1994

Reinforcement learning algorithms are a powerful machine learning technique. However, much of the work on these algorithms has been developed with regard to discrete finite-state Markovian problems, which is too restrictive for many real-world environments. Therefore, it is desirable to extend these methods to high dimensional continuous state-spaces, which requires the use of function approximation to generalise the information learnt by the system. In this report, the use of back-propagation neural networks is considered in this context.

We consider a number of different algorithms based around Q-Learning combined with the Temporal Difference algorithm, including a new algorithm (Modified Connectionist Q-Learning), and Q(lambda). In addition, we present algorithms for applying these updates on-line during trials, unlike backward replay that requires waiting until the end of each trial before updating can occur. On-line updating is found to be more robust to the choice of training parameters than backward replay, and also enables the algorithms to be used in continuously operating systems where no end of trial conditions occur.

We compare the performance of these algorithms on a realistic robot navigation problem, where a simulated mobile robot is trained to guide itself to a goal position in the presence of obstacles. The robot must rely on limited sensory feedback from its surroundings, and make decisions that can be generalised to arbitrary layouts of obstacles.

These simulations show that on-line learning algorithms are less sensitive to the choice of training parameters than backward replay, and that the alternative update rules of MCQ-L and Q(lambda) are more robust than standard Q-learning updates.


(ftp:) rummery_tr166.ps.Z (http:) rummery_tr166.ps.Z
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
  (ftp:) rummery_tr166.pdf | (http:) rummery_tr166.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.

© 2005 Cambridge University Engineering Dept
Information provided by milab-maintainer