ON INFORMATION THEORY AND UNSUPERVISED NEURAL NETWORKS
Mark D. Plumbley
In recent years connectionist models, or neural networks, have been used with some success in problems related to sensory perception, such as speech recognition and image processing. As these problems become more complex, they require larger networks, and this typically leads to very slow training times. Work on these has primarily involved the use of supervised models, networks with a `teacher' which indicates the desired output. If it were possible to use unsupervised models in the early stages of systems to help with the solutions to these sensory problems, it might be possible to approach larger and more complex problems than are currently attempted. We may also gain more insight into the representation of sensory data used in such sensory systems, and this may also help with our understanding of biological sensory systems.
In contrast to supervised models, unsupervised models are not provided with any teacher input to guide them as to what they should `learn' to perform. In this thesis, an information-theoretic approach to this problem is explored: in particular, the principle that an unsupervised model should adjust itself to minimise the information loss, while in some way producing a simplified representation of its input data as output.
Initially, general concepts about information theory, entropy and mutual information are reviewed, and some systems which use other information-theoretic principles are described. The concept of information loss and some of its properties are introduced, and this concept is related to Linsker's `Infomax' principle. The information loss across supervised learning systems is briefly considered, and various conditions are described for a close match between minimisation of information loss and minimisation of various distortion measures.
Next information loss across a simple linear network with one layer of processing units is considered. In order to progress, an assumption must be made concerning the noise in the system instead. With the noise on the input to the network dominant, a network which performs a type of principal component analysis is optimal. A common framework for various neural network algorithms which find principal components of their input data is derived, these are shown to be equivalent in an information transmission sense.
The case of significant output noise for position and time-invariant signals is considered. Given a power cost constraint in our system, the form of the optimum linear filter required to minimise the information loss under this constraint is analysed. This filter changes in a non-trivial manner with varying noise levels, mirroring the way that the response of biological retinal systems changes as the background light level changes. When the output noise is dominant, the optimum configuration can be found by using anti-Hebbian algorithms to decorrelate the outputs. Various forms of networks of this type are considered, and an algorithm for a novel Skew-Symmetric Network which employs inhibitory interneurons is derived, which suggests a possible role for cortical back-projections.
In conclusion, directions for further work are suggested, including the expansion of this analysis for systems with various non-linearities; and general problems of representation of sensory information are discussed.
NB New version created 1Apr98 with fixes for postscript bug in original
If you have difficulty viewing files that end
which are gzip compressed, then you may be able to find
tools to uncompress them at the gzip
If you have difficulty viewing files that are in PostScript, (ending
'.ps.gz'), then you may be able to
find tools to view them at
We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.