Background

A point X=[X Y Z 1]^T is projected onto an image plane by a 3 x 4 projection matrix P , which can be expressed in homogeneous co-ordinates as

The matrix P has 11 degrees of freedom and can be computed from six known points.

If a second camera P' is introduced and the first camera is assumed to be the normalised camera P=[I | 0] then the fundamental matrix is given by:

Points on the two images are related by equation \ref{eq_epipolar_constraint} which can be expressed geometrically as the epipolar constraint (see figure \ref{fig_stereo}).

The ray projected from a point in the first image will be viewed as a line in the second image. Alternatively, a point in the world defines an epipolar plane which intersects both image planes at an epipolar line, and the projection of the point in the world must lie along the epipolar line. All epipolar lines intersect at the epipole, so the fundamental matrix must have a zero determinant, and this can be seen in equation \ref{eq_epipole}. Since the fundamental matrix has 9 entries, can be computed up to scale, and has a zero determinant, it must have 7 degrees of freedom.

If a third camera P'' is introduced, it defines a plane containing the three camera centres (the trifocal plane) and this plane introduces three constraints on the epipoles (equation \ref{eq_trifocal_plane_1}..\ref{eq_trifocal_plane_3}) which are can be seen in figure \ref{fig_trifocal_plane}.

The 3 x 3 x 3 trifocal tensor Hartley95 is a more general way of representing the geometry of three views and is expressed in terms of lines (equation \ref{eq_tensor_lines}). A line $\lambda$ in the first view defines a plane through its camera centre. If a line $\lambda'$ or point $p'$ in the second image is projected onto this plane, it defines a line or point in the world. This line or point in the world can then be projected into the third view. Alternatively contracting the tensor with $\lambda$ defines a homography for lines from the second to third images (equation \ref{eq_tensor_H}).

The trifocal tensor has 18 (3 x 7 - 3) degrees of freedom and can be computed linearly from 7 point matches. An alternative computation by Quan Quan94 using a canonical co-ordinate system, requires only 6 points matched across the three views, but produces one or three solutions. This approach was implemented and analysed in detail by Torr and Zisserman Torr97 who used a combination of robust sampling and optimisation to accurately compute the trifocal tensor. An alternate approach by Hartley Hartley98 adjusts the epipoles to minimise the algebraic error of the linear solution. This paper describes an extension of this work, so that it can be used to compute the geometry of long image sequences.



Accepted as a poster for CVPR 99