10.2. Perspective Projection Model¶
The previous diagram (Thin Lens Camera Model) accurately depicts the geometry of a camera. However, it does not yield a convenient mathematical model to relate points in the real world to points in the image.
The perspective projection model allows us relate object locations in the real world to the corresponding pixel locations in the image.
With the origin of the image coordinate system, , in the center of the image, the perspective projection equation relates the world locations to image locations.
Notice that given a location in the world, it is possible to determine the image location, but not the reverse. An image point defines a ray, which might intersect an object in the world at any distance away. Some knowledge of the geometry of the world is required to determine world locations from an image.
For film cameras, it is appropriate to think of in units of linear distance; but for digital cameras, pixels is a more convenient unit for . When is given in linear units (usually mm) for a digital camera, the linear size of the distance between pixels, , must also be used. The unit of is usually mm/pixel. Divide the focal length (mm) by (mm/pixel) to get the focal length in pixels.
Another observation from perspective projection is that as (distance from the camera) increases, objects in the image get smaller. Thus parallel lines that are not perpendicular to the optical axis appear to come to a point at a long distance away.
When objects are very far away, the , , and can be huge. If the camera is moved, those numbers hardly change. This explains why the moon seems to follow you, why the north star (Polaris) is always north, and why you can tell time from the sun regardless of where you are.
10.2.1. When Z is Constant¶
In industrial robotics applications it is common to position the camera so that is constant. For to be constant, the axis of the camera must be perpendicular to a flat surface being photographed. Images from a camera positioned over a conveyor belt carrying products can meet this scenario, as can a picture taken directly in front of a building.
When is constant and known, the ratio of in the perspective projection equation is constant for the whole image, so the size of objects can be measured from the distance (in pixels) between two points on the image.
The distance between two points in an image is:
Then, the distance between the points in the world can be computed.
Note: The most accurate results are obtained from points near the optical axis (center) of the image.
The ratio of the distances between points in the image and points in the world is also constant. Thus, if we know the distance between two points in the world, then the distance between two other points can be calculated from an image.
10.2.2. Projective Geometry¶
In the matrices used to relate points from an image to points in the physical world, we use homogeous coordinates, rather than Euclidean coordinates. In one sense, this may seem as just a convenience to allow matrix multiplication by adding another row to the matrices; but it is really more than that. Homogeneous coordinates are used in the geometry system known as projective geometry, which describes the relationship between the physical world and images taken by cameras.
Projective geometry is concerned with how the world appears to us; whereas, Euclidean geometry describes the actual dimensions of objects. The images that our eyes send to the brain are projective views. Then our brain translates the projective images to Euclidean space to give us our intuition about the dimensions of the objects that we see. Artists were the first to recognize the importance of projective geometry. Projective geometry has its origins in the early Italian Renaissance, particularly in the architectural drawings of Filippo Brunelleschi (1377–1446) and Leon Battista Alberti (1404–1472), who invented the method of perspective drawing. Italian fresco painter Piero della Francesca (1415–1492) described the mathematical model of projective geometry in his publication De Prospectiva Pingendi in 1478. Later, French mathematician Gérard Desargues (1591–1661) gave us more rigorous mathematical models and equations of projective geometry that we use today. Two other 17th century mathematicians that contributed to our knowledge of projective geometry were Blaise Pascal and Philippe de La Hire.
Here, we review a few math principles of projective geometry and homogeneous coordinates. Later, we will apply homogeneous coordinates to images.
Consider a normalized projective system where, per the above diagram, , then any point passing through the image (Euclidean) plane at point is on the same ray. Simply divide by to convert from homogeneous coordinates to Euclidean coordinates.
To convert a point in Euclidean coordinates to homogeneous coordinates, simply add an addition variable of value 1. Points along the same ray are scale invariant because any change is removed in the conversion back to Euclidean coordinates.
The 3-tuple homogeneous coordinate points are said to be in two dimensional projective space (), that is, they correspond to points on a plane in Euclidean space ().
Representing points and lines in homogeneous coordinates allows us to take advantage of some convenient properties.
Consider two homogeneous points, and , then the line between the points is the cross product of the points.
A line in homogeneous coordinates is also a 3-tuple. The point where two lines intersect is also found by a cross product.
See also
Data Analysis study guide for notes on cross products.
10.2.3. Camera Matrix¶
Here we discuss how to describe the perspective projection camera model with matrices such that the model is more useful as a software model.
In homogeneous coordinates, a point in the physical world is ideally projected onto an image with the following matrix multiplication.
Note
The Euclidean coordinates of a point relative to the principal point is .
This model is appropriate for a continuous (film) image plane. But for digital cameras, we have to divide the image plane into a grid of light sensors (pixels). The dimension of each pixel in this grid is wide and high. The pixels on typical digital camera are usually square, about 10 microns ( meters) on each side.
In MATLAB, the origin is in the upper, left corner of the image, so we can translate the image to shift the origin from the principal point.
To make the equations even more useful to machine vision applications, we can relate the position and orientation (pose) of the camera relative to a world coordinate frame.
Note
The and terms allow us to use the upper right corner of the image as the origin. The coordinates of a point is again found by converting from homogeneos coordinates to Euclidean coordinates, .
The product of the first two matrices is typically denoted by the symbol and we refer to these as the intrinsic parameters. All the numbers in these two matrices are functions of the camera itself. It does not matter where the camera is or where it is pointing, they’re only a function of the camera. These numbers include the height and width of the pixels on the image plane, the coordinates of the principal point, and the focal length of the lens.
The third matrix describes the extrinsic parameters related to where the camera is but it does not say anything about the parameters of the camera.
The product of these matrices is called the camera matrix, .
The camera matrix can also be found from a calibration procedure, which may be needed if some parameters are not known or if the lens causes any distortion to the image.
Note
The value of is often set to 1, in which case, is said to be a normalized camera matrix.
The MVTB has objects that model the projection of points onto an image plane of cameras with known parameters.
-
Example
Given a camera matrix and point in the world as below, find where the point will be in an image.
>> C C = 512 -110 1 800 512 512 -100 1600 1 1 0 0 >> P P = 20 30 60 >> P_tilde = [P;1] % homogeous coordinates P_tilde = 20 30 60 1 >> p_tilde = C*P_tilde p_tilde = 7800 21200 50 >> p = p_tilde(1:2)/p_tilde(3) % Cartesian coordinates p = 156 424
The image plane point is at .