Finding Points in 3D Camera Coordinates from Points on Image Plane


When a camera is viewing an object on the flat ground, the bottom point PP of the object is projected on the camera image plane. The camera coordinate system here is right handed whose view vector is xx-axis and xx-axis points outwards from the screen. Let ϕ[0,π2]\phi\in[0, \frac \pi 2] be the tilt angle of the camera, then the view vector is parallel with the ground when ϕ=0\phi=0 and intersects with a point on the ground when ϕ0\phi\not =0. Lastly, HH is the camera height in meter.

1. The ground plane equation

If yy-axis of the camera is rotated by ϕ\phi around its xx-axis, its yy-axis will be parallel with the normal vector nn of the ground. Therefore, n-n is (1000cosϕsinϕ0sinϕcosϕ)(010)=(0cosϕsinϕ)=n\begin{aligned} \left(\begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos\phi & -\sin\phi \\ 0 & \sin\phi & \cos\phi \end{array}\right) \left(\begin{array}{c} 0 \\ 1 \\ 0 \end{array}\right) = \left(\begin{array}{c} 0 \\ \cos\phi \\ \sin\phi \end{array}\right) = -n \end{aligned}

So nn is (0,cosϕ,sinϕ)t(0, -\cos\phi, -\sin\phi)^t. It implies that the ground plane is the plane whose normal vector is nn and distance from the camera origin is HH. Therefore, the ground plane equation is Hycosϕzsinϕ=0H-y\cos\phi-z\sin\phi=0.

2. Finding PP

P(x,y,z)P(x, y, z) is the object point which is on the ground plane, so this point is satisfied with the ground plane equation. Let P(u,v,1)P'(u, v, 1) be the point on the image plane and KK is the intrinsic matrix of the camera, then PP' can be written by KP=(f0w20fh2001)(xyz)=(fx+wz2fy+hz2z)(fxz+w2fyz+h21)=(uv1)\begin{aligned} KP = \left(\begin{array}{ccc} f & 0 & \frac w 2 \\ 0 & f & \frac h 2 \\ 0 & 0 & 1 \end{array}\right) \left(\begin{array}{c} x \\ y \\ z \end{array}\right) = \left(\begin{array}{c} fx + \frac{wz}{2} \\ fy + \frac{hz}{2} \\ z \end{array}\right) \cong \left(\begin{array}{c} \frac{fx}{z} + \frac w 2 \\ \frac{fy}{z} + \frac h 2 \\ 1 \end{array}\right) = \left(\begin{array}{c} u \\ v \\ 1 \end{array}\right) \end{aligned}

where ww and hh are the width and height of the camera. It means P((uw2)zf,(vh2)zf,z)P((u-\frac w 2)\frac z f, (v-\frac h 2)\frac z f, z), where zz component of PP is z=Hycosϕsinϕ(ϕ0)z=\frac{H-y\cos\phi}{\sin\phi}(\phi\not =0) from the ground plane equation. Consequently, PP is as follows. y=(vh2)Hycosϕfsinϕ    y=(vh2)Hfsinϕ+(vh2)cosϕx=(uw2)Hycosϕfsinϕ=(uw2)1fsinϕ(H(vh2)Hcosϕfsinϕ+(vh2)cosϕ)=(uw2)Hfsinϕ+(vh2)cosϕz=1fsinϕ(H(vh2)Hcosϕfsinϕ+(vh2)cosϕ)=fHfsinϕ+(vh2)cosϕ\begin{aligned} y &= \left(v-\frac h 2\right)\frac{H-y\cos\phi}{f \sin\phi} \implies y = \boxed{\frac{\left(v-\frac h 2\right)H}{f \sin\phi+\left(v-\frac h 2\right)\cos\phi}} \\ x &= \left(u-\frac w 2\right)\frac{H-y\cos\phi}{f \sin\phi} \\ &= \left(u-\frac w 2\right)\frac{1}{f \sin\phi} \left(H-\frac{\left(v-\frac h 2\right)H\cos\phi}{f \sin\phi+\left(v-\frac h 2\right)\cos\phi}\right) \\ &= \boxed{\frac{\left(u-\frac w 2\right)H}{f \sin\phi+\left(v-\frac h 2\right)\cos\phi}} \\ z &= \frac{1}{f \sin\phi} \left(H-\frac{\left(v-\frac h 2\right)H\cos\phi}{f \sin\phi+\left(v-\frac h 2\right)\cos\phi}\right) = \boxed{\frac{fH}{f \sin\phi+\left(v-\frac h 2\right)\cos\phi}} \end{aligned}

What if ϕ0\phi\not =0?

When ϕ0\phi\not =0, z=Hycosϕsinϕz=\frac{H-y\cos\phi}{\sin\phi} is not defined because its denominator is zero. Instead, y=Hy=H from the ground plane equation in this case. It also implies that (vh2)zf=H\left(v-\frac h 2 \right)\frac z f=H and z=fHvh2z=\frac{fH}{v - \frac h 2}. Moreover, x=(uw2)Hvh2x=\frac{\left(u-\frac w 2 \right)H}{v - \frac h 2} from (uw2)zf\left(u-\frac w 2\right)\frac z f. Note that this result is the same with PP as above when ϕ=0\phi=0.

What if ϕ0\phi\not =0 and v=h2v=\frac h 2?

When ϕ0\phi\not =0 and v=h2v=\frac h 2, PP is not defined since its all components go to infinity. But it makes sense because if the tilt angle is zero, the view vector is parallel with the ground and there is no intersection between them, or they meet at infinity.

3. Finding a point over PP

Until now, PP is always on the ground plane. But objects are not always on the ground and they can be over the ground. In this case, it is required that the height of the object from the ground be known. Let rr be the height of the object, then it is possible to imagine that the ground plane is going up by rr. It means that this point is on the new ground plane again whose distance from the camera origin is HrH-r, not HH. Therefore, PP which is over the ground can be modified as follows: x=(uw2)(Hr)fsinϕ+(vh2)cosϕy=(vw2)(Hr)fsinϕ+(vh2)cosϕz=f(Hr)fsinϕ+(vh2)cosϕ\begin{aligned} x &= \frac{\left(u-\frac w 2\right) \left(H-r\right)}{f \sin\phi+\left(v-\frac h 2\right)\cos\phi} \\ y &= \frac{\left(v-\frac w 2\right) \left(H-r\right)}{f \sin\phi+\left(v-\frac h 2\right)\cos\phi} \\ z &= \frac{f \left(H-r\right)}{f \sin\phi+\left(v-\frac h 2\right)\cos\phi} \end{aligned}

4. Accuracy on the image plane


The adjacent pixel points projected on the image plane from the ground may not close enough on the ground. In other words, the points that are close on the image plane can be far from each other in the camera coordinates. (u1,v1,1)t\left(u_1, v_1, 1\right)^t and (u2,v2,1)t\left(u_2, v_2, 1\right)^t are adjacent as above, and let z1z_1 and z2z_2 be zz components of the 3D points on the ground which are projected on them. z1=fHfsinϕ+(v1h2)cosϕ    v1=f(Hz1cosϕtanϕ)+h2z2=fHfsinϕ+(v2h2)cosϕ    v2=f(Hz2cosϕtanϕ)+h2\begin{aligned} z_1 &= \frac{fH}{f \sin\phi+\left(v_1-\frac h 2\right)\cos\phi} \implies v_1 = f \left(\frac{H}{z_1 \cos\phi}-\tan\phi\right) + \frac h 2 \\ z_2 &= \frac{fH}{f \sin\phi+\left(v_2-\frac h 2\right)\cos\phi} \implies v_2 = f \left(\frac{H}{z_2 \cos\phi}-\tan\phi\right) + \frac h 2 \end{aligned}

Now that v1v_1 and v2v_2 are adjacent, v2v1=1v_2-v_1=1. v2v1=fHcosϕ(1z21z1)=1    1z21z1=cosϕfH\begin{aligned} v_2-v_1 &= \frac{fH}{\cos\phi} \left(\frac{1}{z_2} - \frac{1}{z_1} \right)=1 \implies \frac{1}{z_2} - \frac{1}{z_1} = \frac{\cos\phi}{fH} \end{aligned}

This result implies that the larger ϕ\phi, ff, and HH are, the smaller z1z2\vert z_1-z_2 \vert is, the less the distance between two 3D points from (u1,v1,1)t\left(u_1, v_1, 1\right)^t and (u2,v2,1)t\left(u_2, v_2, 1\right)^t is.

© 2024. All rights reserved.