# ECE 661: Computer Vision

## Other Course Material (from handouts and lecture)

### Camera Matrix

$P= \begin{bmatrix} \vec{p_1} & \vec{p_2} & \vec{p_3} & \vec{p_4} \end{bmatrix}$ where each of $\vec{p_1}$ through $\vec{p_3}$ is the image of the point at infinity along the $\vec{x}$ through $\vec{z}$ direction.

$\vec{x}=K R \begin{bmatrix} I \Big| -\vec{\tilde{C}} \end{bmatrix} \vec{X}$

$\vec{x}=M \begin{bmatrix} I \big| M^{-1} \vec{p_4} \end{bmatrix} \vec{X}$

#### Affine Camera

The last row of $P$ is $(0, 0, 0, 1)$

#### Orthographic Camera

Always positioned at infinity $\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

### Stratified reconstruction hierarchy

See section 2.4: "A hierarchy of transformations" and section 3.4: "The hierarchy of transformations"

1. projectivity
• Straight lines go to straight lines
• planar projectivity has 8 DOF
• 3-space projectivity has 15 DOF (every element of the $4 \times 4$ matrix except for the scale factor)
2. affinity
• parallel lines go to parallel lines
• lines at infinity stay at infinity (Vegas?)
• planar affinity has 6 DOF (4 for arbitrary upper-left matrix, 2 for translation)
• 3-space affinity has 12 DOF (9 for arbitrary upper-left matrix, 3 for translation)
3. similarity (metric reconstruction, aka equi-form)
• preserves right angles
• planar similarity has 4 DOF (an additional DOF over isometry to account for istotropic scaling)
• 3-space similarity has 7 DOF (an additional DOF over isometry to account for istotropic scaling)
4. isometry (euclidean reconstruction)
• preserves euclidean distance
• planar isometry has 3 DOF (1 for rotation and 2 for translation)
• 3-space isometry has 6 DOF (6 for rotation and 3 for translation)

### Conics

#### Dual conic

$C^* = C^{-1}$

#### Polar lines

A point $x$ and a conic $C$ define a line $\textbf{l} = C x$.

### Epipolar geometry

• $F$ is rank 2 with 7 DOF
• $x^T F x' = 0$
• "Most fundamental relationship"
• $l' = F x$
• $l = F^T x'$
• $F e = 0$
• $e' F = 0$

### RANSAC

What is the expression for the probability of getting at least one trial with no outliers given $N$ trials?

• Let $\epsilon$ be the probability that a data element is an outlier
• Let $\omega = 1 - \epsilon$ be the probability that a data element is an inlier
• Let $s$ be the minium number of datum needed for constructing an estimate
• Probability that all selected elements are inliers: $\omega^s$
• Probability that at least one element is an outlier: $1 - \omega^s$
• Probability all $N$ trials suffer from corrupted estimates: $(1 - \omega^s)^N$
• Probability that at least one trial has no outliers: $\Phi = 1 - (1 - \omega^s)^N = 1 - \left (1 - (1 - \epsilon)^s \right )^N$
• $\epsilon$ depends on the data (usually chosen empirically), $s$ depends on what entity is being estimated
• We try to choose $N$ such that $\Phi \ge 0.99$

### Binary Images

#### Thresholding (Discriminant Analysis) - Otsu's algorithm

within-class variance
$\sigma_W^2 = \omega_0 \sigma_0^2 + \omega_1 \sigma_1^2$
between-class variance
$\sigma_B^2 = \omega_0 (\mu_0 - \mu_T)^2 + \omega_1 (\mu_1 - \mu_T)^2 = \omega_0 \omega_1 (\mu_1 - \mu_0)^2$
total variance
$\sigma_T^2=-\sum_{i=1}^L (i-\mu_T)^2 p_i$
• We wish to maximize the ratio of $\sigma_B^2$ to $\sigma_W^2$.
• A fast implementation uses aggregate results from the previous bin to calculate the next bin.

### Corner detection

• Why use corners as features to track across image sequences?
• aperture problem

Geometric interpretation of eigenvectors of $C$:

• eigenvectors encode edge directions
• eigenvalues encode edge strength
• $\lambda_1 = \lambda_2 = 0$: uniform gray value, and $C$ is a null matrix
• $\lambda_1 > 0$ and $\lambda_2 = 0$: $C$ is rank-deficient, we have an edge
• $\lambda_1 \ge \lambda_2 > 0$, we have a corner.

### Edge Finding

#### Roberts Operator

$\begin{bmatrix} 0 & 1 \\ -1 & 0 \\ \end{bmatrix}$

$\begin{bmatrix} 1 & 0 \\ 0 & -1 \\ \end{bmatrix}$

#### Canny Edge Detector

Optimality criterion

• Good detection
• minimize false positives (noise)
• minimize false negatives (don't miss real edges)
• Good localization
• Detected edges should be as close as possible to true edges
• Single response constraint
• return only one point for each true edge point (use hysteresis)

• Increasing filter size improves detection at the expense of localization

#### Chain codes / Crack codes

• 2 bits (3 bits) encode direction for 4-connectedness (8-connectedness)
• Crack codes fall on the "cracks" between pixels; pixels are not interpreted as part of the boundary

### Graph Cuts

Recall, "Rayleigh quotient": $\frac{x^T \textbf{A} x}{x^T x}$

$\frac{y^T (\textbf{D} - \textbf{W}) y}{y^T \textbf{D} y}$ The second-smallest eigenvector is used because $y_i$ does not necessarily take on two discrete values.

### Machine Learning/Class Discrimination

#### Entropy

$H(x)=-\sum_{i=1}^n p_i \log_2(p_i)$

#### Conditional Entropy

$H(Y|X) = H(X,Y) - H(X)$

## Fall 2006 Midterm

### 1

Given the identity $l \cdot (l\times l') = 0$ and the fact that $x = l \times l'$, we know $l^T x = 0$.
Therefore $x$ is on $l$.

Similarly, $l'\cdot(l' \times l) = l'^T x = 0$. Thus $x$ is also on $l'$.

Since $x$ lies on both lines it must be the point of intersection.

### 2

Given the two identities $x(x \times x') = 0$ and $x'(x' \times x) = 0$ and the point $l = x \times x' = x' \times x$,

$lx = lx' = 0$

Thus l passes through both x and x' and is therefore the line joining the two points. Given $x' = Hx,$

### 3

Start with $l' = H^{-T}l:$
$l'^T = l^T H^{-1}$ take the transpose of both sides
$l'^Tx' = l^TH^{-1}x'$ post multiply both sides by x'
$l'^Tx' = l^TH^{-1}Hx$ convert x'
$l'^Tx' = l^Tx = 0$
$\therefore$ true

Note that the previous exam asks you to prove a false statement.

Given $x = Hx' \implies x' = H^{-1}x$

$l' = H^{-T}l:$
$l'^T = l^T H^{-1}$
$l'^Tx' = l^TH^{-1}x'$
$l'^Tx' = l^TH^{-1}H^{-1}x$
$l'^Tx' = 0$
$l^TH^{-1}H^{-1}x \neq 0$
$\therefore l'^Tx' \neq l^TH^{-1}H^{-1}x$
$\therefore false$

### 6

#### Part (a)

$L^* = PQ^\textrm{T} - QP^\textrm{T}$

$\pi = L^*X$

$\pi^\textrm{T} = (L^*X)^\textrm{T} = X^\textrm{T}L^{*T}$

\begin{align} \pi^\textrm{T} X & = X^\textrm{T}L^{*\textrm{T}}X \\ & = -X^\textrm{T}L^* X \\ & = -X^\textrm{T}(PQ^\textrm{T} - QP^\textrm{T})X \\ & = -X^\textrm{T}PQ^\textrm{T}X + X^\textrm{T}QP^\textrm{T}X \end{align}

$X$ lies on $\pi$ therefore $\pi^\textrm{T}X = 0$ and $X^\textrm{T}\pi = 0$

We can assume $P = \pi$ without loss of generality.

Therefore $\pi^\textrm{T}X = 0 = -0Q^\textrm{T}X + X^\textrm{T}Q0 = 0$

Therefore $\pi = L^*X$

#### Part (b)

$L = AB^\textrm{T} - BA^\textrm{T}$

$X = L\pi$

$X^\textrm{T} = (L\pi)^\textrm{T} = \pi^\textrm{T}L^\textrm{T}$

\begin{align} X^\textrm{T}\pi & = \pi^\textrm{T}L^\textrm{T}\pi \\ & = -\pi^\textrm{T}L\pi \\ & = -\pi^\textrm{T}(AB^\textrm{T} - BA^\textrm{T})\pi \\ & = -\pi^\textrm{T}AB^\textrm{T}\pi + \pi^\textrm{T}BA^\textrm{T}\pi \end{align}

$A$ and $X$ lie on $\pi$ therefore $X^\textrm{T}\pi = 0$ and $\pi^\textrm{T}A = 0$ and $A^\textrm{T}\pi = 0$

Therefore $X^\textrm{T}\pi = 0 = -0B^\textrm{T}\pi + \pi^\textrm{T}B0 = 0$

Therefore $X = L\pi$

### 8

If the brightness values in the x and y directions are thought of as random variables then C is a scaled version of their covariance matrix.

The eigenvectors of a covariance matrix form an the orthogonal basis which yeilds the highest entropy along the axes.

C is a scaled version of the covariance matrix of the brightnesses in the x and y directions .

### 9

$P= \begin{bmatrix} \vec{p^1}^T \\ \vec{p^2}^T \\ \vec{p^3}^T \\ \end{bmatrix}$

Each row $p^i$ represents a plane.

$\vec{p^i}^T \textbf{X} = 0$ means that point $\textbf{X}$ lies on plane $p^i$. Thus $\vec{p^3}^T \textbf{X} = 0$ means that $\textbf{X}$ lies on the principal plane, and lying on the planes $p^1$ or $p^2$ mean that the projected point $x$ will lie on the $\hat y$ or $\hat x$ image axis, respectively.

### 10

A world point lying on the principal axis will project to an an image coordinate (0,0). Does this help? Consult p. 158-159

## Fall 2006 Final

### 2

• If $\textbf{X}$ is on $\pi$, then $\textbf{X}^T \pi = 0$.
• If $x$ is on $\textbf{l}$, then $x^T \textbf{l} = 0$.

\begin{align} \textbf{X}^T \pi & = \textbf{X}^T \left ( \textbf{P}^T \textbf{l} \right ) \\ & = \left ( \textbf{P} \textbf{X} \right )^T \textbf{l} \\ & = x^T \textbf{l} \\ \end{align}

### 3

#### Part (a)

Given camera matrix $\vec{x}=K R \begin{bmatrix} I \big| {-\vec{\tilde{C}}} \end{bmatrix} \vec{X}$, a world point $\textbf{X}_\infty = \begin{bmatrix} \textbf{d}^T & 0 \end{bmatrix}^T$, maps as...

\begin{align} x & = \textbf{P} \textbf{X}_\infty \\ & = K R \begin{bmatrix} I \big| {-\vec{\tilde{C}}} \end{bmatrix} \begin{bmatrix} d_1 \\ d_2 \\ d_3 \\ 0 \end{bmatrix} \\ & = K R \textbf{d} \\ & = \textbf{H} \textbf{d} \end{align}

#### Part (b)

Conics transform as $C' = \textbf{H}^{-T} C \textbf{H}^{-1}$. Therefore, the IAC

\begin{align} \omega &= \textbf{H}^{-T} \Omega_\infty \textbf{H}^{-1} \\ &= \textbf{H}^{-T} \textbf{I} \textbf{H}^{-1} \\ &= \textbf{H}^{-T} \textbf{H}^{-1} \\ &= \left ( \textbf{K R} \right ) ^{-T} \left ( \textbf{K R} \right )^{-1} \\ &= \left ( \textbf{R}^T \textbf{K}^T \right )^{-1} \left ( \textbf{K R} \right )^{-1} \\ &= \textbf{K}^{-T} \textbf{R}^{-T} \textbf{R}^{-1} \textbf{K}^{-1} \\ &= \textbf{K}^{-T} \textbf{R} \textbf{R}^{-1} \textbf{K}^{-1} \\ &= \textbf{K}^{-T} \textbf{K}^{-1}\\ &= \left ( \textbf{K} \textbf{K}^{T} \right )^{-1} \end{align}. I grok this.

### 4

This is the theoretical justification of Zhang's method for camera calibration.

#### Part (a)

Assume we have a homography $\textbf{H}$ that maps points $x_\pi$ on a probe plane $\pi$ to points $x$ on the image.

The circular points $I, J = \begin{bmatrix} 1 \\ \pm j \\ 0 \end{bmatrix}$ lie on both our probe plane $\pi$ and on the absolute conic $\Omega_\infty$. Lying on $\Omega_\infty$ of course means they are also projected onto the image of the absolute conic (IAC) $\omega$, thus $x_1^T \omega x_1= 0$ and $x_2^T \omega x_2= 0$. The circular points project as

\begin{align} x_1 & = \textbf{H} I = \begin{bmatrix} h_1 & h_2 & h_3 \end{bmatrix} \begin{bmatrix} 1 \\ j \\ 0 \end{bmatrix} = h_1 + j h_2 \\ x_2 & = \textbf{H} J = \begin{bmatrix} h_1 & h_2 & h_3 \end{bmatrix} \begin{bmatrix} 1 \\ -j \\ 0 \end{bmatrix} = h_1 - j h_2 \end{align}.

We can actually ignore $x_2$ while substituting our new expression for $x_1$ as follows:

\begin{align} x_1^T \omega x_1 &= \left ( h_1 + j h_2 \right )^T \omega \left ( h_1 + j h_2 \right ) \\ &= \left ( h_1^T + j h_2^T \right ) \omega \left ( h_1 + j h_2 \right ) \\ &= h_1^T \omega h_1 + j \left ( h_2^T \omega h_2 \right ) \\ &= 0 \end{align}

which, when separating real and imaginary parts give us

\begin{align} h_1^T \omega h_1 &= 0 \\ h_2^T \omega h_2 &= 0 \end{align}

Since conics are symmetric matrices, $\omega = \omega^T$ and...

### 8

We prove the Projective Reconstruction Theorem.

Say that the correspondence $x \leftrightarrow x'$ derives from the world point $\textbf{X}$ under the camera matrices $\left ( \textbf{P}, \textbf{P}' \right )$ as

\begin{align} x & = \textbf{P} \textbf{X} \\ x' & = \textbf{P}' \textbf{X} \end{align}.

Say we transform space by a general homography matrix $\textbf{H}_{4 \times 4}$ such that $\textbf{X}_0 = \textbf{H} \textbf{X}$.

The cameras then transform as

\begin{align} \textbf{P}_0 & = \textbf{P} \textbf{H}^{-1} \\ \textbf{P}_0' & = \textbf{P}' \textbf{H}^{-1} \end{align}.

$\textbf{P}_0 \textbf{X}_0 = \textbf{P} \textbf{H}^{-1} \textbf{H} \textbf{X} = \textbf{P} \textbf{X} = x$ and likewise with $\textbf{P}_0'$ still get us the same image points.

## Alumni Liaison

Ph.D. 2007, working on developing cool imaging technologies for digital cameras, camera phones, and video surveillance cameras.