#LyX 1.5.4 created this file. For more info see http://www.lyx.org/
\lyxformat 276
\begin_document
\begin_header
\textclass article
\language english
\inputencoding auto
\font_roman default
\font_sans default
\font_typewriter default
\font_default_family default
\font_sc false
\font_osf false
\font_sf_scale 100
\font_tt_scale 100
\graphics default
\paperfontsize default
\spacing single
\papersize default
\use_geometry true
\use_amsmath 1
\use_esint 1
\cite_engine basic
\use_bibtopic false
\paperorientation portrait
\leftmargin 1cm
\topmargin 1cm
\rightmargin 1cm
\bottommargin 3cm
\secnumdepth 3
\tocdepth 3
\paragraph_separation skip
\defskip medskip
\quotes_language english
\papercolumns 2
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\author ""
\author ""
\end_header
\begin_body
\begin_layout Section*
Counsel Of Despair
\end_layout
\begin_layout Standard
Vision is inverse graphics in that it tries to invert the 3D to 2D projection.
Unfortunately this is, strictly, mathematically impossible.
Most computer vision problems are not well posed in that:
\end_layout
\begin_layout Itemize
No solution necessarily exists
\end_layout
\begin_layout Itemize
Solutions are not necessarily unique
\end_layout
\begin_layout Itemize
Solutions may not depend continuously on the data
\end_layout
\begin_layout Section*
Technology
\end_layout
\begin_layout Standard
Spatial resolution is determined by density of CCD array elements and lens
properties.
Luminance resolution, the number of distinguishable grey levels, is determined
by the number of bits per pixel resolved by the digitizer and the SNR of
the CCD array.
\end_layout
\begin_layout Standard
Framegrabbers discretize video signals into byte streams.
\end_layout
\begin_layout Section*
Biological Visual Mechanisms
\end_layout
\begin_layout Standard
Typically neurobiological visual principles inform approaches to machine
vision.
\end_layout
\begin_layout Standard
Neural activity is fundamentally asynchronous and it is rarely possible
to distinguish processing from communication.
\end_layout
\begin_layout Standard
The eye consists of 120 million photoreceptors, of which 6 million are
cones, arranged in regular hexagonal lattices.
Signal flows in the eye occur both longitudinally and laterally.
Despite the number of inputs, there are only 1 million
\begin_inset Quotes eld
\end_inset
output channels
\begin_inset Quotes erd
\end_inset
via the optic nerve, so considerable preprocessing occurs before the brain,
which may be summarized as:
\end_layout
\begin_layout Itemize
Image sampling by photoreceptors
\end_layout
\begin_layout Itemize
Centersurround comparisons implemented by bipolar cells
\end_layout
\begin_layout Itemize
Temporal differentiation by amacrine cells
\end_layout
\begin_layout Itemize
Separate coding of sustained versus transient image information by different
ganglion cells
\end_layout
\begin_layout Itemize
Initial colour separation by opponent processing channels
\end_layout
\begin_layout Standard
Neurons in the retina can be considered as linear operators or filters,
and their behaviour fully understood.
The signal flow travels along the optic nerve, splits at the optic chiasm
and go via the thalmus.
This
\begin_inset Quotes eld
\end_inset
relay station
\begin_inset Quotes erd
\end_inset
receives 3 times as many efferent fibres from the cortex as it emits afferent
fibres from the eyes.
Ocular dominance columns attempt to integrate the signals from the two
eyes in a way suitable for stereoscopic vision, while simultaneously using
orientation columns to detect structures with preferred orientations.
\end_layout
\begin_layout Standard
The retinabased receptive fields of neurons are determined experimentally,
and enjoy 5 degrees of freedom:
\end_layout
\begin_layout Itemize
Position of the field, horizontally and vertically
\end_layout
\begin_layout Itemize
Size of the field
\end_layout
\begin_layout Itemize
Orientation of excitatory/inhibitory boundaries
\end_layout
\begin_layout Itemize
Phase of the receptive field
\end_layout
\begin_layout Standard
The fields may be closely described as Gabor wavelets.
\end_layout
\begin_layout Standard
The representation of the retina in the brain is retinatopic (adjacent points
in the retina project to adjacent points in a cortical map) but there is
a distortion to magnification of the fovea by the cortical magnification
factor.
It has been proposed that this accomplishes a logpolar projection for
scale and rotation invariance.
\end_layout
\begin_layout Section*
Mathematical Operations
\end_layout
\begin_layout Standard
Any image can be represented by a linear combination of basis functions
by
\begin_inset Formula $f(x,y)=\sum_{k}a_{k}\Psi_{k}(x,y)$
\end_inset
.
In the case of Fourier,
\begin_inset Formula $\Psi_{k}(x,y)=e^{i(\mu_{k}x+\nu_{k}y)}$
\end_inset
where
\begin_inset Formula $\nu$
\end_inset
and
\begin_inset Formula $\mu$
\end_inset
are vector spatial frequencies that may be resolved into polar coordinates
as
\begin_inset Formula $\omega=\sqrt{\mu^{2}+\nu^{2}}$
\end_inset
and
\begin_inset Formula $\phi=tan^{1}(\frac{\nu}{\mu})$
\end_inset
.
The coefficients
\begin_inset Formula $a_{k}$
\end_inset
are computed as the orthonormal projection of the entire image into the
conjugate Fourier component:
\begin_inset Formula $a_{k}=\int_{X}\int_{Y}e^{i(\mu_{k}x+\nu_{k}y)}f(x,y)dxdy$
\end_inset
.
\end_layout
\begin_layout Standard
Shift theorem:
\begin_inset Formula $f(x\alpha,y\beta)\leftrightarrow F(\mu,\nu)e^{i(\alpha\mu+\beta\nu)}$
\end_inset
, giving translation invariance for the power spectrum of isolated patterns
\end_layout
\begin_layout Standard
Similarity theorem:
\begin_inset Formula $f(\alpha x,\beta y)\leftrightarrow\frac{1}{\alpha\beta}F(\frac{\mu}{\alpha},\frac{\nu}{\beta})$
\end_inset
\end_layout
\begin_layout Standard
Rotation theorem:
\begin_inset Formula $f(xcos(\theta)+ysin(\theta),xsin(\theta)+ycos(\theta))\leftrightarrow F(\mu cos(\theta)+\nu sin(\theta),\mu sin(\theta)+\nu cos(\theta))$
\end_inset
, so if we work with our Fourier domain
\begin_inset Formula $(\mu,\nu)$
\end_inset
in logpolar space
\begin_inset Formula $(r=log(\sqrt{\mu^{2}+\nu^{2}}),\theta=tan^{1}(\frac{\nu}{\mu}))$
\end_inset
then size change becomes translation along
\begin_inset Formula $r$
\end_inset
and rotation becomes translation along
\begin_inset Formula $\theta$
\end_inset
, and we can make these immaterial by considering the power spectrum.
\end_layout
\begin_layout Standard
Convolution theorem: if
\begin_inset Formula $h(x,y)=\int_{\alpha}\int_{\beta}f(\alpha,\beta)g(x\alpha,y\beta)d\alpha d\beta$
\end_inset
then
\begin_inset Formula $H(\mu,\nu)=F(\mu,\nu)G(\mu,\nu)$
\end_inset
, giving an efficient way to compute Fourier expressions after the application
of filtering
\end_layout
\begin_layout Standard
Differentiation theorem:
\begin_inset Formula $\left(\frac{d}{dx}\right)^{m}\left(\frac{d}{dy}\right)^{n}f(x,y)\leftrightarrow(i\mu)^{m}(i\nu)^{n}F(\mu,\nu)$
\end_inset
so in particular
\begin_inset Formula $\nabla^{2}f(x,y)\leftrightarrow(\mu^{2}+\nu^{2})F(\mu,\nu)$
\end_inset
 notice that this emphasises high frequencies and discards the DC part
\end_layout
\begin_layout Section*
Edge Detection
\end_layout
\begin_layout Standard
This information is useful as:
\end_layout
\begin_layout Itemize
Edges demarcate boundaries and parts of objects
\end_layout
\begin_layout Itemize
Occlusion edges reveal the geometry of the scene
\end_layout
\begin_layout Itemize
Edges may appear in more abstract domains than luminance
\end_layout
\begin_layout Itemize
Velocity fields may be understood as the movement of edges
\end_layout
\begin_layout Itemize
Aligning edges can be used to solve the correspondence problem effectively
\end_layout
\begin_layout Standard
You can find this information computationally by convolving with
\begin_inset Formula $\left[\begin{array}{cc}
1 & 1\end{array}\right]$
\end_inset
and finding large amplitude or
\begin_inset Formula $\left[\begin{array}{ccc}
1 & 2 & 1\end{array}\right]$
\end_inset
and looking for zero crossings.
In two dimensions either directional or nondirectional derivatives may
be employed.
An example discrete isotropic operator is the Laplacian:
\end_layout
\begin_layout Standard
\noindent
\align center
\begin_inset Tabular
\begin_inset Text
\begin_layout Standard
1
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
2
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
1
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
2
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
12
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
2
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
1
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
2
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
1
\end_layout
\end_inset

\end_inset
\end_layout
\begin_layout Standard
Operators which sum to 0 are known as filters as they are insensitive to
the overall brightness of a scene.
\end_layout
\begin_layout Standard
Logan's theorem says that for 1D signals that are bandlimited to at most
one octave and have no complex zeroes in common with their Hilbert transforms
you are able to recover the signal from just its zerocrossings.
\end_layout
\begin_layout Section*
Multiscale Analysis
\end_layout
\begin_layout Standard
Multiscale analysis may be used with edge detection as nonredundant structure
typically exists in images at all scales.
Marr proposed that the image be convolved with a multiscale family of
isotropic blurred second derivative filters, retaining only their zerocrossing
s.
This can be concretely implemented by the operator
\begin_inset Formula $\nabla^{2}\left[G_{\sigma}(x,y)\star I(x,y)\right]=G_{\sigma}(x,y)\star\nabla^{2}I(x,y)=\left[\nabla^{2}G_{\sigma}(x,y)\right]\star I(x,y)$
\end_inset
(with the last being the preferred version).
\end_layout
\begin_layout Standard
The GaussianLaplacian approach tends to be very noisesensitive, and more
sophisticated nonlinear detectors have been developed.
Furthermore, it is not clear how to generalize the constraint of oneoctave
bandlimiting to 2D signals, and the zeroes of a 2D signal are not countable.
\end_layout
\begin_layout Standard
Causality is the property that edges at lower resolutions must be caused
by edges in the underlying data, and are not artifacts of the blurring
process.
Fingerprint theorems show that the Gaussian blurring operator uniquely
possesses this property.
\end_layout
\begin_layout Standard
A plot showing the evolution of zerocrossings in the image after convolution
with a linear operator as a function of the scale of that operator is called
scalespace.
A mapping of the edges in an image is called a scalespace fingerprint.
\end_layout
\begin_layout Section*
Models
\end_layout
\begin_layout Standard
Active contours are one expression of a modelfitting approach that relies
jointly on a data term (modelinput similarity) and a cost term (model
complexity).
Iterative numerical methods (regularization methods) exist that optimize
a functional that is a linear combination of the two terms:
\begin_inset Formula $argmin_{m}\int((MI)^{2}+\lambda(M_{xx})^{2})dx$
\end_inset
.
\end_layout
\begin_layout Standard
The family of filters that uniquely achieve the lowest possible conjoin
uncertainly in both space and Fourier domains are Gabor wavelets:
\begin_inset Formula $f(x)=e^{i\mu_{0}(xx_{0})}e^{\frac{(xx_{0})^{2}}{\alpha^{2}}}$
\end_inset
,
\begin_inset Formula $F(x)=e^{ix_{0}(\mu\mu_{0})}e^{(\mu\mu_{0})^{2}\alpha^{2}}$
\end_inset
.
Such functions are nonorthogonal and hence the coefficients are hard to
obtain.
When they are parametrized to be selfsimilar (dilates and translates of
each other) they constitute a wavelet basis, e.g.
\begin_inset Formula $\Psi_{mpq\theta}(x,y)=2^{2m}\Psi(2^{m}(x\cos(\theta)+y\sin(\theta))p,2^{m}(x\sin(\theta)+y\cos(\theta))q)$
\end_inset
.
\end_layout
\begin_layout Standard
By taking the modulus of a facial image after convolution with complexvalued
2D Gabor wavelets key features may be detected: this is known as a quadrature
demodulator network.
\end_layout
\begin_layout Section*
Texture
\end_layout
\begin_layout Standard
Texture is a cue to surface shape and image segmentation.
It is defined by the existence of certain statistical correlations across
the image, with an underlying notion of quasiperiodicity.
\end_layout
\begin_layout Standard
The detection of periodicity is best done by Fourier methods.
However, the usual exponential eigenfunctions are globally defined so in
order to recover local information you typically
\begin_inset Quotes eld
\end_inset
window
\begin_inset Quotes erd
\end_inset
the sinusoids.
The optimal set of windowing functions are Gaussians due to their optimal
spatial/spectral localization.
Hence the final basis used is 2D Gabor wavelets.
Edge detection on the modulus of the Gabor coefficients can detect textured
regions.
\end_layout
\begin_layout Standard
Colour is difficult to recover because wavelengths received depend as much
on the illuminant as upon the spectral reflectances of the surface.
Since
\begin_inset Formula $R(\lambda)=I(\lambda)O(\lambda)$
\end_inset
, some have proposed searching for specular regions in the image where reflected
light would be a faithful estimate for
\begin_inset Formula $I(\lambda)$
\end_inset
.
A more robust approach is Retinex, which works on the basis that the colors
of object or areas in a scene are determined by their surrounding spatial
context.
A sequence of ratios computed across object boundaries enables the illuminant
to be algebraically discounted.
\end_layout
\begin_layout Section*
Correspondence And Motion
\end_layout
\begin_layout Standard
Stereoscope disparity results in the images from the left and right eyes
differing.
Making use of this disparity to infer depth is called the correspondence
problem.
\end_layout
\begin_layout Standard
Current algorithms for determining correspondence require large searches
for matching features under a large number of permutations.
A multiscale image pyramid can be used to guide this search at successively
finer scales to improve efficiency.
Once feature correlation has been found,
\begin_inset Formula $d=\frac{fb}{\alpha+\beta}$
\end_inset
where
\begin_inset Formula $f$
\end_inset
is the camera focal length,
\begin_inset Formula $b$
\end_inset
is the base of triangulation and
\begin_inset Formula $\alpha$
\end_inset
and
\begin_inset Formula $\beta$
\end_inset
are the disparities of the projections of the object in the two images
relative to the respective optical axis.
\end_layout
\begin_layout Standard
For motion vision we need to solve the correspondence problem for two images
coincident in space but acquired with a temporal displacement.
Requirements include the need to infer 3D trajectories, make local velocity
estimations, disambiguate object from contour motion and assign more than
one velocity vector to a given region!
\end_layout
\begin_layout Standard
Intensity gradient models assume time derivative is related to local spatial
gradients due to velocity
\begin_inset Formula $\bar{v}$
\end_inset
:
\begin_inset Formula $\frac{\delta I(x,y,t)}{\delta t}=\bar{v}\vec{\nabla}I(x,y,t)$
\end_inset
\end_layout
\begin_layout Standard
Dynamic zerocrossing models measure velocity by finding edges and contours
and then applying the time derivative in the vicinity of a zero crossing:
\begin_inset Formula $\frac{\delta}{\delta t}(\nabla^{2}G_{\sigma}(x,y)\star I(x,y,t))$
\end_inset
\end_layout
\begin_layout Standard
Spatiotemporal correlation models detect motion by observing the most likely
correlation between the timeseparated images, realized as a pair of coordinate
s from which the velocity can be calculated.
This has been supported somewhat by biological investigation of the visual
system of the fly.
\end_layout
\begin_layout Standard
Spatiotemporal spectral models detect and measure motion purely by Fourier
means, exploiting the fact that motion creates a covariance in the spatial
and temporal spectra of the image
\begin_inset Formula $I(x,y,t)$
\end_inset
where
\begin_inset Formula $F(\omega_{x},\omega_{y},\omega_{z})=\int_{X}\int_{Y}\int_{T}I(x,y,t)e^{i(\omega_{x}x+\omega_{y}y+\omega_{t}t)}dxdydt$
\end_inset
.
Motion detection occurs by filtering the image sequence in space and time
and observing that tuned spatiotemporal filters whose center frequencies
are coplanar in this space are activated together.
This is a consequence of the spectral coplanarity theorem, which says
that since
\begin_inset Formula $I(x,y,t)=I(xv_{x}t_{o},yv_{y}t_{0},tt_{0})$
\end_inset
,
\begin_inset Formula $F(\omega_{x},\omega_{y},\omega_{t})\neq0$
\end_inset
iff
\begin_inset Formula $\omega_{x}v_{x}+\omega_{y}v_{y}+\omega_{t}=0$
\end_inset
.
The spherical coordinates of the normal of the plane correspond to the
speed and direction of motion.
\end_layout
\begin_layout Section*
Surfaces
\end_layout
\begin_layout Standard
Albedo of a surface is the fraction of the illuminant that is reemitted
from the surface in all directions.
\end_layout
\begin_layout Standard
Lambertian surfaces are pure matte, i.e.
have no specular component.
\end_layout
\begin_layout Standard
Specular surfaces are locally mirrorlike and obey Snell's law.
\end_layout
\begin_layout Standard
The reflectance map is a function
\begin_inset Formula $\phi(i,e,g)$
\end_inset
where
\begin_inset Formula $i$
\end_inset
is the illuminant angle,
\begin_inset Formula $e$
\end_inset
is the reflected angle and
\begin_inset Formula $g$
\end_inset
is the angle between the two that specifies the fraction of incident light
reflected per unit surface area, per unit solid angle in the direction
of the camera.
For Lambertian surfaces,
\begin_inset Formula $\phi(i,e,g)=cos(i)$
\end_inset
.
For Lunar surfaces,
\begin_inset Formula $\phi(i,e,g)=\frac{cos(i)}{cos(e)}$
\end_inset
.
For specular surfaces
\begin_inset Formula $\phi(i,e.g)=\begin{cases}
1 & g=i+e\\
0 & g\neq i+e\end{cases}$
\end_inset
.
Typical surfaces are a blend and are governed by
\begin_inset Formula $\phi(i.e.g)=\frac{s(n+1)(2\cos(i)\cos(e)\cos(g))^{n})}{2}+(1s)\cos(i)$
\end_inset
, where
\begin_inset Formula $s$
\end_inset
is the fraction of light emitted specularly and
\begin_inset Formula $n$
\end_inset
is the sharpness of the specular peak.
\end_layout
\begin_layout Section*
Shape Description
\end_layout
\begin_layout Standard
Cues to surface shape are texture, colour, stereo, motion and shading informatio
n.
However, it is an inherently illposed problem as many ambiguous factors
have to be resolved, such as surface reflectance, geometry, material and
illuminant geometry.
\end_layout
\begin_layout Standard
Closed boundary contours can be represented by their curvature map:
\begin_inset Formula $\theta(s)=\lim_{\Delta s\rightarrow0}\frac{1}{r(s)}$
\end_inset
where
\begin_inset Formula $r(s)$
\end_inset
is the limiting radius of a circle that best fits the contour at position
\begin_inset Formula $s$
\end_inset
and
\begin_inset Formula $\Delta s$
\end_inset
is the arc length.
This is position and orientation independent, scales easily and represents
mirror symmetry by a sign change.
Additionally, these maps can be expanded with basis functions.to generate
a description which is rotation, translation and dilation invariant.
Grammars of such invariant shapes are called codon libraries.
\end_layout
\begin_layout Standard
The 2.5dimensional sketch is a 2dimensional image with surface normals
assigned to each point in the image domain.
\end_layout
\begin_layout Standard
Solids can also be represented as the unions and intersections of generalized
superquadric objects which are defined by equations of the form
\begin_inset Formula $Ax^{\alpha}+By^{\beta}+Cz^{\gamma}=R$
\end_inset
.
This allows volumetric descriptions of the objects in a scene by just giving
a list of 3D parameters and relations.
\end_layout
\begin_layout Standard
Deformable parametric models fit human recognisable parameters to the models
for the purposes of lossy coding or customization of an avatar.
\end_layout
\begin_layout Section*
Perceptual Psychology
\end_layout
\begin_layout Standard
Recent developments include the idea of a process grammar which models objects
and shapes in terms of their morphogenesis.
\end_layout
\begin_layout Standard
Percepts can be considered as hypotheses: topdown interpretations that
depend greatly on contexts, expectations and other extraneous factors beyond
the stimulus.
\end_layout
\begin_layout Standard
Agnosias are failures of recognition that result from brain injury.
They include things such as the loss of ability to recognise faces but
no other objects, loss of colour vision, loss of ability to see in 3D and
the inability to simultaneously see more than one thing.
\end_layout
\begin_layout Section*
Bayesian Analysis
\end_layout
\begin_layout Standard
Bayesian statistics provide a means for integrating prior information with
empirical information gathered from incoming data.
This is especially relevant in computer vision, where there are many sources
of uncertainty.
The governing equation is
\begin_inset Formula $p(HD)=\frac{p(DH)p(H)}{p(D)}$
\end_inset
, with the old posterior iteratively becoming the new prior.
\end_layout
\begin_layout Standard
Statistical decision theory describes a decision environment that recognises
similarity between
\begin_inset Quotes eld
\end_inset
different
\begin_inset Quotes erd
\end_inset
patterns and differences between
\begin_inset Quotes eld
\end_inset
similar
\begin_inset Quotes erd
\end_inset
patterns:
\end_layout
\begin_layout Standard
\noindent
\align center
\begin_inset Tabular
\begin_inset Text
\begin_layout Standard
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
Actually Same
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
Decision
\begin_inset Quotes eld
\end_inset
Same
\begin_inset Quotes erd
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
Hit
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
\begin_inset Formula $\surd$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
\begin_inset Formula $\surd$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
Miss
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
\begin_inset Formula $\surd$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
\begin_inset Formula $\times$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
False Alarm
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
\begin_inset Formula $\times$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
\begin_inset Formula $\surd$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
Correct Reject
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
\begin_inset Formula $\times$
\end_inset
\end_layout
\end_inset

\begin_inset Text
\begin_layout Standard
\begin_inset Formula $\times$
\end_inset
\end_layout
\end_inset

\end_inset
\end_layout
\begin_layout Standard
The criterion for similarity should be set so as to minimize the expected
cost of errors.
If both types of errors have the same cost then this will be where this
causes the area under the probability density curves to be equal.
You can derive a Receiver Operating Characteristic which plots the hit
rate against the false alarm rate for a range of thresholds.
\end_layout
\begin_layout Standard
The decidability of the signal detection task is defined as
\begin_inset Formula $d'=\frac{\mu_{2}\mu_{1}}{\sqrt{\frac{1}{2}(\sigma_{2}^{2}+\sigma_{1}^{2})}}$
\end_inset
where
\begin_inset Formula $\mu_{i}$
\end_inset
and
\begin_inset Formula $\sigma_{i}$
\end_inset
are the characteristics of the respective distributions.
\end_layout
\begin_layout Standard
Bayesian classifiers take into account the prior probabilities of the possible
classifications.
The minimum misclassification criterion is that
\begin_inset Formula $\forall j\neq k.P(xC_{k})P(C_{k})>P(xC_{j})P(C_{j})$
\end_inset
where
\begin_inset Formula $C_{i}$
\end_inset
is class
\begin_inset Formula $i$
\end_inset
.
This can be satisfied by assigning an
\begin_inset Formula $x$
\end_inset
to the class with the highest posterior probability.
However, in situations where error costs differ this misclassification
criterion may not be appropriate.
\end_layout
\begin_layout Standard
Discriminant functions are functions
\begin_inset Formula $y_{k}(x)$
\end_inset
associated with each class
\begin_inset Formula $C_{k}$
\end_inset
such that an observation
\begin_inset Formula $x$
\end_inset
is assigned to that class iff
\begin_inset Formula $\forall j\neq k.y_{k}(x)>y_{j}(x)$
\end_inset
.
Decision boundaries between regions are defined by those loci where
\begin_inset Formula $y_{k}(x)=y_{j}(x)$
\end_inset
.
\end_layout
\begin_layout Section*
Face Detection
\end_layout
\begin_layout Standard
The central issue in pattern recognition is the relation between withinclass
and betweenclass variability.
Often there is greater variability in the code for a given face across
changes in the illuminant, angle or expression than for different faces
with these factors constant, which leads to realworld error rates approaching
50%.
\end_layout
\begin_layout Standard
For face detection, identification and expression interpretation for the
problem of identifying distinct expressions, withinclass variability is
desirable and betweenclass variability undesirable.
Conversely, for interpreting expressions in the classes of same/different
faces, within class variability is desirable and between class variability
undesirable.
\end_layout
\begin_layout Standard
Face detection is a harder problem than face recognition and current leading
approaches rely just on skin hue!
\end_layout
\begin_layout Standard
Templatematching face recognition algorithms store an array of sizeinvariant
pictures of faces in a number of pose angles and match on a pixelbypixel
basis.
\end_layout
\begin_layout Standard
Eigenfaces work with a KarhunenLoeve Transform of a large database of faces
to define all faces as linear combinations of the
\begin_inset Quotes eld
\end_inset
most likely
\begin_inset Quotes erd
\end_inset
face basis functions.
It is limited since many of the principle components just extract shading
variations and lack invariance to illumination, pose angle and size!
\end_layout
\begin_layout Standard
Wavelets can be used for face recognition as due to their localization they
can track changes in facial expression in a local way: faces are a kind
of texture! To allow for deformations associated with changes in pose angle
or expression, these
\begin_inset Quotes eld
\end_inset
Gabor jets
\begin_inset Quotes erd
\end_inset
are placed on a deformable graph that tolerates distortions relative to
fiducial points, but performance is comparable to that of Eigenfaces.
\end_layout
\begin_layout Standard
Focus today is on modelling faces as threedimensional objects and fitting
these models to the percepts.
\end_layout
\begin_layout Standard
Motion energy models can be used to extract motion signatures from parts
of faces and classify these as expressions.
\end_layout
\end_body
\end_document