1
$\begingroup$

I am reading a book about support vector machine, and I don't understand some of the math in it.

Consider the training sample ${(x_{i}, d_{i})}^{N}_{i=1}$ where $x_{i}$ is the input pattern for the ith example and $d_{i}$ is the corresponding desired response.

[...]

Let $w_{0}$ and $b_{0}$ denote the optimum values of the weight vector and bias, respectively. Correspondingly, the optimal hyperplane, representing a multidimensional linear decision surface in the input space, is defined by $$w^{T}_{0} x + b_{0} = 0 $$

The discriminant function $$g(x) = w^{T}_{0} x + b_{0}$$ gives an algebraic mesure of the distance from $x$ to the optimal hyperplane.

We can express $x$ as $$x = x_{p} + r \frac{w_{0}}{||w_{0}||}$$ where $x_{p}$ is the normal projection of $x$ onto the optimal hyperplane and $r$ is the desired algebraic distance.

Since, by definition, $g(x_{p}) = 0$, it follows that $$g(x) = w^{T}_{0} x + b_{0} = r||w_{0}||$$

From: Neural Networks and Learning Machines (3rd Edition) p 270

Why can we express x as $x = x_{p} + r \frac{w_{0}}{||w_{0}||}$ ?

Why does $g(x) = r||w_{0}||$ ?

I wonder how I can represent this hyperplane in two dimensions.

At first I thought that the equation $w^{T}_{0} x + b_{0} = 0 $ would be equivalent to a linear function $ax + b$ but I am not quite sure as in a linear function a is a scalar but in my case a would be $w_{0}$ which is a vector.

$\endgroup$

1 Answer 1

0
$\begingroup$

$$g(x) = g(x_p + \frac{w_0 r}{||w_0||}) = w_0^T (x_p + \frac{ w_0 r }{||w_0||}) + b_0 = w_0^T x_p + w_0^T \frac{ w_0 r }{||w_0||} + b_0$$

Now observe that $w_0^T x_p + b_0 = g(x_p) = 0$ by construction of $x_p$.

And $$w_0^T w_0 = <w_0,w_0> = ||w_0||^2$$

Hence :

$$g(x) = w_0^T x_p + b_0 + \frac{||w_0||^2r}{||w_0||} = 0 + r||w_0||=r||w_0||$$

About hyperplanes, an hyperplane in a $n$-dimensional space is just a subspace of dimension $n-1$. So it's indeed a line in $\mathbb{R^2}$, an usual "plane" in $\mathbb{R^3}$, and so on...

Apart from the calculus, it just expresses that if you decompose your $x$ between a vector that lies in $P$ (the distance from which $g$ computes) and an other that lies in the orthogonal of $P$, then the distance is totally given by this orthogonal part.

Why can you always do this decomposition ? On a simple example (in $\mathbb{R^2}$ with $b_0=0$) you can project any vector $x$ on a line with a directing vector $u$, just using scalar product : $x_p = <x,u>u$. Then you can check that $x = x_p + (x - x_p)$ and that $<x-x_p,u>=0$ indeed. For affine space, this is essentially the same, you just need to manage the constant. You can see it quite easily on a picture (here a projection on a plane) :

http://www.math4all.in/public_html/linear%20algebra/images/recta8.1.jpg

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.