I am reading a book about support vector machine, and I don't understand some of the math in it.
Consider the training sample ${(x_{i}, d_{i})}^{N}_{i=1}$ where $x_{i}$ is the input pattern for the ith example and $d_{i}$ is the corresponding desired response.
[...]
Let $w_{0}$ and $b_{0}$ denote the optimum values of the weight vector and bias, respectively. Correspondingly, the optimal hyperplane, representing a multidimensional linear decision surface in the input space, is defined by $$w^{T}_{0} x + b_{0} = 0 $$
The discriminant function $$g(x) = w^{T}_{0} x + b_{0}$$ gives an algebraic mesure of the distance from $x$ to the optimal hyperplane.
We can express $x$ as $$x = x_{p} + r \frac{w_{0}}{||w_{0}||}$$ where $x_{p}$ is the normal projection of $x$ onto the optimal hyperplane and $r$ is the desired algebraic distance.
Since, by definition, $g(x_{p}) = 0$, it follows that $$g(x) = w^{T}_{0} x + b_{0} = r||w_{0}||$$
From: Neural Networks and Learning Machines (3rd Edition) p 270
Why can we express x as $x = x_{p} + r \frac{w_{0}}{||w_{0}||}$ ?
Why does $g(x) = r||w_{0}||$ ?
I wonder how I can represent this hyperplane in two dimensions.
At first I thought that the equation $w^{T}_{0} x + b_{0} = 0 $ would be equivalent to a linear function $ax + b$ but I am not quite sure as in a linear function a is a scalar but in my case a would be $w_{0}$ which is a vector.