Solving Kernel SVM with a Large Kernel Matrix

Question

I have a matrix X, size 40x60000

while writing the SVM, I need to form a linear kernel: $K = X^TX$

And of course I would get an error "Requested 60000x60000 (26.8GB) array exceeds maximum array size preference".

How is it usually done? the data set is Mnist, so someone must have done this before. In this case $\text{rank}(K) \leq 40$, I need a way to store K and later pass it to $\texttt{quadprog}$.

You can create a stripped SVD by appending some string to the command. [S,V,D] = svd(rand(10,2),'stripped') . Since the singular values will be 0 for all dimensions exceeding the smallest we will get away with a $40\times40$ diagonal matrix in the middle. — mathreadler
– mathreadler, Commented Apr 22, 2017 at 8:42
such command exists? I know svd (X'*X,'econ'), but then I had the same error, svd(X,'econ) would work though (in fact that's how I reduced size of X to 40x60000) — Geralt of Rivia
– Geralt of Rivia, Commented Apr 22, 2017 at 8:49
Yes 'econ' works too, I don't think it checks the string for content, just the fact that some string is being sent is enough to make it do the compressed version. — mathreadler
– mathreadler, Commented Apr 22, 2017 at 8:52
@GeraltofRivia, I added an answer for any Kernel. Why would you need the kernel matrix if you solve for the linear case? You can build much faster solver by hand for that simple case. — Royi
– Royi, Commented Sep 26 at 20:09

Royi · Accepted Answer · 2025-09-23 19:26:16Z

In Gradient Descent for Primal Kernel SVM with Soft Margin (Hinge) Loss I derived an ADMM based solver for the primal Kernel SVM problem.

In this case, I'd assume the kernel is Gaussian which according to Tomaso Poggio, Sayan Mukherjee, Ryan M. Rifkin, Alexander Rakhlin, Alessandro Verri - b - AI Memo 2001-011 does not need a bias term.

So the objective is given by:

$$ \arg \min_{\boldsymbol{\beta}} \frac{\lambda}{2} \boldsymbol{\beta}^{T} \boldsymbol{K} \boldsymbol{\beta} + \sum_{i = 1}^{n} \max \left\{ 0, 1 - {y}_{i} \boldsymbol{k}_{i}^{\top} \boldsymbol{\beta} \right\} $$

Where $\lambda$ is the regularization term and $\boldsymbol{k}_{i}$ is the row / column of the Kernel Matrix.

It can be formulated as:

$$ \arg \min_{\boldsymbol{\beta}} \underbrace{\frac{\lambda}{2} \boldsymbol{\beta}^{T} \boldsymbol{K} \boldsymbol{\beta}}_{g \left( \cdot \right)} + h \left( \boldsymbol{z} \right), \; \text{ subject to } \boldsymbol{A} \boldsymbol{x} = \boldsymbol{z} $$

Where

$h \left( \boldsymbol{z} \right) = \sum_{i} \max \left\{ 0, 1 - {z}_{i} \right\}$ - The hinge Loss.
$\boldsymbol{A} = \operatorname{Diag} \left( \boldsymbol{y} \right) \boldsymbol{K}$ - The ADMM linear operator.

The Proximal of the 2 functions is given by:

$\operatorname{prox}_{\gamma h \left( \cdot \right)} {\left( \boldsymbol{y} \right) }_{i} = \begin{cases} {y}_{i} + \gamma & \text{ if } \; {y}_{i} < 1 - \gamma \\ 1 & \text{ if } \; 1 - \gamma \leq {y}_{i} \leq 1 \\ {y}_{i} & \text{ if } \; {y}_{i} > 1 \end{cases}$ - The Prox of the Hinge Loss.
$\operatorname{prox}_{\rho g \left( \cdot \right)} \left( \boldsymbol{y} \right) = \arg \min_{\boldsymbol{\beta}} \frac{\lambda}{2} \boldsymbol{\beta}^{T} \boldsymbol{K} \boldsymbol{\beta} + \frac{\rho}{2} {\left\| \boldsymbol{A} \boldsymbol{\beta} - \boldsymbol{y} \right\|}_{2}^{2}$ - Solving the regularized Least Squares problem.

In order to support arbitrary $\boldsymbol{K}$ one can define it as a Linear Operator which defines its operation on a vector.

The operation of $\operatorname{prox}_{\gamma h \left( \cdot \right)}$ only requires calculation of the operator over a vector.
The operation of $\operatorname{prox}_{\rho g \left( \cdot \right)}$ can be solved using an iterative solver.

Julia offers some abstractions which might make this feasible.
Yet, to get it in a performant form will probably require a significant effort.

Remark: It is tricky to do with MATLAB as you need to define the Linear Operator defined by $\boldsymbol{K}$ which quadprog() does not support. Hence you'd need to build your own solver.

Royi · Accepted Answer · 2025-09-26 20:01:07Z

Another approach is going with the using Gradient based methods.
In this approach it is assumed we can calculate the Matrix Vector operation for some $\boldsymbol{v}$:

$$ \boldsymbol{K} \boldsymbol{v} $$

This can be done by having the Kernel Function $k \left( \boldsymbol{x}_{i}, \boldsymbol{x}_{j} \right)$:

$$ \boldsymbol{u} = \boldsymbol{K} \boldsymbol{v} \Rightarrow {u}_{i} = \sum_{j} k \left( \boldsymbol{x}_{i}, \boldsymbol{x}_{j} \right) {v}_{j} $$

The methods below are limited with how efficient one can calculate the above.

The original objective, formulated as:

$$ \arg \min_{\boldsymbol{\beta}} \frac{\lambda}{2} \boldsymbol{\beta}^{\top} \boldsymbol{K} \boldsymbol{\beta} + \sum_{i = 1}^{n} \max \left\{ 0, 1 - {y}_{i} \left( \boldsymbol{k}_{i}^{\top} \boldsymbol{\beta} + b \right) \right\} $$

Is not smooth. It may be solved using the Sub Gradient Method.
Yet it will be relatively slow.

One can formulate the problem with the Squared Hinge Loss:

$$ \arg \min_{\boldsymbol{\beta}} \frac{\lambda}{2} \boldsymbol{\beta}^{\top} \boldsymbol{K} \boldsymbol{\beta} + \frac{1}{2} \sum_{i = 1}^{n} {\left( \max \left\{ 0, 1 - {y}_{i} \left( \boldsymbol{k}_{i}^{\top} \boldsymbol{\beta} + b \right) \right\} \right)}^{2} $$

The above is a smooth objective which can calculate its Hessian and use the Newton Method to solve.
The idea is described in Olivier Chapelle - Training a Support Vector Machine in the Primal.

I implemented it without the pre defined Kernel.
Namely, whenever the algorithm requires the multiplication of the matrices I used a calculation as described above. It is not fast, but it works, potentially, for any size number of samples and verified against a Disciplined Convex Problem (DCP) solver.

An optimized case, with smart selection of sub set of the problem is given in S. Sathiya Keerthi, Olivier Chapelle, Dennis DeCoste - Building Support Vector Machines with Reduced Classifier Complexity.

The code is available on my StackExchange Code GitHub Repository (Look at the Mathematics\Q2246150 folder).

Stack Exchange Network

Solving Kernel SVM with a Large Kernel Matrix

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Solving Kernel SVM with a Large Kernel Matrix

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions