2
$\begingroup$

I'm trying to manually implement the scikit learn basic SVM classifier using the Gram kernel matrix $K$. The mathematic formulation is the following:

\begin{align} \min_{\alpha\in\mathbb{R}^n} \quad & \frac{1}{2} \alpha^T Q \alpha - e^T \alpha \\ \text{s.t.} \quad & y^T \alpha = 0 \\ & 0 \leq \alpha_i \leq C, \quad \forall i = 1,\dots ,n \\ \text{with} \quad & Q_{ij} \equiv y_i y_j K_{ij} \end{align}

Using cvxopt for the convex optimization - more information - I compare the dual coefficients of the two approches (which should be the same) but alas they differ and I don't know what I have done wrong.

Below is a code sample that reproduces the issue:

import numpy as np import cvxopt from sklearn import svm ############### DATA SIMULATION ############### X = np.array([-5, 1, -9, 2, 0, 4, -0.1]) Y = np.array([-1, 1, 1, 1, 1, 1, -1]) n = X.shape[0] K = np.outer(X,X) ################# SKLEARN SVM ################# # train the sklearn SVM C = 100 clf = svm.SVC(kernel='precomputed', C=C) clf.fit(K, Y) # retrieve lagrangian multipliers alpha_sk = np.zeros(n) alpha_sk[clf.support_] = clf.dual_coef_ ################# MANUAL SVM ################## # QP objective function parameters P = K * np.outer(Y, Y) q = - np.ones(n) G = np.zeros((2*n, n)) G[0:n,:] = - np.eye(n) G[n:2*n,:] = np.eye(n) h = np.zeros((2*n,1)) h[0:n,0] = 0 h[n:2*n,0] = C A = Y.reshape(1,n) b = np.array([0]) # convert all matrix to cvxopt matrix P_qp = cvxopt.matrix(P.astype(np.double)) q_qp = cvxopt.matrix(q.astype(np.double)) G_qp = cvxopt.matrix(G.astype(np.double)) h_qp = cvxopt.matrix(h.astype(np.double)) A_qp = cvxopt.matrix(A.astype(np.double)) b_qp = cvxopt.matrix(b.astype(np.double)) # solve solution = cvxopt.solvers.qp(P_qp, q_qp, G_qp, h_qp, A_qp, b_qp) # retrieve lagrangian multipliers alpha_manual = Y*np.array(solution['x']).flatten() alpha_manual[np.abs(alpha_manual)<1e-3] = 0 ################### RESULTS ################### print(np.concatenate((alpha_manual.reshape(n,1),alpha_sk.reshape(n,1)),axis = 1)) 

Which outputs:

[[ -99.99999907 -100. ] [ 35.03082601 39.00000002] [ 75.37828732 60.99999998] [ 28.79243995 0. ] [ 41.85122172 100. ] [ 18.94722123 0. ] [ -99.99999716 -100. ]] 
$\endgroup$
4
  • $\begingroup$ @RodrigodeAzevedo yes $Q$ is positive definite because $K$ is. Do you mean replacing P = K * np.outer(Y, Y) with P = np.multiply(K, np.outer(Y,Y))? I might be wrong but they seem equivalent to me. $\endgroup$ Commented Feb 22, 2017 at 11:16
  • $\begingroup$ $\rm K$ may be PD, but $\rm y y^{\top}$ is only PSD. $\endgroup$ Commented Feb 22, 2017 at 11:29
  • $\begingroup$ @RodrigodeAzevedo, I tried both P1 = K * np.outer(Y, Y), P2 = np.multiply(K, np.outer(Y, Y)), print(np.abs(P1-P2).sum().sum()) and I get 0 so I'm sure they are the same. $\endgroup$ Commented Feb 22, 2017 at 11:37
  • $\begingroup$ Struggling with the same issue right now. Did you resolve your confusion? $\endgroup$ Commented Jul 17, 2023 at 14:38

1 Answer 1

0
$\begingroup$

You may use a simpler formulation of the Kernel SVM (Primal Kernel SVM):

$$ \arg \min_{\boldsymbol{\beta}, b} \frac{1}{2} \boldsymbol{\beta}^{T} \boldsymbol{K} \boldsymbol{\beta} + C \sum_{i = 1}^{n} \max \left\{ 0, 1 - {y}_{i} \left( \boldsymbol{k}_{i}^{T} \boldsymbol{\beta} + b\right) \right\} $$

Where

  • ${K}_{i, j} = k \left( \boldsymbol{x}_{i}, \boldsymbol{x}_{j} \right)$ - The Kernel Matrix. Where $\boldsymbol{K} \in \mathbb{S}_{+}^{n}$.
  • $\boldsymbol{k}_{i}$ - The $i$ -th column / row of $\boldsymbol{K}$.
  • $C$ - The (Inverse) regularization factor.

This formulation matches the implementation of SciKit Learn.
It can be formulated and solved using CVXPY.

def KernelSVMCVXPY( mK: np.ndarray, vY: np.ndarray, /, *, paramC: float = 1.0 ) -> Tuple[np.ndarray, float]: """ Solves the dual problem of a Support Vector Machine using CVXPY. Parameters: - mK: Kernel matrix of shape (numSamples, numSamples). - vY: Class labels of shape (numSamples,). - paramC: Regularization parameter. Returns: - vβ: Optimization variables. - b: Bias term. """ numSamples = mK.shape[0] # Define the dual variable vβ = cp.Variable(numSamples) paramB = cp.Variable(1) # Define the objective function hingeLoss = cp.sum(cp.pos(1 - cp.multiply(vY, mK @ vβ + paramB))) # hingeLoss = cp.sum([cp.pos(1 - vY[ii] * (mK[ii] @ vβ + paramB)) for ii in range(numSamples)]) cpObjFun = cp.Minimize(0.5 * cp.quad_form(vβ, mK, assume_PSD = True) + (paramC * hingeLoss)) # Define Constraints cpConst = [] # Define the Problem oCvxPrb = cp.Problem(cpObjFun, cpConst) # Solve the problem # oCvxPrb.solve(solver = cp.SCS, verbose = False) oCvxPrb.solve(solver = cp.CLARABEL, verbose = False) return vβ.value, paramB.value[0].item() 

To verify the implementation I used the data in the tutorial: SciKit Learn - Plot classification boundaries with different SVM Kernels.

I always used a pre computer kernel.
For the linear case:

enter image description here

The red line shows the implementation which matches SciKit Learn's SVC class perfectly.

For the RBF kernel one can only plot the decision function value:

enter image description here enter image description here

They also match perfectly.

Related to Gradient Descent for Primal Kernel SVM with Soft Margin (Hinge) Loss.


The code is available on my StackExchange Code GitHub Repository (Look at the Mathematics\Q2154648 folder).

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.