Manual Implementation of the Kernel SVM to Match SciKit Learn

Question

I'm trying to manually implement the scikit learn basic SVM classifier using the Gram kernel matrix $K$. The mathematic formulation is the following:

\begin{align} \min_{\alpha\in\mathbb{R}^n} \quad & \frac{1}{2} \alpha^T Q \alpha - e^T \alpha \\ \text{s.t.} \quad & y^T \alpha = 0 \\ & 0 \leq \alpha_i \leq C, \quad \forall i = 1,\dots ,n \\ \text{with} \quad & Q_{ij} \equiv y_i y_j K_{ij} \end{align}

Using cvxopt for the convex optimization - more information - I compare the dual coefficients of the two approches (which should be the same) but alas they differ and I don't know what I have done wrong.

Below is a code sample that reproduces the issue:

import numpy as np import cvxopt from sklearn import svm ############### DATA SIMULATION ############### X = np.array([-5, 1, -9, 2, 0, 4, -0.1]) Y = np.array([-1, 1, 1, 1, 1, 1, -1]) n = X.shape[0] K = np.outer(X,X) ################# SKLEARN SVM ################# # train the sklearn SVM C = 100 clf = svm.SVC(kernel='precomputed', C=C) clf.fit(K, Y) # retrieve lagrangian multipliers alpha_sk = np.zeros(n) alpha_sk[clf.support_] = clf.dual_coef_ ################# MANUAL SVM ################## # QP objective function parameters P = K * np.outer(Y, Y) q = - np.ones(n) G = np.zeros((2*n, n)) G[0:n,:] = - np.eye(n) G[n:2*n,:] = np.eye(n) h = np.zeros((2*n,1)) h[0:n,0] = 0 h[n:2*n,0] = C A = Y.reshape(1,n) b = np.array([0]) # convert all matrix to cvxopt matrix P_qp = cvxopt.matrix(P.astype(np.double)) q_qp = cvxopt.matrix(q.astype(np.double)) G_qp = cvxopt.matrix(G.astype(np.double)) h_qp = cvxopt.matrix(h.astype(np.double)) A_qp = cvxopt.matrix(A.astype(np.double)) b_qp = cvxopt.matrix(b.astype(np.double)) # solve solution = cvxopt.solvers.qp(P_qp, q_qp, G_qp, h_qp, A_qp, b_qp) # retrieve lagrangian multipliers alpha_manual = Y*np.array(solution['x']).flatten() alpha_manual[np.abs(alpha_manual)<1e-3] = 0 ################### RESULTS ################### print(np.concatenate((alpha_manual.reshape(n,1),alpha_sk.reshape(n,1)),axis = 1))

Which outputs:

[[ -99.99999907 -100. ] [ 35.03082601 39.00000002] [ 75.37828732 60.99999998] [ 28.79243995 0. ] [ 41.85122172 100. ] [ 18.94722123 0. ] [ -99.99999716 -100. ]]

@RodrigodeAzevedo yes $Q$ is positive definite because $K$ is. Do you mean replacing P = K * np.outer(Y, Y) with P = np.multiply(K, np.outer(Y,Y))? I might be wrong but they seem equivalent to me. — Leo
– Leo, Commented Feb 22, 2017 at 11:16
@RodrigodeAzevedo, I tried both P1 = K * np.outer(Y, Y), P2 = np.multiply(K, np.outer(Y, Y)), print(np.abs(P1-P2).sum().sum()) and I get 0 so I'm sure they are the same. — Leo
– Leo, Commented Feb 22, 2017 at 11:37
Struggling with the same issue right now. Did you resolve your confusion? — Winger 14
– Winger 14, Commented Jul 17, 2023 at 14:38

Royi · Accepted Answer · 2025-08-16 18:04:02Z

You may use a simpler formulation of the Kernel SVM (Primal Kernel SVM):

$$ \arg \min_{\boldsymbol{\beta}, b} \frac{1}{2} \boldsymbol{\beta}^{T} \boldsymbol{K} \boldsymbol{\beta} + C \sum_{i = 1}^{n} \max \left\{ 0, 1 - {y}_{i} \left( \boldsymbol{k}_{i}^{T} \boldsymbol{\beta} + b\right) \right\} $$

Where

${K}_{i, j} = k \left( \boldsymbol{x}_{i}, \boldsymbol{x}_{j} \right)$ - The Kernel Matrix. Where $\boldsymbol{K} \in \mathbb{S}_{+}^{n}$.
$\boldsymbol{k}_{i}$ - The $i$ -th column / row of $\boldsymbol{K}$.
$C$ - The (Inverse) regularization factor.

This formulation matches the implementation of SciKit Learn.
It can be formulated and solved using CVXPY.

def KernelSVMCVXPY( mK: np.ndarray, vY: np.ndarray, /, *, paramC: float = 1.0 ) -> Tuple[np.ndarray, float]: """ Solves the dual problem of a Support Vector Machine using CVXPY. Parameters: - mK: Kernel matrix of shape (numSamples, numSamples). - vY: Class labels of shape (numSamples,). - paramC: Regularization parameter. Returns: - vβ: Optimization variables. - b: Bias term. """ numSamples = mK.shape[0] # Define the dual variable vβ = cp.Variable(numSamples) paramB = cp.Variable(1) # Define the objective function hingeLoss = cp.sum(cp.pos(1 - cp.multiply(vY, mK @ vβ + paramB))) # hingeLoss = cp.sum([cp.pos(1 - vY[ii] * (mK[ii] @ vβ + paramB)) for ii in range(numSamples)]) cpObjFun = cp.Minimize(0.5 * cp.quad_form(vβ, mK, assume_PSD = True) + (paramC * hingeLoss)) # Define Constraints cpConst = [] # Define the Problem oCvxPrb = cp.Problem(cpObjFun, cpConst) # Solve the problem # oCvxPrb.solve(solver = cp.SCS, verbose = False) oCvxPrb.solve(solver = cp.CLARABEL, verbose = False) return vβ.value, paramB.value[0].item()

To verify the implementation I used the data in the tutorial: SciKit Learn - Plot classification boundaries with different SVM Kernels.

I always used a pre computer kernel.
For the linear case: