Another approach is going with the using Gradient based methods.
In this approach it is assumed we can calculate the Matrix Vector operation for some $\boldsymbol{v}$:
$$ \boldsymbol{K} \boldsymbol{v} $$
This can be done by having the Kernel Function $k \left( \boldsymbol{x}_{i}, \boldsymbol{x}_{j} \right)$:
$$ \boldsymbol{u} = \boldsymbol{K} \boldsymbol{v} \Rightarrow {u}_{i} = \sum_{j} k \left( \boldsymbol{x}_{i}, \boldsymbol{x}_{j} \right) {v}_{j} $$
The methods below are limited with how efficient one can calculate the above.
The original objective, formulated as:
$$ \arg \min_{\boldsymbol{\beta}} \frac{\lambda}{2} \boldsymbol{\beta}^{\top} \boldsymbol{K} \boldsymbol{\beta} + \sum_{i = 1}^{n} \max \left\{ 0, 1 - {y}_{i} \left( \boldsymbol{k}_{i}^{\top} \boldsymbol{\beta} + b \right) \right\} $$
Is not smooth. It may be solved using the Sub Gradient Method.
Yet it will be relatively slow.
One can formulate the problem with the Squared Hinge Loss:
$$ \arg \min_{\boldsymbol{\beta}} \frac{\lambda}{2} \boldsymbol{\beta}^{\top} \boldsymbol{K} \boldsymbol{\beta} + \frac{1}{2} \sum_{i = 1}^{n} {\left( \max \left\{ 0, 1 - {y}_{i} \left( \boldsymbol{k}_{i}^{\top} \boldsymbol{\beta} + b \right) \right\} \right)}^{2} $$
The above is a smooth objective which can calculate its Hessian and use the Newton Method to solve.
The idea is described in Olivier Chapelle - Training a Support Vector Machine in the Primal.
I implemented it without the pre defined Kernel.
Namely, whenever the algorithm requires the multiplication of the matrices I used a calculation as described above. It is not fast, but it works, potentially, for any size number of samples and verified against a Disciplined Convex Problem (DCP) solver.
An optimized case, with smart selection of sub set of the problem is given in S. Sathiya Keerthi, Olivier Chapelle, Dennis DeCoste - Building Support Vector Machines with Reduced Classifier Complexity.
The code is available on my StackExchange Code GitHub Repository (Look at the Mathematics\Q2246150 folder).