I am going to give an answer based on the Gaussian kernel $$K(x,y) = \exp(- \gamma \|x-y \|^2)$$ First note that any PSD kernel is associated with a Hilbert space $H$ that it implicitly maps points to through its feature map $\phi$. This feature map satisfies $$K(x,y) = \langle \phi(x), \phi(y) \rangle$$ for all $x,y \in \mathbb{R}^d$. These Hilbert spaces are actually reproducing kernel Hilbert spaces (RKHS), on which there is a vast literature. You could take a look at wikipedia if you want.
The Gaussian kernel satisfies the nice property of being universal, which means that for any two compact, disjoint sets $A,B \subset \mathbb{R}^d$ there exists some $w \in H$ such that $$sgn(\langle w , \phi(x) \rangle) = 1_A(x) - 1_B(x)$$ Note that this implies that we have linearly separated $A$ and $B$ in the space $H$, which is what you wanted (just choose $A,B$ as subsets of your dataset corresponding to whatever labels your points have).
For the Gaussian kernel, one can show that $H$ is of infinite dimension. A feature map is given by $$\phi(x) = exp(- \gamma \|x- \cdot \|^2)$$ Note that we map a point to a function (a Gaussian centered at that very point, to be specific)!
So why is the Gaussian kernel universal? There are more elegant proofs on this (using e.g. Stone Weierstrass theorem), but if you trust me that for $n$ distinct points, the vectors $v_i = \phi(x_i)$ are linearly independent in the feature space $H$ (which makes sense because no Gaussian can be written as the linear combination of any weighted sum of Gaussians centered at other points) then there is a quick way to see this:
For $n$ linearly independent points $v_1,\dots,v_n$ we may simply define the map $$f(v_i) = 1_A(x_i) - 1_B(x_i)$$ and extend it to a linear functional on the linear span of the $n$ points $v_1,\dots,v_n$ in the feature space. By Riesz representation, there must hence exist $w \in H$ such that $$f(v_i) = \langle w, \phi(x_i) \rangle$$