I am working on an implementation of deconvolution and Wang et al.'s paper$^\color{magenta}{\dagger}$ mentions something I do not quite understand. The objective function is, in essence,
$$\min_u\sum_{i=1}^{n^2} \| D_i u \|_2 + f(u)$$
where $D_i u$ is the discrete gradient of a grayscale image $u$ at pixel $i$. They say "at each pixel an auxiliary variable $w_i\in\mathbb{R}^2$ is introduced to transfer $D_iu$ out of the non-differentiable term $\| \cdot \|_2$" and they write
$$\min_{w,u}\sum_i\|w_i\|_2+\frac{\beta}{2}\sum_i\|w_i-D_iu\|_2^2+f(u)$$
However, since the square root is not differentiable at the origin, the term $\sum\limits_i\|w_i\|_2$ is non-differentiable, but the authors use proximal gradient methods anyway.
$\color{magenta}{\dagger}$ Yilun Wang, Junfeng Yang, Wotao Yin, Yin Zhang, A new alternating minimization algorithm for total variation image reconstruction, SIAM Journal on Imaging Sciences, Volume 1, Issue 3, 2008.