3
$\begingroup$

Given a function $f: \mathbb{R}^n \rightarrow \mathbb{R}^n$ and a matrix $A \in \mathbb{R}^{n \times n}$. Is there a general formula for calculating the following derivative:

$$ \frac{\partial}{\partial x} f(x)^T A f(x) \tag{1} = ? $$

I know that

$$ \frac{\partial}{\partial x} x^T A x = x^T(A + A^T) \overset{A = A^T}{=} 2 x^T A \tag{2} $$

and the solution to $(1)$ will probably look similar to $(2)$, but I am stuck here since I am not sure how to apply the chain rule in the matrix case.

Edit: Regarding notation, we have

$$ \frac{\partial }{\partial x}f(x) = \begin{bmatrix} \frac{\partial}{\partial x_1} f_1(x) & \frac{\partial}{\partial x_2} f_1(x) & \cdots & \frac{\partial}{\partial x_n} f_1(x) \\ \frac{\partial}{\partial x_1} f_2(x) & \frac{\partial}{\partial x_2} f_2(x) & \cdots & \frac{\partial}{\partial x_n} f_2(x) \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial}{\partial x_1} f_n(x) & \frac{\partial}{\partial x_2} f_n(x) & \cdots & \frac{\partial}{\partial x_n} f_n(x) \end{bmatrix} $$

and

$$ x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} , f(x) = \begin{bmatrix} f_1(x) \\ f_2(x) \\ \vdots \\ f_n(x) \end{bmatrix} $$

$\endgroup$
2
  • 1
    $\begingroup$ Is $\partial/\partial x$ the total differential (or Jacobian, however you want to call it)? The expression $f(x)^TAf(x)$ is a product of things, so you can appeal to the product rule (either a general product rule, or do it entry wise). $\endgroup$ Commented Feb 26, 2019 at 20:56
  • $\begingroup$ @Reveillark I updated the question. I know I could do everything elementwise using the product rule, but I am rather looking for a compact formula in matrix notation, similar to $(2)$. $\endgroup$ Commented Feb 26, 2019 at 21:25

3 Answers 3

4
$\begingroup$

Given a differentiable vector field $\mathrm v : \mathbb R^n \to \mathbb R^n$ and a matrix $\mathrm A \in \mathbb R^{n \times n}$, let function $f : \mathbb R^n \to \mathbb R$ be defined by

$$f (\mathrm x) := \langle \mathrm v (\mathrm x), \mathrm A \mathrm v (\mathrm x) \rangle$$

whose directional derivative in the direction of $\mathrm y \in \mathbb R^n$ at $\mathrm x \in \mathbb R^n$ is

$$D_{\mathrm y} f (\mathrm x) := \lim_{h \to 0} \frac{f (\mathrm x + h \mathrm y) - f (\mathrm x)}{h} = \cdots = \langle \mathrm y, \mathrm J_{\mathrm v}^\top (\mathrm x) \, \mathrm A \, \mathrm v (\mathrm x) \rangle + \langle \mathrm J_{\mathrm v}^\top (\mathrm x) \, \mathrm A^\top \mathrm v (\mathrm x) , \mathrm y \rangle$$

where matrix $\mathrm J_{\mathrm v} (\mathrm x)$ is the Jacobian of vector field $\rm v$ at $\mathrm x \in \mathbb R^n$. Thus, the gradient of $f$ is

$$\nabla_{\mathrm x} f (\mathrm x) = \mathrm J_{\mathrm v}^\top (\mathrm x) \left( \mathrm A + \mathrm A^\top \right) \mathrm v (\mathrm x)$$

$\endgroup$
0
3
$\begingroup$

I find differential notation helpful here in organizing things. The total derivative is a linear operator, so we introduce its argument and apply the product rule: $$d(f(x)^TAf(x))=d(f(x)^T)\cdot Af(x)+f(x)^TA\cdot d(f(x)$$ $$d(f(x)^TAf(x))=\left(\frac{df}{dx}\cdot dx\right)^T\cdot Af(x)+f(x)^TA\cdot \left(\frac{df}{dx}\cdot dx\right)$$ Now, the transpose of a $1\times 1$ matrix is itself, so we transpose that first term: $$d(f(x)^TAf(x)) = f(x)^TA^T\cdot \left(\frac{df}{dx}\cdot dx\right)+f(x)^TA\cdot \left(\frac{df}{dx}\cdot dx\right)$$ $$d(f(x)^TAf(x)) = f(x)^T(A+A^T)\cdot \left(\frac{df}{dx}\cdot dx\right)$$ Now that's in the form we want for the derivative. The total derivative of $f(x)^TAf(x)$ is $$f(x)^T(A+A^T)\frac{df}{dx}$$ where $\frac{df}{dx}$ is the matrix of partial derivatives of $f$, written in your question as $\frac{\partial}{\partial x}f(x)$.

$\endgroup$
1
$\begingroup$

Well we want to differentiate $f(x)^TAf(x)$ then it is useful to break into pieces.

First we see how to differentiate $g(x,y) = x^TAy$ with $A$ constant. $$ g(x+h,y+k) = (x+h)^TA(y+k) =(x^T+h^T)A(y+k) = x^TAy + h^TAy + x^TAk + h^TAk $$ From this we see that $Dg_{(x,y)}(h,k) = h^TAy + x^TAk$.

Now we use the chain rule $$ D(f(x)^TAf(x))_x(v) = Dg_{(f(x),f(x))}(Df_x(v),Df_x(v)) = Df_x(v)^TAf(x) + f(x)^TADf_x(v) $$

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.