0
$\begingroup$

I am reading about the Logstic regression. I get confused when we take derivatives with respect to vectors. As an example we have the Loss function of the Logistic regression as the Log-odds function given by $$l(\theta, \beta)=-\sum log\left( p_i^{y_i} (1-p_i)^{1-y_i}\right)$$ where $p_i=\dfrac{e^{\beta^T x_i}}{1+e^{\beta^T x_i}}, where \beta=[\beta_0 \beta_1]^T$, Now after this I have some questions, do we differentiate $l(\theta,\beta)$ with respect to $\beta $ or $\beta^T$. If we differentiate it with respect to $\beta$ we get a term that say $x_i^T$ which gives the equation as $$\sum (y_i-p_i)x_i^T=0 ~~~~~\text{matrix form}~X(Y-P)=0~~\text{where}~~X_{2\times n}, (Y-P)_{n\times 1}$$ and if we differentiate it with respect to $\beta^T$ we get a term that say $x_i$, gives the equation as $$\sum (y_i-p_i)x_i=0 ~~~~~\text{matrix form}~X^T(Y-P)=0~~\text{where}~~X_{n\times 2}, (Y-P)_{n\times 1}$$ assuming my $\beta$ is $2\times 1$ vector. Further when we do the second derivative for Newton raphson, what do we do exactly? do we take $\dfrac{\partial ^2l}{\partial \beta \partial \beta^T}$ or $\dfrac{\partial ^2l}{\partial \beta^T \partial \beta}$ or $\dfrac{\partial ^2l}{\partial \beta \partial \beta}$, I know that we have to end up with a matrix. Is there some rule, trick to bypass this confusion of the matrix dimensions, when we convert from indices to matrix compact form.

$\endgroup$

1 Answer 1

1
$\begingroup$

Here is my two personal tricks to understand derivatives on vectorial spaces easier.

  1. Partial derivatives with respect to coordinates.

    • The i-th coordinate of the gradient is the partial derivative wrt to $x_i$. That's easier to compute.
    • The i,j coordiante of the Hessian (second derivative matrix) is the partial derivative wrt to $x_i$ then $x_j$. That's also easier to compute.
    • Everything should be symmetric. That gives us a good way to check for any errors.
  2. Ordinary derivatives of $t \in \mathbb R \rightarrow f(t \delta + X_0)$.

    • If $t$ is a scalar then the function $t \rightarrow f(t \delta + X_0)$ is just a standard real function.
    • It's usually easy to compute its derivative.
    • The derivative is equal to $\delta . \nabla f(X_0)$

I hope these help you build upon your current knowledge of ordinary derivatives and partial derivatives to understand better vector derivatives.

Finally, the matrix cookbook can also help you build some familiarity with this. It has many formulas that are useful.

Bon courage.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.