0
$\begingroup$

Recent linear state-space model papers like Mamba often use matrix exponential to discretize the system. They initialize the system in a continuous-time regime, and discretize it to run it like a vanilla RNN / CNN.

Consider the simplest continuous-time autonomous system $d\mathbf{h(t)}/dt = A\mathbf{h(t)}$ (assuming the time constant $\tau$ = 1). $A$ is a matrix, and $\mathbf{h}$ is a vector. The discretized version of this system can be represented as $\mathbf{h}_t = exp(A)\mathbf{h}_{t-1}$ (again, assuming the discretization window $\Delta$ = 1 for simplicity). Output for loss calculation is just $\mathbf{y}_t = f(C\mathbf{h}_t)$ where f is an activation function. It can just be a linear function.

Here, I'm facing difficulty in mathematically determining how to calculate $\partial L / \partial A_{ij}$. The matrix exponential complicates the process for me. One might approximate $exp(A) \approx I + A$ but I suspect this approximation may not accurately capture the loss landscape. Does anyone have suggestions or references that could help clarify this calculation? I am not very familiar with tensor calculus so the blame is on me.

EDIT 1

$L$ is a loss function defined as $g(\mathbf{y}^*, \mathbf{y}_t)$. $\mathbf{y^*}$ is an optimal target. I would love to know how to correctly update the weight $A_{ij}$ regarding this loss function. Therefore, $\partial L / \partial A_{ij} = \frac{\partial L}{\partial \mathbf{y}_t} * \frac{\partial \mathbf{y}_t}{\partial \mathbf{h}_t} * \frac{\partial \mathbf{h}_t}{\partial A_{ij}}$ by chain rule.

I'm having trouble in calculating $\frac{\partial \mathbf{h}_t}{\partial A_{ij}}$ term.

EDIT 2

I realized my question can be boiled down to this:

Given $B = exp(A)$, can we express an element $B_{kl}$ with an elementary operation of A’s elements $A_{ij}$?

As my previous question is not well-structured, I may open a new question regarding an element of matrix exponential.

EDIT 3

My question is boiled down at this link. Any further update on this question will be done at the linked post.

Also, I realized my original question did not do a very good job on indexing elements of matrix partial derivatives. Apologies on that.

$\endgroup$
5
  • $\begingroup$ Can you provide a reference, the description is not very clear. $\endgroup$ Commented Feb 21, 2024 at 6:58
  • $\begingroup$ You haven't defined $L$ in your question. If you want to differentiate a matrix exponential exactly, here is how it is done : en.wikipedia.org/wiki/… $\endgroup$ Commented Feb 21, 2024 at 13:55
  • $\begingroup$ @TedBlack could you please specify which part was not very clear? About discretization, this would work. About state-space machine, it is not very far from a simple linear state-space model. $\endgroup$ Commented Feb 21, 2024 at 14:43
  • 1
    $\begingroup$ Ok based on this you want to look at Exercise 13.29 in "Matrix Algebra" by Abadir and Magnus. Unfortunately, there is no closed formed solution; using their notation $d(e^\mathbf{A})=\sum_{k=0}^\infty \frac{1}{(k+1)!}\sum_{j=0}^k \mathbf{A}^j (d\mathbf{A}) \mathbf{A}^{k-j}$. $\endgroup$ Commented Feb 21, 2024 at 15:44
  • $\begingroup$ @TedBlack will check this out! $\endgroup$ Commented Feb 21, 2024 at 16:33

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.