Just as elementary differentiation rules are helpful for optimizing single-variable functions, matrix differentiation rules are helpful for optimizing expressions written in matrix form. This technique is used often in statistics.
Suppose is a function from to .Writing , we define the Jacobian matrix (or derivative matrix) to be
Note that if , then differentiating with respect to is the same as taking the gradient of .
With this definition, we obtain the following analogues to some basic single-variable differentiation results: if is a constant matrix, then
The third of these equations is the rule.
The Hessian of a function may be written in terms of the matrix differentiation operator as follows:
Some authors define to be , in which case the Hessian operator can be written as .
Exercise Let be defined by where is a symmetric matrix. Find
Solution.We can apply the product rule to find that
Exercise Suppose is an matrix and .Use matrix differentiation to find the vector which minimizes .Hint: begin by writing as .You may assume that the rank of is .
To minimize this function, we find its gradient
and set it equal to to get
(We know that has an inverse matrix because its rank is equal to that of , which we assumed was .)