Multiplicação e exponenciação de cadeias matriciais

Se eu tiver duas matrizes e , das dimensões e , respectivamente, e quiser calcular , é mais eficiente reescrever a expressão como e só então avaliar numericamente, porque é da dimensão mas é da dimensão . $A$ $B$ $1000\times2$ $2\times1000$ $(AB)^{5000}$ $A(BA)^{4999}B$ $AB$ $1000\times1000$ $BA$ $2\times2$

Eu quero resolver uma versão generalizada deste problema. Existe um algoritmo razoavelmente eficiente (não força bruta) para otimizar uma expressão que contém:

Variáveis matriciais livres de dimensões conhecidas
Produtos de subexpressões arbitrárias
Subexpressões arbitrárias aumentadas para energia natural

... para que seja necessário o mínimo de trabalho para avaliar numericamente, depois de substituir as variáveis da matriz livre por valores da matriz concreta?

O problema da multiplicação da cadeia da matriz é um caso especial do meu problema.

Editar:

Esta é uma resposta provisória. Parece intuitivamente correto para mim, mas não tenho provas de que esteja correto. Se estiver correto, ainda estou interessado na prova. (Se não estiver correto, é claro, me corrija.)

Para cada produto elevado a uma potência, digamos , considere toda permutação cíclica dos fatores: $(A_1 A_2 \ldots A_k)^n$

$(A_1 A_2 \ldots A_k)^n$
$A_1 (A_2 \ldots A_k A_1)^{n-1} A_2 \ldots A_k$
$A_1 A_2 (A_3 \ldots A_k A_1 A_2)^{n-1} A_3 \ldots A_k$
...
$A_1 A_2 \ldots A_{k-1} (A_k A_1 A_2 \ldots A_{k-1})^{n-1} A_k$

... recursively. Each power is to be calculated using exponentiation by squaring (obviously), and all other products are to be calculated using the optimal order returned by the matrix chain multiplication algorithm.

Edit:

The idea outlined in my previous edit is still somewhat nonoptimal. The exponentiation by squaring algorithm actually evaluates expressions of the form $K A^n$ or $A^n K$ , where $K$ isn't necessarily the identity matrix. But my algorithm doesn't consider the possibility of using the exponentiation by squaring algorithm with $K$ not equal to the identity matrix.

optimization dynamic-programming linear-algebra

— pyon
fonte

@gnasher729: Sorry, I should've been more explicit. I don't want to brute-force all possibilities, for exactly the same reason you wouldn't want to solve the matrix chain multiplication by brute force. I just edited the question accordingly.

— pyon

Note that even after you cleverly factor expression

A (B A)^{4999} B

$A(BA)^{4999}B$ it is still more clever to factor it as

A (B A)^{2 * (2 * 1249 + 1) + 1} B

$A(BA)^{2*(2*1249+1)+1}B$ . Point is, you probably have to mix between matrix chain multiplication and other standard algorithm for fast exponentiation.

— Apiwat Chantawibul

@Billiska: Indeed, that's precisely what I want to do: combine the matrix chain multiplication and exponentiation by squaring into a single algorithm for the combined problem. But there are some pesky issues. Given

A (B A)^{n - 1} B

$A(BA)^{n-1}B$ , how do I prevent the algorithm from further trying

A B (A B)^{n - 2} A B

$AB(AB)^{n-2}AB$ ,

A B A (B A)^{n - 3} B A B

$ABA(BA)^{n-3}BAB$ , and so on?

— pyon

We change the basis to Eigen vector for matrix exponentiation and when all matrix have power 1 then we can use the matrix chain multiplication.

— Deep Joshi

@DeepJoshi Sorry, I find your comment rather terse. But, if I understand your idea correctly, I'm afraid it won't work in the general case, because the dimensions of the eigenspaces of an

n \times n

$n \times n$ matrix needn't add up to

n

$n$ . In other words, it isn't always the case that every vector can be expressed as a linear combination of eigenvectors.

— pyon

Respostas:

Disclaimer: The following method has not been rigorously proven to be optimal. An informal proof is provided.

The problem reduces to finding the most efficient ordering when considering the square of the product.

For example, when looking at e.g. $(ABC)^{50}$ , we only need to optimally solve $(ABC)^2$ since this expands to $ABCABC$ . No useful ordering information is added by concatenating $ABC$ again. The intuition here is that since the problem of optimal ordering can be solved bottom-up, higher orderings consisting of more elements using the same matrices are irrelevant.

Finding the best ordering of $ABCABC$ reduces to the Matrix Chain Multiplication problem. After finding an optimal ordering, apply exponentiation to the triplet (n-tuple generally) in the ordering.

As an e.g., if the optimal ordering for the square is $A(B(CA))BC$ , the solution to the initial problem is $A(B(CA))^{49}BC$ .

In summary:
1) The first step in solving $(A_1 A_2 \cdots A_n)^m$ is to solve $(A_1 A_2 \cdots A_n)^2$ .
2) Solving $(A_1 A_2 \cdots A_n)^2$ is best approached as an instance of the Matrix Chain Multiplication problem.
3) Using the n-tuple ordering $G$ from the solution in (2) will give us the solution to (1) as some flavor of $A_1 \cdot A_2 \cdot G^{m-1} \cdot A_n$ (note that any other groupings from solving (2) should be applied as well).

Informal proof
Considering the simplest case using two matrices, $(AB)^n$ , we note that $A$ and $B$ have dimension $X \times Y$ and $Y \times X$ respectively. Any product using $A$ and $B$ has one of the following dimensions:

$X \times Y$
$Y \times X$
$Y \times Y$
$X \times X$

We have either $X < Y$ or $Y ≤ X$ .

Assumption 1a): $X < Y$
$AB$ has dimension $X \times X$ , and this ordering is guaranteed to be optimal from a bottom-up approach. Any other configuration of $A$ and $B$ is either equally good, or worse. Thus, the problem is optimally solved as $(AB)^n$ .

Assumption 1b): $Y ≤ X$
$BA$ has dimension $Y \times Y$ . This is the optimal ordering for all products involving $A$ and $B$ . Thus, the solution is optimally found as $A(BA)^{n-1}B$ .

This concludes the proof, and we have only looked at the two orderings found in $ABAB$ , the square problem.

Using more matrices, the argument is similar. Perhaps an inductive proof is possible? The general idea is that solving the MCM for the square will find the optimal size for the operations with all involved matrices considered.

Case study:

julia> a=rand(1000,2);
julia> b=rand(2,1000);
julia> c=rand(1000,100);
julia> d=rand(100,1000);
julia> e=rand(1000,1000);

julia> @time (a*b*c*d*e)^30;
  0.395549 seconds (26 allocations: 77.058 MB, 1.58% gc time)

# Here I use an MCM solver to find out the optimal ordering for the square problem
julia> Using MatrixChainMultiply
julia> matrixchainmultiply("SOLVE_SQUARED", a,b,c,d,e,a,b,c,d,e)
Operation: SOLVE_SQUARED(A...) = begin  # none, line 1:
    A[1] * (((((A[2] * A[3]) * (A[4] * (A[5] * A[6]))) * (A[7] * A[8])) * A[9]) * A[10])
  end
Cost: 6800800

# Use the ordering found, note that exponentiation is applied to the group of 5 elements
julia> @time a*(((((b*c)*(d*(e*a)))^29*(b*c))*d)*e);
  0.009990 seconds (21 allocations: 7.684 MB)

# I also tried using the MCM for solving the problem directly
julia> @time matrixchainmultiply([30 instances of a,b,c,d,e]);
  0.094490 seconds (4.02 k allocations: 9.073 MB)

— matteyas
fonte

Can you justify the claim that it's enough to consider

(A B C)^{2}

$(ABC)^2$ ? That doesn't seem obvious, to me.

— David Richerby

@DavidRicherby I have no proof for this, but it seems intuitive since the problem is cyclic in nature, and each unique cyclic ordering occurs in

A B C A B C

$ABCABC$ and no further terms provide any new cyclic group. My hypothesis is that

(A B C)^{n}

$(ABC)^n$ is optimally calculated as one of the following:

(A B C)^{n}

$(ABC)^n$ ,

A (B C A)^{n - 1} B C

$A(BCA)^{n-1}BC$ or

A B (C A B)^{n - 1} C

$AB(CAB)^{n-1}C$ . I'll try proving it tomorrow if there is time.

— matteyas

@DavidRicherby is the added informal proof of any use?

— matteyas

@matteyas: That's more or less what I said in the first edit to my question, right?

— pyon

@matteyas OK, so I guess the point is that, because we have a repeating sequence, there are only a fixed number of sizes of intermediate matrices and you've seen all of those by the time you've looked at

A B C A B C

$ABCABC$ .

— David Richerby

-1

If you want to calculate the product of n matrices $A_1$ to $A_n$ in the best possible time, you can easily calculate how many operations are needed to calculate the product of $A_i$ to $A_j$ for all 1 ≤ i ≤ j ≤ n in $O (n^3)$ steps.

— gnasher729
fonte

This doesn't take into account subexpressions that are raised to a power (if the power is large this might be very inefficient), and it doesn't take into account the opportunity to use fast exponentiation to achieve better speedups, so I suspect this isn't an optimal answer yet.

— D.W.