In the previous post I outlined some of the motivations for low-rank matrix completion and described an interesting algorithm for completing rank-1 matrices. In this post I'll be outlining an approach for arbitrary rank matrices. If you haven't read part 1, check it out here!

Generalizing Constraints

In the previous post we used the observation that the determinant of 2x2 submatrices is equal to zero under the rank-1 condition to constrain the result. This is true for any \(n \times m\) submatrix where \(min(n,m)>r\), so intuitively there should be more constraints on the matrix, given that there would \(r(n+m)\) degrees of freedom for a matrix with \(nm\) entries. This leads us to a useful observation - that a given submatrix should be overdetermined if \(r(n+m) \leq nm\). Equality here is also valid as we lose a degree of freedom by disregarding a shared scaling constant in the singular vectors.

So we should have \(C = 1+nm-r(n+m)\) constraints on a given submatrix, meaning that we can solve for up to \(C\) entries in each submatrix as long as each of the remaining entries are defined. So what are the constraints that enable this solution? Let's consider the characteristic polynomial of an example NxN square submatrix \(A\) with eigenvalues \(\lambda_i\):

\[ p_A(\lambda) = det(\lambda I-A) = (\lambda-\lambda_1)(\lambda-\lambda_2)... = \prod_i (\lambda-\lambda_i) \]

A matrix with less than full rank would have at least one eigenvalue equal to zero. As a result we know that this polynomial has at least one zero at \(\lambda = 0\). Furthermore, a given submatrix can have up to \(N-r\) zeros at \(\lambda=0\), thus enabling as many entries to be solved for. If we know how many zeros there are we can write constraints on each derivative of the characteristic polynomial up to the \((C-1)\)th. Fortunately there is a relatively simple equation for the derivative of the characteristic polynomial:

\[ \left. \frac{\partial}{\partial \lambda} det(\lambda I-A) \right|_{\lambda=0} = \sum_i det(-B_i)=0\]

Here \(det(B_i)\) is the principal minor formed by deleting the \(i\)th row and column of A. We know that this is true for derivatives \(d \in [0..C-1]\), so we can pass derivatives through the summation thanks to linearity and get a series of equations:

\[ \begin{align} det(A) &= 0 \\ \sum_i det(-B_i) = \sum_i (-1)^{N-1} det(B_i) &= 0 \\ \sum_{i_1} (-1)^{N-1} \sum_{i_2} det(-C_{i_1,i_2}) = - \sum_{i_1} \sum_{i_2} det(C_{i_1,i_2}) &=0 \\ ... \end{align} \]

Here we denote \(C_{i_1,i_2}\) to be the submatrix created by removing rows and columns \(i_1\) and \(i_2\), so \(C_{i_1,i_2} = C_{i_2,i_1}\). These equations unfortunately increase in complexity from the 2x2 case, but fortunately they can be solved in a straight forward manner for individual entries. Again, this is thanks to the relatively simple formula for low-rank additive updates to determinants often called the Matrix Determinant Lemma. Here we construct \(U\) and \(V^T\) so that their product results in an all zero matrix except for the unknown entries \(a_{x_1,y_1}, a_{x_2,y_2},...\) at the proper index. We also set the unknown entries in the matrix A to 0:

\[ det(A + UV^T) = det(I + V^T A^{-1} U)det(A)\]

So in the above derivative constraints where one has a constraint it's possible to expand it out into a polynomial constraint on \(a_{x,y}\)s which can be solved. As an example, consider the second constraint involving submatrices \(B\) (dropping the subscript for clarity), and consider where there are two entries \(a_{x_1,y_1}, a_{x_2,y_2}\) to solve for:

\[ det(B + UV^T) = det(I + V^T B^{-1} U) det(B) = det \left(\begin{bmatrix} 1+B_{y_1,x_1}^{-1} a_{x_1,y_1} & B_{y_1,x_2}^{-1} \sqrt{a_{x_1,y_1} a_{x_2,y_2}} \\ B_{y_2,x_1}^{-1} \sqrt{a_{x_1,y_1} a_{x_2,y_2}} & 1+B_{y_2,x_2}^{-1} a_{x_2,y_2} \end{bmatrix} \right) det(B) \]

\[ = \left((1+B_{y_1,x_1}^{-1} a_{x_1,y_1})(1+B_{y_2,x_2}^{-1} a_{x_2,y_2}) - B_{y_1,x_2}^{-1} B_{y_2,x_1}^{-1} a_{x_1,y_1} a_{x_2,y_2} \right) det(B) \]

\[ = \left(1 + B_{y_2,x_2}^{-1} a_{x_2,y_2} + B_{y_1,x_1}^{-1} a_{x_1,y_1} + a_{x_1,y_1} a_{x_2,y_2} (B_{y_1,x_1}^{-1} B_{y_2,x_2}^{-1} - B_{y_1,x_2}^{-1} B_{y_2,x_1}^{-1}) \right) det(B) \]

So in the end we observe that the expression consists of a polynomial that is quadratic in the unknown entries \(a\) that would be included in the overall sums described in the above constraints. By considering determinants of matrices of increasing size for larger numbers of unknown entries in submatrices we see that in general for a series of \(C\) constraints on \(C\) unknown variables that they take the form of a \(C\)th degree polynomials with only "mixing" terms (i.e no single term with power greater than 1). In the two entry solution above this requires a set of quadratic equations in two variables. In general, for a rank \(r\) problem one would need to solve an \(r\)th degree polynomial in \(r\) equations, which is NP-Hard, but still feasible for small \(r\) (<15). This is of significant use, as many problems require that \(r=2\) or \(3\), such as global positioning problems in 2 or 3 dimensions. Additionally, if the matrix is sufficiently sampled, the polynomial solution will still be overdetermined, and faster algorithms exist for finding solutions. This coincides quite nicely with the observation in the convex relaxation approach that when you have a little bit more sampling than is strictly required and singular vectors satisfying the right coherence property you can solve the problem exactly in polynomial time.

This approach suffices for square submatrices, but to solve the problem in general we need to consider solvable rectangular submatrices. One way to approach this is to consider constructing a tensor valued characteristic polynomial using the generalized minor as described here. For submatrix \(A \in \mathbb{R}^{n \times m}\) with \(n \lt m\) we can define a similar generalized tensor valued characteristic polynomial:

\[ \widetilde{p}_A(\lambda) = \begin{cases} det(\lambda I - A_{[:],![i_1,\ldots,i_{(m-n)}]}) \text{ for } i_1 > \ldots > i_{m-n} \\ 0 \text{ for else} \end{cases}\]

We can apply the same technique as above to each entry in the generalized polynomial, thus constructing sufficient constraints to solve for the given entries. This effectively captures the observation that each square submatrix of the rectangular submatrix should have rank \(r\). Each component of the tensor valued polynomial should have \(n-r\) zeros at \(\lambda=0\).

There's a bit more work to be done to complete the implementation - in particular better structuring the constraints in the above polynomial equations so they can be solved algorithmically. That will be expanded on in Part 3, along with the implementation.

Finding Solvable Entries

In the previous post we identified solvable entries by considering the mask of known values as an adjacency graph, and noticed that solvable entries corresponded to paths that were "1-near to cycles of length 4", that is that by adding one edge to the path we could construct a cycle of length 4. In the general rank case we can solve for up to \(1+nm-r(n+m) = C\) different entries in each submatrix of size \(n \times m\). Furthermore, a simple cycle isn't sufficient to identify submatrices that can be completed - rather, we require the concept of a "complete bipartite graph", where every node in two subsets, each of the two graph parts, is connected.

An example of a complete bipartite graph (From Wikipedia)

Our goal is to identify subgraphs that are "C-near" complete bipartite graphs, and thus can be solved. I expect this problem to be NP-Hard due to it's connection to finding cliques of a fixed size (the complete bipartite subgraph problem can be turned into finding C-near cliques of a given size by connecting all points in the same principle subset), so I'd expect the best worst-case performance would be a brute-force checking that can run in \(O((nm)^r)\). There are also well-developed approximation algorithms for finding cliques of a fixed size in polynomial time, so this provides some interesting alternative approaches to identifying conditions under which matrix completion is in \(P\) via this path.

Part 3 will mostly focus on the implementation details - that will also have a Python/numpy implementation, so stay tuned!