 Research
 Open Access
 Published:
Nonconvex matrix completion with Nesterov’s acceleration
Big Data Analyticsvolume 3, Article number: 11 (2018)
Abstract
Background
In matrix completion fields, the traditional convex regularization may fall short of delivering reliable lowrank estimators with good prediction performance. Previous works use the alternation least squares algorithm to optimize the nonconvex regularization. However, this algorithm has high time complexities and requires more iterations to reach convergence, which cannot scale to largescale matrix completion problems. We need to develop faster algorithm to examine large amount of data to uncover the correlations between items for the big data analytics.
Results
In this work, we adopt the randomized SVD decomposition and Nesterov’s momentum to accelerate the optimization of nonconvex matrix completion. The experimental results on four real world recommendation system datasets show that the proposed algorithm achieves tremendous speed increases whilst maintaining comparable performance with the alternating least squares (ALS) method for nonconvex (convex) matrix completions.
Conclusions
Our novel algorithm can achieve comparable performance with previous algorithms but with shorter running time.
Background
The fields of machine learning and mathematical programming are increasingly intertwined. Optimization algorithms lie at the heart of most machine learning approaches [1] and have been widely applied in learning to rank [2, 3], clustering analysis [4], matrix factorization [5], etc.
Matrix completion is the task of filling in the missing entries of a partially observed matrix. Given a partially observed matrix O, matrix completion [6] attempts to recover a lowrank matrix X that approximates O on the observed entries. This approach leads to the following lowrank regularized optimization problem:
where P_{Ω} is the projection of X_{m×n} onto the observed indices Ω and otherwise zeros, and ∥·∥_{F} represents the Frobenius norm of a matrix.
Direct minimization of the matrix rank is NPhard, and a nature convexification of rank(X) is the nuclear norm ∥X∥_{∗} of X. The corresponding surrogate problem can be solved by various optimization algorithms. Candes and Recht [6] realized that convex relaxation is the equivalent to classic semidefinite programming (SDP), where SDP complexity is O(max(m,n)^{4}). Cai et al. [7] gave a firstorder method that approximately solves convex relaxation by the singular value thresholding algorithm (SVT), which takes O(mn^{2}) time to solve SVD in each SVT iteration. The softimpute algorithm utilized a special “sparse + lowrank” structure associated with SVT to efficiently compute the SVD with a reasonable time. The augmented Lagrangian method (ALM) provides a powerful framework to solve convex problems with equality constrains. Lin et al. [8] used ALM to solve matrix completion problems. The fixed point continuation with approximate SVD (FPCA) algorithm [9] is based on the proximal gradient with a continuation technique to accelerate convergence. Toh and Yun [10] proposed an accelerated proximal gradient algorithm on the balanced approach. Tomioka et al. [11] used a dual augmented Lagrangian algorithm for matrix completion, which achieves superlinear convergence and can be generalized to the problem of learning multiple matrices and general spectral regularization.
The traditional nuclear norm regularization may fall short of delivering reliable lowrank estimators with good prediction performance. Many researchers move their strides beyond the convex l_{1}−penalty to the nonconvex form. Mazumder et al. [12] used MC+penalty to demonstrate the performance of the coordinatedescent algorithm. Dimitris et al. [13] developed a discrete extension of modern firstorder continuous optimization methods to find a highquality feasible solution. Zhang [14] proposed the MC+ method for penalized variable selection in highdimensional linear regression, which is brought into a unified view of concave highdimensional sparse estimation procedures [15].
Previous work [16] used the alternating least squares (ALS) procedure to optimize the nonconvex matrix completion. However, this method has high time complexities and requires more iterations to reach convergence, which cannot meet the requirements of big data analytics. Halko et al. [17] presented a modular framework for constructing randomized algorithms to compute the partial matrix decompositions. Nesterov’s momentum [18] makes a step in the velocity direction and then makes a correction to a velocity vector based on the new location. Nesterov’s momentum has been widely used in deep learning [19] and signal processing [20].
In this work, we adopt the randomized SVD decomposition and Nesterov’s momentum to accelerate optimization of nonconvex matrix completion. Randomized SVD decomposition requires very few iterations to converge quickly. In addition, the proposed algorithm has a low running time and fast convergence rate by Nesterov’s acceleration. The experimental results on four real world recommendation system datasets show that the proposed algorithm achieves tremendous speed increases whilst maintaining comparable performance with ALS for nonconvex (convex) matrix completions.
Our contribution is as follows: (1) we introduce Nesterov’s momentum to solve the nonconvex matrix completion problem; (2) we further adopt the random SVD decomposition to accelerate the converge of the algorithm; (3) the experimental results on benchmark datasets shows the effectiveness of our algorithm.
The remainder of the paper is organized as follows: the “Background” section introduces nonconvex matrix completion and the alternating least squares algorithm; the “Nonconvex Matrix Completion by Nesterov’s Acceleration (NMCNA)" section shows the experimental results; the “Discussion” section analyses the time complexities of the algorithms; the “Conclusionsss” section summarizes the conclusions; and the “Methods” gives our novel algorithm called Nonconvex Matrix Completion by Nesterov’s Acceleration (NMCNA).
Nonconvex Matrix Completion
Nonconvex matrix completion uses nonconvex regularizers to approximate the l_{0}−penalty, which leads to the following nonconvex estimation problems:
where σ_{i}(X)(i≥1) are the singular values of X and P(σ_{i}(X);λ,γ) is a concave penalty function of σ on [0,∞]. The parameter (λ,γ) controls the amount of nonconvexity and shrinkage. For the nuclear norm regularized problem with P(σ_{i}(X);λ,γ)=λσ_{i}(X), the softthresholding operator gives a solution of problem (2):
where U and V is the left singular matrix and the right singular matrix of Z, σ is a singular value vector of Z and λ is a threshold.
For the more general spectral penalty function P(σ;λ,γ), we obtain the following similar result [16] as Lemma 1.
Lemma 1
Let Z=Udiag(σ)V^{T} denote the SVD of Z and D_{λ,γ} denote the following thresholding operator on the singular values of Z:
Then, S_{λ,γ}(Z)=Udiag(s_{λ,γ}(σ))V^{T}.
By the separability of the optimization, we can see that it can be converted into the optimization of the single variable
The popular nonconvex function used in the highdimensional regression framework includes the l_{γ} penalty, the SCAD penalty, the MC+ penalty and the logpenalty. In this work, we use the MC+ penalty (λ>0 and γ>1) as a representative:
where the parameter λ and γ) controls the amount of nonconvexity and shrinkage.
We can easily analyze the solution of the thresholding function induced by the MC+ penalty:
As observed in Fig. 1, when γ→+∞, the above threshold operator is consistent with the softthresholding operator λσ. On the other hand, the threshold operator approaches the discontinuous hardthresholding operator σ1(σ≥λ) when γ→1+.
It is critical to solve the SVD efficiently. Mazumder [16] et al. used the alternating least square (ALS) procedure to compute a lowrank SVD.
Matrix Completion with Fast Alternating Least Squares
The ALS method [21, 22] solves the following nonlinear optimization problem:
where λ is a regularization parameter and P_{Ω}(X) is the projection of the matrix X on the matrix Ω. We call it as the maximummargin matrix factorization (MMMF) criterion with nonconvex but biconvex. It can be solved with the alternating minimization algorithm. One variable is fixed (A or B), and we wish to solve (8) for another variable (B or A). We can easily see that this problem decouples into n separate ridge regressions. Further, we can prove that [21] the problem (8) is equivalent to the following problem:
where X≈AB.
It is noticed that (??) or (??) are efficient sparse + lowrank representations for highdimensional problems, which can efficiently store and multiply the matrices. We can then obtain the analytical solution for the linear regression problems. Comparing (8) and (9), we can observe that X=UD^{2}V^{T}=AB^{T} with A=UD and B=VD.
Finally, we use the relative change in Frobenius norm to check for convergence
Results
In this section, we describe the experimental protocols and then compare the effectiveness and efficiency of our proposed algorithm (NMCNA) and the ALSconvex and ALSnonconvex algorithms. All algorithm are implemented in Python language with numpy library and running on Ubuntu system 14.04.
We used four real world recommendation system datasets (as shown in Table 1) from ml100k and ml1m provided by MovieLens ^{1} to compare nonconvex MC+ regularization. Every matrix item is taken from the ratings {1,2,3,4,5}.
For performance evaluation, we use root mean squared error as follows:
where Y and O are the true values and predicted ones, respectively.
For all three settings above, we fix λ_{1}=∥P_{Ω}(Y)∥ and λ_{50}=0.001·λ_{1}. We choose 10 γvalues in a logarithmic grid from 10^{6} to 1.1. We let the true ranks be taken from {3,5,10,20,30,40,50}.
From Fig. 2, we observed that a moderate γ value will obtain the best generalization performance.
Figure 3 shows the variations of the test RMSE with the training RMSE results. The xaxis is the training RMSE, and the yaxis the test RMSE in each subfigure. The rows and columns correspond to the ranks and the datasets.
Figure 4 shows that the running time varies with the iteration times. It is noted that our proposed algorithm runs faster than the ALSconvex and ALSnonconvex algorithms.
Discussion
In Fig. 2, it is more obvious for the large rank k=50 that the choice γ=80 achieves the best test RMSE with 35 iterations for the ml100k/ua dataset. Similar results also hold for the rank k=3.
We can observe from Fig. 3 that the proposed algorithm can converge rapidly, although it has a high training and test RMSE from the beginning. In the end, comparable performance is achieved with ALSconvex and ALSconvex algorithms. The minimum RMSE value does not decrease with increasing ranks.
Figure 4 shows that our algorithm is 18 times faster than other two algorithms on the ml1m/ra.train and ml1m/rb.train datasets.
According to the discussion on the ALS algorithm in [21], the total cost of an iteration for computing SVD is O(2rΩ+mk^{2}+3nk^{2}+k^{3}).
The randomized SVD procedure [17] in the proposed algorithm requires the computation of Y^{∗}Y^{∗T}R (line 5 in Alg. 3), where Y^{∗T}R can be computed as
Assume that X and X^{′} have ranks r and r^{′}; then, it will take O((r+r^{′})Ω) time for constructing P_{Ω}(Y−X^{∗}) and O(Ωk) for computing P_{Ω}(Y−X^{∗})R time. Further, the cost of computing U(diag(s_{λ,γ}(σ))(V^{T}R)) is O(mrk+nrk); then, the total cost is
Although we do not judge the performance advantages of random SVD over ALS for one iteration, the random SVD is shown to require fewer iterations (<5) than ALS in these experiments.
Conclusions
This paper proposes a novel algorithm for nonconvex matrix completion. The proposed algorithm uses randomized SVD decomposition with few iterations. Nesterov’s acceleration further decreases its running time. The experimental results on four realworld recommendation system datasets show that the approach can accelerate the optimization of nonconvex matrix problems but with comparable performance to ALS algorithms. In the future, we will focus on the validation of other nonconvex functions for matrix completion problems.
Methods
Nonconvex Matrix Completion by Nesterov’s Acceleration (NMCNA)
Although softimpute is a proximal gradient method that can be accelerated by Nesterov’s acceleration, the special sparse+lowrank structure will be lost in the ALS algorithm [23]. In this work, we resorted to the randomized partial matrix decomposition algorithm [17] to compute SVD.
Now, we present Nesterov’s acceleration for the proposed nonconvex matrix completion problem. First, define the following sequences:
Given the singular value decomposition of X and \(\bar {X}\): \(\bar {X} = U\Sigma V^{T}\) (Σ is diagonal) and \(\bar {X}^{\prime } = U^{\prime }\Sigma ^{\prime } V^{\prime {T}}\), the proximal gradient algorithm uses the proximal mapping
where s_{λ,γ}(σ) is computed by (7) and X (X^{′}) is the current (previous) predict matrix. By Nesterov’s acceleration, we have
Random SVD Factorization
The problem of computing a lowrank approximation to a given matrix can be divided into two stages: construction of a lowdimensional subspace and computation of the factorization with the restriction to the subspace.
We require a matrix Q here as an approximate basis (see Alg. 3) for the range of the input matrix Z
where Q has orthonormal columns.
In the simplest form, we directly compute the QR factorization of Y=ZΩ=QR, where Ω is a random matrix. Thus, Q is an orthonormal base of the range of Z. However, the singular spectrum of the input matrix may decay slowly. Therefore we alternatively compute the QR factorization of (ZZ^{T})^{q}ZΩ=QR, where q≤5 usually suffices in practice.
Given the orthonormal matrix Q we compute the SVD of B=Q^{T}Z as follows:
then with (24), we immediately obtain the approximate SVD of Z:
Abbreviations
 ALM:

Augmented Lagrangian method
 ALS:

Alternative least squares
 FPCA:

Fixed point continuation with approximate SVD
 MMMF:

Maximummargin matrix factorization
 NMCNA:

Nonconvex matrix Completion by Nesterov’s acceleration
 RMSE:

Root mean squared error SDP: Semidefinite programming
 SVT:

Singular value thresholding
References
 1
Huang KZ, Yang H, King I, Lyu MR. Machine Learning: Modeling Data Locally And Globally. In: Advanced Topics in Science and Technology in China. Berlin Heidelberg: Springer: 2008.
 2
Jin XB, Geng GG, Xie GS, Huang K. Approximately optimizing NDCG using pairwise loss. Inf Sci. 2018; 453:50–65.
 3
Jin XB, Geng GG, Sun M, Zhang D. Combination of multiple bipartite ranking for multipartite web content quality evaluation. Neurocomputing. 2015; 149:1305–14.
 4
Jin XB, Xie GS, Huang K, Hussain A. Accelerating Infinite Ensemble of Clustering by Pivot Features. Cogn Comput. 2018; 6:1–9.
 5
Yang X, Huang K, Zhang R, Hussain A. Learning Latent Features With Infinite Nonnegative Binary Matrix Trifactorization. In: Transactions on Emerging Topics in Computational Intelligence, IEEE: 2018. p. 1–14.
 6
Candès E., Recht B.Exact Matrix Completion via Convex Optimization. Commun ACM. 2012; 55(6):111–9.
 7
Cai JF, Candès EJ, Shen Z. A Singular Value Thresholding Algorithm for Matrix Completion. SIAM J on Optimization. 2010; 20(4):1956–82. https://doi.org/10.1137/080738970.
 8
Lin Z, Chen M, Ma Y. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted LowRank Matrices. J Struct Biol. 2013; 181(2):116–27. arXiv: 1009.5055.
 9
Ma S, Goldfarb D, Chen L. Fixed point and Bregman iterative methods for matrix rank minimization. Math Program. 2011; 128(1–2):321–53.
 10
Shen Z, Toh Kc, Yun S. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific J Optim; 6:615–640.
 11
Tomioka R, Suzuki T, Sugiyama M, Kashima H. A Fast Augmented Lagrangian Algorithm for Learning Lowrank Matrices. In: ICML. ICML’10. USA: Omnipress: 2010. p. 1087–94.
 12
Mazumder R, Friedman JH, Hastie T. SparseNet: Coordinate Descent With Nonconvex Penalties. J Am Stat Assoc. 2011; 106(495):1125–38.
 13
Bertsimas D, King A, Mazumder R. Best Subset Selection via a Modern Optimization Lens; 2015. arXiv:1507.03133 [math, stat]. arXiv: 1507.03133.
 14
Zhang CH. Nearly unbiased variable selection under minimax concave penalty. Ann Stat. 2010; 38(2):894–942. arXiv: 1002.4734.
 15
Zhang CH, Zhang T. A General Theory of Concave Regularization for HighDimensional Sparse Estimation Problems. Stat Sci. 2012; 27(4):576–93.
 16
Mazumder R, Saldana DF, Weng H. Matrix Completion with Nonconvex Regularization: Spectral Operators and Scalable Algorithms; 2018. arXiv:1801.08227 [stat]. arXiv: 1801.08227.
 17
Halko N, Martinsson PG, Tropp JA. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions; 2009. arXiv:0909.4061 [math]. arXiv: 0909.4061.
 18
Nesterov Y. Gradient methods for minimizing composite functions. Math Program. 2013; 140(1):125–61.
 19
Sutskever I, Martens J, Dahl G, Hinton G. On the Importance of Initialization and Momentum in Deep Learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning  Volume 28. ICML’13. Atlanta: JMLR.org: 2013. p. 1139–47.
 20
Gu R, Dogandzic A. Projected Nesterov’s ProximalGradient Algorithm for Sparse Signal Recovery. IEEE Trans Sig Process. 2017; 65(13):3510–25.
 21
Hastie T, Mazumder R, Lee J, Zadeh R. Matrix Completion and LowRank SVD via Fast Alternating Least Squares; 2014. arXiv:1410.2596 [stat]. arXiv: 1410.2596.
 22
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. In: Springer Series in Statistics 2nd edn. New York: Springer: 2009.
 23
Golub GH, Van Loan CF. Matrix Computations (3rd Ed,). Baltimore: Johns Hopkins University Press; 1996.
Acknowledgements
The authors acknowledge supports from their affiliations.
Funding
This work was partially supported by the Fundamental Research Funds for the Henan Provincial Colleges and Universities in the Henan University of Technology (2016RCJH06), Jiangsu University Natural Science Programme Grant 17KJB520041, the National Natural Science Foundation of China (61103138 and 61473236).
Availability of data and materials
The dataset are provided by MovieLens can be downloaded in the website https://grouplens.org/datasets/movielens/.
Author information
Affiliations
Contributions
Conceived the research: JXB. Designed and Performed the experiments: JXB and XGS. Analyzed the data: WQF and ZGQ. Contributed materials/analysis tools: ZGQ. Wrote the paper: JXB,ZGQ and GGG. All authors read and approved the final manuscript.
Corresponding author
Correspondence to XiaoBo Jin.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Matrix completion
 Nonconvex optimization
 Randomized SVD
 Nesterov’s momentum