- Open Access
Latent feature models for large-scale link prediction
© The Author(s) 2016
- Received: 14 May 2016
- Accepted: 21 October 2016
- Published: 1 February 2017
Link prediction is one of the most fundamental tasks in statistical network analysis, for which latent feature models have been widely used. As large-scale networks are available in various application domains, how to develop effective models and scalable algorithms becomes a new challenge. In this paper, we provide a review of the recent progress on latent feature models for the task of link prediction in large-scale networks, including the nonparametric Bayesian models which can automatically infer the latent social dimensions and the max-margin models which can learn strongly discriminative latent features for highly accurate predictions as well as dealing with the imbalance issue in large real networks. We also review the progress on scalable algorithms for posterior inference in such models, including stochastic variational methods and MCMC methods with data augmentation.
- Latent feature model
- Social network
- Link prediction
As the pervasiveness and scope of network data (e.g., social networks, biological gene networks, document networks, citation networks, etc.) increase, statistical network analysis has attracted a great deal of attention. Those networks are typically represented as a graph, whose vertices represent entities and edges represent links or relationships between these entities. Given a network, it is very useful to answer the query that: which new interactions among entities are likely to occur given some partially observed information? This problem is known as link prediction . Link prediction is of significant importance in network analysis with extensive applications, where latent feature models have been widely used. Compared with other methods, latent feature models can learn expressive representations from network structures to achieve state-of-the-art prediction performance. However, the link prediction problem meets a lot of challenges as the networks become larger, which have motivated the development on effective models and scalable algorithms.
Statistics of different networks, where positive links mean relationships between entities in a network
Several challenges exist when analyzing such large networks. First, a large number of vertices and edges will lead to the increase of computational complexity, which asks for efficient algorithms. Second, the relationships between entities will become more complicated. If we represent each entity with a feature vector, we need a space with a higher dimension. That is, both entities and relationships in large-scale networks are harder to be depicted. Finally, in real networks the positive links are often much fewer than the negative ones, which leads to serious imbalance issues in supervised learning. As shown in Table 1, positive links are much sparser with larger networks. Therefore, it is imperative for improving models to adapt in large-scale network analysis.
Link prediction is one of the most fundamental problems in network analysis. For static networks, it is often defined as predicting the missing links from a partially observed network topology (and maybe some attributes as well), while for dynamic networks, it is typically defined as predicting network structure at the next time t+1 given the structures up to the current time t. Link prediction is of significant application value in many areas (see  for a comprehensive survey). In social networking websites like Facebook and LinkedIn, link prediction can be used to predict the existence of friendships between pairs of users . In citation networks, link prediction can be applied in suggesting the most likely coauthorships in the near future . In bioinformatics, link prediction offers a cheaper way to predict if two proteins will interact than the laboratory experiments [8, 9]. Moreover, the link prediction approaches can be applied to user-item recommendation in collaborative networks , and describe the link structure in document networks .
In the rest of the paper, we first survey some existing prominent approaches for link prediction, followed by the recent progress in latent feature models for large-scale link prediction with several experimental results. Finally, we conclude and discuss about future work.
A wide range of models have been proposed for link prediction. In this section, we survey some prominent approaches.
Proximity based models
The early work on link prediction has been focused on designing good proximity (or similarity) measures between nodes, using features related to certain topological properties of the graph. For instance, graph-based models  compute a measure for each pair of nodes and rank the unobserved links in a descending order. Popular measures include common neighbors, Jaccard’s coefficient , Adamic/Adar , Katz , etc. These methods are unsupervised and depend heavily on the manually designed proximity measure.
Well-conceived feature models
Supervised learning methods have also been popular for link prediction [15, 16]. These methods learn predictive models on labeled training data with a set of manually designed features that capture the statistics of the network. For example, Hasanand et al.  and Lichtenwalter et al.  identify a set of features and cast the link prediction problem as a classification task. Backstrom and Leskovec  use random walks to combine the information from the network structure with node and edge attributes. Although we can design effective domain-specific features from the graph topology as well as node attributes, this process can be too time demanding and only restrictive to some particular application domains.
Latent class models
Latent class models assume that there is a number of clusters (or classes) underlying the observed entities and each entity belongs to certain clusters. The observed link between two entities is determined by their cluster assignments (or social roles). The early work of stochastic block models  is a representative work that places a probability distribution over the clusters and reveals a soft clustering of the entities by posterior inference. Later advancements are with nonparametric techniques, such as the infinite relational model  and the infinite hidden relational model , which allow a potentially infinite number of clusters. The mixed membership stochastic block model (MMSB)  increases the expressiveness of latent class models by allowing entities to be members of multiple communities. But the number of latent communities is required to be externally specified. The nonparametric extension of MMSB is a hierarchical Dirichlet process relational model (HDPR) , which allows mixed membership in an unbounded number of latent communities.
We firstly discuss the large-scale networks analysis and the challenges they meet, along with the importance and usefulness of the link prediction problem. In order to tackle the challenges in large-scale link prediction, progresses have been made in latent feature models. We review the latent feature models, especially LFRM, which impose IBP prior to solve the unknown dimension problem. Then two recently improved model DLFRM and MedLFRM are introduced under RegBayes with their efficient inference using stochastic algorithm. The experimental results demonstrate that these improved latent feature models not only have effective and elegant model structures, but also have efficient inference algorithm that can obtain state-of-the-art performances.
There are several future directions to be discussed. The LFRM and its extended models we introduce in the paper belong to Bayesian methods, which represent one important class of statistic methods for machine learning. As Bayesian methods can get good performance in network analysis, improving them to scale up to large-scale networks is of great importance. Recently, many advances are in big learning with Bayesian methods (see  for a survey). Besides those techniques we mentioned in the paper, there are many other methods, such as scalable algorithms and distributed computing. Taking full advantage of big Bayesian learning, we can improve our methods effectively.
Besides Bayesian methods, deep learning is another powerful technique for learning latent features. Deep learning has been widely used in computer vision (e.g. deep convolutional neural network ) and natural language processing (e.g. word embedding ). Recently, a novel approach, DeepWalk , was proposed in network analysis, which incorporated random walk with deep learning to learn latent representations for entities in networks. How to take advantage of deep learning to solve the problem in network analysis, such as link prediction and community detection, is very worth studying.
Moreover, learning latent features in dynamic networks is more complicated, as both the networks and the features change over time. The dynamic relational infinite feature model (DRIFT)  is the dynamic extension of LFRM for link prediction, where the latent features for each entity in the network evolve according to a Markov process. It is significant but also challenging to enrich and scale up these kinds of latent feature models for dynamic network analysis.
The work was supported by the National Basic Research Program (973 Program) of China (No. 2013CB329403), National NSF of China Projects (Nos. 61620106010, 61322308, 61332007), and Tsinghua Initiative Scientific Research Program (No. 20141080934).
Availability of data and materials
JZ carried out the whole structure and the main idea, participated in drafting the manuscript. BC carried out the model development and experiments, participated in drafting the manuscript. Both of the authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Liben-Nowell D, Kleinberg J. The link prediction problem for social networks. In: ACM Conference of Information and Knowledge Management (CIKM). New York: ACM: 2003. http://dl.acm.org/citation.cfm?id=956972.Google Scholar
- Craveny M, DiPasquoy D, Freitagy D, McCallumzy A, Mitchelly T, Nigamy K, Slatteryy S. Learning to Extract Symbolic Knowledge from the World Wide Web. In: Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence. Menlo Park: American Association for Artificial Intelligence: 1998. p. 509–516. http://dl.acm.org/citation.cfm?id=295240.295725.
- Cho E, Myers SA, Leskovec J. Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM: 2011. p. 1082–1090. http://dl.acm.org/citation.cfm?id=2020579.
- Leskovec J, Kleinberg J, Faloutsos C. Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. New York: ACM: 2005. p. 177–87. http://dl.acm.org/citation.cfm?doid=1081870.1081893.
- Lü L, Zhou T. Link prediction in complex networks: A survey. Physica A: Stat Mech Appl. 2011; 390(6):1150–1170.View ArticleGoogle Scholar
- Backstrom L, Leskovec J. Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. New York: ACM: 2011. p. 635–44. http://dl.acm.org/citation.cfm?id=1935914.
- Miller K, Jordan MI, Griffiths TL. Nonparametric latent feature models for link prediction. In: Advances in Neural Information Processing Systems. Curran Associates, Inc.: 2009. p. 1276–1284. http://papers.nips.cc/paper/3846-nonparametric-latent-feature-models-for-link-prediction.
- Clauset A, Moore C, Newman ME. Hierarchical structure and the prediction of missing links in networks. Nature. 2008; 453(7191):98–101.View ArticleGoogle Scholar
- Redner S. Networks: teasing out the missing links. Nature. 2008; 453(7191):47–8.View ArticleGoogle Scholar
- Xu M, Zhu J, Zhang B. Nonparametric max-margin matrix factorization for collaborative prediction. In: Advances in Neural Information Processing Systems. Curran Associates, Inc.: 2012. p. 64–72. http://papers.nips.cc/paper/4581-nonparametric-max-margin-matrix-factorization-for-collaborative-prediction.
- Chen N, Zhu J, Xia F, Zhang B. Discriminative relational topic models. IEEE Trans Pattern Anal Mach Intell. 2015; 37(5):973–86.View ArticleGoogle Scholar
- Salton G, McGill MJ. Introduction to modern information retrieval. New York: McGraw-Hill, Inc.; 1986.MATHGoogle Scholar
- Adamic LA, Adar E. Friends and neighbors on the web. Soc Networks. 2003; 25(3):211–30.View ArticleGoogle Scholar
- Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953; 18(1):39–43.View ArticleMATHGoogle Scholar
- Hasanand MA, Chaoji V, Salem S, Zaki M. Link prediction using supervised learning. In: SDM: Workshop on Link Analysis, Counterterrorism and Security. Bethesda, Maryland: 2006. http://www.siam.org/meetings/sdm06/workproceed/Link%2520Analysis/12.pdf.
- Shi X, Zhu J, Cai R, Zhang L. User grouping behaviror in online forums. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM: 2009. http://dl.acm.org/citation.cfm?id=1557105.Google Scholar
- Lichtenwalter R, Lussier J, Chawla N. New perspectives and methods in link prediction. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM: 2010. http://dl.acm.org/citation.cfm?id=1835837.Google Scholar
- Nowicki K, Snijders TAB. Estimation and prediction for stochastic blockstructures. J Am Stat Assoc. 2001; 96(455):1077–87.MathSciNetView ArticleMATHGoogle Scholar
- Kemp C, Tenenbaum J, Griffithms T, Yamada T, Ueda N. Learning systems of concepts with an infinite relational model. In: the American Association for Artificial Intelligence (AAAI). Boston, Massachusetts: AAAI Press: 2006. http://dl.acm.org/citation.cfm?id=1597600.Google Scholar
- Xu Z, Tresp V, Yu K, Kriegel HP. Infinite hidden relational models. In: International Conference on Uncertainty in Artificial Intelligence (UAI). Arlington, Virginia: AUAI Press: 2006. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1%26smnu=2%26article_id=1291%26proceeding_id=22.
- Airoldi E, Blei D, Fienberg S, Xing E. Mixed membership stochastic blockmodels. J Mach Learn Res (JMLR). 2008; 9:1981–2014. http://dl.acm.org/citation.cfm?id=1442798.
- Kim DI, Gopalan P, Blei DM, Sudderth EB. Efficient online inference for Bayesian nonparametric relational models. In: Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc.: 2013. http://papers.nips.cc/paper/5072-efficient-online-inference-for-bayesian-nonparametric-relational-models.
- Hoff P, Raftery A, Handcock M. Latent space approaches to social network analysis. J Am Stat Assoc. 2002; 97(460):1090–8.MathSciNetView ArticleMATHGoogle Scholar
- Hoff P. Modeling homophily and stochastic equivalence in symmetric relational data. In: Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc.: 2007. http://papers.nips.cc/paper/3294-modeling-homophily-and-stochastic-equivalence-in-symmetric-relational-data.
- Menon AK, Elkan C. Link prediction via matrix factorization. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin, Heidelberg: Springer Berlin Heidelberg: 2011. p. 437–52. http://link.springer.com/chapter/10.1007%252F978-3-642-23783-6_28.
- Zhu J, Song J, Chen B. Max-margin nonparametric latent feature models for link prediction. arXiv preprint arXiv:1602.07428. 2016. https://arxiv.org/abs/1602.07428.
- Chen B, Chen N, Zhu J, Song J, Zhang B. Discriminative nonparametric latent feature relational models with data augmentation. In: the American Association for Artificial Intelligence (AAAI). Phoenix, Arizona: AAAI Press: 2016. http://aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12136.
- Antoniak CE. Mixture of Dirichlet process with applications to Bayesian nonparametric problems. Ann Stat; 2(6). http://www.jstor.org/stable/2958336?seq=1%23page_scan_tab_contents.
- Griffiths T, Ghahramani Z. Infinite latent feature models and the indian buffet process. In: Advances in Neural Information Processing Systems (NIPS). MIT Press: 2005. http://papers.nips.cc/paper/2882-infinite-latent-feature-models-and-the-indian-buffet-process.
- Zhu J, Chen N, Xing E. Bayesian inference with posterior regularization and applications to infinite latent SVMs. JMLR. 2014; 15(1):1799–847.MathSciNetMATHGoogle Scholar
- Polson N, Scott S. Data augmentation for support vector machines. Bayesian Anal. 2011; 6(1):1–23.MathSciNetView ArticleMATHGoogle Scholar
- Polson N, Scott J, Windle J. Bayesian inference for logistic models using Polya-Gamma latent variables. J Am Stat Assoc. 2013; 108:1339–1349. http://www.tandfonline.com/doi/abs/10.1080/01621459.2013.829001.
- Welling M, Teh YW. Bayesian learning via stochastic gradient langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11). New York: ACM: 2011. p. 681–8. http://machinelearning.wustl.edu/mlpapers/papers/ICML2011Welling_398.
- Zhu J. Max-margin nonparametric latent feature models for link prediction. In: International Conference on Machine Learning (ICML). New York: Omnipress: 2012. http://www.icml.cc/2012/papers/.Google Scholar
- Teh YW, Görür D, Ghahramani Z. Stick-breaking construction for the indian buffet process. In: International Conference on Artificial Intelligence and Statistics. JMLR: 2007. p. 556–63. http://jmlr.csail.mit.edu/proceedings/papers/v2/. https://www.mendeley.com/catalog/stickbreaking-construction-indian-buffet-process/.
- Hoffman MD, Blei DM, Wang C, Paisley J. Stochastic variational inference. J Mach Learn Res. 2013; 14(1):1303–47.MathSciNetMATHGoogle Scholar
- Zhu J, Chen J, Hu W. Big learning with Bayesian methods. arXiv preprint arXiv:1411.6370. 2014. https://arxiv.org/abs/1411.6370.
- Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. Curran Associates, Inc.: 2012. p. 1097–1105. http://papers.nips.cc/paper/4824-i.
- Bengio Y, Schwenk H, Senécal JS, Morin F, Gauvain JL. Neural probabilistic language models. In: Innovations in Machine Learning. Berlin, Heidelberg: Springer Berlin Heidelberg: 2006. p. 137–86. http://link.springer.com/chapter/10.1007%252F3-540-33486-6_6.
- Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM: 2014. p. 701–10. http://dl.acm.org/citation.cfm?doid=2623330.2623732.
- Foulds JR, DuBois C, Asuncion AU, Butts CT, Smyth P. A dynamic relational infinite feature model for longitudinal social networks. In: International Conference on Artificial Intelligence and Statistics. JMLR: 2011. p. 287–95. http://www.jmlr.org/proceedings/papers/v15/foulds11b.html.
- Denham WW. The detection of patterns in Alyawara nonverbal behavior. PhD thesis, University of Washington, Seattle. 1973.Google Scholar
- Globerson A, Chechik G, Pereira F, Tishby N. Euclidean embedding of co-occurrence data. In: Advances in Neural Information Processing Systems. MIT Press: 2004. p. 497–504. http://papers.nips.cc/paper/2733-euclidean-embedding-of-co-occurrence-data.
- Leskovec J, Kleinberg J, Faloutsos C. Graph evolution: Densification and shrinking diameters. ACM Trans Knowl Discov Data (TKDD). 2007; 1(1):2.View ArticleGoogle Scholar