Skip to main content

Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Table 1 Analyzed methods and the maximum data size managed by each one

From: Big data preprocessing: methods and prospects

Methods Category # Features # Instances Size (GB) Framework
[70] FS 630 65,003,913 305.1196 Hadoop MapReduce
[69] FS 630 65,003,913 305.1196 Hadoop MapReduce
[74] FS 1,156 5,670,000 48.8350 MPI
[60] FS 29,890,095 19,264,097 4.1623 C++/MATLAB
[59] FS 100,000 10,000,000 1.4901 MapReduce
[76] FS 100 1,600,000 1.1921 Apache Spark
[80] FS 127 1,131,571 1.0707 Hadoop MapReduce
[71] FS 54,675 2,096 0.8538 Hadoop MapReduce
[75] FS 54 581,012 0.2338 Hadoop MapReduce
[73] FS 20 1,000,000 0.1490 MapReduce
[77] FS 0.0976 Hadoop MapReduce
[79] FS 256 38,232 0.0729 Hadoop MapReduce
[68] FS 52 5,253 0.0020 Hadoop MapReduce
[78] FS 0.0000 Hadoop MapReduce
[67] FS 0.0000 Hadoop MapReduce
[72] FS 0.0000 Hadoop MapReduce
[83] Imbalanced 630 32,000,000 150.2037 Hadoop MapReduce
[84] Imbalanced 630 32,000,000 150.2037 Hadoop MapReduce
[90] Imbalanced 630 16,000,000 75.1019 Apache Spark
[89] Imbalanced 41 4,856,151 1.4834 Hadoop MapReduce
[82] Imbalanced 41 4,000,000 1.2219 Hadoop MapReduce
[81] Imbalanced 14 1,432,941 0.1495 Hadoop MapReduce
[86] Imbalanced 9,731 1,446 0.1048 Hadoop MapReduce
[91] Imbalanced 14 524,131 0.0547 Hadoop MapReduce
[87] Imbalanced 36 95,048 0.0255 Hadoop MapReduce
[88] Imbalanced 8 2,687,280 0.0200 Hadoop MapReduce
[93] Incomplete 625 4,096,000 19.0735 MapReduce (Twister)
[92] Incomplete 481 191,779 0.6873 Hadoop MapReduce
[95] discretization 630 65,003,913 305.1196 Apache Spark
[96] discretization 630 65,003,913 305.1196 Apache Spark
[94] discretization 4.0000 Hadoop MapReduce
[97] IR 41 4,856,151 1.4834 Hadoop MapReduce
  1. The methods are grouped by preprocessing task, and ordered by maximum data size. Those methods with no information about number of features or instances have been set to zero size