Skip to main content

Table 1 Analyzed methods and the maximum data size managed by each one

From: Big data preprocessing: methods and prospects

Methods

Category

# Features

# Instances

Size (GB)

Framework

[70]

FS

630

65,003,913

305.1196

Hadoop MapReduce

[69]

FS

630

65,003,913

305.1196

Hadoop MapReduce

[74]

FS

1,156

5,670,000

48.8350

MPI

[60]

FS

29,890,095

19,264,097

4.1623

C++/MATLAB

[59]

FS

100,000

10,000,000

1.4901

MapReduce

[76]

FS

100

1,600,000

1.1921

Apache Spark

[80]

FS

127

1,131,571

1.0707

Hadoop MapReduce

[71]

FS

54,675

2,096

0.8538

Hadoop MapReduce

[75]

FS

54

581,012

0.2338

Hadoop MapReduce

[73]

FS

20

1,000,000

0.1490

MapReduce

[77]

FS

–

–

0.0976

Hadoop MapReduce

[79]

FS

256

38,232

0.0729

Hadoop MapReduce

[68]

FS

52

5,253

0.0020

Hadoop MapReduce

[78]

FS

–

–

0.0000

Hadoop MapReduce

[67]

FS

–

–

0.0000

Hadoop MapReduce

[72]

FS

–

–

0.0000

Hadoop MapReduce

[83]

Imbalanced

630

32,000,000

150.2037

Hadoop MapReduce

[84]

Imbalanced

630

32,000,000

150.2037

Hadoop MapReduce

[90]

Imbalanced

630

16,000,000

75.1019

Apache Spark

[89]

Imbalanced

41

4,856,151

1.4834

Hadoop MapReduce

[82]

Imbalanced

41

4,000,000

1.2219

Hadoop MapReduce

[81]

Imbalanced

14

1,432,941

0.1495

Hadoop MapReduce

[86]

Imbalanced

9,731

1,446

0.1048

Hadoop MapReduce

[91]

Imbalanced

14

524,131

0.0547

Hadoop MapReduce

[87]

Imbalanced

36

95,048

0.0255

Hadoop MapReduce

[88]

Imbalanced

8

2,687,280

0.0200

Hadoop MapReduce

[93]

Incomplete

625

4,096,000

19.0735

MapReduce (Twister)

[92]

Incomplete

481

191,779

0.6873

Hadoop MapReduce

[95]

discretization

630

65,003,913

305.1196

Apache Spark

[96]

discretization

630

65,003,913

305.1196

Apache Spark

[94]

discretization

–

–

4.0000

Hadoop MapReduce

[97]

IR

41

4,856,151

1.4834

Hadoop MapReduce

  1. The methods are grouped by preprocessing task, and ordered by maximum data size. Those methods with no information about number of features or instances have been set to zero size