Skip to main content
Fig. 3 | Big Data Analytics

Fig. 3

From: Identification of disease-distinct complex biomarker patterns by means of unsupervised machine-learning using an interactive R toolbox (Umatrix)

Fig. 3

Screenshots of interface components “iEsomTrain” and “iUstarmatrix” of the interactive “Umatrix” R library. a: The “iEsomTrain” component performs the training of the emergent self-organizing map (ESOM) and displays the resulting U-matrix. Default values are shown. The following user interactions are implemented: ❶ Selection of the number of training cycles of ESOM. ❷ Selection of the projection grid as either toroid, where opposite edges are connected, or planar. ❸ Selection of the size of the ESOM. The sizes should meet the following three criteria: Firstly, it should not be too small as it has been shown that in that case SOMs degenerate to a k-means like clustering [34]. Secondly, it should not be too large to avoid that each input data point can be represented on the map on a separate neuron with a surrounding area of other neurons interpolating the data space. Thirdly, edge ratios between 1.2 and the golden ratio of 1.6 should be applied as it has been observed that SOMs perform better if the edge lengths of the map are not equal [35]. However, a SOM with the default sizes of 4000 (80 × 50) neurons has been successful in many applications. ❹ A prior classification can be loaded from a structured text file. ❺ The display of the U-matrix can be visually modified such as changing the size, the diameter of the best matching units or making the colors slightly transparent which enhanced data structure visibility. ❻ After all parameters have been set, the training of the ESOM is started by pressing the “Train” button, the numerical results can be saved to a file and ❼ the interface is finally closed. ❽ The trained U-matrix is shown at the right of the interface panel. ❾ Further parameters such as learning rate can be set in a special expert mode, for details, see the description delivered within the R-package. b: The “iUstarmatrix” component calculated the data density based P-matrix and displays the U- and the resulting U*-matrix. The following user interactions are implemented: ❿ the radius of the hyperspheres for density estimation can be selected based on a suggestion obtained from the probability density distribution of the distances between the data points. This distribution is displayed below as a Pareto density estimation (PDE) [72] with the suggested radius indicated as a magenta line adjustable by the user.⓫. At the top right part of the interface, the U*-matrix ⓬ is displayed, which results from superposition of the data-density based P-matrix ⓭ with the original data-distance based U-matrix ⓮. The figure has been created using the R software package (version 3.4.0 for Linux; http://CRAN.R-project.org/ [24]) using the R package “Umatrix” (https://cran.r-project.org/package=Umatrix)

Back to article page