Group Method of Data Handling (GMDH) - type neural network algorithms are the self organizing machine learning algorithms for the modelling of complex systems. GMDH algorithms are used for different purposes; examples include pattern recognition, classification, clustering, the approximation of multidimensional processes, forecasting, etc.

This web-tool enables the researchers to performs binary classification via GMDH-type neural network algorithm. There exist two main algorithms, GMDH algorithm and diverse classifiers ensemble based on GMDH (dce-GMDH) algorithm. GMDH algorithm performs classification for a binary response and returns important variables dominating the system. dce-GMDH algorithm performs binary classification by assembling classifiers based on GMDH algorithm.

This web-tool is a web-interface of the GMDH2 package in R. The tool also produces a well-formatted table of descriptives for a binary response in different format (R, LaTeX, HTML). Moreover, it produces confusion matrix and related statistics and scatter plot (2D and 3D) with classification labels of binary classes to assess the prediction performance.



            

Usage of the web-tool

(i) load your data set and define the binary response variable using Data upload tab.

(ii) obtain descriptive statistics by groups using Describe data tab.

(iii) specify the algorithm and its arguments in Algorithms tab.

(iv) obtain confusion matrix and related statistics for train, validation and test sets in the Results tab. Researchers can also download predicted probabilities and classes (as csv).

(v) draw scatter plot (2D and 3D) with classification labels of binary classes to assess the prediction performance in the Visualize tab.

(vi) load new data for prediction in the New data tab. Researchers can obtain predicted probabilities and classes in predictions subtab and download them (as csv) with download button.

If there are missing values in the data, a listwise deletion will be applied and a complete-case analysis will be performed. The seed number is fixed to 12345 for reproducibility.

The data are divided into three sets; train (60%), validation (20%) and test (20%) sets. Train set is included in model building. Validation set is used for neuron selection. Test set is utilized to estimate the performance of the methods on unseen data.

Authors

Osman Dag, Department of Biostatistics, Hacettepe University, Ankara, Turkey

Erdem Karabulut, Department of Biostatistics, Hacettepe University, Ankara, Turkey

Reha Alpar, Department of Biostatistics, Hacettepe University, Ankara, Turkey

Merve Kasikci, Department of Biostatistics, Hacettepe University, Ankara, Turkey


Please e-mail osman.dag@outlook.com for any bugs and requests.

News

Version 1.3 (April 28, 2021)

Minor improvements and fixes.


Version 1.2 (July 23, 2019)

Minor visual improvements.


Version 1.1 (June 5, 2018)

Minor improvements and fixes.


Version 1.0 (May 23, 2018)

Web-tool version of the GMDH2 package has been released.