Methodological approaches based on machine learning in the use of big data in public health: a systematic review
More details
Hide details
Sapienza University of Rome Department of Public Health and Infectious Diseases Piazzale Aldo Moro 5, 00168, Rome Italy
Publication date: 2023-04-26
Popul. Med. 2023;5(Supplement):A624
Background and objective:
Since the past few years, there has been a growing interest for the use of massive amounts of data - i.e., the Big Data - in medical research, for their potential role in changing the approach to personal care, medical care and public health. However, part of the difficulties in introducing Big Data in Public Health is constituted by the problem of abandoning the classical statistical methodology in favour of the methodologies mainly based on machine learning (ML). We performed a systematic review to investigate the ML methodologies used in studies of interest for public health, with particular attention to their development and validation.

The research was performed in PubMed, Scopus, and Web of Science databases. Studies investigating risk prediction reporting use of ML methodologies in public health fields of interest were included. The following data were extracted: study design, target population, data source, type of machine learning algorithm used, study objective, methodological approaches used in the development of the ML algorithms.

The search retrieved 26340 records. 26 studies were included. 15 studies used Random Forest models, while the use of other types of models was more sporadic. The risk of overfitting of the models developed in the 24 studies that used supervised algorithms, based on tuning methods, internal validation methods and external validation methods, was found to be high in 6 studies, low in 15 studies and minimal in 3 studies.

Almost a third of the studies used approaches inadequate for the tuning, training, and validation of machine learning algorithms. Only three studies applied appropriate external validation techniques. These methodologies will have to be carefully guided, both in the standardization of their development, and in the assessment of their effectiveness, to ensure that their potential can bring real benefits to the entire population.

Journals System - logo
Scroll to top