Category Statistics

training of machine learning algorithms training of machine learning algorithms

Machine learning, Statistics

The three musketeers

There are three important components involved in the training process of a machine learning algorithm: the loss function, the performance metric, and the validation control. The need to balance accuracy and predictive capacity to obtain robust and effective models is emphasized.

Manuel Molina
03/18/2025

Sin categoría, Statistics

Too many paths, no final destination

Contrary to what it could be supposed, the inclusion of a large number of variables in a linear regression model can be counterproductive to its performance, producing overfitting of the data and decreasing the capacity for generalization. This is known as the curse of multidimensionality.

Manuel Molina
02/25/2025

Sin categoría, Statistics

The megapixel trap

Visual manipulation of data using poorly designed charts can distort data interpretation. The most common errors, such as missing axes, manipulated scales, and confusing pie charts, are described, which can lead to erroneous conclusions. Learning to detect these errors will allow us to improve our ability to visually analyze and interpret data.

Manuel Molina
02/04/2025

Machine learning, Sin categoría, Statistics

Apophenia

Overfitting occurs when an algorithm over-learns the details of the training data, capturing not only the essence of the relationship between them, but also the random noise that will always be present. This negatively affects its performance and its ability to generalize when we introduce new data, not seen during training.

Manuel Molina
11/27/2024

Machine learning, Statistics

The wisdom of the weirdwoods

Simple decision trees have the problem of being less accurate than other regression or classification algorithms, as well as being less robust to small modifications of the data with which they are built. Some techniques for building ensemble decision trees are described, such as resampling aggregation (bagging) and random forests, which aim to improve the accuracy of predictions and avoid overfitting of models.

Manuel Molina
11/26/2024