Instructor's Note
Note
The code included in the workshop is split into separated scripts just for the sake of clarity. Each script can only be executed after the previous ones have been executed.
Note
The examples and exercises included in the workshop are technically very simple. This is because they are not meant as ready-made recipes, but rather as a basic material with which fundamental concepts can be presented and discussed.
Suggested itinerary
A simple machine learning model
01-linear_regression.jl
can be thought as a warm-up. A trivial synthetic dataset of pairs of scalars, \(\{ (x_i, y_i) \}_{i=1}^N\), is generated by drawing \(x_i\) from a uniform distribution and computing \(y_i = x_i + \varepsilon_i\), where \(\varepsilon_i\) is normally distributed noise. The dataset is not quite interesting, but it provides a good chance to introduce the concepts of supervised learning, empirical risk minimization and to reflect on the fact that, in the wild, the modeler is mostly unaware of which patterns may underlie the data.
This dataset is fitted with a linear regression model implemented in DeepPumas. The linear regression model can be seen as the most basic form of a multilayer perceptron and thus, the connection between vocabulary and approaches in statistical modeling and machine learning can be highlighted. More interestingly, the implementation and fit of a linear regression model introduce the DeepPumas-specific preprocess
, MLPDomain
and fit
.
Capturing complex relationships
02-complex_relationships.jl
starts by generating a "more complex" dataset. The new dataset, given by the relationship \(y_i = x_i ^ 2 + \varepsilon_i\), is also quite uninteresting, but taken together, both datasets provide a pretext to discuss the need to adequate model complexity with the actual complexity required for the task at hand (exercise 2.2).
The DeepPumas MLPDomain
, briefly presented earlier, is used to implement a multilayer perceptron with one hidden layer. The occasion should be used to inspect the syntax required to specify the number of layers, the number of units, and the activation functions in an MLPDomain
, and to remind the users that these details can be recalled through ?MLPDomain
.
Bias-variance tradeoff
03-bias-variance_tradeoff.jl
discusses the training (or fitting) of machine learning models and the bias-variance tradeoff. To showcase examples of so-called underfitted and overfitted models, two main aspects are investigated, namely for how many epochs (or iterations) a machine learning model is fitted, the complexity of the machine learning model, and the relationship between the two.
Exercise 3.1 deals with the number of training epochs. It introduces the ability to pass options to the optimizer through optim_options
, in particular optim_options = (; iterations = NUM_ITERATIONS)
. It shows that, for a machine learning model of complexity reasonably suited to the task at hand, both too few and too many training epochs can be detrimental. Exercises 3.2 and 3.3 further bring the effect of model complexity into the mix.
This section may be striking for some users, since, if a machine learning model had the right level of complexity, and it reached "the right solution", how could training it longer possibly harm? Can different neural networks lead to similarly good solutions? This is a good moment to discuss the basic paradigm in which neural networks operate, including overparameterization, universal approximation theorem, existence of many local minima (and most likely absence of a global one), importance of careful design and fitting, and ability to cast a modeling problem as an optimization one.
Generalization
04-generalization.jl
elaborates on the ability of machine learning models to make accurate predictions on unseen data. The section starts by presenting the concept of withheld (or validation, or test) data, as well as its crucial role in training machine learning models. Then, a bias-variance figure is constructed. Time should be taken to inspect and explain the figure in detail, and to draw connections to the previous section.
The concept of regularization is introduced, along with the DeepPumas syntax to add regularization to an MLPDomain
. This is a good time to describe regularization approaches based on the addition of a penalization term to the loss function, but the existence of other approaches such as early stopping, data augmentation and dropout should be mentioned.
If time allows, introduce the larger concept of model selection, maybe starting with the concepts of hyperparameter and hyperparemeter optimization. At this point of the workshop, good examples of hyperparameters are the learning rate of a gradient descent algorithm, the weight given to the regularization term in a loss function, or the number of layers and of units in a multilayer perceptron. The (arguably blurry) difference between model parameters and hyperparameters deserves attention. Finally, the DeepPumas hyperopt
tool for programmatic hyperparameter tuning is demonstrated.
Get in touch
If you have any suggestions or want to get in touch with our education team, please send an email to training@pumas.ai.
License
This content is licensed under Creative Commons Attribution-ShareAlike 4.0 International.