Algobook
- The developer's handbook
mode-switch
back-button
Buy Me A Coffee
Thu Sep 28 2023

Loss functions in machine learning

Loss functions in machine learning are an essential part when building and training our models. When training a model, we are using something we call training data and prediction data. Basically, we feed our models with x values and their corresponding y values (labels) that we have predefined. As the data flows through our neural network in the forward propagation, the output is then compared with the y values, which is our predicted outcome. Depending on the loss function we use, the predicted outcome and the corresponding label (the truth value) is calculated according to a certain algorithm and the deviation between the two are then passed back in the backpropagtion and optimize the model to make a better prediction in the next training loop (epoch). And the idea is that the model should be as optimized as it can be when all the epochs have been run.

This means, that choosing the right loss function (or building your own) is very important when it comes to building a robust model that should help us with the predictions that we want it to make. In this article, we will take a look at the most common ones, and their benefits - and hopefully, you (and I) can be more confident in choosing the right loss function for our model(s).

MSE

Mean squared error is one of the most popular formulas used as a loss function. It is used for regression models, which means models that predicts continuous values such as prices, age, income et cetera. MSE takes the mean of the errors squared to the data that it relates to. The lower the number, the better.

The formula of MSE is shown below:

mean squared error formula

To use MSE in Tensorflow, we can do as follows:

model.compile(loss="mse", optimizer=keras.optimizers.legacy.Adam( learning_rate=0.01), metrics=['mse'])

Or, we can import it as a function object from Tensorflow, as follows:

model.compile(loss=tf.keras.losses.MeanSquaredError(), optimizer=keras.optimizers.legacy.Adam( learning_rate=0.01), metrics=['mse'])

MAE

Mean absolute error is another formula that are used in regression models. MAE is the average of all absolute errors, the formula is as below image.

mean absolute error formula

The downside with MAE is that it fails to punish large errors in the prediction outcome, which the former is taking care of due to the square. But if a dataset has a lot of outlier cases, which means a lot of values that are far off from what we consider normal, MAE might be the better option.

In Tensorflow, we can use MAE as below:

model.compile(loss="mae", optimizer=keras.optimizers.legacy.Adam( learning_rate=0.01), metrics=['mae'])

or

model.compile(loss=tf.losses.MeanAbsoluteError(), optimizer=keras.optimizers.legacy.Adam( learning_rate=0.01), metrics=['mae'])

SparseCategoricalCrossentropy

This loss function calculates the crossentropy between the label and the prediction and should be used in classification models. If your predicted output should be represented as an integer value that are higher than two, the SparseCategoricalCrossentropy might be the best option for you. Example of the output, might be if you want to predict if an image is belonging to a certain category.

Example of the output might be [0-1] to detect one of the below example of our image detecting model.

{ 0: 'animal', 0.5: 'vechicle', 1: 'building'}

In Tensorflow, it is used as below

model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), optimizer=keras.optimizers.legacy.Adam( learning_rate=0.01))

You can read more about SparseCategoricalCrossentropy on Tensorflow.

Summary

Today we took a look at some of the common loss functions that are used in machine learning and deep learning. We also shared how we can use them in different scenarios and how to use them in Tensorflow. Using the correct loss function for the purpose of the model is crucial, and to know which one is the best requires some research, and testing. But there are differences in use cases where some loss functions should be used instead of others - such as classification models and regression models.

I hope this article was helpful, even though we just mentioned three out of many loss functions that exists out there. To see all the loss functions that Keras are providing, check out Keras documentation. If you need to customize the loss function, you can create your own function as well - but that is a topic of another article 😃

Thanks for reading, and have a great day.

signatureThu Sep 28 2023
See all our articles