Fri Sep 29 2023

Predict heart failure using machine learning

As I am currently studying the field of AI and machine learning, I will post my learnings along the way here on Algobook. So if you are a regular visitor, you might find out that there is a lot of these kind of articles/guides coming out now 😃 And I also hope you want to learn together with me. Personally I think this is an extremely fun world to get myself into, and I can really see the benefits in using machine learning and AI techniques in my own projects to create great (and fun) features.

In this guide we will use training data from Kaggle and train our model for predicting heart failure.

Prerequisites

In this guide, we will use Tensorflow and Python. We will also use scikit-learn. I will also expect you to have basic python and Tensorflow knowledge, as well as some fundamental machine learning knowledge.

Load data and create training/test data

First step in our project, is to create the test and training data, and splitting them into x and y values, also referred as features and target. We will also have to normalize the data to get the best accuracy as possible.

First, download the csv file from the above link and add to the project root.

Load data

Create a file called heart_failure.py and add following code.

import pandas as pd
from sklearn.model_selection import train_test_split

def create_data():
    data = pd.read_csv("./heart_failure_clinical_records_dataset.csv")
    features = data.drop(["DEATH_EVENT"], axis=1)
    target = data["DEATH_EVENT"]

    num_features = normalize(pd.DataFrame(features))

    x_train, x_test, y_train, y_test = train_test_split(
        num_features, target, test_size=0.2, random_state=1)

    return x_train, x_test, y_train, y_test

Here, we are reading the csv file using Pandas and creating two data sets. Our target (output) data will be DEATH_EVENT which is a binary representation, 1 for death and 0 for survival. The other data fields are our input. We are then normalizing the data (next section will implement the function) and at last, we are splitting the data using ths sci-kit function, train_test_split, which is also randomizing the data.

Normalize

We will use the min max normalize formula to get numbers between 0-1.

def normalize(df):
    df_norm = (df - df.min()) / (df.max() - df.min())
    return df_norm

If you want, you can use built in functions from sci-kit for this as well.

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X)
num_features = scaler.transform(pd.DataFrame(features))

Build our model

Now, it is time to create our neural network and train our model.

Create a new file called main.py. We will start by importing and loading the data

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from heart_failure import create_data

x_train, x_test, y_train, y_test = create_data()

Then, we will build our model. We will create a sequential model, with one input layer, one hidden layer and one output layer. We will use the "softmax" activation function and BinaryCrossentropy for our loss function (since it is 0-1 as output). As our optimizer, we will go with the adam algorithm.

model = keras.Sequential([
        layers.Dense(16, input_shape=(x_train.shape[1],), activation="softmax"),
        layers.Dropout(.2),
        layers.Dense(4, activation="softmax"),
        layers.Dense(1)
    ])

model.compile(loss=tf.keras.losses.BinaryCrossentropy(), optimizer=keras.optimizers.legacy.Adam(
    learning_rate=0.01), metrics=["accuracy", "mse"])

One note, is that we are using 20% as a dropout. This means, turning of 20% of the neurons while training to maximize the learning.

Next, we will do some training. We will create an EarlyStopping callback function to prevent overfitting as well.

callback = keras.callbacks.EarlyStopping(monitor="mse", patience=5)
model.fit(x_train, y_train, epochs=500, validation_split=0.2, verbose=True, callbacks=[callback])

And at last, we will evaluate the model.

results = model.evaluate(x_test, y_test, batch_size=32)
print(results) // [0.4002048373222351, 0.8666666746139526, 0.12480596452951431]

The numbers we got here, is the mse, accuracy and val_mse. So our model is 86,7% accurate in predicting heart failure. Not too bad, since the data we are training with is quite small, only 300 rows. But hey, this is just a very simple example and that data is more than enough.

Challenge for you: Try out some other loss functions, optimizers etc and try to get better result. Also experiment with the size of the layers and neurons in the network 😃

Summary

In this article, we built a model for predicting heart failure using data from Kaggle. Kaggle has a lot of great data for machine learning students, so if you are interested, head over and find something you find interesting and create your own models.

Did you try out the challenge with getting better results than me? Contact me with you model design!

Thanks for reading,

Fri Sep 29 2023

See all our articles

Liked what you just read? Check out our related articles below

Loss functions in machine learning-thumbnail

Loss functions in machine learningArticle about loss functions in machine learning. We will go through what a loss function is, and also why we are using it in our machine learning models.

Thu Sep 28 2023

Predict stock prices using machine learning-thumbnail

Predict stock prices using machine learningA guide on how to use historical stock data in order to try to predict the future price. Tensorflow, Python and Yahoo finance API will be used in this guide.

Wed Sep 27 2023