Model Selection: Visualizing Training Performance with Keras Neural Nets (Python)

Griffin Hundley
3 min readMar 8, 2021

In this article, I will show how to create a figure, as shown below, to generate insightful feedback on the learning of a neural network for supervised classification. After training a model, it is useful to track the learner’s progress over epochs. Such feedback can give clues to questions specific to your data structure, such as how many epochs are needed to optimize performance in future iterations, what should the conditions for early stopping be, and which model is likely to perform better on the holdout data.

In this example, I’m using the keras library for my neural net, and matplotlib for visualization. After fitting a model, the history can be accessed with the .history method. To start, I’ll define a simple binary output neural net.

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Input
# instantiate a sequential NN
model_1 = Sequential()
# input layer where no_of_features is the number of features
model_1.add(Input(no_of_features,))
# hidden layer
model_1.add(Dense(32))
model_1.add(Activation('relu'))
# binary output layer
model_1.add(Dense(1))
model_1.add(Activation('sigmoid'))

The training data should be split into training and validation sets. In this example, I’ll use a split of 80/20, and set a seed for reproducibility. The variable X is my Pandas dataframe containing my features, and the variable y is my dataframe containing the target — in this case a 0 or 1.

from sklearn.model_selection import train_test_split# X features and y target, split 80/20 into training and validation
(
X_train,
X_val,
y_train,
y_val
) = train_test_split(X,
y,
test_size=0.2,
random_state=1)

Compile the model with your desired optimizer and loss function. The functions can be imported from the respective keras package. This is also the step where we determine the metrics used for validation and set the learning rate. Using the keras.metrics package and giving them a custom name allows us to reference them more easily in the .history data dictionary.

from keras.metrics import Precision
from keras.optimizers import Adam
adam = Adam(learning_rate = 0.00001)
model_1.compile(optimizer = 'adam',
loss = 'binary_crossentropy',
metrics = ['acc',
Precision(name='prec'))

Fit the model with the training data. Here is where you determine the number of epochs, batch size, and any callbacks such as early stopping.

# fit the training data to the model
hist = model.fit(X_train,
y_train,
epochs=100,
batch_size = 32,
validation_data = (X_val, y_val)
)

To visualize the results of the model training, we reference the history object. It is a data dictionary containing the values for each of the metrics we included in the compile for each epoch. Using matplotlib, a figure can be generated using each of the metrics.

import matplotlib.pyplot as plth = hist.history# X-axis vector 
epochs = range(1, len(loss_values) + 1)
# Define a figure, in this case subplots with 1 row with 3 columns
fig, (ax1, ax2, ax3) = plt.subplots(1,3 ,
figsize = (15,4))
# Plot 1, Loss
ax1.plot(epochs, h['loss'], 'g.', label='Training loss') ax1.plot(epochs, h['val_loss'], 'g', label='Validation loss') ax1.set_title('Training and validation loss')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Loss')
ax1.legend()
# Plot 2, Accuracy
ax2.plot(epochs, h['acc'], 'r.', label='Training acc') ax2.plot(epochs, h['val_acc'], 'r', label='Validation acc') ax2.set_title('Training and validation accuracy') ax2.set_xlabel('Epochs')
ax2.set_ylabel('Accuracy')
ax2.legend()
# Plot 3, Precision
ax3.plot(epochs, h['prec'], 'b.', label='Training precision') ax3.plot(epochs, h['val_prec'], 'b', label='Validation precision') ax3.set_title('Training and validation precision') ax3.set_xlabel('Epochs')
ax3.set_ylabel('Precision')
ax3.legend()
fig.show()

By tracking the progress of the training, it will be easier to adjust the architecture. When the training and validation become highly divergent, the implication is the NN is not optimized for the dataset. The introduction of regularization methods such as dropout, or changes to the complexity of the hidden layers can be fine-tuned with each iteration until the history shows a convergence of the training and validation predictions.

--

--