TensorFlow can infer labels from a directory structure. For example, a binary image dataset can be organized like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
data/
horse-or-human/
horses/
horse01.png
horse02.png
humans/
human01.png
human02.png
validation-horse-or-human/
horses/
horse01.png
horse02.png
humans/
human01.png
human02.png
horses/ are labeled as horses.humans/ are labeled as humans.We will use the Horses or Humans dataset. This dataset contains color images of horses and humans in different poses and backgrounds.
Create a directory called data in the same directory as the notebook or script. Then download the training and validation zip files:
https://storage.googleapis.com/learning-datasets/horse-or-human.zip https://storage.googleapis.com/learning-datasets/validation-horse-or-human.zip The following code assumes the two zip files already exist in ./data/.
1
2
3
4
5
6
7
8
9
10
import os
import zipfile
os.makedirs("./data", exist_ok=True)
with zipfile.ZipFile("./data/horse-or-human.zip", "r") as zip_ref:
zip_ref.extractall("./data/horse-or-human")
with zipfile.ZipFile("./data/validation-horse-or-human.zip", "r") as zip_ref:
zip_ref.extractall("./data/validation-horse-or-human")
If students are working in a fresh environment, the following version downloads the files first:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import os
import urllib.request
import zipfile
os.makedirs("./data", exist_ok=True)
files = {
"./data/horse-or-human.zip": "https://storage.googleapis.com/learning-datasets/horse-or-human.zip",
"./data/validation-horse-or-human.zip": "https://storage.googleapis.com/learning-datasets/validation-horse-or-human.zip",
}
for local_path, url in files.items():
if not os.path.exists(local_path):
urllib.request.urlretrieve(url, local_path)
with zipfile.ZipFile("./data/horse-or-human.zip", "r") as zip_ref:
zip_ref.extractall("./data/horse-or-human")
with zipfile.ZipFile("./data/validation-horse-or-human.zip", "r") as zip_ref:
zip_ref.extractall("./data/validation-horse-or-human")
Before training, we should confirm that the files are where we expect them to be.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import os
train_horse_dir = os.path.join("./data/horse-or-human/horses")
train_human_dir = os.path.join("./data/horse-or-human/humans")
validation_horse_dir = os.path.join("./data/validation-horse-or-human/horses")
validation_human_dir = os.path.join("./data/validation-horse-or-human/humans")
train_horse_names = os.listdir(train_horse_dir)
train_human_names = os.listdir(train_human_dir)
validation_horse_names = os.listdir(validation_horse_dir)
validation_human_names = os.listdir(validation_human_dir)
print("Training horse images:", train_horse_names[:10])
print("Training human images:", train_human_names[:10])
print("Validation horse images:", validation_horse_names[:10])
print("Validation human images:", validation_human_names[:10])
The model below is a CNN for binary image classification. It accepts 100x100 RGB images.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import tensorflow as tf
model = tf.keras.models.Sequential([
# The input shape is 100x100 pixels with 3 color channels.
tf.keras.Input(shape=(100, 100, 3)),
# First convolution block
tf.keras.layers.Conv2D(32, (3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D(2, 2),
# Second convolution block
tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D(2, 2),
# Third convolution block
tf.keras.layers.Conv2D(128, (3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D(2, 2),
# Fourth convolution block
tf.keras.layers.Conv2D(256, (3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D(2, 2),
# Flatten the feature maps before using dense layers.
tf.keras.layers.Flatten(),
# Dense classification layers
tf.keras.layers.Dense(512, activation="relu"),
tf.keras.layers.Dense(256, activation="relu"),
# One output neuron for binary classification.
# Values closer to 0 represent one class; values closer to 1 represent the other.
tf.keras.layers.Dense(1, activation="sigmoid")
])
model.summary()
Conv2D layers learn image features.MaxPooling2D layers reduce the width and height of the feature maps.Flatten converts the final feature maps into a vector.sigmoid output is appropriate for binary classification.
1
2
3
4
5
6
7
8
9
from tensorflow.keras.optimizers import RMSprop
optimizer = RMSprop(learning_rate=0.0001)
model.compile(
loss="binary_crossentropy",
optimizer=optimizer,
metrics=["acc"]
)
binary_crossentropy is used because this is a two-class problem.RMSprop is one possible optimizer.0.0001 to make updates smaller and more stable.An image data generator solves a practical problem: the model needs tensors, but our data is stored as image files in folders.
ImageDataGenerator can:
In this example, each image is resized to 100x100, and pixel values are rescaled from the range 0-255 to the range 0-1.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Start with only rescaling.
# Later, we can uncomment augmentation options to reduce overfitting.
train_datagen = ImageDataGenerator(
rescale=1.0 / 255,
# rotation_range=40,
# width_shift_range=0.2,
# height_shift_range=0.2,
# shear_range=0.2,
# zoom_range=0.2,
# horizontal_flip=True,
# fill_mode="nearest"
)
train_generator = train_datagen.flow_from_directory(
"./data/horse-or-human/",
target_size=(100, 100),
batch_size=128,
class_mode="binary"
)
validation_datagen = ImageDataGenerator(rescale=1.0 / 255)
validation_generator = validation_datagen.flow_from_directory(
"./data/validation-horse-or-human",
target_size=(100, 100),
class_mode="binary"
)
rescale=1.0 / 255 normalizes pixel values.target_size=(100, 100) resizes every image.batch_size=128 controls how many images are passed through the model at a time.class_mode="binary" tells TensorFlow that there are two classes.Training this model may take a little while because we are now using a larger CNN and real image files.
1
2
3
4
5
6
7
history = model.fit(
train_generator,
steps_per_epoch=8,
epochs=100,
verbose=1,
validation_data=validation_generator
)
steps_per_epoch=8 means each epoch uses 8 batches from the generator.epochs=100 trains for 100 passes through the configured training steps.
1
2
3
4
5
6
7
8
9
import matplotlib.pyplot as plt
plt.plot(history.history["acc"])
plt.plot(history.history["val_acc"])
plt.title("Model Accuracy")
plt.ylabel("Accuracy")
plt.xlabel("Epoch")
plt.legend(["train", "validation"], loc="upper left")
plt.show()
If the training accuracy keeps improving while validation accuracy stalls or drops, that is a warning sign for overfitting.
The following code downloads several images and asks the model to classify them.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import numpy as np
from io import BytesIO
import urllib.request
from tensorflow.keras import utils
from IPython.display import Image, display
base_url = "https://www.cs.wcupa.edu/LNGO/data/horse-human-confuse/"
file_list = ["a-horse.jpg", "horse-jockey.jpg", "horse-hindleg.jpg", "two-man.jpg"]
for image_name in file_list:
image_url = base_url + image_name
print(image_url)
with urllib.request.urlopen(image_url) as response:
img = utils.load_img(BytesIO(response.read()), target_size=(100, 100))
display(Image(url=image_url))
x = utils.img_to_array(img)
x = x / 255.0
x = np.expand_dims(x, axis=0)
prediction = model.predict(x)
print("Raw prediction:", prediction)
if prediction[0] > 0.5:
print("Prediction: human")
else:
print("Prediction: horse")
0 means the model leans toward one class.1 means the model leans toward the other class.0.5 is a simple decision boundary.A CNN does not jump directly from pixels to a final class label. Each convolution and pooling layer transforms the image into new feature maps.
Visualizing these feature maps can help us ask:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import matplotlib.pyplot as plt
import numpy as np
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img
%matplotlib inline
# Create a model that returns intermediate outputs from the trained model.
successive_outputs = [layer.output for layer in model.layers[1:]]
visualization_model = tf.keras.models.Model(
inputs=model.inputs,
outputs=successive_outputs
)
# Prepare a random input image from the training set.
horse_img_files = [os.path.join(train_horse_dir, f) for f in train_horse_names]
human_img_files = [os.path.join(train_human_dir, f) for f in train_human_names]
img_path = random.choice(horse_img_files + human_img_files)
# Uncomment this line to pick the first human image manually.
# img_path = human_img_files[0]
img = load_img(img_path, target_size=(100, 100))
x = img_to_array(img)
x = x.reshape((1,) + x.shape)
x = x / 255.0
successive_feature_maps = visualization_model.predict(x)
layer_names = [layer.name for layer in model.layers]
for layer_name, feature_map in zip(layer_names, successive_feature_maps):
if len(feature_map.shape) == 4:
# Only visualize convolution and pooling layers.
n_features = feature_map.shape[-1]
n_features = min(n_features, 5)
size = feature_map.shape[1]
display_grid = np.zeros((size, size * n_features))
for i in range(n_features):
feature = feature_map[0, :, :, i]
feature -= feature.mean()
feature /= feature.std() + 1e-8
feature *= 64
feature += 128
feature = np.clip(feature, 0, 255).astype("uint8")
display_grid[:, i * size : (i + 1) * size] = feature
scale = 20.0 / n_features
plt.figure(figsize=(scale * n_features, scale))
plt.title(layer_name)
plt.grid(False)
plt.imshow(display_grid, aspect="auto", cmap="viridis")
plt.show()
1e-8 value avoids division by zero when normalizing a flat feature map.Rework the visualization code so that it visualizes the intermediate layers for the confusing images used during prediction.
Questions to answer:
Overfitting happens when a model becomes too specialized to the training data.
For image models, this can happen when the training images are too consistent. For example:
A model can have high training accuracy but poor validation accuracy. That is often a sign that the model is memorizing training patterns rather than learning more general features.
Data augmentation creates modified versions of training images during training.
Common augmentation options include:
1
2
3
4
5
6
7
8
9
10
train_datagen = ImageDataGenerator(
rescale=1.0 / 255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode="nearest"
)
Uncomment the augmentation settings in the training ImageDataGenerator.
Questions to answer:
Dropout is another technique for reducing overfitting.
During training, dropout randomly disables a percentage of neuron outputs. This reduces the chance that the model becomes too dependent on a small number of highly specialized neurons.
1
tf.keras.layers.Dropout(0.2)
The value 0.2 means that 20% of the outputs from that layer are dropped during training.
Example:
1
2
3
4
5
6
7
8
9
10
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(256, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(64, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
Reference: Nitish Srivastava et al. Dropout: A Simple Way to Prevent Neural Networks from Overfitting, 2014
TensorFlow provides many optimizer and loss function options:
The important habit is not to memorize every option. The important habit is to match the final layer, loss function, and label format.
| Problem type | Final layer | Common loss | Label format |
|---|---|---|---|
| Binary classification | Dense(1, activation="sigmoid") | binary_crossentropy | 0 or 1 |
| Multi-class classification with integer labels | Dense(num_classes, activation="softmax") | sparse_categorical_crossentropy | 0, 1, 2, … |
| Multi-class classification with one-hot labels | Dense(num_classes, activation="softmax") | categorical_crossentropy | [1,0,0], [0,1,0], … |
In this activity, students adapt the image-classification workflow to a new dataset: bean leaf disease classification.
The dataset contains 224x224 color images of bean plants from Uganda. The goal is to classify leaves into three categories:
Data downloads:
https://storage.googleapis.com/learning-datasets/beans/train.zip https://storage.googleapis.com/learning-datasets/beans/validation.zip https://storage.googleapis.com/learning-datasets/beans/test.zip
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import os
import urllib.request
import zipfile
os.makedirs("./data/beans", exist_ok=True)
files = {
"./data/beans/train.zip": "https://storage.googleapis.com/learning-datasets/beans/train.zip",
"./data/beans/validation.zip": "https://storage.googleapis.com/learning-datasets/beans/validation.zip",
"./data/beans/test.zip": "https://storage.googleapis.com/learning-datasets/beans/test.zip",
}
for local_path, url in files.items():
if not os.path.exists(local_path):
urllib.request.urlretrieve(url, local_path)
with zipfile.ZipFile("./data/beans/train.zip", "r") as zip_ref:
zip_ref.extractall("./data/beans/train")
with zipfile.ZipFile("./data/beans/validation.zip", "r") as zip_ref:
zip_ref.extractall("./data/beans/validation")
with zipfile.ZipFile("./data/beans/test.zip", "r") as zip_ref:
zip_ref.extractall("./data/beans/test")
Fill in the missing values. The hint from the notebook still applies: we do not want abnormal data, so the first preprocessing step should normalize pixel values.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
# YOUR CODE HERE
)
validation_datagen = ImageDataGenerator(
# YOUR CODE HERE
)
TRAIN_DIRECTORY_LOCATION = None # YOUR CODE HERE
VAL_DIRECTORY_LOCATION = None # YOUR CODE HERE
TARGET_SIZE = None # YOUR CODE HERE
CLASS_MODE = None # YOUR CODE HERE
train_generator = train_datagen.flow_from_directory(
TRAIN_DIRECTORY_LOCATION,
target_size=TARGET_SIZE,
batch_size=128,
class_mode=CLASS_MODE
)
validation_generator = validation_datagen.flow_from_directory(
VAL_DIRECTORY_LOCATION,
target_size=TARGET_SIZE,
batch_size=128,
class_mode=CLASS_MODE
)
Because this is a three-class classification problem, use class_mode="categorical" and categorical_crossentropy later.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1.0 / 255)
validation_datagen = ImageDataGenerator(rescale=1.0 / 255)
TRAIN_DIRECTORY_LOCATION = "./data/beans/train"
VAL_DIRECTORY_LOCATION = "./data/beans/validation"
TARGET_SIZE = (224, 224)
CLASS_MODE = "categorical"
train_generator = train_datagen.flow_from_directory(
TRAIN_DIRECTORY_LOCATION,
target_size=TARGET_SIZE,
batch_size=128,
class_mode=CLASS_MODE
)
validation_generator = validation_datagen.flow_from_directory(
VAL_DIRECTORY_LOCATION,
target_size=TARGET_SIZE,
batch_size=128,
class_mode=CLASS_MODE
)
Define a CNN that can learn higher-level image features and then classify the three bean leaf categories.
1
2
3
4
5
6
7
import tensorflow as tf
model = tf.keras.models.Sequential([
# YOUR CODE HERE
])
model.summary()
This is not the only correct answer. It is a reasonable starting point.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import tensorflow as tf
model = tf.keras.models.Sequential([
tf.keras.Input(shape=(224, 224, 3)),
tf.keras.layers.Conv2D(32, (3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(3, activation="softmax")
])
model.summary()
224x224x3 because the bean images are color images.softmax because this is multi-class classification.
1
2
3
4
5
6
7
8
LOSS_FUNCTION = None # YOUR CODE HERE
OPTIMIZER = None # YOUR CODE HERE
model.compile(
loss=LOSS_FUNCTION,
optimizer=OPTIMIZER,
metrics=["accuracy"]
)
1
2
3
4
5
6
7
8
LOSS_FUNCTION = "categorical_crossentropy"
OPTIMIZER = tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(
loss=LOSS_FUNCTION,
optimizer=OPTIMIZER,
metrics=["accuracy"]
)
categorical_crossentropy matches one-hot class labels from class_mode="categorical".Adam is a common optimizer for CNNs.
1
2
3
4
5
6
7
8
NUM_EPOCHS = 20
history = model.fit(
train_generator,
epochs=NUM_EPOCHS,
verbose=1,
validation_data=validation_generator
)
1
2
3
4
5
6
7
8
9
10
11
import matplotlib.pyplot as plt
plt.plot(history.history["accuracy"])
plt.plot(history.history["val_accuracy"])
plt.title("Model Accuracy")
plt.ylabel("Accuracy")
plt.xlabel("Epoch")
plt.legend(["train", "validation"], loc="upper left")
plt.xlim([0, NUM_EPOCHS])
plt.ylim([0.4, 1.0])
plt.show()
Questions to answer:
The original hands-on notebook includes a test data link. A responsible workflow should avoid using the test set while choosing model architecture and hyperparameters. Once the model is selected, use the test set for a final evaluation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
test_datagen = ImageDataGenerator(rescale=1.0 / 255)
TEST_DIRECTORY_LOCATION = "./data/beans/test"
# For final evaluation, do not shuffle the test set.
test_generator = test_datagen.flow_from_directory(
TEST_DIRECTORY_LOCATION,
target_size=(224, 224),
batch_size=128,
class_mode="categorical",
shuffle=False
)
test_loss, test_accuracy = model.evaluate(test_generator)
print("Test loss:", test_loss)
print("Test accuracy:", test_accuracy)
horse-or-human to beans?Dense(1, activation="sigmoid") to Dense(3, activation="softmax")?