[1].see images as matrices of numbers, which in turn can be converted into colored pixels to be displayed on screen for human to see.convlutions.ipynb notebook and run the cell with the following code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import numpy as np
import matplotlib.pyplot as plt
# 1. CREATE A 2D MATRIX (Grayscale Image)
# 10x10 matrix: 0 is black, 255 is white
matrix_2d = np.array([
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 255, 255, 0, 0, 0, 0, 255, 255, 0],
[0, 255, 255, 0, 0, 0, 0, 255, 255, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 128, 128, 0, 0, 0, 0],
[0, 0, 0, 0, 128, 128, 0, 0, 0, 0],
[0, 255, 0, 0, 0, 0, 0, 0, 255, 0],
[0, 0, 255, 0, 0, 0, 0, 255, 0, 0],
[0, 0, 0, 255, 255, 255, 255, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
], dtype=np.uint8)
# Convert 2D matrix to a grayscale image plot
plt.figure(figsize=(4, 4))
plt.imshow(matrix_2d, cmap='gray')
plt.title("2D Matrix (Grayscale)")
plt.axis('off')
plt.show()
# 2. CREATE A 3D MATRIX (RGB Color Image)
# 3x3 pixels, 3 channels deep (Red, Green, Blue)
matrix_3d = np.zeros((3, 3, 3), dtype=np.uint8)
# Set pixel colors: [Row, Column] = [Red, Green, Blue]
matrix_3d[0, 0] = [255, 0, 0] # Top-left: Pure Red
matrix_3d[1, 1] = [0, 255, 0] # Center: Pure Green
matrix_3d[2, 2] = [0, 0, 255] # Bottom-right: Pure Blue
matrix_3d[0, 2] = [255, 255, 0] # Top-right: Yellow (Red + Green)
# Convert 3D matrix to a color image plot
plt.figure(figsize=(4, 4))
plt.imshow(matrix_3d)
plt.title("3D Matrix (RGB)")
plt.axis('off')
plt.show()
# 3. VICE VERSA: CONVERT AN IMAGE BACK TO A MATRIX
# Save the 3D plot image locally, then read it back as a raw matrix
plt.savefig('color_grid.png')
image_matrix = plt.imread('color_grid.png')
# Print the dimensions of the reconstructed matrix
print("Reconstructed Matrix Shape (Height, Width, Channels):", image_matrix.shape)
matrix_2d and matrix_3d are converted into images.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import matplotlib.pyplot as plt
from skimage import data
# 1. DOWNLOAD & LOAD A WELL-KNOWN IMAGE
# skimage (scikit-image) has built-in iconic test images. We will use a small 100x100 thumbnail.
astronaut_image = data.astronaut() # Original large image
# Slice the matrix to make it a tiny 64x64 pixel image for easy viewing
small_astronaut = astronaut_image[100:164, 200:264]
# 2. DISPLAY THE IMAGE
plt.figure(figsize=(4, 4))
plt.imshow(small_astronaut)
plt.title("64x64 Astronaut Thumbnail")
plt.axis('off') # Hide pixel coordinate axes
plt.show()
# 3. CONVERT IT BACK TO A RAW MATRIX
# Converting a PIL or skimage image into a NumPy matrix is done via np.array()
image_matrix = bytearray(small_astronaut) # Alternatively, np.array(small_astronaut)
# Print out the array specifications
print("--- MATRIX SPECS ---")
print("Matrix Type:", type(small_astronaut))
print("Matrix Dimensions (Height, Width, Color Channels):", small_astronaut.shape)
print("Total number of values:", small_astronaut.size)
# Print a tiny slice of the raw matrix data (Top-left 3x3 pixels)
# Each pixel contains 3 values: [Red, Green, Blue]
print("\n--- RAW RGB MATRIX VALUES (Top-Left 3x3 Grid) ---")
print(small_astronaut[0:3, 0:3])
1
2
3
4
5
6
7
8
9
10
11
12
import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (val_images, val_labels) = mnist.load_data()
training_images=training_images / 255.0
val_images=val_images / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(20, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, validation_data=(val_images, val_labels), epochs=20)
epoch), the training result parameters are: kernel) of size 3x3 or 5x5 and pass it over the image. Based on the contents of the kernel matrix, it is possible to enhance specific features of the images such as edges or contours. As a result, this helps making these features more notable and estimatables by the weights and biases of the subsequent neuron layers.
To demonstrate the operation of CNN, let’s look at a stairwell image
We will start with the following filter. This filter will emphasize horizontal edges.
\[\begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{bmatrix}\]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import numpy as np
from scipy import datasets
import matplotlib.pyplot as plt
i = datasets.ascent()
i = i.astype(np.int16)
i_transformed = np.copy(i)
size_x = i_transformed.shape[0]
size_y = i_transformed.shape[1]
filter = [[-1, -2, -1], [0, 0, 0], [1, 2, 1]]
weight = 1
for x in range(1,size_x-1):
for y in range(1,size_y-1):
convolution = 0.0
convolution = convolution + ((i[x-1,y-1]) * filter[0][0])
convolution = convolution + (i[x, y-1] * filter[1][0])
convolution = convolution + (i[x + 1, y-1] * filter[2][0])
convolution = convolution + (i[x-1, y] * filter[0][1])
convolution = convolution + (i[x, y] * filter[1][1])
convolution = convolution + (i[x+1, y] * filter[2][1])
convolution = convolution + (i[x-1, y+1] * filter[0][2])
convolution = convolution + (i[x, y+1] * filter[1][2])
convolution = convolution + (i[x+1, y+1] * filter[2][2])
convolution = convolution * weight
if(convolution<0):
convolution=0
if(convolution>255):
convolution=255
i_transformed[x, y] = convolution
fig, axes = plt.subplots(1, 2, figsize=(12, 6))
axes[0].imshow(i, cmap='gray')
axes[0].set_title("Original Stairwell")
axes[1].imshow(i_transformed, cmap='gray')
axes[1].set_title("Stairwell with Convolution Applied")
plt.tight_layout()
plt.show()
We display the original and new images next to one another to observe the impact of the filter (lines 33-39).
You can try out another filter to highlight the vertical lines:
\[\begin{bmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{bmatrix}\]
Run this next code in a cell and observe the outcome. This code will show (4, 4) pooling.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import numpy as np
from scipy import datasets
import matplotlib.pyplot as plt
i = datasets.ascent()
i = i.astype(np.int16)
i_transformed = np.copy(i)
size_x = i_transformed.shape[0]
size_y = i_transformed.shape[1]
filter = [[-1, -2, -1], [0, 0, 0], [1, 2, 1]]
weight = 1
for x in range(1,size_x-1):
for y in range(1,size_y-1):
convolution = 0.0
convolution = convolution + ((i[x-1,y-1]) * filter[0][0])
convolution = convolution + (i[x, y-1] * filter[1][0])
convolution = convolution + (i[x + 1, y-1] * filter[2][0])
convolution = convolution + (i[x-1, y] * filter[0][1])
convolution = convolution + (i[x, y] * filter[1][1])
convolution = convolution + (i[x+1, y] * filter[2][1])
convolution = convolution + (i[x-1, y+1] * filter[0][2])
convolution = convolution + (i[x, y+1] * filter[1][2])
convolution = convolution + (i[x+1, y+1] * filter[2][2])
convolution = convolution * weight
if(convolution<0):
convolution=0
if(convolution>255):
convolution=255
i_transformed[x, y] = convolution
new_x = int(size_x/4)
new_y = int(size_y/4)
newImage = np.zeros((new_x, new_y))
for x in range(0, size_x, 4):
for y in range(0, size_y, 4):
pixels = []
pixels.append(i_transformed[x, y])
pixels.append(i_transformed[x+1, y])
pixels.append(i_transformed[x+2, y])
pixels.append(i_transformed[x+3, y])
pixels.append(i_transformed[x, y+1])
pixels.append(i_transformed[x+1, y+1])
pixels.append(i_transformed[x+2, y+1])
pixels.append(i_transformed[x+3, y+1])
pixels.append(i_transformed[x, y+2])
pixels.append(i_transformed[x+1, y+2])
pixels.append(i_transformed[x+2, y+2])
pixels.append(i_transformed[x+3, y+2])
pixels.append(i_transformed[x, y+3])
pixels.append(i_transformed[x+1, y+3])
pixels.append(i_transformed[x+2, y+3])
pixels.append(i_transformed[x+3, y+3])
pixels.sort(reverse=True)
newImage[int(x/4),int(y/4)] = pixels[0]
fig, axes = plt.subplots(1, 3, figsize=(12, 6))
axes[0].imshow(i, cmap='gray')
axes[0].set_title("Original Stairwell")
axes[1].imshow(i_transformed, cmap='gray')
axes[1].set_title("Stairwell with Convolution Applied")
axes[2].imshow(newImage, cmap='gray')
axes[2].set_title("Downsampled Image")
plt.tight_layout()
plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (val_images, val_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0
val_images=val_images.reshape(10000, 28, 28, 1)
val_images=val_images/255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(20, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(training_images, training_labels, validation_data=(val_images, val_labels), epochs=20)
reshape will produce a single 4D list that is 60,000x28x28x1.Conv2D is the implementation of CNN for two dimensional images in TensorFlow as another possible layer to be added to the model (Line 9). 64 is the number of filters/convolutions we want to use in this layer. This is randomly selected but should be in the order of 64.(3,3) is the size of each convolution, in this case a 3x3 gridrelu in this case (returning x when x is posistive, else returning 0)(28,28,1) is the shape of the input data (for each image).MaxPooling2D is designed to compress the image, while maintaining the content of the features that were highlighted by the convolution (Line 10).More technical details about CNN can be found at Visualizing and Understanding Convolutional Networks
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import matplotlib.pyplot as plt
import tensorflow as tf
fig, axarr = plt.subplots(3, 3, figsize=(10, 10))
LABEL = 9
CONVOLUTION_NUMBER = 1
# Get the first three unique image indices matching the label
image_nine = [i for i in range(len(val_labels)) if val_labels[i] == LABEL]
selected_images = [image_nine[0], image_nine[1], image_nine[2]]
# Create the activation extraction model
layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs=model.inputs, outputs=layer_outputs)
for row_idx, img_idx in enumerate(selected_images):
input_img = val_images[img_idx].reshape(1, 28, 28, 1)
activations = activation_model.predict(input_img)
# Column 0: Original Image (28x28)
axarr[row_idx, 0].imshow(val_images[img_idx].reshape(28, 28), cmap='gray')
# Column 1: Layer 0 (Conv2D - 26x26)
f1 = activations[0]
axarr[row_idx, 1].imshow(f1[0, :, :, CONVOLUTION_NUMBER], cmap='inferno')
# Column 2: Layer 1 (MaxPooling2D - 13x13)
f2 = activations[1]
axarr[row_idx, 2].imshow(f2[0, :, :, CONVOLUTION_NUMBER], cmap='inferno')
for col_idx in range(3):
axarr[row_idx, col_idx].set_xlim(0, 28)
axarr[row_idx, col_idx].set_ylim(28, 0) # Flipped to keep image right-side up
axarr[row_idx, col_idx].grid(False)
plt.tight_layout()
plt.show()
LABEL are selected.