First, I mentioned in the first post that the model was trained using ResNet-34. If you didn’t have a chance to look at Deep Residual Learning for Image Classification, you might not know that there are different sizes of ResNet models that can be used here of various sizes. I’m going to give them a shot. You have an idea what the code should look like. Not that for my purposes, I’m using the dataset with nine classes.

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai.vision import *
from fastai.metrics import error_rate

bs = 64

# Leaving off most of the infrastructure for making and populating the folders for readability.

path = Path("data/fruit")
classes = ['apples', 'bananas', 'oranges', 'grapes', 'pears', 'pineapples', 'nectarines', 'kiwis', 'plantains']

data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
                                 ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

Now for the first bit we are using that is different, training with ResNet50. If it weren’t apparent from the name, it uses 50 layers instead of 34. This theoretically gives it a chance to learn more nuance, but it also introduces the risk of overfitting. We go!

learn = cnn_learner(data, models.resnet50, metrics=error_rate)
learn.fit_one_cycle(4)

Differences? Here’s what jumps out at me:

This network took about twice as long to train, which is understandable given that it has many more layers.
The training time here is not uniformly distributed.
- Running the other notebook again revealed that the timing for the equivalent using ResNet-34 also wasn’t uniformly distributed; it was just a fluke. I’m still curious as to why it might change, though.
We see a substantial improvement in training loss (~0.65 vs. ~0.41), and slight improvement in validation loss (~0.37 vs. 0.32) and error rate (~0.12 vs. ~0.1).

Okay, not bad. On closer inspection, the ResNet-50 example uses a resolution of 299 instead of 224 for its input images. Additionally, it halved the batch size and used the default number of workers (I am operating under the hypothesis that this is for memory reasons). How do those compare?

data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
                                 ds_tfms=get_transforms(), size=299, bs=bs//2).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet50, metrics=error_rate)
learn.fit_one_cycle(4)

With the higher resolution images, we still see improvement with the training loss, but also some fluctuations in the validation loss and error rate. Granted, my source ran this four eight cycles and saw a similar fluctuation in the initial four. Easily remedied.

learn.fit_one_cycle(4)

I’m seeing training loss of ~0.12, validation loss of ~0.30, and an error rate of ~0.11. What does all of this mean?

Relationship of Training Loss, Validation Loss, and Error Rate¶

All of this is very abstract, so I decided to do some searching. Andrej Karpathy has a GitHub page that, among other things, includes tips for the interpretation of these values. Using this as a reference, because our training loss is much smaller than our validation loss, this model is probably overfitting.

In fact, inferring from this advice, we actually want our training loss to be higher than our validation loss. From the looks of it, this learner has been overfitting since epoch 3.

Why?¶

Simply, loss is an expression of an incorrect prediction; a loss of 0.0 would mean that all of our predictions were correct. Going from Karpathy’s post above, why do we want our training loss to be larger than our validation loss? In that situation, it would mean that our training was stringent enough and our model confident enough that it will perform better than expected when being validated.

epoch	train_loss	valid_loss	error_rate	time
0	1.540670	0.397189	0.143713	00:38
1	0.851569	0.252047	0.071856	00:13
2	0.590861	0.260350	0.077844	00:14
3	0.448579	0.258741	0.083832	00:13

epoch	train_loss	valid_loss	error_rate	time
0	1.252562	0.303597	0.089820	00:42
1	0.704307	0.329760	0.101796	00:23
2	0.469999	0.270663	0.125749	00:23
3	0.361943	0.265046	0.119760	00:23

epoch	train_loss	valid_loss	error_rate	time
0	0.120617	0.297528	0.107784	00:23
1	0.134901	0.270895	0.077844	00:23
2	0.124303	0.296516	0.101796	00:23
3	0.117052	0.297157	0.113772	00:23