%matplotlib inline
%reload_ext autoreload
%autoreload 2

from fastai.vision import *
from fastai.datasets import *
from fastai.widgets import *

from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

bs = 16
initial_dims = 224
workers = 2
valid = 0.2

IMG_PATH = Path("data/bark_75_categories")

data = ImageDataBunch.from_folder(IMG_PATH, train=".", valid_pct=valid,
                                 ds_tfms=get_transforms(), bs=bs, size=initial_dims,
                                  num_workers=workers).normalize(imagenet_stats)

data.show_batch(rows=4, figsize=(10, 10))

learn = cnn_learner(data, models.resnet50, metrics=error_rate)

learn.fit_one_cycle(5)

learn.lr_find()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

Looking at this plot, I’m already unexcited about the prospect of seeing improvements here. The gradient is too flat; the model isn’t really able to tell the differences between a lot of images. It’s like when you’re getting an eye exam, and the doctor is asking you which is better of two seemingly identical lenses.

learn.recorder.plot()

Here we can see numerically that adding twice the data hasn’t dont much to improve things. The losses and the error rate are still pretty high.

learn.unfreeze()
learn.fit_one_cycle(5, max_lr=1e-4)

learn.save("cpi-0.0_75-categories-1")

data_larger = ImageDataBunch.from_folder(IMG_PATH, train=".", valid_pct=valid,
                                 ds_tfms=get_transforms(), bs=bs, size=initial_dims*2,
                                  num_workers=workers).normalize(imagenet_stats)

Still, it won’t take long to do the progressive upscaling for the sake of comparison.

learn_larger = cnn_learner(data_larger, models.resnet50, metrics=error_rate)

learn_larger.load("cpi-0.0_75-categories-1")

#learn_larger.fit_one_cycle(4, max_lr=1e-4)
learn_larger.fit_one_cycle(5)

learn_larger.save("cpi-0.0_75-categories-1b")

learn_larger.lr_find()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

learn_larger.recorder.plot()

The model is doing its best, but it is still having a lot of trouble beating a ~60% error rate.

learn_larger.fit_one_cycle(4, max_lr=slice(1e-5, 1e-4))

interp = ClassificationInterpretation.from_learner(learn_larger)

It is important to maintain the perspective that we are dealing with a dataset with 75 classes. Random chance would yield an error rate of ~98.667%, so the model is picking up on some important patterns. Here, we can see the confusion matrix in all its toddling glory.

interp.plot_confusion_matrix(figsize=(24, 24), dpi=60)

A Higher Perspective¶

Doubling the number of images did not improve results in a meaningful way. There is a chance that I am drastically underestimating the number of images required here, but ~15,000 images is enough that I don’t want to get 5X-10X more without seeing what can be done with this existing set. Let’s try zooming out a bit.

Up until now, we’ve been looking at classification on the species level, and this has a lot of issues. A lot of species can hybridize, and of the specimens that haven’t, a lot of them are similar enough that they could be confused in a casual inspection. Next step: we can bypass a lot of these issues by regrouping the data into taxonomic orders instead of species. There are a lot of explanations for a model that gets cherry tree species mixed up, but not quite as many for one that confuses cherry for pine trees. Let’s do this!

(Pardon the naming and numbering weirdness; I’m stitching a few notebooks together in editing)

Classifying by Taxonomic Order – What Does the Dataset Look Like?¶

The original dataset was ~15,000 images spread across 75 classes. This is a lot, but not so many that it can’t be done manually with a little patience. The Metroparks checklist provides just enough taxonomic information to go on, and the species that I downloaded were covered by 10 orders. The resultant classes are unfortunately unbalanced, and I’ll say more about that later.

In merging classes from the original set, I noticed that there were a large number of redundant images in some of the orders. Given that we’re getting these images from a somewhat blind search on Google, this was to be expected. In all, the new dataset features ~12,000 images across 10 categories, meaning we lost something like 2,000-3,000 images worth of redundant or mislabelled data.

data = ImageDataBunch.from_folder(IMG_PATH, train=".", valid_pct=valid,
                                 ds_tfms=get_transforms(), bs=bs, size=initial_dims,
                                  num_workers=workers).normalize(imagenet_stats)

data.show_batch(rows=4, figsize=(10, 10))

learn = cnn_learner(data, models.resnet50, metrics=error_rate)

Is this encouraging? It seems to be learning faster, but we are also dealing with dataset that has far fewer categories.

learn.fit_one_cycle(5)

learn.lr_find()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

The similarly flat learn rate plot is giving me a weird feeling about this, too, but we’ll see where it goes.

learn.recorder.plot()

learn.unfreeze()
learn.fit_one_cycle(5, max_lr=slice(1e-5, 5e-5))

learn.save("cpi-0.0-orders-1")

data_larger = ImageDataBunch.from_folder(IMG_PATH, train=".", valid_pct=valid,
                                 ds_tfms=get_transforms(), bs=bs, size=initial_dims*2,
                                  num_workers=workers).normalize(imagenet_stats)

learn_larger = cnn_learner(data_larger, models.resnet50, metrics=error_rate)

learn_larger.load("cpi-0.0-orders-1")

We see the upscaled version starts off with a similar error rate as the above and doesn’t train as quickly.

learn_larger.unfreeze()
learn_larger.fit_one_cycle(5)

learn_larger.lr_find()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

But we are seeing some improvement here.

learn_larger.recorder.plot()

learn_larger.fit_one_cycle(5, max_lr=slice(2e-5, 1e-4))

Uh… Let’s do that again.

learn_larger.fit_one_cycle(5, max_lr=slice(2e-5, 1e-4))

Alright, I think we’re coming up on a plateau. Let’s do the honors.

interp = ClassificationInterpretation.from_learner(learn_larger)

I think this is a good illustration of where the baseline confusion matrix falls down. Our classes are imbalanced enough where the visual of a solid diagonal of about the same intensity doesn’t tell the whole story. There are just two big classes, the visual impression of which dominates the chart.

interp.plot_confusion_matrix(figsize=(24, 24), dpi=60)

Results¶

Alright, on a dataset with 10 plant classes, ~12,000 images, a ResNet50 model like this will give us results of ~55% accuracy. Clearly, there is some room for improvement.

But! I did some nosing around and found How we beat the FastAI leaderboard score by +19.77%…a synergy of new deep learning techniques for your consideration.. I was especially interested in its discussion of the ImageWoof dataset, concerning the classification of dog breeds. It also has about 10 classes and ~12,000 images, and good performance on that is also on the order of 55% accuracy (or at least, it was before this article came out).

Additionally, dog breeds are kind of… “Meant to be distinguishable” is not the right term, but certainly a lot more work went into them being distinct than serviceberry trees!

Next Steps¶

If nothing else, trying to classify by order instead of species has given us a lot of information about the difficulty of the problem at hand. Immediate next steps will be to examine methods of dealing with class imbalance, but I want to do more thinking about what can be done at the species level. The error rate was much higher at that level of classification, but so was the specificity, and I think that’s worth exploring. Early Days!

epoch	train_loss	valid_loss	error_rate	time
0	4.529002	4.010970	0.875541	03:15
1	3.810512	3.475730	0.826479	03:14
2	3.470911	3.218662	0.792929	03:15
3	3.190804	3.085810	0.769841	03:17
4	2.996430	3.059711	0.763348	03:15

epoch	train_loss	valid_loss	error_rate	time
0	3.052122	3.101303	0.777417	03:14
1	3.245432	3.101461	0.765873	03:14
2	2.884305	2.891907	0.738456	03:15
3	2.567502	2.761083	0.701659	03:14
4	2.345342	2.728074	0.695887	03:15

epoch	train_loss	valid_loss	error_rate	time
0	2.869457	2.432699	0.651154	04:49
1	2.812652	2.453286	0.652237	04:44
2	2.622365	2.370909	0.632035	04:45
3	2.446855	2.301858	0.615079	04:49
4	2.271797	2.282915	0.616162	04:44

epoch	train_loss	valid_loss	error_rate	time
0	2.264575	2.272218	0.612554	04:44
1	2.260219	2.268570	0.615440	04:45
2	2.219465	2.270867	0.615801	04:45
3	2.199016	2.264873	0.609307	04:49

epoch	train_loss	valid_loss	error_rate	time
0	2.293406	1.954188	0.609133	02:48
1	1.983944	1.719169	0.573104	02:49
2	1.779403	1.547382	0.510683	02:47
3	1.643190	1.500176	0.497696	02:49
4	1.535627	1.479375	0.488060	02:49

epoch	train_loss	valid_loss	error_rate	time
0	1.510465	1.453368	0.472979	02:49
1	1.498507	1.409679	0.457478	02:50
2	1.375820	1.350060	0.441558	02:50
3	1.291470	1.344369	0.440721	02:50
4	1.211504	1.334942	0.436531	02:49

epoch	train_loss	valid_loss	error_rate	time
0	1.839270	1.841498	0.596146	04:18
1	2.060362	2.006438	0.692920	04:13
2	1.845123	1.747595	0.587348	04:15
3	1.692970	1.582104	0.516129	04:15
4	1.557250	1.513704	0.513615	04:18

epoch	train_loss	valid_loss	error_rate	time
0	1.526821	1.504949	0.497277	04:14
1	1.444442	1.469969	0.486804	04:12
2	1.433790	1.434834	0.477168	04:16
3	1.410849	1.431510	0.462505	04:14
4	1.379590	1.422813	0.459573	04:20

epoch	train_loss	valid_loss	error_rate	time
0	1.379468	1.441278	0.467114	04:13
1	1.403144	1.421605	0.467114	04:16
2	1.363362	1.403969	0.458735	04:19
3	1.260465	1.386433	0.454964	04:16
4	1.268007	1.363443	0.448680	04:22