DendroNN, or the Resurrection of the Tree Classification Project

TreeID Dataset

Collecting and processing the tree data was a big learning experience. That’s the encouraging, optimistic way of saying that I made a ton of mistakes that I will never, ever repeat because I cringe at how obvious they feel in retrospect. This is still under active development and has attracted some outside interest, so you can follow along at the GitHub repo. So, what are we looking at here?

How was the data collected?

The beginning of this project was a collection of nearly 5,000 photos around my own neighborhood in Cleveland at the end of 2020. Not being an arborist, or even someone who leaves the house much, I hadn’t anticipated how difficult it would be to learn how to identify trees from just bark at the time. Because it was only of personal interest at the time, the project was put on ice.

At the end of 2022, I learned of a couple of helpful projects. The first and primary source of my data, the City of Pittsburgh paid for a tree inventory a few years ago. The species of tree, its GPS-tagged location, and other relevant data can be accessed as part of its Burghs Eye View program. It’s worth noting that Burghs Eye View isn’t just about trees, but is an admirable civic data resource in general. The second, and one which I unfortunately didn’t have time to use, is Falling Fruit, which has a bit of a different aim.

Using the Burghs Eye View map, I took a trip to Pittsburgh and systematically collected several thousand photos of tree bark using a normal smart phone camera. The procedure was simple:

  1. Start taking regularly spaced photos from the roots, less than a meter away from the tree bark, and track upward.
  2. When the phone it at arm’s reach, angle it upward to get one more shot of the canopy.
  3. Step to another angle of the tree and repeat, usually capturing about six angles of an adult tree.

Mistake 1 – Images too Large

Problem

The photos taken in Pittsburgh had a resolution of 3000×4000. An extremely common preprocessing technique in deep learning is scaling images down to, e.g. 224×224 or 384×384. Jeremy Howard in the Fast.ai course even plays with this, and developed a technique called progressive upscaling; images are scaled to 224×224, a model is trained on them, and then the model is trained on images that were scaled to a resolution more like 384×384.

I spent a lot of time trying to make this work, to the point where I started using cloud compute services to handle much larger images, to no avail. Ready to give up, I scoured my notebooks and noticed that some of the trees that the model was confusing more than the others looked very similar when scaled down to that size. It occurred to me that a lot of important fine details of bark was probably getting lost in that kind of compression. Okay, but so what?

Solution

Cutting the images up. I suspect this solution is somewhat specific to problems like this one. It doensn’t seem like it would be useful for classification purposes if you were to cut photos of fruit or houses into many much smaller patches. But tree bark is basically just a repeating texture. Even before realizing the next mistake, I noticed improvements by scaling down to 500×500 images, then finally a more drastic improvement by going down to 250×250.

In some ways this gave me a lot more data. If you follow the match, a 3000×4000 image becomes at most 192 usable 250×250 patches. I at first thought it was a little suspicious, but I looked around and doing it this way isn’t without precedent. There is a Kaggle competition for histopathological cancer detection where this technique comes up, for instance.

Mistake 2 – Too Much Extraneous Data

Problem

A non-trivial amount of this data got thrown out. At the time, I was working on the impression that the AI would need to take in as much data as it could. What I hadn’t considered was that some kinds of tree data, even some kinds of bark data, would be substantially more useful than others in classifying the trees. To borrow a data science idea, I hadn’t done a principle component analysis and wasted a lot of time. Many early training sessions were spent trying to get the model to classify trees based on images that included:

  • Roots and irregular branches
  • Soil and stray leaves
  • Tree canopy that wasn’t specific
  • Excessively blurry images
  • Tree bark that was covered in lichen or moss, damaged, diseased
  • Potentially useful, but at an angle or a distance that just got in the way

The end result was that it might require an inhumanly large dataset to achieve learning in any meaningful way. Early models could somewhat distinguish photos of oaks and pines in this way, but the results were too poor to be worth reporting.

Solution

Well, I ended up with two solutions.

The first was training a binary classifier to detect usable bark. The first time this came up, I was still working with the 500×500 patches, which worked out to just shy of 200,000 images. Not exactly a manual number, but I’ve never seen an ocean that I didn’t think I could boil. I spent an afternoon sorting 30,000 images, realized I had only sorted 30,000 images, and then realized I accidentally made a halfway decent training set for a binary classifer.

That classifier sorted the remainder of those images in under an hour.

The second solution was, unfortunately, quite a bit less dramatic. It involved simply going through the original photos, picking out the ones that basically weren’t platonic bark taken at about torso level, and just repeating the process of dividing them into patches and feeding those into the usable bart sorter.

So, what does it actually look like?

Alright, we’re getting into a little of the code. First, we need to import some standard libraries. I’ll do some explaining for the uninitiated.

pandas handles data manipulation, here mostly CSV files. If you’ve done any work with neural networks before, you might have seen images being loaded into folders, one per class. I personally prefer CSV files because they make it easier to include other relevant things besides just the specific class of the image. I’ll show you an example shortly.

matplotlib is a library for annotation and displaying charts.

numpy is a library for more easily handling matrix operations.

PIL is a library for processing images. It’s used here mostly for loading purposes, and we can see that because we are loading the Image class from it.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

import os

Something that was really helpful was the use of confidence values. For every patch that the usable bark sorter classified, it also returned its confidence in the classification. Looking at the results, it had a strong inclination to reject perfectly usable bark, but it also wasn’t especially confident about those decisions. We can use that to our advantange by only taking bark that it was very confident about.

In [2]:
def confidence_threshold(df, thresh=0.9, below=True):
    if below==True:
        return df[df['confidence'] < thresh]
    else:
        return df[df['confidence'] > thresh]

Loading the Data

What these files will ultimately look like is still in flux. Today, I’ll be walking you through some older ones that still have basically what we need right now. First, we load the partitions that were accepted or rejected by the usable bark sorter. We also do a little cleanup of the DataFrame.

In [49]:
reject_file = "../torso_reject.csv"
accept_file = "../torso_accept.csv"

reject_df = pd.read_csv(reject_file, index_col=0)
accept_df = pd.read_csv(accept_file, index_col=0)

accept_df
Out[49]:
path confidence
0 dataset0/pittsburgh_torso-250×250/0001/2023010… 0.776984
1 dataset0/pittsburgh_torso-250×250/0001/2023010… 0.999994
2 dataset0/pittsburgh_torso-250×250/0001/2023010… 0.890233
3 dataset0/pittsburgh_torso-250×250/0001/2023010… 0.503523
4 dataset0/pittsburgh_torso-250×250/0001/2023010… 0.999959
108057 dataset0/pittsburgh_torso-250×250/0079/2023010… 0.998933
108058 dataset0/pittsburgh_torso-250×250/0079/2023010… 0.999988
108059 dataset0/pittsburgh_torso-250×250/0079/2023010… 0.998984
108060 dataset0/pittsburgh_torso-250×250/0079/2023010… 1.000000
108061 dataset0/pittsburgh_torso-250×250/0079/2023010… 0.999999

108062 rows × 2 columns

I accidentally goofed up when generating the files by accidentally leaving a redundant “.jpg” in a script. The files have since been renamed, but because we’re using older CSVs, we need to make a quick fix here. The below is just a quick helper function that pandas can use to map to each path in the DataFrame.

In [50]:
def fix_path(path):
    path_parts = path.split('/')
    
    # Change 'dataset0' to 'dataset'
    #fn[0] = "dataset"
    
    # Remove redundant '.jpg'
    fn = path_parts[-1]
    fn = fn.split('_')    
    fn[1] = fn[1].split('.')[0]
    fn = '_'.join(fn)
    path_parts[-1] = fn
    path_parts[0] = "../dataset"
    path_parts = '/'.join(path_parts)
    return path_parts
In [51]:
accept_df['path'] = accept_df['path'].map(lambda x: fix_path(x))
reject_df['path'] = reject_df['path'].map(lambda x: fix_path(x))

And just to make sure it actually worked okay:

In [52]:
accept_df['path'].iloc[0]
Out[52]:
'../dataset/pittsburgh_torso-250x250/0001/20230106_113732(0)_7_1.jpg'

Alright, so what’s this?

In [53]:
accept_df
Out[53]:
path confidence
0 ../dataset/pittsburgh_torso-250×250/0001/20230… 0.776984
1 ../dataset/pittsburgh_torso-250×250/0001/20230… 0.999994
2 ../dataset/pittsburgh_torso-250×250/0001/20230… 0.890233
3 ../dataset/pittsburgh_torso-250×250/0001/20230… 0.503523
4 ../dataset/pittsburgh_torso-250×250/0001/20230… 0.999959
108057 ../dataset/pittsburgh_torso-250×250/0079/20230… 0.998933
108058 ../dataset/pittsburgh_torso-250×250/0079/20230… 0.999988
108059 ../dataset/pittsburgh_torso-250×250/0079/20230… 0.998984
108060 ../dataset/pittsburgh_torso-250×250/0079/20230… 1.000000
108061 ../dataset/pittsburgh_torso-250×250/0079/20230… 0.999999

108062 rows × 2 columns

In [54]:
 reject_df
Out[54]:
path confidence
0 ../dataset/pittsburgh_torso-250×250/0001/20230… 1.000000
1 ../dataset/pittsburgh_torso-250×250/0001/20230… 0.999809
2 ../dataset/pittsburgh_torso-250×250/0001/20230… 0.999998
3 ../dataset/pittsburgh_torso-250×250/0001/20230… 0.997332
4 ../dataset/pittsburgh_torso-250×250/0001/20230… 0.999979
93149 ../dataset/pittsburgh_torso-250×250/0079/20230… 0.651545
93150 ../dataset/pittsburgh_torso-250×250/0079/20230… 0.999999
93151 ../dataset/pittsburgh_torso-250×250/0079/20230… 0.982236
93152 ../dataset/pittsburgh_torso-250×250/0079/20230… 0.991354
93153 ../dataset/pittsburgh_torso-250×250/0079/20230… 0.897549

93154 rows × 2 columns

Around 108K patches were accepted, and around 93K were rejected. Looking at the confidence column, we see that there are a few that the model is somewhat uncertain about some of these. Let’s take a look at that.

First, we need a couple more helper functions. This one opens an image, converts it to RGB, resizes it to something presentable, and converts it to a numpy array.

In [22]:
def image_reshape(path):
    image = Image.open(path).convert("RGB")
    image = image.resize((224, 224))
    image = np.asarray(image)
    return image

Next, it will be helpful to be able to see a few of these patches at once. This next function will get patches in batches of 16 and arranges them in a 4×4 grid with matplotlib.

In [23]:
def get_sample(path_list):
    print("Generating new sample")
    new_sample = np.random.choice(path_list, 16, replace=False)
    
    samples = []
    paths = []
    for image in new_sample:
        samples.append(image_reshape(image))
        paths.append(image)
    return samples, paths

get_sample() takes a list of paths, so let’s extract those from the DataFrame.

In [24]:
reject_paths = reject_df['path'].values.tolist()
accept_paths = accept_df['path'].values.tolist()
In [26]:
samples, paths = get_sample(reject_paths)
plt.imshow(samples[1])
Generating new sample
Out[26]:
<matplotlib.image.AxesImage at 0x7fb632dda470>

Okay, let’s see a whole grid of rejects!

In [27]:
def show_grid(sample):
    rows = 4
    cols = 4
    img_count = 0
    fix, axes = plt.subplots(nrows=rows, ncols=cols, figsize=(15,15))
    
    for i in range(rows):
        for j in range(cols):
            if img_count < len(sample):
                axes[i, j].imshow(sample[img_count])
                img_count += 1
In [28]:
def new_random_grid(path_list):
    sample, paths = get_sample(path_list)
    show_grid(sample)
    return sample, paths
In [30]:
sample, paths = new_random_grid(reject_paths)
Generating new sample

These are samples of the whole of the rejects list. I encourage you to run this cell a bunch of times if you’re following along at home. You should see a goodly number of things that aren’t remotely bark: bits of streets, signs, grass, dirt, and other stuff like that. You’ll also see a number of patches that are bark, but aren’t very good for training. Some bark is blurry, out of focus, or a part of the tree that I later learned wasn’t actually useful.

A fun thing to try is to see the stuff that the model rejected, but wasn’t very confident about.

In [35]:
reject_df_below_85 = confidence_threshold(reject_df, thresh=0.85, below=True)
reject_df_below_85_paths = reject_df_below_85['path'].values.tolist()
sample, paths = new_random_grid(reject_df_below_85_paths)
Generating new sample

Running this just once or twice usually reveals images that are a bit more… On the edge of acceptability. It’s not always clear why something got rejected. I haven’t ended up doing this yet, but something in the works is examining these low-confidence rejects in either training or, more likely, testing of the model. Mercifully, there is enough data that was unambiguously accepted that doing so hasn’t been a pressing need.

Speaking of, we should take a look at what got accepted, too.

In [39]:
sample, paths = new_random_grid(accept_paths)
Generating new sample

As above, these are all of the accepted bark patches, not just those that have a high confidence. Let’s see what happens when we look at the low-confidence accepted patches.

In [45]:
accept_df_below_65 = confidence_threshold(accept_df, thresh=0.65, below=True)
accept_df_below_65_paths = accept_df_below_65['path'].values.tolist()
sample, paths = new_random_grid(accept_df_below_65_paths)
Generating new sample

Here we see that bias that I was talking about before. Note that when we were looking at the rejected bark, we were looking at patches that were divided by a threshold of 0.85 and were already seeing a lot of patches that could easily be accepted. Here, we are looking at a confidence threshold of 0.65 and are still not seeing many that would definitely be rejected.

The cause of the bias is unknown. I made it a special point of training the usable bark sorter on a roughly even split of acceptable and unacceptable bark. Because this was just a secondary tool for the real project, I haven’t had time to deeply investigate why this might be. I suspect there is some deep information theoretical reason for why this happened, perhaps one that will be painfully obvious to any high schooler once the field is more mature. The important thing now is it’s a quirk of the model that I caught early enough to use.

And what do these patches represent?

Having seen the images we are working we, it might be a good idea to look at what species we’re actually working with.

In [55]:
specimen_df = pd.read_csv("../specimen_list.csv", index_col=0)
specimen_df
Out[55]:
common_name latin_name family
id
1 norway_maple Acer_platanoides maple
2 norway_maple Acer_platanoides maple
3 norway_maple Acer_platanoides maple
4 red_maple Acer_rubrum maple
5 freeman_maple Acer_x_freemanii maple
75 northern_red_oak Quercus_rubra beech
76 northern_red_oak Quercus_rubra beech
77 white_oak Quercus_alba beech
78 bur_oak Quercus_macrocarpa beech
79 swamp_white_oak Quercus_bicolor beech

79 rows × 3 columns

Okay, I took photos of 79 different trees. It was actually 81, but the GPS signal on the last two was too spotty to match them on the map, and they had to be excluded. How can we break this down?

In [76]:
family_names = specimen_df.family.value_counts()
family_names = family_names.to_dict()
f_names = list(family_names.keys())
f_values = list(family_names.values())

header_font = {'family': 'serif', 'color': 'black', 'size': 20}
axis_font = {'family': 'serif', 'color': 'black', 'size': 15}
plt.rcParams['figure.figsize'] = [10, 5]

plt.bar(range(len(family_names)), f_values, tick_label=f_names)
plt.title("Breakdown of Specimens Collected in Pittsburgh, by Family",
         fontdict=header_font)
plt.xlabel("Family", fontdict=axis_font)
plt.ylabel("Number of Specimens", fontdict=axis_font)
plt.show()

When I started training, it made sense to start training focused on the family level. A family will inherently have at least as many images to work with as a species, and usually many more, and I had the assumption that variation would be smaller within the family. Interestingly enough, at least within this dataset, the difference in the quality of the model at the family and species levels has so far been negligible.

In [80]:
common_names = specimen_df.common_name.value_counts()
common_names = common_names.to_dict()
c_names = list(common_names.keys())
c_values = list(common_names.values())
In [83]:
plt.bar(range(len(common_names)), c_values, tick_label=c_names)
plt.title("Breakdown of Specimens Collected in Pittsburgh, by Species",
         fontdict=header_font)
plt.xlabel("Species (common names)", fontdict=axis_font)
plt.ylabel("Number of Specimens", fontdict=axis_font)
plt.xticks(rotation=45, ha='right')
plt.show()

The Model and Dataset Code

Full code can be found on the GitHub repo, but here are some important parts of the training code. First, because we are using a custom dataset, we need to make a class that will tell the dataloaders what to do. Some of this might need a little bit of explaining.

We have to import some things to make this part of the notebook work. BarkDataset inherits from the Dataset class. To initialize it, we only have to bring a given DataFrame, e.g. accept_df into the class. I’ve shown before that CSVs will let us work with a lot of other data that goes into the support and interpretation of the dataset, but BarkDataset itself only needs two things: the column of all the paths of the images themselves, and the column that defines their labels.

You might be a little confused about the line self.labels = df["factor"].values. The full code converts either the species-level or family-level specimens into a numerical class. For example:

"eastern_white_pine": 0

The label is the 0. When making predictions, we will convert back from this label for clarity to the user, but that isn’t how the model sees it.

After loading the DataFrame, we also define a set of transforms in self.train_transforms. At minimum, this is where we scale images down to 224×224 and normalize them. If you’re wondering, the values for normalization are standard in the field from ImageNet statistics.

In addition to these standard changes, transforms also has a wide variety of transforms that facilitate data augmentation. We can use data augmentation to give us more information from a base dataset; we just need to keep in mind to introduce the kinds of variability that would actually occur in the collection of more data.

You’ll notice two other methods in this class: __len__ and __getitem__. The former just returns the number of items in the DataFrame. The latter is where a single image is actually loaded using Image from the PIL library, and the label is matched from that image’s location in the DataFrame. The transforms are then applied to the image, and we get both the loaded image and its label returned in a dictionary.

In [86]:
import torch
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm

class BarkDataset(Dataset):
    def __init__(self, df):
        self.df = df
        self.fns = df["path"].values
        self.labels = df["factor"].values

        self.train_transforms = transforms.Compose([
            transforms.ToTensor(),
            transforms.Resize((224, 224)),
            transforms.RandomHorizontalFlip(),
            transforms.RandomRotation(60),
            transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                std=[0.229, 0.224, 0.225]),
            ])

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        image = Image.open(row.path)
        label = self.labels[idx]

        image = self.train_transform(image)

        return {
            "image": image,
            "label": torch.tensor(label, dtype=torch.long)
        }

Next, we’ll look at the model. The model can be expanded in a lot of ways with a class of its own, but at this stage of the project’s development, we are just starting with the pretrained weights and unchanged architecture as provided by timm.

In [ ]:
model = timm.create_model("deit3_base_patch16_224", pretrained=True, num_classes=N_CLASSES)
model.to(device)

criterion = nn.CrossEntropy
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

scheduler = optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=100, T_mult=1,
                                                        eta_min=0.00001, last_epoch=-1)
model, history = training(model, optimizer, criterion, scheduler, device=device, num_epochs=100)
score = testing(test_df, model)

Now, I’ve gone through a lot of iterations of the various components here. Just off the top of my head:

  • I first started using an EfficientNet architecture, but got curious to see how one based on vision transformers would compare, and it wasn’t really a contest.
  • I originally used an Adam optimizer. Training with vanilla SGD proved slower, but also gave more consistent results.
  • I’ve settled on cosine annealing as a learning rate scheduler, but also have intermittent success with CyclicLR and MultiCyclicLR.
  • Most recently, I’ve been wondering if cross-entropy is the right loss metric. I quickly replaced accuracy with ROC because there are multiple, imblanaced classes in the full dataset. I suspect this also has implications for training, but have so far not found a loss metric that works better.

Changing the parameters here is still an active area of my research.

Results

This is a confusion matrix of the results so far. It plots the predicted label against the actual label, and makes it a little easier to see where things are getting mixed up. Ideally, there would only be nonzero values along the diagonal.

In [152]:
#!pip install seaborn
import confusion_stuff
import seaborn as sn

matrix = confusion_stuff.matrix

ind = confusion_stuff.individuals
ind = {k: v for k, v in sorted(ind.items(), key=lambda item: item[1])}
ind_keys = ind.keys()
ind_vals = ind.values()

df_cm = pd.DataFrame(confusion_stuff.matrix)
sn.heatmap(df_cm, cmap="crest")
Out[152]:
<AxesSubplot:>

The average ROC for this model across all 21 species is about 0.80. For reference, a 1.0 would be a perfect score, and 0.5 would be random guessing. The graphis is somewhat muddled because there are so many classes, so you can see the scores for individual classes below.

In [153]:
for i, j in zip(ind_keys, ind_vals):
    print(f"{i}: {j}")
common_pear: 0.508
pin_oak: 0.542
red_maple: 0.611
colorado_spruce: 0.69
kentucky_coffeetree: 0.736
chestnut_oak: 0.741
japanese_zelkova: 0.784
northern_red_oak: 0.823
swamp_white_oak: 0.838
scotch_pine: 0.847
sugar_maple: 0.848
callery_pear: 0.857
norway_maple: 0.864
austrian_pine: 0.873
thornless_honeylocust: 0.875
bur_oak: 0.888
eastern_white_pine: 0.888
ginkgo: 0.893
white_oak: 0.912
amur_corktree: 0.921
freeman_maple: 0.922
In [154]:
ind_keys = sorted(list(ind_keys))
ind_vals = sorted(list(ind_vals))
ind_keys.reverse()
ind_vals.reverse()
In [155]:
plt.bar(range(len(ind_keys)), ind_vals, tick_label=ind_keys)
plt.title("ROC of Individual Species in Best Model",
         fontdict=header_font)
plt.xlabel("Species", fontdict=axis_font)
plt.ylabel("ROC", fontdict=axis_font)
plt.xticks(rotation=45, ha='right')
plt.show()

Having plotted these, it’s not hard to see where the model is weak.

Next Steps

I’m still trying new things and learning a lot from this project. Some things on the horizon:

  • Trying one of the larger DeiT models.
  • Ensemble method: training a number of classifiers with a smaller number of classes and having them vote using their confidence scores.
  • Ensemble method: break one large test image into patches, have models vote on each of the patches, and use the majority, the highest confidence, or some other metric as the prediction.
  • Gathering more data, especially expanding to include more species.
  • Weird idea: information theoretical analysis of the tree bark.

Early days!

CartograCLE

Haha, get it? ‘Cause it’s making maps about Cleveland, and…

Anyway, in my post about the maternal mortality case study, I alluded to another project in R that involved USGS data and interactive maps. This was actually inspired by the Great Lakes Data Watershed. I took some time out to learn how to reproduce most of the functionality of one of their dashboards, and I learned a lot in the process. You can follow along with the GitHub repo here, or check out the demo here.

What are we looking at here?

This program calls the USGS REST API to get near real-time data from water monitoring stations around Cuyahoga county, Ohio. It formats that data to get the flow and stage of rivers around the county, and displays that information on relevant points on the map. Unfortunately, it’s only hosted on GitHub Pages, so it isn’t making regular calls to the USGS. This could be remedied by using real hosting that I am not going to pay for personally. For the curious, I made use of OpenStreetMaps to provide the map data, and Leaflet for the interactive map elements.

What’s next?

I have been trying to get into GIS applications for years, and I never before had the skill with R to explore this environment with that kind of tooling. Because this is the first time I’ve used a REST API to get data of this type, I’m thinking about other scientific datasets that have frequent updates. Something that I have been paying particular attention to is Landsat data and its possible applications to precision agriculture. Early Days!

Maternal Mortality Case Study

I recently completed the Google Data Analytics Professional Certificate program available through Coursera. It was a big adventure covering a large swath of material I had never taken the time to look at, chief among them probably getting the hang of R. It’s a programming language I have poked at before, but never really examined in a directed way. Coming largely from a background of languages like Python and JavaScript, R is really, genuinely different.

Anyway, the course calls for the completion of a case study. Instead of using the bike dataset, I opted to look at something that has been hanging over my head for quite some time now: maternal mortality rates in the US. My initial analysis is currently being hosted on GitHub Pages and can be found here. The GitHub repo itself can be found here. As of this writing, the findings are preliminary; this project will be subject to a lot of revision, but I’m happy with this initial draft.

Why, though?

The issue of women’s health has never been spectacular in the US, but it seems to be coming to a head in recent months. The recent overturning of Roe v. Wade, declining fertility rates for sociological and economic reasons, and concerns around plastics having pronounced physiological effects are on many people’s minds. Additionally, I know people for whom this is a personal concern.

What did you find?

It’s still a preliminary analysis subject to expansion and review, but it looks like one big culprit in the drastic increase in maternal mortality rates in the US is increasing maternal age. I don’t mean women in their 40s and 50s who are having unsafe geriatric pregnancies; the average age for women dying of maternal mortality conditions is still entirely under 36. In fact, of particular interest was that changes in maternal mortality rates has cleaved around 40 in recent years, with cases being on the decline in women in those much older cohorts.

Everything after that was beyond the scope of my analysis. There could be any number of reasons why maternal ages are increasing. I posited that a general lack of trust in the economy following the Great Recession of 2008 could be one reason, but climate change, labor situations, and the general cultural disposition to families could also be factors.

Also, while discussing the project with a friend, we got the idea of comparing maternal mortality to mining. After using R to run some statistics from MSHA, we found out about this:

Graph comparing the fatalities per 100,000 of mothers and miners, indicating that a career in mining has been safer from this perspective since 2011

Yeah. Fatalities for people working in the coal mining industry were lower per 100,000 than those for mothers. In my mind, there’s something somber and concerning about looking at the cultural touchstone of dirty and dangerous work, comparing it to something that is quintessentially human, and being surprised at which one is more dangerous. Very generously, this could be characterized as a victory for improvements in safety standards and the continued application of engineering to the well-understood problems of mining, but given the sharp increase in maternal mortality in the US in particular, I’m personally having a hard time conceptualizing it that way.

What’s next?

Broadly, taking this course and doing this case study has given me a new lens on learning programming languages. Early in the process of learning R, my brain constantly thought of how it would solve problems with Python, tools that I am much more familiar with. R felt kludgy, cumbersome, and unnatural, right up until it didn’t. At some point, R became the natural tool for the job of data analysis, and I started solving problems in a way that I know would be difficult if I were to do the same in Python. From now on, I aspire to learn tools such that they are obviously the perfect fit for their respective jobs.

Less abstractly, I will be doing more projects with R. It allows for a certain fluidity in exploring data that I’m gaining a deep appreciation for, and I’m very curious to see how involved the ecosystem is. For instance, while working on the maternal mortality case study, I took the liberty of learning just a bit about geographic information systems. Within a day, I got the hang of calling the USGS API and applying its data to an interactive map, a project which I will be posting about very shortly. Early Days!

Deploying PyTorch to an Android Environment

I have some inchoate thoughts about approaches to deployable AI products. Too much of the conversation around AI frames it from the angle of AI-as-a-service, something that a customer makes requests of. This is evidenced by reactions to things like DALL-E 2 being concerned about the possible future of professional artists. A more useful frame of reference is AI-as-an-instrument (AIaaI), i.e. a tool that a user would wield like a pencil. Ideally, it would be something that could amplify arbitrary ideas that a user had. Of course, this would require much more generality than currently trained into models and likely the ability to interface with a number of other APIs.

Exploring a product from this angle will require levering hardware accessible to a normal user, meaning a heavy focus on mobile and edge computing systems. To that end, I’m making a harder pivot to mobile. Fortunately, there is a large and growing body of work being done in the area of mobile deep learning. Before we get started on the really interesting stuff, we have some housekeeping to attend to.

Having done a little work with both, I actually think it will be easier to develop native Android apps instead of using something like React Native. I understand that PyTorch supports both, but the ecosystem for the former seems tighter to me.

Fair warning: because this is covering so much new material, this is going to get long.

GitHub Repo

There is a PyTorch GitHub repo focusing on demos for Android that is worth a look. Today I’ll be going through some of the relevant code in the HelloWorldApp section. Honestly, most of that code. Additionally, I somewhat followed the Quickstart with a HelloWorld Example.

The following is basically notes on things that were confusing or somehow tripped me up in the process of getting HelloWorldApp to run on my phone. If you’re following along and aren’t familiar with Android or Java, I heartily recommend going through the code yourself instead of just downloading and running the repo, especially if you’re most experienced with something like a desktop Python environment. Basically, this picks up from the point of opening Android Studio and starting an empty project.

Gradle Stuff

Gradle is the build tool being used here, and it warrants some attention for anyone coming from the Python landscape or used something like CMake.

Looking at the project structure in Android Studio with the Android view, it’s easy to see that there are two build.gradle files, one for the project and one for the module. For future reference, the Project version is placed in the root folder and wil define behavior for the entire project. A module, being isolated from the broader project, has its own file to define settings that are only relevant to that module.

setttings.gradle

We need to configure this file at the top level. Possibly due to the specifics of my machine, I encountered build errors when starting with the boilerplate file, as follows.

pluginManagement {
    repositories {
        gradlePluginPortal()
        google()
        mavenCentral()
    }
}

dependencyResolutionManagement {
    repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)  
    repositories {
        google()
        mavenCentral()
    }
}

rootProject.name = "HelloWorld"
include ':app'

I initially had a look at how to configure a build focusing on settings.gradle and thought that dependencyResolutionManagement had to be removed, but this felt like poor form and raised questions about what would be done in later projects that actually needed this. Fortunately, this is known to happen in the latest version of Android Studio.

The culprit is down to the single setting. FAIL_ON_PROJECT_REPOS. The build fails here because we are setting repositories at the project level. If we comment this line out, we’re golden.

build.gradle (:app)

The project-level build.gradle file is a little more involved, and I think it’s worth going over it in sections. Worth noting that this is written in Groovy, which is a whole thing on its own and beyond the scope of this article.

plugins {
    id 'com.android.application'
}

Here is where we specify plugins. This project in particular just uses the one. com.android.application is used here because the project is an app. For examples of what else might be listed here, library would be com.android.application, or using Kotlin would be org.jetbrains.kotlin.android. There are rich gradle docs going into more detail about the use of plugins.

android {
    compileSdk 32

    defaultConfig {
        applicationId "com.example.helloworld"
        minSdk 28
        targetSdk 31
        versionCode 1
        versionName "1.0"

        testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
        }
    }

    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }  
}

Here is where the specifics of the Android build happen. Looking at defaultConfig, we can see that the name of our app is com.example.helloworld, the earliest Android version that this app can run on is 28, it’s really made for version 31 and compiled against 32, and that it’s version 1.0.

minifyEnabled might seem a little strange. I ran a search and found from this article on shrinking, obfuscating, and optimizing apps. These are things that improve security and shrink the build size. It is disabled by default in Android Studio apps; a developer would need to specify ProGuard rules to enable it, and this is a lightweight example app.

dependencies {
    implementation 'androidx.appcompat:appcompat:1.4.1'
    implementation 'com.google.android.material:material:1.4.0'
    implementation 'androidx.constraintlayout:constraintlayout:2.1.4'
    implementation 'org.pytorch:pytorch_android_lite:1.9.0'
    implementation 'org.pytorch:pytorch_android_torchvision:1.9.0'
    testImplementation 'junit:junit:4.13.2'
    androidTestImplementation 'androidx.test.ext:junit:1.1.3'
    androidTestImplementation 'androidx.test.espresso:espresso-core:3.4.0'
}

The dependencies keyword itself is pretty straightforward, so let’s have a look at what we’re actually using.

Side note: the implementation keyword exists as a way to manage dependencies in chains of libraries and speed up the build time. See Implementation Vs API in Android Gradle plugin 3.0 for more information.

  • AndroidX is the namespace for the Android Jetpack libraries. It’s the replacement for the Android Support Library and used for deployment.
  • AppCompat is a library to make Android apps work across different versions of Android itself.
  • Material Design is the library for Android’s design language.
  • The two PyTorch libraries there provide the actual deep learning functionality that we’ll get to later in this article.
  • testImplementation extends the implementation configuration such that we can use JUnit for tests.

build.gradle (HelloWorld)

With that cleared, we can get into the module-level build.gradle file.

buildscript {
    repositories {
        google()
        jcenter()
    }

    dependencies {
        classpath 'com.android.tools.build:gradle:7.2.0'
    }
}

The buildscript block provides the foundational information for the rest of the build script. Here you can see that we have added

  • google(), which is a shortcut to the Maven repository, and
  • jcenter(), which enables access to a huge number of Java and Android OSS libraries.

Both of these will be downloaded if they are not already installed.

allprojects {
    repositories {
         google()
        jcenter()
    }
}

This might look redundant. allprojects is distinct from buildscript in that the latter is for gradle itself, and the former is for the modules being built by gradle.

task clean(type: Delete) {
    delete rootProject.buildDir
}

Here we just remove the build directory when the app runs. If you want to get a better intuition around tasks, Build Script Basics has a lot of coverage there.

Design Stuff

activity_main.xml

Okay, I’m lying a little bit. Before going over the Java code proper, there is some preliminary work that needs to be done in the design. We need to take a look at activity_main.xml located in the res/values folder, as it is the file that specifies the layout. Android Studio lets you look at the code, the GUI layout, and both side-by-side. Because we are dealing with a relatively simple layout, we’ll just be going over the code in this one.

<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
 xmlns:app="http://schemas.android.com/apk/res-auto"
 xmlns:tools="http://schemas.android.com/tools"
 android:layout_width="match_parent"
 android:layout_height="match_parent"
 tools:context=".MainActivity">
  ...
</androidx.constraintlayout.widget.ConstraintLayout>

This can be regarded as a bit of boilerplate. For the curious, ConstraintLayout is the parent class of all the design elements that we will be using. It enables the use of the GUI layout editor in Android Studio and avoids nesting of controls.

Being basically a static demonstration, we only need to put two controls inside of this, an ImageView and a TextView. We will explore things like buttons, check boxes, and so on in a later entry.

<ImageView
 android:id="@+id/image"
 android:layout_width="match_parent"
 android:layout_height="match_parent"
 android:scaleType="fitCenter"/>

The image view is where the static image gets displayed. For the sake of simplicity, we will be focusing on the original code and not adding constraints. Those will be part of an upcoming tutorial.

Here, android:id gives the ImageView a name that can be referenced by the activity. layout_width and layout_height are both set to match_parent, indicating that they will be no wider than ConstraintLayout minus its padding. scaleType being set to fitCenter centers the image and matches it to the dimensions of the ImageView. Now, all of this can result in some visual distortions that should be explored when we get to the project that we are ultimately aiming for, but it is being done here to give a consistent presentation for the sake of the example.

<TextView
 android:id="@+id/text"
 android:layout_width="match_parent"
 android:layout_height="wrap_content"
 android:layout_gravity="top"
 android:text="TextView"
 android:textSize="18sp"
 tools:layout_editor_absoluteX="169dp"
 tools:layout_editor_absoluteY="540dp" />

Next is the TextView, which needs a little more explaining. We see here that layout_height is set to wrap_content. This means that, instead of taking up as much space as possible, the TextView only expands enough vertically to contain its values. Setting layout_gravity to top pushes it to the top of the screen without changing the size.

Funnily enough, if you compare this code to the GitHub repo, this is where it starts to get really apparent that I made it a point of going through this myself instead of directly copying the repo. If you don’t change these settings, you’ll run into a warning about android:text being hardcoded instead of using a @string resource, which… Yeah, fair enough, but I don’t want to do a separate section on strings.xml. At any rate, that’s something that Android Studio adds.

Something else specific to Android Studio is the tools namespace, which is used by the layout editor. Seriously, if you remove these two and run this on the phone, it’ll run the same. What’s going on here is that tools is trying to ensure consistency between the XML code and the layout GUI.

Given that the last three attributes deal with layout dimensions, you might be wondering what’s going on with sp and dp. sp stands for scale-independent pixels, and dp stands for density-independent pixels. The functional difference between these is that sp is used for text, which is scaled by user preferences, and dp is used for everything else. Setting textSize to 18sp just makes the text conventionally readable.

Additional Files

Okay, there is some housekeeping that needs to be done. From the repo, you’ll need to copy the files in app/src/main/assets. These are just the test image and the pretrained model. Additionally, you’ll need to copy over ImageNetClasses.java from app/src/main/java/org/pytorch/helloworld, which gives you the labels for the classes in question. This goes right next to MainActivity.java. I would go over it if it were anything more complex than a class containing a string, and it’ll be clear enough on the other side of the main Java material here.

Java Stuff

With that, we are finally ready to get to the meat of the project. The entire focus of the rest of this writeup will be in MainActivity.java.

Imports

First, we’ll have a look at our libraries.

package com.example.helloworld;

This creates the package. These are generally used to avoid name conflicts. It isn’t critical here, but it is a good practice to keep an eye out for.

import androidx.appcompat.app.AppCompatActivity;

import android.os.Bundle;

import android.content.Context;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.util.Log;
import android.widget.ImageView;
import android.widget.TextView;

As we’ll see in a moment, AppCompatActivity here is the base class for our MainActivity. This allows us to use newer features on older devices.

Broadly speaking, the android package contains the set of tools and resources that will be used by the app. android.content provides basic data handling on the device itself. android.graphics deals more in things that are drawn to the screen, and will be needed to handle the images. I think it’s clear that android.util‘s use here to handle logging, but it also handles a lot of time, string, and numerical data types. Finally, android.widget handles the UI elements such as our ImageView and TextView.

import org.pytorch.IValue;
import org.pytorch.LiteModuleLoader;
import org.pytorch.Module;
import org.pytorch.Tensor;
import org.pytorch.helloworld.ImageNetClasses;
import org.pytorch.torchvision.TensorImageUtils;
import org.pytorch.MemoryFormat;

org.pytorch is the package containing the deep learning components of the app. Notice on line 5 that we have added the ImageNetClasses file copied from the GitHub repo.

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

Finally, we add the java package. java.io just handles the input and output. This might be strange to someone who is only familiar with interpreted languages like Python.

MainActivity

public class MainActivity extends AppCompatActivity {
...
}

This is wrapped around the entirety of our code here. Notice that MainActivity is a class that extends AppCompatActivity. This class is declares MainActivity public, meaning that it is visible to all other classes. It contains only two methods. The first is onCreate(), which initializes the activity. The second is assetFilePath(), which is a helper method that lets us open our test image.

onCreate()

First, we’ll have a look at onCreate() in a number of sections:

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    Bitmap bitmap = null;
    Module module = null;

@Override here indicates that we are overriding the method from the parent class, in this case AppCompatActivity.

Bundle savedInstanceState is the state of the application in a bundle, typically dynamic and non-persistent. This plays an important part in the application life cycle, for instance allowing us to navigate away from the app and come back to it with its data intact.

setContentView(R.layout.activity_main is reference to activity_main.xml that we referenced earlier. R is a class that contains the definitions for all the resources used by the package, so we’ll be seeing it peppered throughout the rest of this writeup.

It is worth noting for anyone coming from Python that Java doesn’t do duck typing; every variable needs a type. Personal note: I’ve done enough with Python to know this is a part of the language I’m going to love. bitmap will ultimately be our image, and module will be the pretrained model that we downloaded from the repo. These are being set up this way to provide them access outside of the following try catch statement:

try {
    bitmap = BitmapFactory.decodeStream(getAssets().open("poodle.jpeg"));
    module = LiteModuleLoader.load(assetFilePath(this, "model.pt"));
} catch (IOException e) {
    Log.e("PytorchHelloWorld", "Error reading assets", e);
    finish();
}

ImageView imageView = findViewById(R.id.image);
imageView.setImageBitmap(bitmap);

This statement will fail if either of those files are missing. getAssets() opens the file from the context of the environment, and that stream is decoded by BitmapFactory. LiteModuleLoader() has to call the helper method to open the pretrained model, but we’ll take a look at that in just a moment. I made it a point of looking at the docs, and this doesn’t actually include a counterpart that would be implied by Lite.

Successful completion of the try catch statement then allows the code to call the ImageView and fill it with the bitmap we just opened.

final Tensor inputTensor = TensorImageUtils.bitmapToFloat32Tensor(bitmap,  
TensorImageUtils.TORCHVISION_NORM_MEAN_RGB, TensorImageUtils.TORCHVISION_NORM_STD_RGB,  
 MemoryFormat.CHANNELS_LAST  
);

Here is where the bitmap gets converted into a tensor that can be used by PyTorch. The keyword final here indicates that inputTensor is a constant. TensorImageUtils ultimately wants to turn this into a tensor comprised of 32-bit floating point numbers. The two arguments in the middle are hardcoded lists of floating point values specifying mean and standard deviations for R, G, and B components of a given pixel. This is a convention in machine learning that helps eliminate extreme values when training.

Finally, MemoryFormat.CHANNELS_LAST rearranges the tensor to make the channels last in contrast to a contiguous tensor. A contiguous tensor might have a shape of (3, 224, 244), the 3 indicating one channel for each R, G, and B value, and 224s indicating height and width. Channels last simply moves the 3 to the end to fit with PyTorch’s image representation conventions.

final Tensor outputTensor = module.forward(IValue.from(inputTensor)).toTensor();

There are some moving parts here, but this is where the model tries to predict the image’s class. inputTensor is passed to IValue, short for Interpreter Value. IValue is a Java representation of a TorchScript value. It can be any of a number of supported types, as determined by from(). As we can see here, this one is a Tensor. When the prediction is made, it is also converted to a tensor.

final float[] scores = outputTensor.getDataAsFloatArray();

float maxScore = -Float.MAX_VALUE;
int maxScoreIdx = -1;
for (int i = 0; i < scores.length; i++) {
    if (scores[i] > maxScore) {
        maxScore = scores[i];
    maxScoreIdx = i;
    }
}

scores is an array representing each of the possible classes. The highest score is the model’s prediction for the image’s class. To get that prediction, we loop over that array and set maxScoreIdx to the index where the largest number can be found.

String className = ImageNetClasses.IMAGENET_CLASSES[maxScoreIdx];

TextView textView = findViewById(R.id.text);
textView.setText(className);

className is just the human-readable representation of the prediction, which can be found in the ImageNetClasses file. The TextView is found from its name in the R class, and the name of the prediction is assigned to it.

assetFilePath()

This is a helper method that is used by LiteModuleLoader. Its purpose is to make sure that the image file exists and get its path, or throw an exception if something is wrong with the process.

The first argument here is a Context, the global information about the environment provided by Android. As used by onCreate(), the Context in question is just this, a reference to the current object. For our purposes, assetName is simply the filename of the pretrained model, model.pt.

Unlike the onCreate() method, this uses the throws keyword. This portion indicates that the method might return an IOException if something goes wrong in our file handling process.

public static String assetFilePath(Context context, String assetName) throws IOException {

    ...
}

Let’s move on to the implementation of this method.

    File file = new File(context.getFilesDir(), assetName);
    if (file.exists() && file.length() > 0) {
        return file.getAbsolutePath();
    }

It’s worth outlining the order of operations here. The File class here represents the pathname that we will try to open. context.getFilesDir() gets the absolute path to the directory, and as mentioned before, assetName is the filename of the model we are opening, model.pt.

The if block here just checks that the file exists and is a nonzero length, and will close the method and return the path to be opened by LiteModuleLoader in onCreate() if it’s successful. Simple enough, but what happens if either of these conditions are not met?

        try (InputStream is = context.getAssets().open(assetName)) {
            try (OutputStream os = new FileOutputStream(file)) {
                byte[] buffer = new byte[4 * 1024];
                int read;
                while ((read = is.read(buffer)) != -1) {
                    os.write(buffer, 0, read);
                }
                os.flush();
            }
            return file.getAbsolutePath();
        }

These are nested try-with-resoruces expressions. These statements create their respective InputStream and OutputStream and ensure that they are closed when the statements are completed.

What’s going on here? This creates the model file in the event that one does not already exist or the current one has 0 length. An InputStream using the “model.pt” filename is created, indicating that we are ready to input a stream of bytes to is. Then a FileOutputStream is created so that we will be able to write data to the file “model.pt”. A 4KB byte buffer is created, with each value set to 0. In the while block, the buffer is written to os, and the path to the file is returned.

New Beginnings

I now have a framework for experimenting with PyTorch in a mobile environment. Immediate next steps include actually letting a user make selections of images, using an intent to take new images with the phone’s camera, and doing real-time classification with text displayed in the ImageView itself. Early Days!

Hydroponics and IoT, or The Guileless First Steps into Agriculture as Taken by a Humble Millennial

A couple of months after the Situation started and everyone was busy working out and caring for sourdough starters, I tried my hand at gardening and found that it really took the edge off. Of course, the sun and I are old enemies, so it was only a matter of time before I tried doing at least part of the operation indoors. It was a natural step from there to do at least a little of that hydroponically. What could go wrong?

Goals

Your first impressions of hydroponics might be that it’s generally done in carefully controlled conditions and with a good deal of automation. I had the same impression, but that takes money I didn’t want to spend and time that went into troubleshooting fiddly physical components. It’s still not a mature system by any stretch, but it has been running long enough to produce tomatoes.

Basically, this series will be about outfitting an existing hydroponics setup with sensors, and today’s entry is about testing. We have three goals today:

  1. Ensure that sensors are wired into a Raspberry Pi 4 such that we can get readings out of them.
  2. See that we are using the right database technology for our needs.
  3. Ensure that those database entries can be seen from a website. There is the caveat that the database and (currently) the site to view it are being self-hosted from the Raspberry Pi that is gathering the sensor readings, so uptime might be an issue.

You can check out the code here. As of this writing, you can also check out the most recent sensor readings on the live site here, but be aware that the frontend will probably be accessible as a Vercel app later. This is all very preliminary work, and I’m having a ball figuring out how to put everything together.

The Current Setup

There are a few kinds of hydroponics setups. The first I’m using has a heavy focus on the ebb and flow method, with two reservoirs (cheap 30-gallon totes) that feed into four beds (honestly, two old plastic drawers and two new under-bed storage bins). One of these systems is arranged in a tower configuration, with water being pumped into the top bed and being pumped into the ones beneath it with a series of bell siphons.

Ebb and Flow Tower - Reservoir 1

Reservoir 1, in which the beds feed in parallel. Water is pumped to the top bed, which fills and releases the water to the bottom bed by means of a bell siphon.

The second ebb and flow system runs in parallel, with both beds feeding directly back into the reservoir. The increased modularity seemed like a good idea at the time, but it has turned out to be really finicky given the ad hoc nature of the setup.

Ebb and Flow System Attached to Reservoir 2

Reservoir 2, in which each of the beds feed from the reservoir in parallel.

Our plans here today are focus on two of five deep water cultures. They feed from similar 30-gallon totes. The plants themselves rest in 3D-printed net pots for support, and their roots rest in a nutrient solution. Oxygen is delivered to them via an air pump.

Deep Water Systems 1-4

Four of the five deep water hydroponic systems.

Because it is still early days and we are mostly interested in setting up monitoring infrastructure today, the plan is to focus on Deep Water 3 and 4.

DW3 and DW4

The deep water cultures are currently entirely growing tomatoes, save for a single pepperoncini plant. The ebb and flow systems have a combination of tomatoes, rosemary, figs, strawberries, and coffee plants (seriously).

The Physical Computing Elements

I’ve finally found a use for some of the sensors I’ve been gradually accumulating for a few years. At the base of this is a Raspberry Pi 4. Unfortunately, Raspberry Pis are being hit hard from semiconductor supply chain issues, so if anyone is looking to follow this blog exactly and you don’t already have one, you have been warned.

Raspberry Pi 4

An exposed Raspberry Pi 4 currently wired into a small collection of sensors.

The rough idea today includes two sensors. First, we have the DHT11 temperature and humidity sensor. When I first ran the code to get this thing running, I thought back to the day I bought this sensor on a whim and felt remorse. Its superior counterpart, the DHT22, was right there. A little reading indicates that it is less precise in its readings and has some harder limits on its constraints. Still, in the environment that this one is meant to operate in, it will still do us fine.

DHT-11

A DHT-11 sensor for temperature and humidity.

Then we have the HC-SR04 ultrasonic rangefinder. It might seem like an odd sensor when dealing with plants, but this is actually useful in determining water levels of the reservoirs.

HC-SR04 ultrasonic rangefinder

An HC-SR04 ultrasonic rangefinder to measure the volume in a given reservoir.

Finally, we have an ancient webcam to get at least some visuals on the plants themselves. Having had a chance to look at the image quality before writing this article, it seems likely that it will be replaced.

Webcam on a lampshade

A webcam with an elegant and finely-crafted stand.

The Page

Preliminary page for IoTHhydro

What are we looking at here? Remember the three goals from earlier. We needed to make sure that all of our sensors were wired correctly, that they could feed information into a database, then recover that information from the database and present it a user over the internet. Here, we see what amounts to dummy data validating all of these things.

The first element that we get from the database is a timestamp. 22 is the temperature in degrees Celsius. 67 is the percentage of humidity. The two values at the end are simple distances from the ultrasonic rangefinders. They have not been mounted, and when they are, these raw distances will be used to compute the amount of storage in each of the bins. At the bottom is an image taken by the web cam.

It’s a little… Fresh. But the system works!

The Software

I learned a ton on this project. A proper examination of everything is beyond the scope of this post and will be covered later, but let’s see some highlights:

  1. The Raspberry Pi interfaces with the sensors through Python.
  2. This portion of the project is self-hosted, and I went through NoIP to get a reasonable hostname.
  3. On the backend, I decided to use Nginx for the web server technology and ExpressJS for the application framework.
  4. Due to my prior experience and relative comfort with it, I opted with stick with MySQL for the database.
  5. Because of my momentum with it, the door is wide open for further work with React.

The Next Moves

Taking a minute to pause and reflect, we can see that we now have the basic sketch for an IoT monitoring system. As such, the project is now in a good place for aggressive expansion.

In case it isn’t obvious by now, the hydroponics setup is a pilot project. As of this writing, it is about as manual as a hydroponics operation can be. Water is swapped out, nutrients are added, and most relevant for this article, a lot of notes and photos are still taken by hand. The most obvious next step is setting up a means by which that information can be added to its own database from the client side, such as a phone. Breaking this down a bit, that means adding post capabilities, probably another page with Express, and very importantly, user authentication.

The current site is entirely static and passive, and it only displays the most recent sensor data. The most obvious direction to go on this front would be adding a page to more thoroughly query the database, and building on that, a page that can handle data visualization.

And, of course, actually doing a meaningful presentation with HTML and CSS would go a long way. Early Days!

Virtual Horse Racing, or The Trials and Tribulations of Learning React

A couple of months ago, I got myself roped into a soon-to-be-released podcast by a friend. We’re calling it Hoffstadter’s Law. The basic premise is that each episode is a game of Nomic, which explores the contractual and evolutionary nature of law and society. As our ideas have become progressively more involved, we’ve gotten to the point where it made sense to start writing code to facilitate new games.

One set of episodes uses the background of gentlemen betting on horse racing. I quickly developed a minimal version of a virtual horse race in Python that suited our purposes well enough, but it had the distinct disadvantage of only being usable in terminal. Having taken the time to develop GUIs in Python before, it made sense at the time to write it in JavaScript instead. The thinking was that a JS implementation would give me superior control over the interface, and it would have the advantage that my friend would be able to use it online without making any changes to his machine.

The fruit of this endeavor can be played here. It is currently being served through Vercel, and you can see its ongoing development by the connected GitHub repo.

Plot Twist

I’ve never written JavaScript in any meaningful capacity before. I briefly flirted with a CSS tutorial in 2011. People gave earnest advice about flipping houses since the last time I touched HTML. At the start of this project, I understood vaguely that libraries like React existed, but had no context for what they did or why they were useful. That the JavaScript ecosystem had grown so large and byzantine was considered, but decided that it wasn’t worth being daunted by.

From that vantage, the path forward was clear enough: I would take in a bit of HTML, CSS, and JS to write a first draft, translate that into React, then find… Some kind of hosting? People host React apps, right?

Results

There is a lot to reflect on in this project. I didn’t realize at first that the heavy parallelism of learning would be beneficial, and looking back, I would have leaned into it more. Knowing that there were a number of things that had to be done to make anything work made it feel a bit overwhelming at the time, but it also left no time for indecision to bog down the project. If something was unclear, it was easy to focus on something else instead of stopping.

By the same token, if I were to give myself advice in the past, it would be to lean into that parallelism more. There came a time when the next best move was to start learning React, which became a critical point of divergence for the project. From then on, it was only a React project; the vanilla JS elements fell by the wayside as development on the React app accelerated. The early assumption that it would be easy to synchronize them later was based on the lack of understanding of how differently React approaches project development and structure.

Something that became a problem as development went on, and this became clear relatively early in the React portion, was a lack of clarity as to the ultimate audience of Horseless Derby. Deciding that it would only be for the podcast and portfolio, which is its status for the time being, would mean different implementation details than making it a full game.

For instance, at the moment, all user information is stored in localStorage. A full database and user credentialing would have been massive overkill for our purposes. A proper multiplayer mode, with human or AI opponents, even moreso, though I am considering adding all of these features to further expand my skills.

Importantly and not surprisingly, graphical design is hard. I’m not talking about technical implementation details, even though those also have their challenges. The actual design of an attractive layout is something I want to double down on, and will probably be the main focus of a future project.

Looking at the code, there is some uncertainty about the best way to do CSS styling. React has a library for styled components that can be used to change CSS of a single component. As of this writing, styled-components is used to marked effect, but it strikes me as messy and overdone. This certainly warrants more study and experimentation going forward.

Conclusion

It wasn’t until this project that I noticed how much anxiety I had around the web development ecosystem. Getting to the other side of it, I see now how unwarranted that was. I’ve been making more progress with other projects in the meantime, and there is something refreshing in knowing that this is another dimension that they can expand into, and that I actually really enjoy this kind of coding. Early Days!