Book Review: Machine Learning Design Patterns

Machine Learning Design Patterns: A good overview and reference book for machine learning topics in production environments.

If you are anything like me, “machine learning” to you means working with algorithms that adapt their function from data. And while this is true, it’s not the complete story when actually working in the field of machine learning.

Yes, picking the right algorithm and creating the appropriate model is important. But data cleaning, optimizing for production, and setting up scalable infrastructure is just as much part of the day-to-day work on the job.

Along comes Machine Learning Design Patterns, a book that looks at common challenges in practical machine learning. It leaves out model architectures on purpose and instead promises to collect best practices to move machine learning to production.

Each chapter: A (machine learning) design pattern

The book’s title plays on the famous 1994 book by the Gang of Four that popularised the concept of design patterns in software engineering. It comes as no surprise that in Machine Learning Design Patterns, the authors attempt something very similar and identify reoccurring problems from their field.

Each chapter is structured the same way:

  • What is a common problem in machine learning?
  • What is the recommended solution?
  • What are trade-offs and alternatives?

What’s in the book

Written by three engineers from the Google Cloud AI team, the book covers a breadth of topics:

  • Data Representation
  • Problem Representation
  • Model Training
  • Resilient Serving
  • Reproducibility
  • Responsible AI

The early chapters cover topics frequently occurring in the “data science” section of machine learning: What is an embedding, how to work with imbalanced datasets, how to create proper checkpoints during training.

In the later parts, the authors focus more and more on inference and challenges of automation, repeatability and scalability. I now have an idea what a feature store is and how to bridge data schemas when mixing old and new data sources.

A glance at the table of contents. Each chapter follows (more or less) the same structure.

Examples: Python and SQL code

The book is full of examples and they use different technologies: Tensorflow examples in Python, a BigQuery listing, a Google cloud SDK API call.

From the preference of technologies, you notice that the book has been written by Googlers. This doesn’t matter in my mind, because the concepts are always explained clearly, so that porting to other platforms or products should be straight forward.

What I liked

The book covers a wide range of topics and helped extend my knowledge of machine learning to areas I am not an expert in: Resilient Serving, Reproducibility and MLOps in general.

The structure of design patterns lends itself to keep this book on the shelf for future reference. Chapters have a clear motivation and are written to the point, so that I can see myself looking up a design pattern in the future.

Aside from technical topics, the authors also include three chapters about responsible AI and a (brilliant) section about the ML Life Cycle and the AI Readiness of organisations.

What I didn’t like

I have a few complaints, though.

In places it becomes clear that the idea of extracting design patterns from machine learning approaches works well for some topics, but becomes a bit of a stretch for others. I personally didn’t mind this too much, but it’s not as elegant the title of the book suggests.

What I did mind was the fact that this first edition is quite riddled with errors: From figures containing incorrect numbers that don’t align with the text (just annoying) to an explanation of convolution layers that confused convolution with pooling, I think (potentially misleading).

And a final nitpick: The printed copy I ordered was a a monochrome version with low contrast. Many figures were completely indecipherable. A bit disappointing for an O’Reilly book upwards of 40€.

Thin paper, suboptimal graphics. The printed version doesn’t feel very premium.

Conclusion: Great overview to bring ML to production

In conclusion, Machine Learning Design Patterns gives a great overview over common problems you encounter when designing, building and deploying machine learning algorithms.

It will offer valuable content for many in the industry: Data scientists who have never deployed a cloud pipeline, ops experts who are curious about “MLOps” and the product person who wants to understand the constraints and possibilities of modern machine learning development.

My favourite chapter was actually a non-technical one: How to move a team and a whole company from running first ML experiments to becoming an ML-first organisation. This idea ties a lot of the technical and human topics together and it is a topic that excites me personally.

I’ve enjoyed working through this book (together with my data science study group) and it will find a valued place in my bookshelf – to be referenced whenever I encounter one of the problems in the wild again and need a foundational perspective.

Machine Learning Design Patterns, the printed copy. For long-ish reads like this, I personally prefer the physical copy.

How I spy on my cats when I’m not at home

With a second cat moving in this week, I am even more curious than before about what happens at home when no human is around.

There is a range of products that offer pet monitoring. Given I have an unused Raspberry Pi and a spare webcam lying around, I decided to DIY the solution.

Building a pet cam using a Raspberry Pi, a webcam and tailscale

As it turns out, this is easy: Take a Raspberry Pi, connect a webcam and install motion – an open source tool that exposes the webcam stream over the local network.

To enable remote access, I used tailscale which creates a private network for all my devices, no matter where they are located physically.

This guide has great step by step instructions which take less than 30 minutes to complete: https://tailscale.com/kb/1076/dogcam/

One thing to note: Pick the right Raspberry Pi. I started off with the model B+ (from 2014) which is a little underpowered and even has issues running the current Raspberry Pi OS smoothly. Luckily, I also had a “Pi 3 Model B” sitting in a drawer which did the job just fine.

My final setup: A headless Raspberry Pi – no keyboard or display attached. Plug in webcam, ethernet and power, and the stream will start automatically.

Save snapshots when motion detected

The motion project comes with some handy features: Whenever motion is detected, it will save a snapshot frame and even short videos. These are stored on the Pi (under /var/lib/motion by default) and include the time of the event.

Even when not watching the webcam stream, these recordings allow a summary of what the furballs were up to while I’m gone.

Setting the trap
Spoiler: She ended up taking the bait

Tailscale: Pretty cool

I hadn’t used it before, but tailscale really was perfect for this case: The Raspberry Pi is one device in my virtual network. The other two I’ve added are my laptop and my phone.

Anytime I want to check the webcam stream I simply open the browser and access the (virtual?) IP of the Pi.

Accessing the webcam from my laptop

On the iPhone, I added that URL to the home screen so the ominous toilet stream is always in reach.

Accessing the webcam from my iPhone

Next level: Five eyes

The initial setup was easy. I now have one webcam that I can place anywhere (anywhere the ethernet cable reaches, that is). A set of these would be cool so that I could monitor all movement in the flat.

My ambitions to build the next NSA for cats are limited though, so I’ll probably stick to a single cam and point it at one key location. Right now it’s looking at the litter box and I’m working on a spreadsheet to plot the bowel movements of the little one. Uhm, yeah.

The new oil

Another idea for the next step: Collect the visual data over time and run some vision algorithms. How often do they eat? Do they really sleep 16 hours a day? Who spends more time in each room? All of them cool ideas (which I’ll never implement, let’s be honest).

This was a quick project. I can now check in on my cats when I’m out and about. A Saturday afternoon well spent.