Software Engineering for Machine Learning Systems

Software Engineering for Machine Learning Systems is a course first taught at Imperial College, London, in Spring 2024. It covers the engineering concepts required to build and operate robust and trustworthy machine learning systems, while leaving the theory of models themselves to other courses.

The three month course is structured around a real-world example from the medical domain. Alongside lectures, students design and train a model to predict acute kidney injury from synthetic blood test results, before working in groups to build a system around that model to alert clinicians in a simulated environment. Throughout the course, we use real data standards (eg HL7) and deployment infrastructure (eg Kubernetes), rather than simplifications.

When delivered at Imperial, we schedule 2 hours a week of lectures together with 2 hours of week of unstructured lab time, in which groups can work on their coursework support from the lecturer and teaching assistants.

The infrastructure required to deliver the course is available under the Apache license. The resources below are provided under the CC BY-SA 2.0 deed (though photos are covered separately when noted).

Imperial Spring 2024 Slides & Coursework

These lecture slides and coursework specifications were used for the first delivery of the course. The coursework specifications refer to infrastructure (k8s clusters, container registries etc) that aren't accessible outside Imperial, and would need adapting for use in another context.

Week 1: Introduction

Week 2: Training

Week 3: Inference & Testing

Week 4: Deployment

Week 5: Reliability & Monitoring

Week 6: Ethics & Society (featuring guest lecturer Mari)

Week 7: Design

Week 8: Optional competition

During the lab session, run your system in an environment featuring non-trivial failures. The team with the highest score at the end of the session will win amazing prizes.

Credits

This course was inspired by the time I spent working with the amazing engineering teams behind DeepMind Health and Google Maps. Rob Chatley at the Imperial Department of Computing encouraged me to create the course, and taught me everything I know about lecturing. Mari wrote and delivered the ethics content. Cían ensured the course bears some resemblance to clinical reality. Carmen provided insight and thoughts on the supporting infrastructure. Mili and Alberto were the fearless teaching assitants for the first delivery.

Christian Kästner's Machine Learning in Production, taught at CMU, and Chip Huyen's Machine Learning Systems Design, taught at Stanford, were invaluable references while preparing the course.