I am an independent software engineer specialising in building open-source software for scientists, with a focus on large-scale distributed storage and processing.
I currently work on Cubed, an open-source project that I founded for processing large multi-dimensional array datasets, and sgkit, a toolkit for scalable genetics that uses PyData projects like NumPy, Xarray and Zarr.
Previously, I was an early engineer at Cloudera, where I worked on numerous projects in the Apache Hadoop big data ecosystem. I wrote the bestselling book “Hadoop: the Definitive Guide” published by O’Reilly.