Databricks, the company behind big data processing and analytics engine Apache Spark, contributes open source machine learning platform MLflow to The Linux Foundation. The announcement was made by Matei Zaharia, the creator of Apache Spark and MLflow projects, in his keynote presentation at the recent Spark AI Summit 2020 Conference which was held as a global virtual event.
MLflow was created to help data scientists and developers with the complex process of ML model development, which typically includes the steps to build, train, tune, deploy, and manage machine learning models. It manages the entire ML lifecycle, from data preparation to production deployment, including experiment tracking, packaging code into reproducible runs, and model sharing and collaboration, and is designed to work with any ML library.
Zaharia said the move to contribute MLflow to the Linux Foundation is an invitation to the machine learning community to incorporate the best practices for ML engineering into a standard platform that is open, collaborative, and end-to-end. The Linux Foundation provides a vendor neutral home with an open governance model to help with promoting the adoption and contributions to the MLflow project.
Michael Dolan, VP of strategic programs at the Linux Foundation:
The steady increase in community engagement shows the commitment data teams have to building the machine learning platform of the future. The rate of adoption demonstrates the need for an open source approach to standardizing the machine learning lifecycle.
MLflow currently offers four components:
It has built-in integrations with several deep learning and AI frameworks like Tensorflow, PyTorch, scikit-learn, H2O.ai, Amazon Sagemaker and others. There are also several organizations using and contributing to MLflow some of which include Microsoft, Splicemachine, University of Washington, Accenture.