< Back to Blog

The Importance of Reproducibility in Machine Learning applications

Written by Alan Estes, Director, DecisivEdge™

When science’s reproducibility crisis hit, its impact went far beyond creating distrust. A large number of published experiments couldn’t be carried out by other researchers to obtain the same results.

For scientists, this was a failure of wasted time and money. A company experiencing a reproducibility crisis involving machine learning projects would be the equivalent of a “fiscal Armageddon.” But your company can avoid a reproducibility crisis with its machine learning projects.

Understanding reproducibility

Reproducibility with respect to machine learning means that you can repeatedly run your algorithm on certain datasets and obtain the same (or similar) results on a particular project. This process encompasses design, reporting, data analysis and interpretation.

In a perfect world, the inner workings of a machine learning project should be the embodiment of computational transparency. However, it’s not always clear if a machine learning project is reproducible.

Data changes, different software environments or versions and numerous other small variations can result in failure. Part of this problem is derived from a poor approach to machine learning implementations, such as when an outside company develops the project but keeps the client in the dark about it.

The importance of reproducibility

Reproducibility adds value to any continuous integration or continuous delivery cycle. It allows these activities to proceed smoothly, so in-house changes and deployments for clients become routine and not a nightmare.

Reproducibility helps your teams reduce errors and ambiguity when the projects move from development to production. Reproducibility ensures data consistency, which can become challenging if no one is sure that the machine learning project results are actually correct.

Also, a reproducible ML application is naturally built to scale with your business growth.  The attention to ensure the pipeline is appropriately architected and coded will lend itself to meeting the, hopefully, growing demand for speed and volume of model execution.

Moreover, reproducibility creates trust and credibility with the ML product.  The assurances of a properly designed, built, and deployed ML project can be conveyed via ML pipeline.  Which cascades through project stakeholders (strategy teams, policy teams, operations, design, compliance/risk) having confidence in the ML component of a project.  The pipeline framework communicates the required transparency in an organization.

Regulatory agencies are implementing rules governing business use of machine learning. Heavily-regulated industries already understand the benefits derived from the consistency, reliability and transparency that comes from developing reproducible machine learning projects. Previously unregulated industries now find they are suddenly responsible for a tremendous amount of documentation and codebase refactoring exercises.

Building reproducibility into machine learning

Machine learning projects should begin with reproducibility in mind. It should be applied to every aspect of the project. Everything from the software and environment to the development and deployment requires a mindset of reproducibility.

One crucial step in creating reproducible projects is to ensure that documentation starts on day one. The documentation process should explain why certain choices were made, as well as a range of important details needed to successfully execute the project – what Philip Stark refers to as “reproducibility”. It should also track proposed hypotheses, experiments and outcomes.

Team members should also build and deploy with a pipeline mentality.  This does not refer to a single start-to-finish codebase that executes for each hypotheses or production model run, although it can.  In reality, it typically takes the form of sequenced modules of code that perform the function of a typical model dev or production scoring step:  for example, data acquisition, feature engineering, feature reduction (if development), tune candidate models (if development), and scoring.  The theme is the output from one step is the input to the next step.  There is no preferred prescribed “pipelining” methodology or framework; choose a solution that fits your environment.  The result is a codebase that can be adapted to production with minimal friction, can be tested/debugged consistently, can scale efficiently, and most importantly:  a transparent and reproducible machine learning application.

Choose experts to drive success

Creating reproducible machine learning projects require a major shift in thinking.

Considering a machine learning project? Contact me directly to learn how your business can find true value in our services and the help you need to succeed.

About the Author:

For over three decades, Alan Estes has been leveraging data to solve business problems, improve customer experiences, and automate complex tasks. A leader in the Data Science domain, he has successfully developed and deployed numerous machine learning solutions in an open‑source analytical tech stack. Alan has lead enterprise projects in a variety of business functions across the financial services product life-cycle.

At DecisivEdge, Alan is responsible for setting the agenda and driving the sustainable growth of the Data Science practice, ensuring that the company’s data science professionals are devising efficient solutions for our clients and deepening client partner relationships.

Prior to joining DecisivEdge, Alan lead the Data Science team for the Small Business Banking business at Capital One. He has also worked at Sallie Mae, K2 Financial (a consumer lending start‑up), Bank of New York, FirstUSA/Chase and has served as an economist on the Board of Governors of the Federal Reserve System.

Alan graduated from the University of Redlands with a BS in Economics and holds an MA in Economics from Virginia Tech.