Written by Keith Oelkers, Data Science Practice Lead and Zane Murphy, Data Scientist at DecisivEdge™
When discussing data science projects with our prospective clients, we typically ask them about their model validation practices. This helps us gauge the reliability of their existing models as well as their potential to sustain new ones. Working with smaller institutions in the financial services industry, we often find that clients don’t give much thought to validation if their models effectively address the immediate task at hand. In this blog post, we will make the case for the importance of model validation at every scale by discussing the benefits of its implementation and exploring an extreme example of how detrimental unvalidated models can be.
Most institutions in the financial services industry have implemented statistical models to optimize a variety of business processes. Well built, properly-regulated models increase operational efficiency by providing data-backed insights that drive informed business decisions. In contrast, erroneous/unstable models can have a significant negative impact on the institutions that implement them, thereby constituting a form of operational risk. Numerous guidelines have been issued to mitigate this risk, which has given rise to a series of standardized practices. Perhaps the most important, yet most overlooked of these practices is model validation.
What Is Model Validation?
Model validation is the component of modeling concerned with testing that performance is stable and in alignment with project goals. This includes justification of the assumptions and limitations of a model, review of the building protocol, and ongoing testing and monitoring. Proper validation should be completed by a third party (that is, someone other than the model developer) to eliminate potential bias in the review process, and herein lies the reason it is so often overlooked. Finding analysts with the necessary domain knowledge to confidently validate models is difficult, especially given price and time constraints. For this reason, some institutions ignore validation, which can result in unstable and/or unreliable models.
Why Is It Important?
Despite the additional hurdles presented by model validation, it is a beneficial process for all involved parties. Primarily, validation gives a simple sanity check to the modeling team by addressing oversights, providing additional insights, and verifying that performance is as reported. Validation also incents the modeling team to extensively document the building protocol so that the validating team can accurately recreate it. In short, models that are transparently-built, thoroughly reviewed, and re-creatable perform better than those that are not. Validation provides a return on investment by preventing poor future performance and should therefore be valued much more than it currently is.
Model Validation and the Financial Crisis
A lack of proper validation procedures played a significant role in the financial crisis of 2008, when incorrect assumptions were made about the mortgage-backed securities market for which many models were developed. These assumptions went mostly unchecked, causing models to operate outside of their limitations. This led to inaccurate market predictions, resulting in misinformed purchasing decisions that ultimately threw the economy into a recession. Following the financial crisis, the Fed and OCC issued the Supervisory Guidance on Model Risk Management, outlining their recommendations for building models that do not cause, but can withstand economic turmoil. Specifically, this report stresses the importance of third-party validation as an unbiased test of model performance.
One of the many protocols that arose from the financial crisis is the current expected credit loss methodology (CECL). CECL outlines the steps to ensure adequate performance of credit-loss provision models in extreme economic conditions. This guarantees that banks are reserving appropriate funds in case an equivalent economic crisis to that of 2008 were to occur. As a result of CECL, mid-to-large-sized banks are required to subject their loss-provision models to DFAST and CCAR stress-testing. This has given banks an entirely new incentive to have their models validated, as failing a stress test would show that their models are not sufficiently robust or sustainable.
Statistical models are becoming increasingly pervasive and complex with the improvement of data warehousing solutions and new analytic techniques. Some larger banks have created roles for a Chief Data Officer (CDO) whose job is to transform company data into a strategic asset.With much current work being done on machine-learning interpretability, it is even within reason to expect that organizations will start implementing models that are currently considered “black boxes” within the next decade. More institutions are realizing the value in the analysis of their data every day, and it won’t be long before model validation is a standard practice in institutions of every size and in every sector.
If you are grappling with the issues outlined in this blog, are interested in sharing your experience and perspective with us, or would like to learn more about our insights and capabilities in this space, please reach out to Keith Oelkers directly by emailing email@example.com or calling (302)-299-1570 x410.
If you enjoyed this article, you might also enjoy ‘Data Science for Mid-Tier Banks & Specialty Lenders’.
About the Authors:
Keith Oelkers brings 20 years of experience in Data Science and Statistical modeling. He understands the power of data science and big data and how it can be used to enhance the bottom line. Keith has spent his career building solutions that enable business leaders to make better decisions. He can breakdown complex statistical terminology and provide organizations with a clear vision on how statistical modeling and data science should be used.
Zane Murphy is a Data Scientist at DecisivEdge, where he builds statistical models and provides support for the development, configuration, and optimization of the analytics servers. He has several years of experience with statistical programming and data analysis through his work on bioinformatics projects in various labs while in college.
Zane is now focused on data science and analytics in the financial services domain and recently finished development of a loan response model for a direct marketing firm. He has an insatiable appetite for learning and is always seeking a more efficient solution to the problem at hand.
Zane holds a BS in Biology from the California Institute of Technology and is in the process of completing a Big Data Analysis certification course at the University of Delaware.