A large national bank hired DecisivEdge to leverage natural language processing to assign call reasons from unstructured text and introduce strategies that were previously not obtainable. The solution also automated call topic labeling that previously relied on inaccurate/missing data supplied by call center agents.

The project parameters defined by the client required the solution to be data driven, yet not model driven. Using a TF-IDF framework DecisivEdge met the project parameters provided by the client:

  • Define categories of interest: Created a comprehensive list of hierarchical call reasons based on feedback from several corporate stakeholders.
  • Build Dictionary/Lexical resource: Created a rule repository using significant word/phrases in a dictionary form to capture call reason.
  • Create conditional logic: Some of the phrases and words define call reasons, others act as ancillary evidence and their presence could add more weight to the categorization logic.
  • Validation: On classifying call reasons, manually assess classification performance.


The data science team at DecisivEdge used its natural language processing experience with contact center operations and financial domain knowledge to develop a solution on pre-existing call transcripts.

The solution was a unique combination of natural language processing and heuristic logic, where holistic and meaningful text mapping is achieved through a powerful lexical resource in a dictionary form.

Below are three components that were used to define call reason logic and make sure the solution worked flawlessly.

  • Dictionary: This plays a vital role in call reason capturing. Dictionary is created based on extensive study of call transcript for each call reason.
  • Keyword identification: Using TF-IDF (a widely used method) to determine the importance of a word. This context was also used to promote a word to a keyword.
  • Uni grams & Bi grams: For identifying phrases that would be part of defining a call reason.

All three components above were used to sync with business understanding and logic.

To understand the efficiency of the outcome, the following were addressed:

  1. False positives: Created primary and secondary reasons which will have 3 levels that would summarize the call reason efficiently.
  2. Lack of context: To include the importance of surrounding words, we used n-grams to determine the validity of any given match.
  3. Validation: Conducted manually on a randomly selected sample.
  4. Adaptability: Created an object oriented rule bank that can be updated independent of the operational process.


DecisivEdge tested the performance of the new solution which confirmed that the new solution outperformed the existing call reason tagging done by the agents.

  • The new solution had perfect match for call reasons with 63% & 67% accuracy for in time data and out of time data, respectively.
  • The new solution had close match (existing reason is matching with secondary reason) for call reasons with 25% & 21% accuracy for in time and out of time data, respectively.
  • The new solution had a combined 12% error rate compared to the 30% audited error rate with the legacy system.
  • The lift of the solution can be observed by understanding call reasons with NA/NULL. In the new solution at least one reason is assigned to a call, however the legacy process observes a NA/NULL rate at about 27%.
  • Using data for two out of time validation periods, we tested the stability of the solution that recorded consistent performance increases.

In addition to the above results, DecisivEdge provided the client with a number of proof of concept strategies to further enhance the project ROI.

The new solution is currently in production mode and a Tableau dashboard is being developed for daily monitoring purposes.