Mortgage Data and Doc Processing

OCR is Yesterday’s Stale Bread

1 0
Read Time:3 Minute, 5 Second

We used to hear a lot of chatter in the marketplace about “OCR this” and “OCR that” and how it’s the next best thing since sliced bread.  Well, it may have been until someone came along and slathered some warm melted Kerrygold butter on it (this is not a paid advertisement) and a healthy sprinkle of cinnamon – sounds much better, right?

Optical character recognition (OCR) is in fact legacy (“stale” for the sake of the analogy) technology. Sure, it has its purpose of turning images, which we know there are manyyy of in the mortgage industry, into text or readable content but it cannot efficiently or effectively understand the content without assistance. It is also greatly challenged by variations of patterns with documents that it is asked to read.

To explain it further, when ‘reading’ loan documents, rule and template-based OCR tools count on data being found in approximately the same location on every document – which almost never happens. Complicating matters are the wide variation of data patterns found in most loan documents. As a result, these types of tools work best when identifying information on structured documents like the URLA for instance, leveraging template or a keyword search to find and extract information.

OCR doesn’t work so well with low image quality docs, docs with a high degree of variation like a Note, or unstructured docs like gift letters. “Data-picking” on these documents is often incomplete and inaccurate, requiring a human “checker” to capture with the OCR technology missed. Hiring a staff of human checkers of course adds a tremendous amount of cost to the process, not to mention time is money. And when you attempt to speed through it accuracy suffers further.

It’s time to consider a fresh approach. Data and document processing technology has evolved well beyond OCR and looks more like warm cinnamon bread these days than a simple standard slice of fluffy white bread. More sophisticated providers, like LoanLogics, are using OCR lifted content as foundational layer before more sophisticated machine learning technology takes over.

Automated Document Recognition (ADR) leverages machine learning, continuously trained by multiple examples of the same document, to classify more documents, more accurately at great speed using automation alone. Similarly, in Automated Data Extraction technologies, special programs are developed to pinpoint, extract and structure data from documents using textual analysis, not a keyword search. The result of this automation is a lightning fast, accurate data that can be compared across a variety of sources (including structured and unstructured documents), making thousands of data elements available for a variety of intended use cases.  

The reason lenders, servicers, investors and insurers WANT the data off the documents in the first place, is so they can make informed decisions. If it’s not accurate, what good is it to feed:

  • it into emerging AI driven tools that can improve the borrower experience,  
  • the use of more sophisticated credit modules & underwriting decisions,
  • processes that help reduce risk at acquisition or sale?   

Sure, OCR has its advantages. It’s well priced for what it does but, anecdotally, it only gets you “40%” of the way there. That might explain why on a recent industry webinar an industry doc processing provider said they were using 3 different OCR engines to get the job done. Three! Something to consider when evaluating vendors this year….…will it be sliced bread or something more appealing?

For more insight into how data and doc processing is evolving beyond traditional OCR tools, check out this 2020 LoanLogics MReport article, “The Capture 2.0 Revolution: and/or watch the webinar “Man vs. Machine: How Machine Learning is Transforming Traditional OCR.

Melissa DeBlasio

About the Author

Melissa DeBlasio

As part of the LoanLogics Account Management Team, Melissa DeBlasio is responsible for driving customer satisfaction, client retention, and ensuring the overall customer experience exceeds expectations. Prior to her role as Account Manager, Melissa spent five years as product manager for the company's LoanHD® Investor Module for Correspondent Loan Acquisition solution. In this role she was responsible for the product's market research, planning, strategic product decisions, including its roadmap, and implementation.
Tagged
Melissa DeBlasio

About Melissa DeBlasio

As part of the LoanLogics Account Management Team, Melissa DeBlasio is responsible for driving customer satisfaction, client retention, and ensuring the overall customer experience exceeds expectations. Prior to her role as Account Manager, Melissa spent five years as product manager for the company's LoanHD® Investor Module for Correspondent Loan Acquisition solution. In this role she was responsible for the product's market research, planning, strategic product decisions, including its roadmap, and implementation.
View all posts by Melissa DeBlasio →