Enterprise Grade

Background

A financial services firm is struggling to meet strategic goals of growth and customer satisfaction. Management consultants are brought in, and they quickly pinpoint the biggest contributing factor to the problem: the poor performance of information systems used by telephone sales representatives.

More specifically, locating an account in the CRM application is difficult and often takes up to a minute to complete, which is a long time when a customer is waiting on the line. The search function is sluggish and offers no tolerance for typos, often forcing sales representatives to ask customers to spell their name - a terrible way to start a sales pitch or to soothe an angry customer.

Following this discovery, a decision is made to revamp the information systems, starting with the CRM application. An ambitious target is set: sales representatives must be able to locate customer accounts within 10 seconds, and any given search result must come back within one second. The new capabilities must also be rolled out without requiring additional training.

Initial approach

The firm relies heavily on a commercial RDBMS to power its information systems. They decide to engage the database vendor in their effort to speed up the search capabilities of the CRM application.

The vendor sends its best database engineers, who tweak and optimize every bit of the data access layer, sparing no expenses to improve things:

Upgrade the SAN to an all-flash disk array
Double the size of the database cluster
Cache all relevant database tables in memory
Create countless database indexes and materialized views to speed up queries
Optimize traffic patterns on the network card and storage adapters of all physical servers

A staggering amount of money later, the search function still takes over 20 seconds to return results, and still doesn't forgive typos made by users.

The problem with text search

To speed up queries, most RDBMS rely on indexes. The most common type of index is called B-tree.

A B-Tree is a great indexing strategy in general. It stores values or keywords in a tree format, where each node represents the middle point of its two children. Locating a piece of information in that type of tree is much faster than scanning the entire table one item at a time.

There is one important caveat: for this type of index to work, the entire value (or at least the beginning) must be known. For instance, if the goal is to find names that contain "ohn", a B-tree index won't help locate "Johnson" faster than a table scan.

This explains why sales representatives searching for a customer by name in the CRM application experience bad results, regardless of the number of B-tree indexes created by the database engineers.

Solutions in need of a problem

The performance of the CRM application still doesn't meet the requirements and pressure is mounting inside the organization. The database vendor engagement is terminated, and the management team starts looking for other solutions.

As if often happens in large organizations, sharks can smell blood in the water. Pet projects and failed R&D initiatives are suddenly touted as the ideal solution to the problem:

Let's switch to a different database vendor like we've suggested since 2003
Let's buy that shared-nothing database appliance we considered last year
Let's follow the advice of the innovation team and move everything to the cloud
Let's use the Hadoop data lake built by the Business Intelligence division
Let's implement AI like we did in Project Millennial

This is the point where we become involved with the firm.

Fuzzy search to the rescue

To seasoned IT consultants, the requirements for the search function in the CRM application are not outlandish. Sales representatives must be able to:

Get search results within one second
Locate customer accounts within 10 seconds
Use their existing skills, with no additional training

This is a textbook use case of fuzzy search, and as it happens, this type of solution is supported by the existing commercial RDBMS used in the CRM application. It simply hasn't been enabled, which remains a mystery given the small army of database engineers sent by the vendor to solve this specific problem. A runner-up would have been to utilize phonetic search capabilities (such as Metaphone or Soundex), which are also supported by the product.

In addition to leveraging the native capabilities of the existing database, we outline a few alternatives to the client:

An open-source enterprise search platform, such as Apache Solr
A document-oriented NoSQL database that has powerful text indexing capabilities, such as MongoDB
A container-based serverless platform such as OpenWhisk, with custom containers that leverage pattern-matching algorithms such as the Levenshtein distance

Slightly weary at the idea of engaging the database vendor again, the client indicates a preference for the Solr alternative, which is a known platform in the organization. Leaving nothing to chance, we prepare a benchmark to outline the performance of Solr, running a large corpus of real-life queries logged by the CRM application. The results are spectacular - most queries are completed within 50ms - and quite stable, even under heavy load.

Mission accomplished

Since it utilizes standard communication protocols, the client is able to quickly integrate the Solr platform with the CRM application. First reports indicate that customer accounts are now located well within the 10 second limit requirement.

In search of better performance