Recent developments in the Telecom sector have seen operators being mandated by the government to check cases of multiple connections on a single identity proof.
The need really is that of data de-duplication – with a broader context of identity search; a specialized area where solutions have been around for decades, but technology is creating new benchmarks.
Identity Search & Dedupe
Identity search implies finding records in your database to match a search query in the face of the inevitable manual and system errors that exist. In the Telecom context, this means the ability to identify multiple instances of the same individual amongst the mobile subscribers – individuals with multiple connections.
Dedupe is a specific application of identity search where all duplicates in a database are identified – each record in the database is checked for probable matches in the rest of the database and clustered according to the probability of match.
However, like many things in life, coalition programs are not without risks.
Three situations where you need this:
Situation 1: Cleaning up your master database
Everybody with a customer or member database would like the records to be unique. If you have 200,000 records – does that mean 200,000 individuals, or just 150,000? An inflated member base leads to inflated marketing costs and poor ROI on direct campaigns making de-dupe an increasingly critical step.
Situation 2: Merging databases for a campaign
The second situation is one that arises despite a clean master database, when you merge it with a partner database or a bought list for a campaign. Here the need is to check for duplicates in the merged database to prevent reaching out to the same individual twice in the campaign.
Like many successful business practices, coalition loyalty programs need focus and savvy planning to deliver on their promise.
Situation 3: Record finding and checking
Apart from the one-off instances where databases need to be cleaned up, it is likely there will be more instances where you need to find a single record in the system – without the benefit of having a unique code to find that record.
For instance you might want to check each new member at the point of enrolment against your existing database to catch possible overlaps before they even happen.
Error sources and De-dupe
Duplicates and difficulties in identifying a record in a database happen simply because errors are unavoidable: Either the data on file is erroneous or your search name is incorrect.
Different errors could be of Different Spellings (Sandeep Vs. Sandip) Missing Words (Sandeep Vs. Sandeep Mittal), Extra Words (Sandeep Mittal Mumbai Vs. Sandeep Mittal), Word order variations (Mittal Sandeep Vs. Sandeep Mittal), System Induced errors such as truncations (Sandeep Mit Vs. Sandeep Mittal) and other assumptions that are language and culture specific.
What this means is that simple solutions such as finding exact matches in a spreadsheet are completely inadequate (though widely practiced) when it comes to de-dupe or identity search.
Common solutions and limitations
Exact name searches:
Hardly a perfect solution, this approach is likely to throw up very few results as exact matches in a database are infrequent. Also, it is not necessary that searching for an exact match is “better” than finding an approximate match.
Wild card searches:
Searching for Sandeep* or *Mittal* type wild card searches will help overcome some variations, but still not take care of a number of error types. The bigger problem is that it still depends on the user correctly selecting what text to include in the search. Also, doesn’t take care of nicknames and abbreviations whilst throwing up irrelevant results.
Keying partial words:
Typing in SNDP MTL or SAN MIT instead of the entire word can help reduce data entry time, but this makes covercoming error and variation difficult.
Text retrieval techniques:
Text retrieval is relatively sophisticated and can throw up good matches especially when they have phonetic algorithms but they typically require large computing resources and are time consuming.
Soundex:
Patented in 1918 by Robert Russell this method indexes names based on a phonetic algorithm. It basically converts a name into a letter followed by three numbers.
Eg. SANDEEP = S531
Soundex basically helps convert error prone data into correct indexes, however it is also not a perfect system as the algorithm needs to be tailored to local nuances.
Other systems:
A number of other de-dupe and identity search methods exist, and newer software not just overcomes the inherent limitations of accurate search, but allows carrying out complex operations on databases running into millions of records.
Winding Up
This article gives a quick overview of de-dupe and identity search need highlighting some of the limitations of popular solutions. If you wish to find out more about solutions capable of handling millions of records with the most powerful identity search algorithms, contact us!
|