Support DESK

Follow

S2.6.1 - SSIS Tutorial - Understanding Matching - Why did this score 84 on Individual Level?

Previous Article matchIT SQL Index Next Article 

Why did this score 84 on individual

Example 1:

Record A
Fullname: MISS L FOLKES
Company: CUTTING EDGE DIRECT
Add1: PHOENIX MILL LONDON ROAD
Add2: FAR THRUPP
Add3: STROUD
Add3: GLOUCS
Postcode: GL5 2UB

Example 2: 

Record B
First Name: Linda
Last name: Foulks
Company: Cutting Edge Direct Ltd
Add1: Phoenix Mill
Add2: London Road
Add3: Stroud
Add4: Gloucs
Postcode: GL5 2UB

And we’re trying to match them on name or company.

The first question we should be asking is actually if they're even being compared in the first place.  Because remember our matching works as a two step process, we line it up on the keys, then score it.

So we have 3 default keys

 key

A

B

mkname1 + mkpostout

fylk + GL5

fylk + GL5

mkname1 + mkphoneticstreet

fylk + fymyksmy

fylk + lymtym

mkpostout+ mkpostin

GL5 + 2UB

GL5 + 2UB

Because the records have the same sounding last name and the same outbound section of the postcode, we able to find it as a potential match on this key.

But because they have the same sounding last name and but slightly different street names, we weren't able to find it on the 2nd key

Finally because we had the same postcode on both records, we also able to identify the possible match on the third key (although because we found it on the 1st, we’re not going to compare it twice) 

So we know its being compared, often clients may wonder why something isn’t scoring, but the actual issue is that it wasn’t being compared in the first place, so that should be the first thing you check. 

So now that we know its being compared, let's find out what it's going to score.

By default, the score is a cumulative score, we don’t work on percentages, we try to look at it more like a human would, instead of a machine.

There are three main components to the score

Name
Address
Postcode

 

1) Name Scoring

With the Name, we want to look at the mknormalizedname, as mentioned previously, we look at it as last/first/middle.

So we have:

Fullname : MISS L FOLKES

Firstname: Linda
Lastname: Foulks

Which normalised to:

FOLKES,L,

And

FOULKS,LYN,

 

There’s a matching matrix xml that we have with many pre determined decisions, so when we look at this from left to right we see this pattern:

 

The last name sounds approx

The first initial is L which is the first letter of Lindal, so we assume that’s equal.

And neither has a middle initial, so this is both empty

 

So

Last = Sounds Approx
First = Equal
Middle = Both Empty

If we look at the matrix stored by default in
C:\matchIT SQL\config\matchingMatrices\individualLevel\namematchingmatrix.xml

If we navigate through it, we see the pattern it follows, and the associated score. So in this case, the possible score is equal to 25 points.

Whereas  if we had a Will Dayton and Bill Dayton, it would be Equal, Equal, both empty, and score sure, which is 60 points

Some advanced clients will replace the sure/likely/possible entries in the matrix with decimal values, to get more granularity in their results 

 

2) Address

 When we score on the address, we look at the address lines as a whole, we don't explicitly match address1 to address1, address2 to address2, etc...

Record A:
Add1: PHOENIX MILL LONDON ROAD
Add2: FAR THRUPP
Add3: STROUD
Add4: GLOUCS

 

Record B:
Add1: Phoenix Mill
Add2: London Road
Add3: Stroud
Add4: Gloucs

We use our own proprietary algorithm that looks across these columns as a whole.

In this case, because Phoenix Mill and London Road are contained in both addresses and the town and county's are the same, it scores 29. If less than 50% of it was right, we’d end up just throwing it out and not scoring at all.

Address scores range from 15-30 by default in the UK

Or 20-40 by default in the US

3) Postcode

In this case both records have the same postcode 

If you look at the weights under postcode, you’ll see that the score for a sure match is 30.  You can access these weights from findmatches or find overlap tasks, each matching level has its own set of weights.

Sure:

A sure score is when you have two records with the same postcode

Likely:

UB7 7PQ matching

UR7 7PQ

Where only postin is the same   = likely  = 20

Possible: 

M2 8HG matching 

M2 = possible  = 15

 

4) The Cumulative Score

So names are based on the matrix, address is based on an algorithm that looks at the address lines as a whole, and the postcode has its own separate rules.

Once we go through all 3 we add up the score

Name = 25
Address = 29
Postcode = 30

Total =  84

To get more insight into why your matches scored what they did, ensure you’re breaking out the component scores, its an option in your findmatches/findoverlap task when you’re showing advanced options.

Previous Article matchIT SQL Index Next Article
Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.