Support DESK

Follow

S2.7.1 - SSIS Tutorial - What are all these key fields?

Previous Article matchIT SQL Index Next Article

So if you’ve set up a find matches or find overlap, you’ve seen that the keys were chosen for you depending on your matching level.

For starters, you can select a key up top, then view the drop down below
keyfields2.png


You’ll notice it lists both key fields and input fields

The key fields all start with an mk*, whereas the input fields refer back to the fields you mapped back in generate keys,

90% of our clients are fine using the default keys, maybe with a minor tweak such as adding a telephone or email to the keys

So why don’t we take a couple records and see what kind of keys they generate.

First we’re going to start with the name related fields

They consist of

mkNormalizedName – Normalised Name
mkName1 – phonetic last name
mkName2 – phonetic first name
mkName3 – phonetic middle name
mkNameKey – phonetic last name plus initial
mksuffix - Suffix
mkGender – Gender

So lets say we have two records that we’re matching

Record A

Fullname : MISS L FOLKES
Company: CUTTING EDGE DIRECT     
Add1: PHOENIX MILL LONDON ROAD
Add2: FAR THRUPP
Add3: STROUD
Add4: GLOUCS
Postcode: GL5 2UB

Record B:

Firstname: Linda
Lastname: Foulks
Company: Cutting Edge Direct Ltd
Add1: Phoenix Mill
Add2: London Road
Add3: Stroud
Add4: Gloucs
Zip: GL5 2UB

These are obviously the same person, but its not so easy to line up considering the concatenated data in Add1 in the first address and the additional Far Thrupp element, which is why we have the keys.

Mknormalizedname is the normalised version of the name, so even though its first/last and fullname in another, it will be more consistent here.  When we’re matching on name we put the emphasis on the last name, then the first name, then the middle name. we also will convert it to its nickname.

This field is important because when we do the scoring on name, this gives you insight into how the software sees the name.

So MISS L FOLKES becomes

FOLKES,L,

And  

Firstname: Linda
Lastname: Foulks

Becomes 

FOULKS,LYN,

mkName1 -

This is the phonetic of the last name – we follow the same order as the normalised name, that’s why name1 is the last name, because its what we are putting the most emphasis on.

Both have the same phonetic of

fylk

we use our own proprietary algorithm to generate these phonetic keys, the main thing to point out here is the y represents a vowel sound

mkName2 –

For the L, we’re just carrying through the first initial

For Linda we end up with a phonetic first name of lym

For the L, we’re just carrying through the first initial

 

mkName3 –

This is the phonetic of the middle name, in this case though, neither record has a middle name

mkNameKey – phonetic last name plus initial

in this case, the name key is the phonetic of the last name, and the first initial

its fylkL for both records

mksuffix

if we detect a suffix of some sort, that’s going to go here, commonly Jr, Sr,  2nd, 3rd,   etc

mkGender – Gender

This is the inferred gender, we use our names and words lexicon that has a list of common first names and their inferred gender, so we know for sure that Linda is female, and the MISS L FOLKES, the Miss is a common female indicator, so we will return one of four codes
M = Male

F = Female

E = Either  (Sam smith – could be short for either Samantha or Samuel)

X = conflicting ( ie:  Mr Christina smith )

So that covers the name fields

Now for company related fields,

Lets go through the list

mkNormalizedOrganization
mkOrgname1
mkorgname2
mkorgname3
mkorganizationkey

mkNormalizedOrganization

This field is important because when we do the scoring on name, this gives you insight into how the software sees the name.

Let's say we have two companies


The Palmer Air Charters service company

PAC inc

With the first example, by default we see ‘The’ as a Noise type word, and strip it out, we see  ‘service’ and ‘company’ as business type words and strip those out as well.  You can manipulate what we ignore or don’t ignore by reviewing the changes suggested in the company only matching video.

With companies we look at them left to right, as opposed to names where its last, first, middle.

So the first example becomes PALMER,AIR,CHARTERS

In the second example, we just recognise inc. as a business word, and strip that out, seeing the PAC in all caps, by default we assume it’s an acronym, so we simply end up with

PAC,,     with blanks for the 2nd and third words

mkOrgname1

in this case for the first example, it’s the phonetic of palmer

pymy

for the second example, since we recognised it as an acronym, we just pass through the PAC

mkorgname2

y   

in this case the phonetic of Air – which is basically one giant vowel sound

blank for the second example, because we’re ignoring ‘inc.’

mkorgname3

in this case, the phonetic of Charters

Fyty

And just blank for the third example

mkorganizationkey

a concatenation of the first 5 characters of mkorgname1 + first 5 of mkorgname2

This is commonly confused with the normalised organization, if you want your matches to be much tighter, you can use mknormalizedorganization instead  or just mkorgname1 + mkorgname2 separately, 

Address related fields

mkPremise
mkphoneticstreet
mkphonetictown
mkAddresskey
mkflatno
mkpostout
mkpostin

so lets take these addresses

A)

Add1: PHOENIX MILL LONDON ROAD
Add2: FAR THRUPP
Add3: STROUD
Add4: GLOUCS
postcode: GL5 2UB

B)

Add1: Phoenix Mill
Add2: London Road
Add3: Stroud
Add4: Gloucs
Postcode: GL5 2UB

mkpremise

There is no premise in this case but the house number isn’t always in the same spot so we look at the address lines as a whole to identify it

mkPhoneticstreet

Similar to the premise number, the street doesn’t need to be in the first address line to be identified. In this case, it has used the first two words 'Phoenix Mill' because it has identified the Road element and thinks the whole 'Phoenix Mill London Road' is the street, words like road/street/circle are generally noise words. In the second example, because the elements are split across two input fields, it identifies 'London' as the street. Noise words are identified because they are held in names.dat which by default doesn't contain 'Mill'. You can of course add it but since this is a potential Surname, it might be quite dangerous to do so. If you do add words to names.dat, please ensure you perform additional testing of the results to ensure there are no unforeseen consequences.

mkphonetictown

There is a file in C:\matchIT SQL\config\dataFiles\uk called Towns.dat which contains all the post towns in the UK, in this case we’ve identified Stroud because its in the list in Town.dat.

Mkaddresskey

By default, the phonetic street and town can both be up to 8 characters long, if you want to pare that town and be a bit looser, but still want to use a combination of both, then the address key is a concatenation of both, starting the first 4 characters of the phonetic town, then the first 4 characters of the phonetic street.

Mkflatno

A flat, or apartment number

Generally, if there’s some kind of numeric, or alpha numeric number that’s following the street

This one key is not very commonly used except in exact keys, more clients prefer changing settings like ‘must match premise’ and ‘no one empty premise’ to keep different apartments in the same building separate.

Mkpostout

For a UK Postcode “KT22 8DY”, the matchIT API populates mkpostout with “KT22”. Which may also be referred to as the  ‘outward part’

For US addresses, the mkpostout is the first five digits of the zip code

So with a zip of 28401-4273 , the 28401 would be in the mkpostout

 

Mkpostin

For a UK Postcode “KT22 8DY”, the matchIT API populates mkpostin with “8DY”, or the inward part of the full valid postcode

For US addresses, the mkpostin is the last digits of the 9 digit zip code, if just a 5 digit zip the mkpostin would be blank.

So with a zip of 28401-4273 , the 4273 would be in the mkpostin

Optional fields

mkTelAreaCode
mkTelLocalNumber
mkFaxAreaCode
mkFaxLocalNumber

These won’t be available unless you mapped a telephone or fax when you first set up your generatekeys task.  You can ensure these fields are populated by checking the advanced settings under generate keys.

 

In the case of UK phone numbers

For “01372 225 904” or “01372225904” the matchIT

API populates this with “01372”.

For “01372 225 904” or “01372225904” the matchIT

API populates this with “225904”.

 

In the case of US phone and fax numbers

For 914 – 600-7243

The 914  would be the area code part

The 6007243 would be the local number part

 

New Company Fields added with 2.3.1:

For consistency with the Real Time Processing for SQL Server deployment new keys were added, in the output tab of generatekeys if you click advanced, then you can pick additional key columns to output.

Lets start with company fields - for this exercise we'll still use

The Palmer Air Charters service company

PAC inc

 

mkNormalizedOrg1

Unlike the phonetic of the company(mkOrgName1) - this is instead the normalized version - think of this as a parsed out version of mkNormalizedOrganization

So for 'The Palmer Air Charters service company' , it will be 'Palmer' - as we're ignoring 'The' as a noise word so 'Palmer is the first significant word

For 'PAC inc'  it will be 'PAC' as we recognized that as an acronym

mkNormalizedOrg2

This is the Second significant word from mkNormalizedOrganization

So for 'The Palmer Air Charters service company' , it will be 'Air' - as we're ignoring 'The' as a noise word so 'Air' is the Second significant word

For 'PAC inc'  it will be blank as we recognize 'inc' as a noise word

 

mkNormalizedOrg3

This is the Third significant word from mkNormalizedOrganization

So for 'The Palmer Air Charters service company' , it will be 'Charters' - as we're ignoring 'The' as a noise word so 'Charters' is the Third significant word

For 'PAC inc'  it will be blank as there was just the initial acronym

mkOrgAcronym

This is the acronym we've derived from the first 2 or 3 significant words in the company name

For both examples the acronym will be 'PAC'   as we carried through the first letters from 'Palmer Air Charters' and recognised PAC as an acronym already in the other record, this may make it easier to line up some outliers when the address isn't as reliable because of vanity names for cities, which would throw off the addresskey that is generated.

If the company is a single word, an acronym will not be generated for it as a single letter, if a company has more than 3 significant words, an acronym is only generated off the first three.

 

mkAlphabeticOrganizationWordA

This is the first alphabetic Organization word- this may allow you to identify transpositions in company names when the order of the words isn't as consistent for some examples.

for our 'The Palmer Air Charters service company' example it actually is Charters, as air wasn't considered significant as it was less than 4 characters, for 'PAC inc' it was blank as the acronym isn't considered when populating this field.

mkAlphabeticOrganizationWordZ

This is the Last alphabetic Organization word- this may allow you to identify transpositions in company names when the order of the words isn't as consistent for some examples.

For our 'The Palmer Air Charters service company' example it actually is Palmer, as Service wasn't considered significant as it was a Noise type word lookup in the names and words, for 'PAC inc' it was blank as the acronym isn't considered when populating this field as it also falls under the 4 character minimum.

 

 

New Name Fields added with 2.3.1:

 

mkFirstNameIni

The initials of the normalised first name

For  'Mr. W. R Dayton Jr' , as well as for 

Firstname: Bill
Lastname: Deighton

it would be 'W' , as we normalise Bill to Will then take the initial of the normalised version.

mkNormalizedFirstName

This is the normalized first name

For  'Mr. W. R Dayton Jr' , it would be 'W'

For 

Firstname: Bill
Lastname: Deighton

It would be  'Will' , as Bill normalised to Will

 

 

mkNormalizedMiddleName

normalised second name, or mid

For  'Mr. W. R Dayton Jr' , it would be 'R'

For 

Firstname: Bill
Lastname: Deighton

it would be blank

 

mkNormalizedLastName

normalised last name

or  'Mr. W. R Dayton Jr' , it would be 'Dayton'

For 

Firstname: Bill
Lastname: Deighton

it should be 'Deighton'

In this case those examples don't line up, unlike the mkName1, as its using the last name as is instead of a phonetic version.

 

New Other Fields added with 2.3.1:

 

mkEmail

email address with casing standardised

mkEmailUser

user name portion of the email

mkEmailDomain

domain portion of the email

Previous Article matchIT SQL Index Next Article

 

Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.