Support DESK

Follow

S4.3 - msp_BulkGenerateKeys

Previous Article matchIT SQL Index Next Article 

Input Parameters:

  • Configuration file – the file path of the configuration file to be used when this procedure is run.
  • Datasource ID – specifies the data source to be used within the configuration file, which contains the table and column mapping specifications.

 

This procedure generates the match key field values for all records in the current database (using the connection string specified in the datasource).

The following settings are specific to the match key generation. These settings can be configured through the web UI or directly in the XML configuration file used during execution of the procedure:

 

Setting

Description

dataSources

Specifies the database connection, table and column mappings used to define the dataset being processed.

Only one table can be marked as the keys table using the isKeysTable option.  The keys table that will be populated when this procedure runs will have the name specified in the name attribute of the table marked as the keys table.  The recommendation is to not actually specify a keys table, so that the name of the keys table will be automatically generated (see the suffixes in the outputSettings section of a standard matchIT SQL configuration XML).

If you are using msp_BulkGenerateKeys (recommended) then one table may be added in the data source section to generate a cleaned, standardised version of your input data.  To configure this table, add a table with the isOutputTable property set to true.  This table can be used to proper case your data, parse names and simple address validation – please see the matchIT API setting options described here.

generalSettings->ensureUniqueRefIsClustered

If enabled, then the uniqueRef column specified in the first table mapped in a data source is checked to ensure that it is referenced by a unique clustered index.  (Note that this can be, but is not required to be, a primary key column.)

outputSettings->reports

Specifies whether reporting is enabled, what folder the reports will be produced into, and what report format should be used.

matchITAPISettings->generate->dropExcludedWords

With this property set to True, during the generate step matchIT SQL will flag any records that contain exclusion words in any of the key fields (fields such as addressee, company or the address lines). Such exclusion words include “Deceased”, “Addressee” (indicating a record may be a header record) and any other Exclusion type entries in the NAMES.DAT file. Records are flagged by setting the first character of the mkDataFlags field in the generated keys table to “X”.

Note that the default location of the names.dat file is:

C:\Program Files\Common Files\matchIT API\dat

matchITAPISettings->generate->properCase

If the ProperCase property is set to True, the fields within the configured output table created during key generation (with the exception of premise (i.e. building) numbers and postcodes) will be correctly cased.

matchITAPISettings->generate->considerCasing

If this property is set to True, then matchIT SQL will consider the casing of the incoming data when it is splitting the data up for extracting keys, proper casing, and so forth.

For Example, with consider casing switched on the company:

ABCD Systems Ltd

ABCD would be considered to be an acronym rather than simply a word (i.e. on output without this option, ABCD would be output as Abcd).

matchITAPISettings->generate->specialCaseMac

Where a last name begins with Mac, when formatting salutations, matchIT SQL follows this with a small letter or a capital letter, depending on this property.  A value of True will mean that MACLEAN will be formatted as MacLean.  You can add exceptions to the rule (e.g. Maccabee, Macclesfield, MacKay, Mackie) to the NAMES.DAT file.  If you invariably want to use a lower case letter following Mac, set this property to False.  

NB:  Names beginning Mach are always formatted with a lower case H, e.g. Machin, Machinery.  Names beginning Mc are formatted with a capital letter following, if they are greater than 3 characters long.

matchITAPISettings->generate->variableKeysMaxLength

This specifies the maximum length of various variable-length phonetic keys created.  Such keys are PhoneticLastName, PhoneticFirstName, PhoneticMiddleName, PhoneticOrganizationName1, PhoneticOrganizationName2, PhoneticOrganizationName3, PhoneticStreet, and PhoneticTown.

The default is eight characters.

matchITAPISettings->generate->quality->enabled

By default, quality scoring is disabled and all quality scores are 0.

Enabling this feature allows matchIT SQL to generate quality scores for data fields such as:

Names

Emails

Addresses

Company Names

These results are written to the output table configured in the Datasource being used during the GenerateKeys process.

matchITAPISettings->generate->quality->address->allowBlankPostcode

If disabled (enabled by default) then addresses without a postal code are restricted to a maximum quality score of 1.

matchITAPISettings->generate->quality->email-> webmailFiltering

If enabled (default) then email addresses that use webmail provider (such as Hotmail, Yahoo, & mail.com) domains are restricted to a maximum quality score of 7.

LOW LEVEL ADVANCED SETTINGS

The following settings are low level and most users will not normally need to modify the default settings.

matchITAPISettings->generate->Name->joinMarriedPrefixes

 

With this property set to True, multiple addressees with the same last name will be treated as married e.g. input names of “Mr. John Smith and Ms. Mary Smith” or “Mr John Smith & Mary Smith” would have a Salutation generated of “Mr and Mrs Smith” and a Contact generated of “Mr and Mrs J Smith” or “Mr and Mrs John Smith”.

matchITAPISettings->generate->Name->generateContact

 

With this property set to True, matchIT SQL will generate a contact for the input name. The contact will be structured in same way as you would expect to find its corresponding input name on e.g. the front of an envelope. For example, the input name of “John Smith” or “Mr John Smith” would result in a generated contact of “Mr J Smith”.

An accurate contact value cannot be generated when matchIT SQL is unable to determine the gender of an input name. In this situation, the generated contact would be equal to the input name. e.g. “J Smith” as an input name would result in a generated contact of “J Smith”.

matchITAPISettings->generate->Name->contactFullname

Set this property to True to include the full first name of any incoming name in the CONTACT field; just the initial will be used if the property is False.  For example, if the property is True, and the incoming name is “John Smith”, then the generated contact will be “Mr John Smith”, if it is False, then the contact will be “Mr J Smith”.

 

matchITAPISettings->generate->Name->defaultSalutation

This property determines the default salutation, either where matchIT SQL can't determine one (for example, C Smith or Chris Smith, which could be either Mr or Ms), or where the Prefix supplied doesn’t have a salutation rule.  

If you include the word ”Dear” as at the start of the default salutation (i.e. actually specify "Dear Customer" and not just "Customer",  then all the salutations derived by matchIT SQL will start with the word "Dear" unless the salutation for the type of title (or prefix) specifies "Title" only.  For example, Mr J Smith will result in a salutation of "Dear Mr Smith" whereas The Bishop of Liverpool will result in a salutation of "My Lord".

 

matchITAPISettings->generate->Name->defaultGender

The Default Gender property is the gender to assume when matchIT SQL can’t determine whether the name is male or female e.g. Chris Smith, C Smith.  If you set this property to Male or Female, matchIT SQL will assume it to be male or female accordingly, and develop a salutation using Mr or Ms as the prefix.

matchITAPISettings->generate->Name->UseEquivalentName

If you set the Use Equivalent Name property to True, matchIT SQL replaces the first name with its equivalent from the NAMES.DAT file, if there is an entry for the input first name. This enables, for example, “Tony Smith” and “Anthony Smith” to be picked up as a match. The initial of the original first name is stored in the Record.DataFlags property to enable, for example, “Tony Smith” and “T Smith” to still be matched.  

matchITAPISettings->generate->Name->EnhancedDoubleBarrelledLookup

When enabled, this property will cause an unrecognised middle name to be considered part of a non-hyphenated double-barreled last name (for example, where the full name is John Harrington Jones, the last name will be considered Harrington-Jones because Harrington is not a recognised first name).

matchITAPISettings->generate->Name->processBlankLastName

With this property enabled, a blank lastname will cause extra processing to be performed on other input data to help detect typographical errors.  For example, if a firstname was entered but not a lastname, then it’ll be assumed that the firstname is in fact the lastname and match keys will be generated rather than being left blank.

matchITAPISettings->generate->Name->replaceAndWithAmpersand

By default, matchIT SQL will convert ‘and’ to an ampersand when outputting InputFields.Name.Addressee.  Disabling this property will prevent this behavior.

matchITAPISettings->generate->Name->parseNameElements

When enabled, this will cause input name elements (including prefix, firstnames, and lastname) to be parsed.  If matchIT SQL deems any values to have been entered into an incorrect field (for example, suffixes and qualifications in the lastname field), it will reassign these values into the correct fields.

This property is disabled by default, so that any such incorrect values are not reassigned.

matchITAPISettings->generate->Name->detectInverseNames

With this property enabled, matchIT SQL will attempt to identify addressee names that have been specified with the lastname preceding the firstnames, provided a comma delimiter follows the lastname (for example, “Smith, John” where Smith is the lastname).  Without a comma, a name is assumed to be in standard left-to-right format, with the firstnames preceding the lastname.

matchITAPISettings->generate->Name->parseAsNormalizedName

When enabled, addressee names are assumed to be in a delimited normalised format similar to the NormalizedName value that’s output by during Key Generation.  Currently supported delimiters are spaces, commas, semicolons, and pipes (‘|’).

matchITAPISettings->generate->Address->Extract->premise

This will move or copy premise numbers found in the address lines into a field labeled PREMISE in the table configured as an output table during key generation.

matchITAPISettings->generate->Address->Extract->thoroughfare

This will move or copy address data recognised as the thoroughfare of the address (based on Address type entries found in the NAMES.DAT file) into a field labeled THOROUGHFARE in the table configured as an output table during key generation.

matchITAPISettings->generate->Address->Extract->town

This will move or copy address data recognised as the town or city from the address lines to a field labeled TOWN in the table configured as an output table during key generation.

matchITAPISettings->generate->Address->Extract->postTownsOnly

If this is enabled, together with Extract->Town, then only post towns (i.e. any towns found in the TOWNS.DAT file) will be moved or copied.

matchITAPISettings->generate->Address->Extract->region

This will move or copy US, Canadian or Australian states or provinces, or valid UK counties (or other regions found in the NAMES.DAT file), that are found in the address lines into a field labeled REGION in the table configured as an output table during key generation.

matchITAPISettings->generate->Address->Extract->postcode

This will move or copy UK postcodes, or US zip codes found in the address lines into a field labeled POSTCODE.

Only UK postcodes with an outward half that is valid according to the MAILSORT.DAT file will be extracted.  

matchITAPISettings->generate->Address->Extract->country

This will move or copy valid countries found in the address lines (based on Country type entries found in the NAMES.DAT file) into a field labeled ‘COUNTRY’ in the table configured as an output table during key generation.

matchITAPISettings->generate->Address->abbreviateRegion

Set this property to True if you want matchIT SQL to abbreviate States or Provinces when processing address lines e.g. to change “Pennsylvania” to “PA” within the table configured as an output table during key generation.

matchITAPISettings->generate->Address->upperCaseTown

This applies to UK addresses only.  Set this property to True to convert the post town in the address to capitals within the table configured as an output table during key generation.  

Note that, if the ProperCase property is set to False, then this property is ignored.

matchITAPISettings->generate->Address->verifyPostcode

If set to True, this property verifies and corrects the format of the postcode. Numerics are changed to alphas and vice versa where appropriate.  This feature makes use of the rules concerning the alphanumeric structure of the postcode. E.g. it changes “KT22 BDN” to “KT22 8DN” – it will change 0, 1, 5 and 8 to O, I, S and B, or vice versa, if that makes the postcode alphanumerically correct. matchIT SQL will not verify or correct the format of postcodes that are not in the postcode field.  The cleaned postcodes are output to the table configured as an output table within the datasource.  

matchITAPISettings->generate->Address->defaultThoroughfareLine

This property is used when the matchIT SQL is generating a phonetic address key, for which it needs to know the thoroughfare (e.g. street) and the town in the address. If it cannot locate a thoroughfare in the address, usually because it cannot find a word to indicate one, such as “Street”, then it will be assumed that the thoroughfare is the contents of the address line indicated by this property (if it is greater than zero).  For example, if this property is set to 2, then matchIT SQL will take the contents of address line 2 as the thoroughfare if it cannot find a thoroughfare word in the address.  This property should only be used if the addresses in your data are very rigidly structured.

matchITAPISettings->generate->Address->numOfLinesToScan

This property enables personal names to be extracted from address lines. It can be set to 1 or 2. If set to 1, only the first address line will be scanned for names. If set to 2, both the first and second address lines will be scanned and have names extracted from them if found. Any personal names found can then be used for the generation of Contacts and Salutations.

If either or both of the Organization->Extract->Jobtitle and Organization->Extract->Name properties are used in conjunction with this one, matchIT SQL will not only scan the ADDRESSEE field for job titles and business names, but will also scan the corresponding number of address lines.

 

matchITAPISettings->generate->Address->premiseFirst

When parsing an address, this Boolean property indicates whether to expect the premise or flat number to come first in address lines when the flat is not explicitly specified (e.g. “Flat 5”).

matchITAPISettings->generate->Organization->Extract->jobTitle

This will copy or extract job titles contained within the ADDRESSEE field into a field labeled JOB_TITLE within your output table.

Job Titles are recognised by having a word or string defined as a Job Title in the NAMES.DAT file e.g. Director.

matchITAPISettings->generate->Organization->Extract->name

This will copy or extract any business names contained within the ADDRESSEE field into a field labeled COMPANY within your output table.

Business names are recognised by having a word or string defined as a Business word in the NAMES.DAT file e.g. Ltd.  Care should be taken when using this property, as words like "Bank" can be taken to indicate a Business when this isn't the case (e.g. it may be a last name or part of an address line).  

If you want Extract Company Name processing to be applied also to the first one or two lines of the address, you must Set the property Generate->Address->NumOfLinesToScan to either 1 or 2.

matchITAPISettings->generate->Organization->joinInitials

Set this property to True if you want a group of initials separated by spaces or dots in a company name to be concatenated. For example, if this property is True, then “I B M” and “I.B.M.” will be replaced by “IBM” within the company field of your output table. Note that, if the Generate->ProperCase property is set to False, then this property will have no effect.

 

matchITAPISettings->generate->Organization->useEquivalentName

If this property is set to True, then the equivalent (according to the NAMES.DAT file) of words indicating a business name, such as “Motors” or “Services” are included in the NormalizedOrganization field in the generated keys table and the corresponding phonetic keys. This enables, for example, “Wood Green Cars” to match “Wood Green Motors” well (because “Cars” has an equivalent of “Motors”), but ensures that neither of them match “Wood Green Carpets” well.

If you want tight legal matching turned on so that for example, ‘Wood Green Cars Limited’ will match ‘Wood Green Cars Ltd’ but ‘Wood Green Cars Group’ will not match ‘Wood Green Cars Ltd’, then in addition to setting this option to True, you will also need to modify the ‘matchITAPISettings>datPath’ property to the location of the tight dat files which can be found in a subfolder called ‘Tight’ under each region folder in C:\matchIT SQL\config\dataFiles\... .

If you set this property to True, you should change any words in the NAMES.DAT file that you do want ignored, such as “Ltd” and “Inc” to Noise type so that they are not included in the NormalizedOrganization field. As a rule of thumb, if you are doing business matching on a file that is very geographically concentrated, that is, contains records mostly from the same immediate area, then set the Generate->Organization->UseEquivalentName property to True, otherwise set it to False.

matchITAPISettings->generate->Organization->normalizationTruncation

Disabled by default (i.e. set to 0) If this setting is enabled, and the organization consists of more than four words, then the third element of field NormalizedOrganization within your generated keys table will be truncated to the first N characters of each word after the first two (where N is the value of this setting).

matchITAPISettings->generate->Organization->ignoreParentheses

With this property enabled, any words that are enclosed with parentheses within an organization name will be excluded from the generated phonetic organization keys. This can be useful for records such as Remnel Ltd and Remnel (UK) Ltd, to ensure records with these company names are compared if the phonetic organization keys are being used as part of composite match keys.

matchITAPISettings->generate->Organization->ignoreTrailingPostTown

This property, when enabled, will exclude from the phonetic organization keys any trailing post town (defined in the towns.dat file) or UK county that appears at the end of a company name.

For example, the phonetic organization keys for Handso Ltd and Handso Essex Ltd will be the same to help ensure such records will be compared.

 

1.1.1.1      Match Key Fields

The match key table can contain the following fields:

 

Column Name

Description

(unique ref)

Unique reference for each record – either directly specified for the keys table in the table mappings, or taken from the first table in the mappings.

mkNameKey

Phonetic representation of the name. Optional.

mkOrganizationKey

Phonetic representation of the company name. Optional.

mkAddressKey

Phonetic representation of the address lines. Optional.

mkPhoneticStreet

Phonetic representation of the thoroughfare. Optional.

mkPhoneticTown

Phonetic representation of the town/city. Optional.

mkPostOut

First part of the postal code/zip. Optional.

mkPostIn

Second part of the postal code/zip. Optional.

mkName1

Phonetic representation of the lastname. Optional.

mkName2

Phonetic representation of the firstname. Optional.

mkName3

Phonetic representation of the middle name or initial. Optional.

mkOrgName1

Phonetic representation of the first word of the company name. Optional.

mkOrgName2

Phonetic representation of the second word of the company name. Optional.

mkOrgName3

Phonetic representation of the third word of the company name. Optional.

mkTelAreaCode

Telephone area code. Optional.

mkTelLocalNumber

Telephone local number. Optional.

mkFaxAreaCode

Fax area code. Optional.

mkFaxLocalNumber

Fax local number. Optional.

mkName2Found

Indicates whether the firstname was found in the names.dat file. Optional.

mkNormalizedName

Normalised version of all of the consumer name data.

mkGender

Generated gender based on the name data provided

mkSuffix

Any suffix data extracted from the name fields in the source data.

mkNormalizedOrganization

Normalised version of the Organization name.

mkPremise

Premise data extracted from the address lines. Optional.

mkFlatNo

Sub premise extracted from the address lines. Optional.

mkDataFlags

Flag field generated during key generation. Please see more information about the Dataflags here.

mkMasterPriority

Master priority calculated for the record based on the completeness of the data as defined by matchIT SQL’s Master Priority Matrix. Used during the grouping of matches to help determine the master record.

mkAddressLength

The calculated length of the address data contained in the source data. Used during the grouping of matches to help determine the master record.

 

The optional columns can be configured in the outputSettings->keyColumns node in a configuration file; most are enabled by default, but key columns not required by any match keys can be disabled to help improve performance of key generation and deduplication.

Columns not marked as optional will always be added to the keys table and cannot be disabled.

Previous Article matchIT SQL Index Next Article

 

Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.