Support DESK

Follow

Running a matching job to identify duplicate records

  Previous Article Next Article

To create a new processing, job simply choose the create job options on the Jobs Control Panel or from your Dashboard; this will open up the Create Job Wizard; now you can just follow the on screen instructions, but by way of example, let’s walk through the creation of a new job step by step.

Firstly, you need to choose the type of job you want to run – there are 2 options:

  • Find matches within a single dataset
  • Find matches that overlap two datasets

Representing the 2 kinds of job currently supported. The first option allows you to identify duplicate records within a single file as well as producing a cleaned output with the matches removed.

The second option allows you to match the records from one dataset against another and get a set of matches for records that are in both sets as well as producing the unique set of records from the second dataset (i.e. records in the second dataset that weren't also found in the first dataset).

Choose Find matches within a single dataset and you will now be asked to choose your dataset.

From here you can either choose to upload a completely new dataset, or pick an existing dataset. If you’re using the matchIT On Demand trial option, then you will have a couple of demo datasets already loaded which you can pick from (this have been provided by in the matchIT On Demand Trial to make it easy to have an initial play).

Let’s choose Upload a Dataset. Any data uploaded to matchIT On Demand is uploaded over secure FTP and then held on a secure, encrypted datastore within the matchIT On Demand system.

create_a_job.png

Choose Browse and select a suitable file for matching (e.g. a file with names, addresses, postcodes etc). Now click Upload.

You will see the progress bar fill in as the file is uploaded. Once it’s fully uploaded, choose Columns. Next simply tell matchIT On Demand what type of data is in each column – if you don’t see a column type that matches the data, just choose other.

Finally you need to give your dataset a name. You can also add a description, a company (e.g. if your data belongs to a specific client, then you might enter the client name here) and choose whether the data should be shared with other users.

Now let’s configure the job settings. Set the nationality of your data (for example, if most of your data is British, then choose UK). Now choose from the following matching levels (you can select more than one):

  • Individual – identifies contact level matches based on forenames, surname, address, email etc
  • Family – identifies family level matches based on surname, address etc
  • Address – identifies address level matches based on the address data etc
  • Business – identifies business level matches based on the business name and the address data etc

Choose Individual and then choose the matching tightness  - choose Tight so that we only keep the best, most certain matches. IMPORTANT: only use the Loose tightness setting, if you know that your data is good quality, i.e. it doesn't contain a lot of missing first names or dummy values such as Null, N/A etc. Loose will produce the most matches, but you'll want to review the lower scoring matches manually.

Finally you need to give your job a name. You can also add a description, a company (e.g. if you’re running a job for a client, then you might enter the client name here) and choose whether the job should be shared with other users.

Congratulations, you’re now ready to run your job!  Just click ‘Add Job To Queue’.   Your job will now be scheduled and run as soon as possible by the matchIT On Demand matching engine. As soon as your job starts running, you will see its status change in the Jobs View and Dashboard.   Once it’s complete, you can click on the View results option and you will see a page similar to the following:

matching_results.png

You can see that a variety of result files will be available for each match level:

  • Matching Summary - click here
  • XML Statistics - click here
  • Matched File - click here
  • Deduped File- click here
  • Duplicates - click here
  • Matching Groups - click here
  • Matching Pairs - click here

Now simply choose which result files you want and click Download Selection.

TIP:

Use the Master Record Identifier field in the Matched File output to apply a filter in Excel, which will allow you to review matches and un-flag records that you subsequently decide to keep.  To apply a filter in Excel, simply open the Matched File, click on the Data menu option in the top menu bar and then click on the filter icon. You can now click on the Master column header and choose whether to see the records marked as dupes, the clean records, or all of the records.

matched_file.png

 

 

  Previous Article Next Article
Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.