Support DESK

Follow

S2.4.4 - SSIS Tutorial - findoverlap vs findmatches

Previous Article matchIT SQL Index Next Article 

Find Overlap is for matching two separate data sources.  So you would be setting up two generate keys, pointing to your two separate tables.

 

First, let's go over the difference between an overlap vs. an internal dedupe (the findmatches/groupmatches tasks) as this is a common point of confusion.

 

When you’re matching internally (findmatches), you have these 3 records for example

 

1

GILL LAUGHTON

GRETCH LTD

165 CLOCK TOWER ROAD

ISLEWORTH

MIDDX

TW7 6DT

2

MRS J LAWTON

GRETCH GROUP

CLOCK TOWER RD

ISLEWORTH

MIDDX

TW7 6DT

3

JILL LAWTON

GRETECH LTD

CLOCK TOWER ROAD

ISLEWORTH

MIDDLESEX

TW7 6DT

 

These 3 records may not link directly, but if we know 1 matches 2, and 2 matches 3, then we can determine that all 3 go together.  And when we link them we give a base score – which is the weakest link in the set.  When you’re doing an internal dedupe its all about gaining that single customer view.

Now that is good for an internal dedupe.

 

Although when overlapping (findoverlap) it’s a different approach, as opposed to linking records together and choosing the weakest link as the baseline, you want to instead choose the best match

So instead let's say we had two tables

Table1 lets say is your existing customer table, and has 3 separate records (this may indicate that it should be deduped, but that may be a separate project)

Miss G LAUGHTON

GRETCH LTD

165 CLOCK TOWER ROAD

ISLEWORTH

 

 

TW7 6DT

MRS J LAWTON

GRETCH GROUP

CLOCK TOWER RD

ISLEWORTH

MIDDX

TW7 6DT

JILL LAWTON

GRETECH LTD

CLOCK TOWER ROAD

ISLEWORTH

MIDDLESEX

TW7 6DT

And we have table 2 – lets say this is a feed of new sales

GILL LAUGHTON

GRETCH GROUP LTD

165 CLOCK TOWER ROAD

ISLEWORTH

TW7 6DT

John Smith

DARCY CONTROLS

WINCHESTER STREET

BASINGSTOKE,  RG21 7EY

 
Smith, John D'ARCEY CONTROLS 34 WINCHESTER STREET BASINGSTOKE RG21 7EY

Now, we can only link that incoming record to one of our three existing customers. 

So instead what we’ll do is findoverlap , it will compare the 3 customer records against our incoming table.  In this case – we may have found 2 or 3 separate matches, but in this case – it’s about picking the best match, that way we don’t make a new customer record when we shouldn’t.

But we still have the issue that we’re attempting to link 3 new sales, but only caught 1.  So John Smith placed 2 orders, but really should only have 1 account.   So, in this case, you should take the records that were not matched in your overlap, and possibly run them through a dedupe internally as well.   That way we only create one new customer account, instead of 2.   

You shouldn’t rely on just overlap, or just findmatches necessarily. Ideally a combination of both will get you the best end result. It's just a matter of figuring out the order that works best for you.

We do recommend before running an overlap, that at least one of the tables would ideally be deduped internally. Otherwise, you can end up with many to many relationships

Previous Article matchIT SQL Index Next Article

 

Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.