Support DESK

Follow

H6.4.2 - matchIT Hub for Spark - DedupeHive

Previous Article matchIT Hub Index Next Article 

The DedupeHive application demonstrates using the matchIT Hub for Spark ‘Row’ classes to work with a Hive datasource, Rows, and Spark Dataset & RDDs.

DedupeHive.png

Configuration

The command line argument to DedupeHive is the name of a configuration file. This is an xml file in the following format:

<?xml version="1.0" encoding="utf-8" ?>
<config>
<dedupeHive>
<warehouseLocation>/user/hive/warehouse</warehouseLocation>
<!-- Define one input for single table Matching -->
<input>
<table>input</table>
</input>

<!-- Define two inputs for Overlap Matching
<input>
</input> -->

<!-- Output database and table name. -->
<output>
<table>matchingPairsSpark</table>
</output>
<delimiter>\t</delimiter>
<licenceFile>./activation.txt</licenceFile>
</dedupeHive>

<hub>
<data>
<input table="0" columns="|UniqueRef|FullName|Company|Address1|Address2|City|State|Zip" />
<options>...</options>
</data>
<matching>
<outputs>...</outputs>
</matching>
<threads>0</threads>
<advanced>
<nationality>USA</nationality>
</advanced>
</hub>
</config>

The <dedupeHive> section is specific to this application.

warehouseLocation Location of the Hive warehouse.
input Details of an input database table.
table The name of a database table.
output Details of an output database table.
delimiter The delimiter used when converting records to delimited string in order to pass to the underlying matching engine.
licenceFile A file containing the product activation code.

The <hub> section configures the underlying matching engine. Refer to the matchIT Hub documentation for details. The <hub> section must contain the following sub-sections: data, matching, threads, advanced.

Running the sample

The DedupeHive sample can’t be run out-the-box like the DedupeTextFile sample because it requires a Hive database source. Nevertheless, a sampleconfig.xml and run.sh are provided to demonstrate how to set this up to run.

Previous Article matchIT Hub Index Next Article 
Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.