Support DESK

Follow

H6.1 - matchIT Hub for Spark - Introduction

Previous Article matchIT Hub Index Next Article 

Apache Spark™ is a fast and general engine for large-scale data processing. Apache Spark is an open-source cluster-computing framework.

The matchIT Hub for Spark product provides transformation steps that add the deduplication functionality of matchIT Hub to Spark. This allows you to find matches in and across any combination of databases supported by Spark.

Prerequisites

Spark

Download and install Apache Spark from https://spark.apache.org/.

Spark can be run on Windows, but Linux is the platform of choice. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source.

Java

Download and install Java from http://java.com/en/download/manual.jsp.

Maven

You can run the sample apps out-the-box, as-is, but if you intend to modify and rebuild them, the easiest way is using Maven and the supplied pom.xml files.

Download and install Maven from https://maven.apache.org.

Previous Article matchIT Hub Index Next Article 
Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.