Support DESK

Follow

Cortex - Introduction

  Cortex Index Next Article 

Cortex offers a drag and drop workflow environment for deduplication. It does this through a range of tools that make accessing, cleaning, and outputting data faster and easier. Data preparation tools that include Matching and Normalization functionality. This allows you to find matches in and across any combination of data sources including SQL Server, Excel, & delimited files.  Note that if you would like another database connector, please let our support team know and we will introduce additional connectors according to demand.

introduction.png

Requirements

OPERATING SYSTEM

Cortex is compatible with Microsoft Windows 2008/7/10 and Microsoft Windows Server operating systems (2008 or more recent). We strongly recommend using a fully up-to-date and patched operating system, as this will benefit Cortex in terms of robustness, stability, and security.

RAM

Cortex can run entirely in-memory. As the volume of data increases, memory requirements also increase. It is highly recommended that Cortex is used on a machine with enough memory to sufficiently process the data without requiring disk storage.

As a rough guideline:

  • a machine with 8 GB of RAM should comfortably process 15 million rows;
  • a machine with 16 GB of RAM should comfortably process 30 million rows;
  • a machine with 32 GB of RAM should comfortably process 60 million rows;
  • a machine with 48 GB of RAM should comfortably process 80 million rows.

If overlapping two sources of data, then use their summed row counts with these guidelines (for example, 100 million vs. 20 million would require 80 GB of RAM.

Note that these figures are highly dependent on factors such as:

  • the average size of each row (these figures assume an average row size of 150 bytes);
  • which match keys are used (refer to the Configuration Guide for details on match keys);
  • the amount of duplication in the data.

Normalization: Note that when an engine is configured for normalization, a row of data added to the engine is discarded immediately after it's processed and output; it is otherwise not retained in RAM. The above RAM requirements are therefore not applicable, and memory usage is minimal.

 

Disk

Cortex can fall back to storing data on disk, for example if memory usage exceeds a predetermined threshold. This can significantly impact performance, but will allow for processing greater volumes of data. Should disk usage be necessary, then fast disks (such as SSDs) are highly recommended.

 

Activation

The first time you run Cortex, you will be prompted to enter an activation code - simply enter your code in the activation window (see below) and your install will be activated.

activation.png

Main Window

Run matchIT Cortex. The main window is divided into 5 main areas.

mainwindow.png

Menu Bar

The menu bar contains various application command buttons and a progress bar.

Buttons Description
NewOpenSaveSaveAs.png Used to clear the workflow canvas, open a previously saved workflow, save the current workflow, and save the current workflow with a different name.
Activate.png Used to enter or update an activation code.
Config.png Application settings like, log file and log severity.
About.png Opens an “About” dialog with version information.
PasteCutCopyDelete.png Used to edit workflow canvas tools and connections.
RunStop.png Start the current workflow running and abort a running job.

 

Cortex Log

Use the Config button to determine whether messages, warnings, errors should be written to a log file and determine the path of that log file. 

Console

The console shows messages. Configuration options let you choose which severity of messages to see, from:

  • Debug
  • Information
  • Warning
  • Error
  • Fatal

Toolbox

The toolbox contains all the tools you can drag onto the workflow canvas divided into the categories: Input, Process, and Output.

Input

inputgroup.png

The input category has tools to load data from:

  • Databases - currently, Microsoft SQL Server is supported. If you would like another database connector, please let us know by contacting our support team and we will introduce additional connectors according to demand.
  • Delimited files (Tab delimited, comma delimited, etc);
  • Spreadsheets.

Process

processes.png

The process category has tools for

  • Matching – dedupe single table or overlap two tables;
  • Normalization – produce normalized version of input data;
  • Grouping – group previously matched pairs for multiple runs;
  • Union – union two data source with the same layout;
  • Select – select a subset of the input columns to pass through to downstream components;
  • Sample Groups – filter the Matching groups output to produce a sample of groups for each score.

Output

outputgroup.png

The output category has tools to output data to:

  • Databases (SQL Server)
  • Delimited files (Tab delimited, comma delimited, etc)
  • Spreadsheets

Workflow Canvas

Tools are dragged from the toolbox onto the workflow canvas, where they can be connected together:

workflowcanvas.png

  Cortex Index Next Article 
Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.