Microsoft's free fuzzy matching plug-in
If you’re on a PC and you’re using conventional Office, have we got a deal for you: a tool that helps you quickly find good matches using fuzzy logic, with parameters and options galore. This guide will get you started, using Lead importing and dupe-detection as an example…but this tool is good for any of the objects in SFDC.
The Fuzzy Lookup add-on is available for download here.
Step 1: Setup
Once you've downloaded the Fuzzy Lookup add-on, open Excel. If there is a ribbon for Fuzzy Lookup, continue to step 2. If not, activate the add-on by navigating to Options and Add-ons. (Note: different versions of Excel have slight variations in the UI/navigation sequence. If your version doesn’t match what you see here, Google your version for instructions on how to get these steps done.)
Once in options, navigate to Add-ins (1),
Check (2) to make sure the fuzzy lookup plugin is both installed and inactive, and
Use the dropdown in (3) to manage COM Add-ins.
Hit Okay and check to make sure the Fuzzy Lookup ribbon has appeared.
Step 2: Data input
Create four sheets in a new workbook. Name the first Master Data, the second Subject Data, the third Analysis and the fourth Duplicates.
The Master Data sheet will eventually hold your deduplicated data, the data that will be uploaded into Salesforce. Begin by taking a single data set that has been internally deduplicated and insert it into the Master Data spreadsheet. Add in two columns, one for Row Number and one for Divider. Leave the row number blank for now, include some visual aid in the divider column.
Repeat the process for Subject Data in a second worksheet. Next, normalize the columns of the Master Data with that of the Subject Data (if you’re lucky, they’re identical…but never assume that). In the example, the Subject data has the following headers:
And the Master has the following:
Notice the “Email” in the Master Data is actually a “Website” column, and in the Subject Data the “Website” column has phone numbers, and the “States” column has city names. Make sure the contents of the Master and Subject columns agree, as in this example:
Notice the headers are also in the same columns. If necessary, copy and paste columns in either data set to make sure the two worksheets agree. This will aid in the merge of the data sets later.
Step 3: Table Creation and Analysis
For both Master and Subject worksheets, select the entire data set (but none of the empty rows and including the Divider row) and input Ctrl-L to create a table. Select “My Table has Headers,” and hit OK.
Next, select the first row of the Row Number column, and input “=ROW(”. You can then select Column A Row 2, which should input the text “[@Names]” or something similar. Close the parentheses on the “=ROW(” formula and hit enter. Row numbers should appear in that column for the whole table. This will be useful for reference later.
Navigate to the Analysis tab and place your cursor on A1. Open the Fuzzy Lookup add-on from the ribbon.
Hit the x to clear the default configuration.
Set up a new configuration, matching the columns with the same name, minus the phone numbers and any data columns that were added to the Master sheet (i.e., are blank in Master), and click the configuration button to create the configuration. Do the same with the phone number columns. Select default configuration for the first and PhoneNumber configuration for the second, as in the screenshot below. There are further configurations for Zip Codes, which can be used if applicable.
Click GO to run the fuzzy matcher.
Step 4: Analyze data and dedupe
Select the Similarity header and navigate to Data > Filter. Select the chevron next to Similarity and filter out the 0 values.
After hitting Ok, the data with a similarity of 0 will be filtered. What will be left are the records that the system thinks are duplicated based on the parameters set.
The Similarity column shows the confidence the system has of those records being duplicates.
The left of the Divider row is the Master Dataset and the right is the Subject Dataset.
Merge the datasets from the right of the divider to the left of the divider. Once done, use the Row Number reference on the left of the divider to find the correct row in the Master data set and copy the merged data into the Master dataset in that location.
Then use the Row Number on the right of the divider to find the same row in the Subject dataset, and remove that row, shifting the other cells upward to close the gap and pasting the duplicate row into the Duplicates sheet. In the Subject Data sheet, delete the now-empty row to clear up the blank space.
Finally, copy the deduplicated data from Subject into the merged data of Master (notice the row numbers aren’t being copied).
You will now have a larger Master data set that has been both deduped and merged.
Clean up the Subject datasheet, deleting all the headers and data so it’s ready for a new table.
Repeat steps 2-4 for all the Subject Datasets you have, leaving the merged and deduped Master Data in place. When you run out of datasets, you will have deduplicated data in the Master Sheet that you can then insert into Salesforce using its built-in data Import Wizard.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.