Data cleansing

Data cleansing refers to data management of information accumulated over time by individuals and businesses. This is a serious problem for some Atlanta businesses and we want to discuss it and give you the best tools and resources the market offers currently! So, let’s dive into data cleansing.

What Is It?

The process of data cleansing is going through all of the data within a database and either remove or update information that is incomplete, incorrect, improperly formatted, duplicated, or irrelevant.

Why is it important?

Eventually, information becomes outdated making an update to addresses, names, websites, emails, and other information necessary. If all your data is compiled in one area it could take an extensive amount of time to organize everything after piling up information for years. That’s why it’s important to regularly perform data cleansing.

How often that ends up being depends on a variety of factors, such as how much information you have and how many different places the information is stored. It’s also important not to cleanse too often or you may waste time on something that isn’t as big of a priority during your typical schedule.  

Data Cleansing For Individuals

Data cleanse for small business

Credit card details or banking information, tax information, birth dates and legal names, mortgage information, and more can actually be stored on various files on your computer. All this information can become overwhelming very fast making it increasingly more difficult to find the most recent paperwork.

You may have to wade through dozens of old files before you find what you are actually looking for. Disorganization can lead to stress, lost or damaged documents. Data cleansing ensures you only have the most recent files and important documents, so you can manage your documents more efficiently. It also helps ensure that you do not have significant amounts of personal information on your computer, which can be a security risk.

Data Cleansing For Businesses

Businesses generally hold on to a lot of personal information, business info, employee info, and customer or client information. Accurate customer information helps you know your audience better and contact customers if needed. Having the newest, most accurate information will help you get the most out of your marketing efforts.

If your information is accessible then your teams and employees don’t have to search through overcrowded archives, maximizing work hours and increasing overall productivity. Unlike individuals, businesses must ensure that the personal information that they store is kept safe and organized, otherwise your business’s reputation could be at risk or in extreme cases make you and business vulnerable to legal risk.

Data Cleansing Steps

Data cleanse steps

Grab some coffee, this might take a while.

Analyze Data

Step one to data cleaning is to put all the data in one place and have it record in a document such as a spreadsheet. You can’t assess your data if its in individual files spread across your computer, you want your data compiled in a single location so you can assess the data as a whole. When making your assessment you may want to keep certain questions in mind:

  • Does my data seem to make sense?
  • Are there any duplicates, and if so, is that okay?
  • Does numerical data add up and make sense?
  • Are there spelling errors or numbers where there shouldn’t be?

Data Cleaning

Before you start deleting data, it’s wise to be cautious and create a copy of your spreadsheet then make any changes to the data within the copy instead of the original. This is to help protect yourself in case you make a mistake. This extra step will be worth it for peace of mind and ensuring your efforts have not create more problems then you started with.

Data Functions

Addressing every single error manually will be extremely time consuming and tedious. Specifically, making use of functions and developing some skills in applications like Microsoft Excel, will help you data cleanse more efficiently. When using functions there are some other tips you may need to keep you work organized.

Unique IDs

Unique ID columns will restore data to its original order after sorting through it multiple times. For instance, use column A as a unique identifier to insert consecutive numbers starting from 1. It may be simple, but it’s very effective. When you’ve put your Unique IDs into column A, go back to your original paper sheets and write the Unique ID there as well.

Manipulating Data

Have each column one column per variable. The variables are all the pieces of information that you are observing, measuring, counting and collecting, like age, gender, distance, temperature, etc. that can change as part of the study. Each variable should have its own column, and each variable should correspond to just one piece of information. If you’re recording a composite variable made up of 2 or more constituent parts, like Body Mass Index – made up of Height and Weight – then record them in separate columns. You can always combine them into a single variable later.

Use row 1 as the variable name, eventually you’ll need to analyse your data and you may need to export it to a statistical program. The standard for pretty much all commercial stats programs is for the first row to be reserved for the name of the variable and all other rows for the data.

Every cell should be filled with placeholder information.It is quite common to use ‘illegal’ numbers as codes to give you information, so where the entries for a variable can only be positive values, we can use codes such as: If negative numbers aren’t useful, then use letters a, b, c, etc. If you are working in excel, here are some functions that could help with manage your data.

Excel Functions for Data Analysis

  1. Concatenation: =concatenate(Cell 1, Cell 2) – Combines different information into one cell.

2. Len: =LEN(A1) – It returns the number of characters in another cell. Use LEN when creating title tags or descriptions that have character limits.

3. Index Match: = INDEX(B:B,Match(A1,A:A,0)) And a simplified explanation of the function: =INDEX(Column of Data You Want to Return,MATCH(Common Data Point You are trying to Match, Column of Other Data Source that has Common Data Point,0))

4. Logic Functions: =IF(logical_test,value_if_true,value_if_false) – simple explanation from excel: IF(Something is True, then do something, otherwise do something else)

=AND(logical1,logical2,…) – Utilize this with the If function to create multiple two or more logic rules that occur together

=OR(logical1,logical2,…) – Utilize this with the If function to create multiple scenarios

=SUMIF(range,criteria,sum_range) – Sums a range that follows a specified criteria. The criteria range and sum range can be different, but must have the same size range.

=SUMIFS(sum_range1,criteria_range1,criteria1,…) – Sums multiple ranges based on multiple specified criteria. Each criteria and sum range can be different. Note the difference in syntax from the sumif.

=AVERAGEIF(range,criteria,average_range)  – Averages a range that follows a specified criteria. The criteria range and sum range can be different, but must have the same size range.

=AVERAGEIFS(average_range1,criteria_range1,criteria1,…) – Averages multiple ranges based on multiple specified criteria. Each criteria and average range can be different. Note the difference in syntax from the averageif.

=COUNT(value1,value2,…) – Counts number of cells with numbers in them.

=COUNTA(value1,value2,…) – Just like Count, but returns count of text cells.

=COUNTBLANK(range) – Returns count of blank cells.

=COUNTIF(range,criteria) – Returns count of cells that follow a specified rule.

=COUNTIFS(criteria_range1,criteria,…) – Returns count of cells that follow multiple specified criteria.

5. IfError: =IFERROR(value,value_if_error)


=IFERROR(index(B:B,match(A1,A:A,)),”No Data”)

Data Cleansing Software

Another option is to use data cleansing software if you are unsure of your ability to do it yourself or lack the time. Below is a list of some of the available software on the market and a brief description of them.


OpenRefine is a standalone open source application for data cleaning and manipulation to other formats, often referred to as wrangling.

Trifacta Wrangler

Trifacta is a data wrangling software that allows you to analyze, compute, and manipulate large data quickly and effectively.


Drake is a simple to use, text based data wrangling software. It is especially designed for data workflow managment.

TIBCO Clarity

TIBCO Clarity is the data cleaning component of a larger branch of TIBCO software. It is a single point of contact for massive and disruptive data sources.


Winpure is a cheaper alternative to some other larger data cleaning software that is designed for the basic user and not just IT professionals.

Data Ladder

Data Ladder is a world-wide data tools set that strive to deliver quality in its product. Great resource for enterprise sized users.

Quadient Data Cleaner

Quadient Data Cleaner is a data profiler that aims to quantify the quality of your data and clean it accordingly.


Cloudingo aims to mostly target and clean salesforce data and eliminate those pesky duplicates. In addition, it does all of the traditional data cleaning procedures.


Reifier connects information sources, and gives a look through which service requests GDPR can be easily handled.

10 IBM Infosphere Quality Stage

10 IBM Infosphere is mostly for enterprise clients that need a good clean up across multiple interfaces and servers.

For more tips and tricks for Atlanta business, make sure to check out our Facebook advertising strategy article!