• Zev Isert

What is data quality and why is it important?

Updated: Mar 2, 2021

In today’s business landscape, where AI and analytics dominate budgets, and businesses are using data to backup every decision, data quality is paramount. There is a reason why the phrase “garbage in, garbage out” has become commonplace in the vocabulary of CEO’s and data scientists. Low quality data can cause the failure of data-driven solutions. But before we get into the issues that low quality data brings, we should first talk about what data quality actually is.

Although certain aspects of data quality are dependent on industry, there are some universal requirements.

Generally, high quality data is:

  1. Complete.

  2. Missing values are removed or altered to a domain specific default.

  3. Unique.

  4. Duplicate entries are removed or consolidated into an accurate representation.

  5. Timely.

  6. Old, redundant records are removed.

  7. Valid.

  8. Records should make sense in the context of the rest of the dataset and the column specific constraints.

  9. Accurate.

  10. Entries should be accurate and not misreported, whether via a human error or a technology error.

  11. Consistent.

  12. Record entries and their metadata should be consistent to allow for ease of processing.

Overall, high quality data is clean, easy to process, and accurately represents the underlying system.

Now that you know what data quality is, you must understand why it is important. Here are five reasons you should value high quality data in all aspects of your business.

1. Marketing

It can be difficult to ensure the accuracy of customer data. You may have collected the data yourself or bought it from a third party. Using a high quality dataset in marketing initiative will allow you to understand your customers to a much greater extent, and create branding and content that will speak to them.

2. Competitive Advantage

Having better data than competing brands will offer you a competitive advantage. According to the Economist, “the world’s most valuable resource is no longer oil, but data”. Having a higher quality dataset than your competition will allow you to succeed in an economy based on the insights data offers.

3. Confidence

As many businesses implement data driven decision making into their operations, it is important that leadership can actually trust the data that they’re using to make decisions. According to Forbes, “84% of CEO’s are concerned about the quality of the data they’re basing their decisions on”. Investing in high quality data will allow leadership to be confident that they’re basing their high stakes decisions on the correct information.

4. Workplace Productivity

Many data scientists and knowledge workers spend a significant amount of time dealing with errors and inconsistencies in data. According to Crowdflower, “60% of data scientists spend most of their time cleaning and labelling data. 57% said it was the least enjoyable thing they do.” By investing in tools and processes to ensure high quality data, you will give your data science team more time to be productive on essential tasks, rather than have them spend the majority of their time doing menial data cleaning.

Harvard Business Review has also identified inaccurate, low quality data as a barrier to workplace productivity. “Studies show that knowledge workers waste up to 50% of time hunting for data, identifying and correcting errors and seeking confirmatory sources for data they do not trust.” By implementing practices that lead to high quality data, your organization will be able to make massive productivity gains, as workers will be able to spend their time on activities of greater importance than constantly rectifying errors in data.

5. Avoiding Reputation Loss

Decisions based on low quality, dirty data can lead to outcomes and bad decisions that result in reputational loss. For example, Amazon used low quality data when training a hiring algorithm, resulting in misogynistic hiring practices. As an organization you ideally want to avoid embarrassing situations, and implementing a strategy that values high quality data is the ideal way to do so.