The titans of industry are engaged in AI. They have the financial and personnel resources to keep their data in check – governance, master data management, and quality.

For most companies, however, this is not the case. A lot of their data is disparate and independent. The quality of their data is highly suspect, and they operate without a defined master data and data governance strategy. In order to implement AI or advanced technologies, data preparation is Step #1.

At a recent AI Conference, I was surprised to learn 60% of a data scientist’s time is spent on data wrangling and cleansing tasks. That reinforces the fact companies, traditionally, have not allocated enough resources to prepare for AI implementations.

Think about it this way…

When teaching a child to read, a parent explains how words and grammar work; that there is structure in their foundation. The child will learn that ‘cup’ is spelled ‘C-U-P’ and that a period ends a sentence.

Now, imagine the book’s words and sentences were all jumbled up, misspelled, the grammar was incorrect and the pages were out of order.

Even though you could arduously teach the child to read that book, it may not always make sense and the resulting logic/understanding could be flawed. Now, what if it wasn’t a child and that book but rather a program attempting to interpret un-governed data while operating life support equipment. Would you trust your life with it?

A parent teaching a child to read could be considered supervised learning (ML). Over time, that child’s confidence interval will continue to increase based upon recognition of words, common phraseology, etc. At first, the child may pronounce words incorrectly, but over time and with correction from the parent, the child will make fewer mistakes. However, if the parent teaches the child incorrectly because they were taught incorrectly (their data and governance practices are bad), then the resulting conclusions of the child will be wrong.