Big Data

Who should be responsible for your data? The knowledge scientist


How can you build a data-driven culture and spur digital transformation without thinking through who should be responsible for your data? Let’s do that together.

Data engineers and data scientists each occupy critical roles. Data engineers manage the data infrastructure and are in charge of designing, building, and integrating data workflows, pipelines, and the ETL process. Their goal is to provide data for data scientists’ analysis. Data scientists are those who can turn data into insights by applying statistics, machine learning, and analytical approaches. Their goal is to answer critical business questions.

Data-driven organizations require reliable, clean data to function. Without it, your AI, machine learning, and analytics are worthless. Unreliable, erroneous, and incomplete data leads to answers that can’t be trusted—hence, “garbage in, garbage out.”  

Therefore, the process of wrangling and cleaning data is crucial, often said to be 80% of a data scientist’s work. Typically, this is seen as boring, annoying grunt work people don’t want to do.

However, I think this negative view is at least partly based on a major underappreciation of the significance of such work. Data wrangling and cleaning is not simply about eliminating white spaces, replacing wrong characters, and normalizing dates. Stepping back, these tasks should be viewed in the context of two key objectives:

  1. Understanding the ecosystem of people, data, and tasks in an organization
  2. Communicating and documenting that knowledge in order to generate clean and reliable data

Yes, data wrangling and cleaning can take 80% of a data scientist’s time and energy. This does not mean that 80% is wasted. While these tasks can and should be optimized for efficiency, they are part of the vital knowledge work that should be elevated within a data-driven organization. But who should be doing it?



READ SOURCE

This website uses cookies. By continuing to use this site, you accept our use of cookies.