Shedding Light Into Dark Data

Karl Aguilar
Dec 21, 2022
2 min read

With so much data being transmitted across networks and many more being stored in countless physical and online storage and servers, one is bound to stumble upon data that has been left un-utilized and “abandoned” in the vastness of cyberspace. Such data is called “dark data.”

What comprises dark data

All unused, unknown, and untapped data across an organization is considered dark data. Dark data is generated as a result of users’ daily interactions online with countless devices and systems and inevitably leads to so much data being overlooked. This includes everything from machine data to server log files to unstructured data and data derived from social media.

Dark data can also originate from organizations that have come to treat their data as obsolete, incomplete, redundant, or limited by a format that they could not access with the tools at their disposal. Such lack of resources is a critical challenge as it is estimated that 12% of dark data is business-critical.

Ultimately, while some of the dark data floating around is considered to be of little or no value to at least one organization or department, it can be highly valuable to another.

How much dark data is out there

While the presence of dark data should not come as a surprise given the wealth of data currently in cyberspace, the amount of dark data out there is surprising, and to businesses, perhaps a little concerning. A recent IBM study estimates that over 80% of the data in cyberspace is considered dark data.

This presents a significant amount of untapped potential for at least a portion of the dark data that is floating around. Especially if organizations have the proper tools and knowledge available to be able to successfully harness them.

However, dark data may be one of an organization’s biggest untapped resources. Data is increasingly a major organizational asset, and competitive organizations will need to tap into its full value. Further, more stringent data regulations may necessitate complete management of an organization’s data.

Ways to harness dark data

Data capturing is the first step in harnessing dark data but it is arguably also the most difficult part. It is important in this step to know what exactly to look for and where to look without modifying your systems or deploying an intrusive agent to capture the dark data.

As such, harnessing dark data is a multi-step process that requires engineering, data science expertise, and the tools such as a minimal software toolkit that will support three key steps:

Data discovery: Gain visibility into the enterprise’s data landscape and identify useful information for further analysis
Data classification: Identify the value of a dataset, how it can be useful, and possible security concerns, among others. AI algorithms (e.g., ML, NLP) are helpful in organizing large amounts of data into relevant categories.
Data quality management: Implement a policy-based data quality management procedure to facilitate decisions on how to clean each dataset to maximize its value and/or minimize storage costs. Video and sound analytics, computer vision, ML, and advanced pattern recognition software can be helpful in achieving this objective.

Dark data can reveal valuable insights that can contribute to the growth of the enterprise. Spending considerable time and resources to gather and interpret dark data within the network is a challenging but worthwhile investment.

Shedding Light Into Dark Data

What comprises dark data

How much dark data is out there

Ways to harness dark data

Recent Posts

Comments