What does Dark Data mean? We use this term on our website, as if it is obvious. In the most basic terms, Dark Data is information that is difficult to find. So, in effect it is sitting in the dark. But, there is more to it than that, when you apply the term to different industries.
I often use the image of Dark Matter, because I come from a scientific background and it seems like a natural comparison. Dark Matter in the universe has been proven to exist, but is very difficult to actually see. While studying Physics in college, one of my professors was researching Dark Matter, so this term is fresh in my mind.
I’ve noticed that a number of people have written about Dark Data since I started using the term in 2010, and comparing it to Dark Matter, with no mention of where they heard these terms. Here’s a timeline, that I have collected over the last 10+ years, that explains who should be given credit for this term.
1981, the New England Journal of Medicine published a Harvard study on failed scientific experiments. The author (as of yet unknown) was warning of the risk of scientists hiding their research results, when they received undesired results. This practice holds back valuable data, that could be used in the medical industry to better understand the true statistical results of a drug being tested. Scientists basically put their lab report in a drawer, and refrained from publishing the results. The author labeled this data as “Dark Data”. I believe that this is the first use of the term, in relation to the Medical and Pharmaceutical industry, or any other for that matter. (The exact page of the article has not yet been located at the provided link.)
- 2005, Paul Chin wrote an “Intranet Journal” entry titled, “Working With an Organization’s Dark Data“, about Dark Data in corporate networks. While he did mention intentional and unintentional hiding of computer files, his target was towards the Document Management & Information Governance industries.
- 2008, Malcolm Chisholm wrote about Dark Data in comparison to Dark Matter, in reference to Dark Data existing in the Information/Data Governance and Document/Data Management industries. I believe that he was the first to make the comparison with Dark Matter in this industry.
2010-06-09, I posted a presentation titled, “Dark Data In Live Forensics” that I had been presenting at events. After I saw the term “Dark Data” in relation to scientific results, I started using it in my presentations and naturally saw the correlation to Dark Matter. At that time, I was unaware of the previous articles. While I wasn’t the first to use “Dark Data” and “Dark Matter” in relation to data, I believe that I was the first to “coin the term” for Computer/Digital Forensics & Electronic Discovery.
- 2011-02-04, I wrote a blog entry titled, “Dark Data Is Invading Our Lives“, providing the prior research results listed here in this timeline.
- 2017-02-07, Deloitte (no author listed) posted an article titled, “Dark analytics: Illuminating opportunities hidden within unstructured data“, that starts with the term dark analytics and eventually starts using “Dark Data” as if it was already defined. Does this mean that by 2017 Deloitte clients (and prospects) knew this as a standard term? The article explains the difficulty in mining through unstructured files, and the importance of doing this mining in the Deep Web. Maybe this was the first mention of Dark Data in relation to the Deep Web?
2017-09-28, Sony Shetty (Gartner) posted an article titled, “How to Tackle Dark Data“, describing the need to deal with Dark Data as a result of “increased data growth” in unstructured files in an enterprise/corporate environment. Basically, they are describing the need for Document Management and Information Governance. They provide their own definition, “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes“, but don’t bother to give any credit to the previous writers who already defined this term.
I find later articles that tend to say that Gartner coined the term “Dark Data” for everything. Is that what happens, the big companies pretend that they came up with everyone else’s ideas first? Before I write a new blog entry, I perform background research to learn more details about my topic, collect references, and make sure that I’m not plagiarizing someone else’s work. Hopefully, some of the writers that I am referring to learn to do a better job at that. I realize that there are many more “Dark Data” articles being written now, as this has become a popular term for marketing professionals. My intent here was to call attention to the first people to use this term in each industry, so that future writers can more easily get their facts straight.
While I may occasionally touch on higher level aspects of Dark Data in this blog, I will primarily be focusing on the more technical details of finding Dark Data in unstructured files, file objects, data fields and the bytes & bits for use in Computer/Digital Forensics and steganography.
If you have any additional references to add, especially earlier than my claims in my timeline, then feel free to provide them in a comment below.