As published in Digital Insurance, “Navigating Breach Recovery Costs to Slash Inefficiences”
Data mining involves programmatically searching and manually reviewing data. In incident response, it’s crucial for identifying the scope and impact of compromised sensitive data, ultimately determining the notification list.
If the already looming crisis of the cyber event itself isn’t bad enough, these misconceptions about data mining can set you back even more. Understanding the principles of data mining allows you to better set expectations for the process, and even prepare for an (inevitable) cyber event.
Not all data is created equal and nor is its mining linear. Just as files and their data vary widely in type, complexity, and context, so does the process of mining them. It is exactly this variance of process that turns into an unpleasant surprise for those experiencing a cyber incident and involved in the data mining response aftermath. Tangibly speaking, this means that just because one file can be reviewed by one person in one minute, one million files (1,008,00 to be exact) cannot be reviewed in one day if you expanded the team to 700 people.
Many factors contribute to the file/review time ratio. In any given data store, there are hundreds of file types, thousands of languages, endless variations of file formats, …etc. Additionally, one must consider the content of the files, e.g. legal, financial, or administrative files to name a few different ones. Complexity brought forth by the context of the data merits an equally varied data mining response. Lastly, the state of the file, for example low/high resolution image, adds another factor to the review process.
These variances are endless and with each added variety there is less reason to expect file/time ratio in data mining to follow a pattern—especially not a linear one.
To steal a line from Tom Hanks, ”Life is like a box of chocolates, you never know what you’re gonna get”. You don’t actually know what’s in your data until you look at it closely under a microscope. During our hundreds of engagements, we often come across clients who assume they know what’s in their drive/data. That couldn’t be farther from the truth. Perhaps it’s an inclination towards optimism (!), it happens too often when victims of exfiltration assume the accessed data did not contain consequential sensitive information. However, mining that data proves otherwise and thousands of highly sensitive data are revealed to have been accessed by the threat actor.
You just can’t control what employees save. Whether it’s work related or personal, all sorts of data accumulate in a business on purpose or accidentally. In addition, mistakes happen simply because we’re human. For example, a folder that is meant to hold one type of information, is accidentally set as a destination for downloads of a vastly different type of information. Files can be misplaced, copied by mistake from one location to another. The most meticulous operations present data surprises because humans are at play. Never assume that you know your data unless you’ve actually mined it file by file.
Accuracy is the name of the game in data mining and its resulting report. The greatest risk in any data mining process is allowing human interpretation to drive the findings. The human mind is incredibly biased, and our perceptions guide how we see reality. When reviewing data to identify sensitive elements, human reviewers inadvertently allow their interpretation to skew what ought to be objective extraction.
There is no room for interpretation in data mining for cyber incident response. Counsel and the victim organization need direct, objective, and in cases when there may be doubt, as-close-to-the-truth as possible data extraction. That is why automated extraction that uses machine learning is far superior to manual review.
In the absence of R&D efforts or talent, many mistake manual review as a more meticulous alternative but they couldn’t be more mistaken. For example, a human reviewer might see “Tom”, but the characters in the electronic file are immutable as “Tim”, or similarly “Neil” versus “Niel”. The final report deliverable from a post-breach response data mining engagement has no place for interpretation. Extraction is the name of the game. Automated data mining is the most objective method that yields the most accurate results.
Data volumes and complexity continue to grow exponentially. With cyber incidents following the same increase, data mining best practices are the only effective way to respond and produce notifications list.
Like what you see? There’s more where that came from.
By submitting this form, I consent to being contacted by Actfore Inc. in accordance with the privacy policy. I will be able to opt-out at any time by unsubscribing.