The world was awed two years ago when IBM’s Watson defeated Jeopardy! champions Brad Rutter and Ken Jennings. Watson’s brilliant victory reintroduced the potential of machine learning to the public. Ideas flowed, and now this technology is being applied practically in the fields of healthcare, finance and education. Emulating human learning, Watson’s success lies in its ability to formulate hypotheses using models built from training questions and texts.
Three years ago, Army Private First Class Bradley Manning leaked massive amounts of classified information to WikiLeaks and brought to public awareness the significance of data breaches. In response to this and several other highly publicized data breaches, government committees and task forces established recommendations and policies, and invested heavily in cyber technologies to prevent such an event from reoccurring. Surely, we thought, if anyone had the motivation and resources to get a handle on the insider threat problem, it is the government. But, Edward Snowden, who caused the recent NSA breach, has made it painfully obvious how impotent the response was.
Lest we assume this is a just government problem, enormous evidence abounds showing how vulnerable commercial industry is to the insider. We are inundated with a flood of articles describing how malicious insiders have cost private enterprise billions of dollars in lost revenue, so why has no one offered a plausible solution?
The insider threat remains an unmitigated problem for most organizations, not because the technologies do not exist, but rather because the cyber defense industry is still attempting to discover the threat using a rules-based paradigm. Virtually all cyber defense solutions in the market today apply explicit rules, whether they are antivirus programs, firewalls with access control lists, deep packet inspectors, or protocol analyzers. This paradigm is very effective in defending against known malware and network exploits, but fails utterly when confronted with new attacks (i.e. “zero-days”) or the surreptitious insider.
In contrast, acknowledging that it was impossible to build a winning system that relied on enumerating all possible questions, IBM designed Watson to generalize and learn patterns from previous questions and use these models to hypothesize answers to novel questions. The hypothesis with the highest confidence was selected as the answer.
Like Watson, an effective technology to detecting the insider must adaptively learn historical network patterns and then use those patterns to automatically discover anomalous activity. Such anomalous traffic is symptomatic of unauthorized data collection and exfiltration.
Inspired by the WikiLeaks incident, Sphere’s R&D team has investigated machine learning algorithms that construct historical models by grouping users by their network fingerprints. As an example, without any rules or specifications, the algorithms learn that bookkeeping applications transmit a distinctive pattern that enables grouping accountants together, and HR professionals are grouped by the recruiting sites they visit. These behavioral models generalize normal activity and can be used as templates to detect outliers. While users commonly generate some outliers, suspicious users deviate significantly from their cohorts, such as the network administrator that accesses the HR department’s personnel records. Like Watson, the models allow the system to form hypotheses.
Applied to cyber security, every time an entity accesses the network, the algorithms hypothesize if the activity conforms to its model. If it does not conform, that activity is labeled an outlier. Because these methods use a statistical confidence that dynamically balances internal thresholds on network activities (e.g., sources and destinations, direction and amount of data transferred, times, protocols, etc.), it becomes extremely hard for a malicious insider to outsmart. Simply the fact that the system does not reveal its thresholds can have a significant deterrent effect.
A paradigm shift in cyber technologies is happening now. Cyber security professionals agree that preventing data breaches from a malicious insider is a difficult task, and the past suggests that next major breach will not be detected with existing rules-driven cyber defense solutions. Next generation cyber security technology developers must seek inspiration from IBM’s Watson and other successful implementations of machine learning before we can hope to prevail against the insider threat.