PART 2 of 3
Artificial intelligence is a collection of technologies, such as advanced analytics, expert systems, neural networks, machine learning, and more, which are used to drive everything from healthcare medical diagnosis systems to natural language processing to cybersecurity. AI is also a marketing term, beloved by vendors, which can be used to simultaneously educate and obfuscate.
One of the most powerful AI techniques used today in both cyber and non-cyber contexts is machine learning, which we shall explore here. (For a more broad introduction to AI, and an examination of AI-based analytics and expert systems, see part 1, “It’s a Marketing Mess! Artificial Intelligence vs Machine Learning”)
What is machine learning? It’s a type of artificial intelligence that can discern patterns based on its own examination of raw data. Let’s turn to a use case provided by Scott Scheferman, Director of Consulting for Cylance. His goal with this and the other use cases in this article is to paint a general, non-InfoSec view for a common use of the term being discussed; the goal is to make the term relatable.
Non-security example use case: By leveraging machine learning, we now have the ability to predict the gene targets of enhancers (fragments of non-coding DNA) so accurately that it enables us to link mutations in enhancers to the genes they target, which is the first step towards using these connections to treat diseases. (reference) Another startup, Deep Genomics, is using machine learning, genome biology, and precision medicine to invent a new generation of computational technologies that can predict what will happen within a cell when DNA is altered by genetic variation.
From the Gartner Magic Quadrant for Endpoint Protection Platforms: Algorithmic techniques (such as machine learning) are based not on a database or list of what is known (good or bad) artifacts, but are based on a computational method that would include characteristics of known good and bad. Machine learning discovers a detection equation, based on predefined datasets (known good and known bad), and it is the equation (not a database traversal) that determines the probability that a new event is good or bad. Cylance and Deep Instinct are representative vendors of this trend toward algorithmic approaches to file detection.
Wikipedia offer this description for a related term, Supervised Learning: The machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
Igor Baikalov, Chief Scientist of Securonix, explains: “Supervised learning is a subset of machine learning. Unsupervised learning doesn’t need a list of known good and known bad, and utilizes techniques such as clustering, anomaly detection, and principal component analysis to learn hidden patterns in data.”
Carson Sweet, co-founder and CTO of CloudPassage, adds: “Machine learning has been used for some time in the security and anti-fraud industries for things like anomaly detection and discovery of aberrant machine and human behaviors. The broad and easy availability of compute power – thanks to cloud computing models – means security practitioners can do more with analytics since they’re not constrained to a security appliance’s limited compute capacity.”
“Machine learning is an excellent tool in the effort to create leverage for the very sparse security talent that enterprises have today,” says Sweet.
A heavier-duty tool in the AI toolbox is deep learning, which goes much farther.
Deep Learning (Wikipedia): Deep Learning (also known as deep structured learning, hierarchical learning or deep machine learning) is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations.
Securonix’s Baikalov explains that, “Deep learning employs hierarchical modeling, where each layer deals with progressively complex features using the findings of the previous layer. Thus, if the lowest layer might only recognize straight or curved edge and its direction, the next layer would use these findings to recognize shapes, like oval, rectangle, or triangle, and the layer above it could already differentiate between the drawings of cars or airplanes.”
Baikalov points out that while deep learning has been credited with significant progress in reducing fraud in the finance industry, the task of learning fraudulent behavior in purchasing transactions is a lot easier than detecting insider threat. Since many if not most fraudulent transactions are detected by consumers within one or two statement cycles, a financial institution typically has a good, steady volume of well-documented "bad" data that can be traced back in fairly recent transaction logs. This data is then used to train the system to recognize a small number of consistent fraudulent behavior patterns.
What about a more difficult case, say insider fraud? Unlike transactional purchases on credit cards, Baikalov notes that there are relatively few examples of “bad insider” fraud within a financial institution’s logs. The supporting log data may be spread over a multitude of sources and can go back for months, and there's very little consistency between them. This doesn’t give normal machine learning enough to work with…. and that's where deep learning comes in to successfully detect and potentially prevent insider threat. While TTP (Tactics, Techniques, and Procedures) might differ from one attacker to another, there are some common traits of malicious behavior that we recognize as Threat Indicators and that serve as a foundation for our Predictive Threat Models.
What’s more, Baikalov adds, multiple indicators roll up into specific threats, like Malware Infection or Account Compromise, and this threat layer is further aggregated into Composite Threats, or "kill chains." The better we detect the earlier stages of the attack - links in the kill chain, the stronger is our model's predictive capability to recognize and potentially prevent the later, most damaging stages of the attack. Deep Learning takes advantage of these hierarchical threat models by first recognizing various threat indicators (behavioral or direct), then determining the probability of specific threats, and finally amplifying risk scores along the kill chain for early detection of cyber-attack.
Machine Intelligence: This term is a carry-over from early days of machine learning moving to artificial intelligence.
Non-security example use case: The phase “machine intelligence” is completely all over the place; different people mean different things by it ranging from ‘a robot’ to ‘neocortex-based learning’ to ‘any system that does AI,’ and all else in between. One can think of it in a more rigid fashion to denote a machine that can learn on its own by observing series of patterns over time, without having to label the data sets, and modeled after the way the human brain works. Sometimes these systems are called ‘Biological Neural Networks,’ as they mimic the neocortex and function of biological neurons.
The neocortex is a part of the brain’s cortex that is associated with sight and hearing, and is considered to be one of the most recently evolved part of the cortex.
Scheferman builds on this use case and explains that time plays an important part in the challenge of using machine learning on a network; one that requires the model to scale to much more than ‘near-real time’ data sets coming off the network.
“Profiles have to be adaptive and updated if not in real time, then at the very least daily – not monthly – to be effective in detecting malicious activity,” says Baikalov.
Intelligent Security (Symantec): And, just to keep people on their toes (possibly with the goal of creating their own marketing term and in the process confusing matters), there’s the trimmed down version of the term “intelligence” and the word “security” tacked on to it.
Scheferman reiterates that in cybersecurity, the ultimate constraint is time – in this case, how quickly can you spot an emerging danger? From an AI standpoint – specifically speaking to the endpoint – the question is how far ahead of a threat can one actually get? Milliseconds, hours, weeks, months, years? This is what is meant by predictive AI; the ability to detect and block a threat months or years before the threat was even conceived by the author. “The entire point of predictive AI is to save the humans from all that heavy lifting,” he says. Symantec and other vendors appear to be using the phrase “intelligent security” to refer to predictive AI, a system that can theoretically block attacks before they are launched.
Reeling it back in to Machine Learning
While different products will employ a variety of methods to help protect networks and endpoints from attack and compromise, Machine Learning is probably the most prevalent – and relevant – method available on the market. In fact, machine learning arrived on the Gartner Hype Cycle in 2015, replacing “Big Data” and passing the peak of inflated expectations – although, not quite as far along the curve as Big Data was in 2014).
Securonix’s Baikalov explains that User Behavior Analytics (UBA) and User and Entity Behavior Analytics (UEBA) are both terms coined by Gartner’s Avivah Litan to describe what she's seen on the market. The "Market Guide for User Behavior Analytics" was published in 2014, and UBA was born. A year later, in the fall of 2015, the "Market Guide for User and Entity Behavior Analytics" was published, and UEBA was born.
Says Gartner in a recent news article: Purely signature-based approaches for malware prevention are ineffective against advanced and targeted attacks. Multiple techniques are emerging that augment traditional signature-based approaches, including… machine learning-based malware prevention using mathematical models as an alternative to signatures for malware identification and blocking.
So, how smart is machine learning compared to AI? Baikalov insists that it is a lot smarter because science-fiction style AI, or the capability of a machine to imitate intelligent human behavior, doesn't exist.
“Machine Learning is a subset of AI, along with knowledge, perception, reasoning, planning and other good stuff, says Baikalov, “And there's a lot to learn, and as the machine learns something, we say "Well, if the machine can do it, it doesn't require intelligence, and therefore it's not AI."
“The core problem with AI is that it's defined relative to human intelligence, which in turn is not well defined,” explains Baikalov. “AI is created by humans, and if the humans don't understand what the intelligence is, how can they program the machine to imitate it? And does AI even need to imitate every aspect of human intelligence?”
Both excellent questions.
Baikalov continues: “Consciousness is one of the characteristics of intelligence. Does a machine learning system feel remorse about producing a false positive? I hope not, but when pointed out to it by the analyst, it learns from the mistake and doesn't repeat it the next time. As long as the machine does its job well, do you really care how it feels? (Sorry, Terminator!)”
This completes the tour of machine learning – a type of AI where the artificial intelligence can learn, on its own the specific patterns needed to determine that, say, data is good or bad, or if a network is safe or vulnerable, or if it is currently being attacked or not being attacked.
Once you read parts 1 and 2, you'll certainly want to read the third article in the series: “The Actual Benefits of Artificial Intelligence & Machine Learning” Here, we will explore how to move beyond the hype and confusion in order to see the real benefits of artificial intelligence and machine learning.