What is Big Data Security Analytics?
Big Data is the collection of data sets that are large and complex. The vast size, diversity, and complexity of the data, makes it difficult to process using traditional applications.
Traditional databases are great at processing similar data sets with easy to understand relationships.
Big Data specializes in organizing solution sets that are unstructured, and that are more difficult to find and correlate. Examples of this are finding the relationships between Twitter hashtags, road construction, weather, and how it affects retail sales of a specific product. Building and analyzing these types of relationships with traditional databases is very difficult, if not impossible.
The challenge with Big Data is how do you capture, process, store, search, share, analyze, and present large and diverse amounts of data in a visually compelling format.
Cyber Security is looking at Big Data to marry security incidents with threat intelligence. The result is a industry term being coined as Big Data Security Analytics
According to Wikipedia (Source: http://en.wikipedia.org/wiki/Analytics)
“Analytics is the discovery and communication of meaningful patterns in data. Especially valuable in areas rich with recorded information, analytics relies on the simultaneous application of statistics,computer programming and operations research to quantify performance. Analytics often favors data visualization to communicate insight."
Why the need for new methods of Cyber Defense?
A few years ago, attackers used flaws in software to exploit systems. Attacks grew sophisticated over the years taking advantage of not only software flaws, but also additionally business and logic flaws using and combing social engineering attacks.
Security professionals have often played the cat and mouse game between attackers and their systems. They patch their systems for vulnerabilities before attackers can exploit them. When patches have been played, attackers make variations to their original attack, attempting to continuously breach a system based on the same flaws. Security devices got smarter; they were able to detect new variants of the same attack making the perimeter and edge system more difficult to breach.
The defense tactic has changed, largely due to advanced persistent threats (APT). APTs are custom threats against an organization that exploit people and business processes, steal financial and intellectual property, and stay undetected in organizations for long periods of time.
Unlike threats of the past they do not cause damage, therefore they fly under the radar. Attackers understand most IT organizations are undermanned and overworked. They take advantage of this by not causing major changes or denial of service to the organization they are attempting to breach. One of my favorite quotes from the Superman comics, “True Power is best kept concealed” – Lex Luthor
APTs require attackers to conduct research on an organizations staff, business processes, and valuable assets. They are not solely relying on software flaws to launch attacks, therefore traditional signature based solutions, that security professionals are using are mostly rendered useless with APT attacks. Furthermore, the attacks are relying more on social engineering, phishing, and logic flaw problems within an organization. This means there is less reason to breach external access to security systems, because an inside breach has usually already occurred.
Big Data Security Analytics is a deduction based on multiple sources of data to determine if a security threat risk exists.
The deduction is based on how accuracy and the value of the source data. Big Data Security Analytics uses log sources from network devices. These log sources in most instances are syslog messages from network infrastructure devices. These logs can be captured into systems that store, sign, and analyze logs such as the RSA Log Decoder (which is a component of RSA’s Security Analytics platform), Splunk, or Alien Vault’s Logger (there are others out there as well, but these are the ones I run into most often). In most cases logging events from network devices to analyze are a key component of Security Information and Event Management (SIEM) tools.
SIEM tools in general are excellent at collecting and storing logs for network devices in a central location where it can be stored, viewed, and analyzed for threats.
These logs are rich, but the number of source devices logging to them restricts them. Administrators must configure every device on the network to send logs to a SIEM tool. Furthermore, there are tools that do not send Syslog information, but use other logging methods, which may be in a push or pull scenario. The cost of most SIEM tools is also based on the total number of events or devices the tool monitors, that can add expense and complexity.
Baselining the security health of organizations.
SIEM solutions help baseline the security health in an organization. Most security professionals concentrate on alerts detected by SIEM solutions and ignore all other data. They are essentially trying to find the “signal in the noise”.
These systems work by correlating event information from multiple devices and using predefined rules that disregard irrelevant information, surfacing incidents that are more frequent or present a greater risk based on their behavior.
On February 12, 2002 Donald Rumsfeld stated, “There are known knowns; there are things we know thatwe know. There are known unknowns; that is to say, there are things that we now know we don’t know”
Traditional SIEMs often provide an incomplete picture of the risks facing an organization. That’s because SIEMs only collect information from portions of the IT infrastructure, leaving critical blind spots.
What are these blind spots? They can be cross checking if legitimate access to administrating a sensitive server is being accessed by a laptop that should have appropriate access. What about the user signed on to the laptop? Is he or she legitimate to be using the laptop and performing administrative tasks on the server the laptop is connected to?
This same use case can be taken further, what if the laptop and the user signed in are legitimate, and should have access to administrating the server. Are the tools and commands that are being used normal based on past behavior of the administrator or others with similar access? Do the tools or commands pose any known threats? Correlating these multiple types of activities that will not be captured by most SIEM tools. Furthermore, the volume of these events on fast distributed systems, cloud based services, and active-active data centers makes it impossible for the SIEM solutions to write, store, and analyze this information in traditional databases.
The capturing, correlations, and analysis of these activities, assigning a risk to determine if a threat exists, and reconstructing the events to investigate the events is the essential promise and value of Big Data Security Analytics.
Is it about Full-Packet Capture?
In no way I am saying you need full-packet capture enabled on your network to determine if security threats exist. In many instances Netflow will provide deep analysis of metadata to determine if risks exists far beyond the capabilities of traditional SIEM products.
However, Big Data is all about data. Traditionally the more data you analyze, have access to, and capture will generally result in a more accurate threat intelligence picture for an organization. The advantage of full-packet capture is session reconstruction to detect and investigate how attackers infiltrated the environment and what they did once inside.
Big Data Security Analytics is about combining multiple sources of data. It provides a platform that automatically ingest threat intelligence from external sources, providing valuable views of the threat environment outside the enterprise and comparing that to the current behavior of events occurring inside an organization.
For example, if multiple people within an organization are targeted to spear-phishing attacks, a traditional email security appliance, such as Cisco’s IronPort Email Security Appliance could most likely detect and stop this attack.
Big Data Security Analytics could help determine if the phishing attack originated from a specific Facebook account, started targeting individuals in your organization that “liked” a specific Facebook page, and belonged to your organizations accounting department. This type of targeted activity is often indicative of an APT attack against an organization
Furthermore, analytics from full-packet capture could help determine what tool sets the attacker was using, and how they were attempting exfiltration on the data sets.
Big Data Security Analytics creates a platform for collecting security data from multiple sources that is far beyond traditional log information both internal and external to an organization. Detection is not based on signatures or static correlation rules but on dynamic comparisons to normal baseline behaviors for individuals or groups that have similar job functionality and requirements. Behavior outside the normal baseline determines suspicious activities that may indicate attacker activity. This speeds identification of threats, which have not been categorized by security vendors and provides efficiency within an organization on how they value the risk of security events occurring in their organization.