Skip to content


Cyber Security – Shifting from Data to Metadata

November 22, 2017

Metadata has been in the news over the last couple of years. The term has been used by businesses and government ministers but there doesn’t seem to be a consensus about what metadata actually is and how to can be used. Metadata is simply data.

“Metadata is summary information about data. Think of a photograph shot with your phone. The content of the photo is the data. The metadata is information about where the photo was taken, the time it was taken, direction and how large the file is,”

Stewart Baker, general counsel to the NSA has said, “Metadata absolutely tells you everything about somebody’s life. If you have enough metadata, you don’t really need content.”

For cyber security, metadata are all the informations around the packet and the traffic. Specifically, we have at least the source and destination IP addresses and the TCP (or UDP) port numbers, the IN and OUT packets, the window length, etc… We can learn a lot from those alone and by combining the message sequence of two ip addresses. Likewise we can learn a lot from the metadata associated with a call from a mobile phone. That metadata includes at least: calling party, called party, cell information, possibly GPS location information, and call duration.

Consider an IP datagram from an IP address at Company X to one at Company Y. If we see repeated traffic between those two addresses or address ranges, we can surmise that at least one person at each of these companies is communicating. If the communication is over TCP port 25, we see that it is highly likely that email is being exchanged between the two organizations. (Yes, port 25 could be used for something else in order to confuse the observer.) If in that case we have access to the whole IP datagram, and the contents of that datagram are unencrypted, we could read the email messages. 

A Need for a New Approach to Cyber Security

The current systems are clearly not efficient with hacks happening every day for organizations around the world. With technology changing and the shifting toward encryption, the current systems are becoming obsolete.

Based on Gartner and Google, 80% of the traffic will be encrypted by 2020 and 60% of attacks will happen on encrypted traffic. However,  encrypted traffic is still a black whole within the security informations of an organization and they have no tools to detect that without decrypting data and hitting your privacy.

Over recent years, the amount of information moving through our networks has grown exponentially and with the advent of the Internet of Things we’ll see the volume of data moving through our networks explode. That create significant challenges, not just for network engineers but for security teams.

“The traditional approach to security was about detecting threats in networks has been about finding things we already knew were bad. But that approach has never worked particularly well. It doesn’t scale to all the known ways things can be bad. There are attacks that occur once and we’ll never see again. Therefore, they’ll never end up in a database of known bad behaviour. What we need to do is detect anomalies – differences from what we expect – that point to threats. But we have far too much data and it’s flowing far too fast”.

The answer lies in metadata and artificial intelligence: Not the payload, which might be encrypted – but in the destination, certificate information and exchanges, the packets information, the bytes distribution and other information pertaining to that payload. By reconstituting the sequence message between two IP addresses, looking at the metadata associated with a flow of network traffic (SSL & TCP/IP) and applying Artificial Intelligence to those data, it can be easier to tell the difference between legitimate and bad traffic rather than trying to examine the detailed contents of every data packet, that will create a lot of false-positive and will not contain the whole information.

The kinds of metadata we’re looking for are transaction volumes, IP addresses, email addresses, new certificates for TLS and SSL circulating on the network and the full packet message sequence between devices are all pointers to potential breach activity.

Metadata is critical to cybersecurity efforts because, until now, it’s been impossible to seize, analyze and store rich metadata that captures every document and communication protocol – at scale. Traditional network devices can gather some high level metadata, but with platform you can, not only, collect more data from packets but also, rebuild the message sequence between ip sources and analyse it using Artificial Intelligence to detect abnormal activity or cyber attacks.

Artificial Intelligence in cyber security will increasingly rely on the modeling of good and bad behavior made accurate through tuning for context and intent. A.I triangulating these models to uncover anomalous behavior and identify data breaches are already gaining broad adoption in the form of SIEM and user behavioral analytics solutions. As traffic speeds grow, ensuring that the models and analytics approaches are informed by not only Big Data, but the right data and real time data, will be key to making threat detection pragmatic and effective in modern networks.

Barac – Real Time detection platform on normal and Encrypted Traffic

Barac, a London based startup part of the leading Techstars Barclays accelerator, is leading the way in changing from data to metadata for cyber detection using Artificial Intelligence to automate the process and for more accuracy.

Metadata and packet information are collected in real time by barac’s agent, a 1mb software that can be sent to thousand of devices instantly, from the different sources and sent to barac analytics platform. Those packets are then reconstituted into sequences to understand the message exchanged between two IP addresses: While reconstituting the message, barac’s platform calculate more than 150 features as the packets are collected (Average TTL, Average TTR(time to respond), etc…) and use several Artificial Intelligence techniques to detect old patterns of malware and attack or abnormal activities instantly.

By using metadata and Artificial Intelligence, barac is able to reduce false-positive by more than 95% on normal traffic and our patent pending platform is showing that we can detect attacks with more than 99.997% accuracy reducing the number of false positive to less than 0.00006% of total traffic. We can detect the difference, with high accuracy, between legitimate and bad traffic rather than trying to examine the detailed contents of every data packet or decrypting the traffic for different types of attacks:

This is a game changer for cyber security where encryption is becoming mainstream especially for industries like IoT devices, telecom operator, financial institutions, smart cars, payment, etc.. that are all using encrypted traffic with very high security risk and need visibility into encrypted traffic and real time information to react quickly