Australia’s defence force is winning the race to patch vulnerable code and detect software bugs before they bite.
In defence, preventing cyber-hacks is crucial, and there are loads of people working behind-the-scenes to protect potentially susceptible code.
Two of these cybersecurity professionals are Olivier de Vel and Paul Montague from DST’s Cyber and Electronic Warfare Division (CEWD), who have made seriously important advances in tools to focus the attention of security experts.
So how does software vulnerability impact IT security? Paul stresses that software vulnerabilities can be exploited for malicious gain – for example, releasing data, allowing a third party to take control of the software, or just causing it to go offline.
“It’s related but different to malicious software [malware] studies,” he says. “In both cases we’re trying to make sense of the structure of the code but in this case it’s to search for design and coding mistakes. Writing software is a complex job and there are always going to be vulnerabilities.”
And with vulnerability numbers on the up – possibly because of the speed at which commercial software is being updated – it’s become even more important that we protect our software.
“The search for vulnerabilities in software is the modern equivalent of the gold rush. Coming across an iPhone bug that can be exploited to a third party’s advantage can score you upwards of a million dollars,” Olivier says.
The Zerodium website, for example, offers bounties on many types of bugs.
Mathematical line in the sand
As members of CEWD’s Cyberwarfare Operations team, Olivier and Paul have combined their extensive cyber knowledge with their deep learning expertise as academics from Swinburne and Monash universities and Data61, to help crush this burgeoning bug black market.
Specifically, their tools search for vulnerabilities in binary code (the language of ones and zeroes that computer processors use). While there has been research into detecting vulnerabilities in source code – the languages that software engineers talk – it’s usually only the binary version of proprietary code that is available to look at.
So the team developed something called a Maximal Divergence Sequential Auto-encoder for binary code. At the same time, they had to create test data to make sure it was working as intended.
RELATED: Meet a cybersecurity researcher
“The creation of a labelled binary dataset of vulnerable and non-vulnerable code snippets was a minor but important part of this work,” explains Olivier. “There’s not much labelled data out there saying ‘this code is vulnerable’ and ‘this is not vulnerable’ – so it is a positive contribution to the art.”
The auto-encoder was trained on the labelled dataset and then tested on a new unseen one. It creates a mathematical line in the sand – code on one side is labelled as ‘probably vulnerable’ and on the other side ‘probably safe’.
“You end up with two classes of code in a high-dimensional space and our aim is to make the two sets as distinct as possible,” says Oliver. “To do that our academic colleagues used a recurrent neural network, a type of deep learning architecture.”
Paul adds that the labelling is not definitive but a ‘this is likely to be vulnerable’ indication. Results can then be sent to an expert auditor or some other sort of system for more detailed investigation.
The cyber-worthiness of software going into battle is just as important as the sturdiness of physical equipment like armour, flares and jet engines, to give commanders going into battle the confidence in the code on which they are relying. Running the auto-encoder across the various software systems that a platform relies on gives a commander a quantitative approach to measuring the risk of their code harbouring exploitable vulnerabilities.
“There’s plenty of work still to do under this program of work,” says Paul. “We’ve developed many tools, including source-code analysis tools and others that locate functions within binary code. Over the next few months the aim is to bring all these tools together to create an end-to-end solution that takes in a raw binary that we haven’t seen before, break it down into smaller functions, and identify probable locations of vulnerabilities.”
This story was created in partnership with DST.
Author: STEM Contributor
This article was written by a STEM Contributor for Careers with STEM. To learn more, please visit our contact page.