Code similarity evaluation with r2diaphora

0
121

[ad_1]

Govt abstract

Binary diffing, a way for evaluating binaries, could be a highly effective software to facilitate malware evaluation and carry out malware household attribution. This weblog publish describes how AT&T Alien Labs is leveraging binary diffing and code evaluation to cut back reverse-engineering time and generate risk intelligence.

Utilizing binary diffing for evaluation is especially efficient within the IoT malware world, as most malware threats are variants of open-source malware households produced by a variety of risk actors. Producing and sustaining static signatures for variations on IoT malware is tedious, because the meeting code usually modifications throughout variants and architectures and textual content strings are topic to modification. Because of this, AT&T Alien Labs created a brand new open-source software, r2diaphora, to port Diaphora as a plugin for Radare2, and included some use instances on this weblog.

What’s binary diffing?

Binary diffing (or program diffing) is a course of the place two information are in contrast at instruction degree, on the lookout for variations in code. Menace actors can simply remodel the meeting code for a program with out modifying its precise behaviour, so the standard “line-by-line” diffing isn’t adequate when taking a look at malware – a extra superior strategy is required.

There are a number of binary diffing instruments publicly out there, similar to Diaphora,  BinDiff, and DarunGrim. Alien Labs is utilizing Diaphora, as we imagine it’s the most superior of all of the out there choices. Moreover, Diaphora has the additional benefit of being open supply, permitting Alien Labs to change it for our wants.

How can binary diffing be employed to establish malware?

Diaphora works by analyzing every perform current within the binary and extracting a set of options from every analyzed perform. These options are later used to match features throughout binaries and discover matches. If as a substitute of straight evaluating options, we leverage them to construct a database of malicious features (indicators) for identification functions, we will then start analyzing incoming binaries and attempt to discover matches amongst their features when evaluating to the indicator database.

If sufficient matches are discovered within the analyzed binaries, we will safely assume the analyzed pattern is a malware pattern. We are able to additionally word which malware household the features belong to within the indicator database, thus acquiring household attribution for the analyzed samples.

Porting Diaphora to Radare2

Diaphora works as an IDA Professional plugin. So as to work, it wants a sound IDA license and, consequently, legitimate Hex-Rays licenses for every CPU structure chances are you’ll need to decompile. As this value of those licenses is kind of excessive, Alien Labs seemed for a less expensive various, so the group may leverage it.

As such, we determined to port the present Diaphora to the Radare2 disassembly framework. The ported model of Diaphora, named r2diaphora, can also be open supply and out there right here.

Radare2 (r2) is an open-source disassembly framework that helps a really big selection of CPU architectures. It additionally bundles a succesful decompiler and helps the Ghidra decompiler as a plugin. As such, r2 is properly fitted to our goal of porting Diaphora to an open-source disassembler.

Further modifications made to the unique Diaphora included swapping the SQLite3 databases for MySQL. This alteration was carried out for the malware attribution course of described beforehand, as multiple analyst could be writing to the indicator database. With a number of analysts writing to the database, the SQLite database would have to be shared throughout group members and permit parallel write/learn operations. SQLite databases are usually not made for this type of utilization, so the Alien Labs group swapped it for an additional database engine higher designed for the duty.

Set up

As r2diaphora makes use of Radare2 and MySQL they have to be set-up previous to its utilization. Radare2 needs to be put in regionally, whereas the MySQL server might be distant or native. As soon as the atmosphere is about up you may set up it with pip set up r2diaphora. This pip bundle installs three command line utilities: r2diaphora, r2diaphora-db and r2diaphora-bulk.

r2diaphora: The principle command line utility, analyzes and compares information.
r2diaphora-db: Performs database administration and configuration.
r2diaphora-bulk: Analyzes binaries in batches.

Additional utilization choices might be obtained with the -h / –help command line possibility in every of them.

As soon as the pip bundle is efficiently put in you may enter your database credentials with r2diaphora-db config -u -p -hs . If you’re utilizing bash or the same shell and are not looking for your database password to be saved within the shell historical past, precede the command with an area.

Lastly, if you wish to use the r2ghidra decompiler, set up it with the r2pm -ci r2ghidra command, if it’s not put in already.

Utilization

As said beforehand, r2ghidra lists all out there choices if executed with the -h flag. At the moment, they’re the next:

For instance, we will execute r2diaphora on some take a look at IoT samples. You could find file hashes within the Related Indicators appendix.

First take a look at – evaluating to Sakura (a Gafgyt variant) samples with the identical structure:

r2diaphora 562b4c9a40f9c88ab84ac4ffd0deacd219595ab83ed23a458c5f492594a3a7ef 770363f9fd334c3f3c4ba0e05a2a0d4701f56a629b09365dfe874b2a277f4416

Determine 1. r2diaphora output for Sakura samples with the identical structure.

Observe how r2diaphora may establish the similarities between the 2 information. The system managed to search out 40 matches out of 56 potential (71%). Moreover, the similarity ratios for the matched features are near 1.0, indicating a really shut resemblance within the matched features. Moreover, the outcomes level in direction of true constructive matches for the reason that matched features have the identical identify and variety of primary blocks.

Second take a look at – evaluating Sakura samples with completely different architectures:

 r2diaphora 17c62e0cf77dc4341809afceb1c8395d67ca75b2a2c020bddf39cca629222161 6ce1739788b286cc539a9f24ef8c6488e11f42606189a7aa267742db90f7b18d

Determine 2. r2diaphora output for Sakura samples with completely different structure.

On this case, we see how the variety of matches has decreased from the earlier take a look at. This was anticipated as it’s tougher to match features throughout completely different architectures. The similarity ratios have additionally decreased because the meeting code differs in all of the in contrast features. Nonetheless, r2diaphora acknowledged many similarities between each samples and recognized right matches throughout the in contrast information.

Third take a look at – evaluating a Sakura pattern to a Yakuza (one other Gafgyt variant) pattern, each samples having completely different architectures:

$ r2diaphora sakura/594a6b2c1e9beac3ad5f84458b71c1b7ec05ee0239808c9a63bc901040e413a3 yakuza/91392f5dbbfd4ad142956983208a484b91ac5e84c4f9a9fcb530a9b085644c93

Determine 3. r2diaphora output for Sakura and Yakuza samples with completely different structure.

On this case, observe how the variety of matches have decreased even additional whereas the ratios have been maintained principally regular. That is because of the samples being completely different variants that carry out completely different modifications over the bottom Gafgyt supply code.

It is usually notable that the processCmd perform has been in a position to be matched with a low ratio. processCmd is the perform that parses the obtained instructions from the Command & Management server. The low ratio on this match is because of the variants having the ability to deal with completely different instructions, therefore their implementation being completely different. Nonetheless, the system was in a position to match it as a consequence of a typical fixed current in each features.

Conclusion

Code similarity evaluation is a strong software that may be leveraged to establish and attribute malware. Whereas not flawless, program diffing can bypass most of the weaknesses of static signatures and thus could possibly be used along with conventional detection strategies to construct a extra sturdy detection pipeline.

Appendix

Related Indicators (IOCs)

TYPE

INDICATOR

DESCRIPTION

SHA256

132948bef56cc5b4d0e435f33e26632264d27ce7d61eba85cf3830fdf7cb8056

Sakura pattern, Arch: ARM, EABI4

SHA256

136dbd3cfa947f286b972af1e389b2a44138c0013aa8060d20c247b6bcfdd88c

Sakura pattern, Arch: Intel 80386

SHA256

17c62e0cf77dc4341809afceb1c8395d67ca75b2a2c020bddf39cca629222161

Sakura pattern, Arch: ARM, EABI4

SHA256

19e0f329b5d8689b14d901b9b65c8d4fb28016360f45b3dfcec17e8340e6411e

Sakura pattern, Arch: Motorola m68k

SHA256

4cc11ffb3681ebced1f9d88e71b70a87e6d4498abca823245c118afead67b6a5

Sakura pattern, Arch: MIPS, MIPS-I model 1

SHA256

562b4c9a40f9c88ab84ac4ffd0deacd219595ab83ed23a458c5f492594a3a7ef

Sakura pattern, Arch: ARM, EABI4

SHA256

594a6b2c1e9beac3ad5f84458b71c1b7ec05ee0239808c9a63bc901040e413a3

Sakura pattern, Arch: x86-64

SHA256

5fec87479a8d2fa7f0ed7c8f6ba76eeea9e86c45123173d2230149a55dcd760d

Sakura pattern, Arch: MIPS, MIPS-I model 1

SHA256

603d14671f97d12db879cc1c7cd6abfa278bf46431ac73aeb6b3a4c4c2b16b9f

Sakura pattern, Arch: x86-64

SHA256

6b128a64a497eb123f03b77ef45e99e856282dc9620dc26ab38998627a8f3216

Sakura pattern, Arch: Renesas SH

SHA256

6ce1739788b286cc539a9f24ef8c6488e11f42606189a7aa267742db90f7b18d

Sakura pattern, Arch: Intel 80386

SHA256

770363f9fd334c3f3c4ba0e05a2a0d4701f56a629b09365dfe874b2a277f4416

Sakura pattern, Arch: ARM, model 1

SHA256

7c8ba5f88b1c4689a64652f0b8f5e3922e83f9f73c7e165f3213de27c5fb4d05

Sakura pattern, Arch: PowerPC

SHA256

8090c3a1a930849df42f7f796d42e0211344e709a5ac15c2b4aca8ca41de2cd3

Sakura pattern, Arch: Intel 80386

SHA256

94a279397b8c19ec7def169884a096d4f85ce0e21ff9df0be3ce264ef4565ea7

Sakura pattern, Arch: x86-64

SHA256

96bb3e5209e083544ea6a78bc6fc4ebc456e135a786d747718d936af3b063298

Sakura pattern, Arch: ARM, EABI4

SHA256

a079dfd60b55a7d74dd32d49a984bea43665b8b225beceae5b272944889217f6

Sakura pattern, Arch: MIPS, MIPS-I model 1

SHA256

b6c2f02b1bed62a6b845d5f13d9003f5aa3f6d0da3e62fa48d9822872453de10

Sakura pattern, Arch: Renesas SH

SHA256

cef15aa60dc2c09fe117e37e07399f0ef89dca9f930ce13ac1e29f8cf63d9a31

Sakura pattern, Arch: Motorola m68k

SHA256

e984334bbdd1179aadbde949f7c1b0fb02b6c18cb4a56d146150853b18adfa79

Sakura pattern, Arch: MIPS, MIPS-I model 1

SHA256

2858982408bf1664b622e830ad83b871749608a7533e94672153ff90caa658a9

Yakuza pattern, Arch: ARM, EABI4

SHA256

2b7262cae9e192fa7921f3ec02e0f924b32de3d418842fdad9a51603589a54c7

Yakuza pattern, Arch: Intel 80386

SHA256

2faf7437c769abd92347d6f0a77f001523ec41c02d2bf12e3cebf5b950457ba3

Yakuza pattern, Arch: Intel 80386

SHA256

4fc23e8409becb028997c2f0f2041e2dc853018b71e009e3d66f33876d5d4e99

Yakuza pattern, Arch: Renesas SH

SHA256

6554d5edb401e2def2ef9fbb82b591351d3c8261ce0a20c431470f1c68fa3aea

Yakuza pattern, Arch: ARM, model 1

SHA256

8005db9431013f094a2114046679ab971e62a8776639d6c2903fcc5d2fe8065c

Yakuza pattern, Arch: x86-64

SHA256

91392f5dbbfd4ad142956983208a484b91ac5e84c4f9a9fcb530a9b085644c93

Yakuza pattern, Arch: ARM, model 1

SHA256

b8aadb66183196868a9ff20bebd9c289fbfe2985fb409743bb0d0fea513e9caf

Yakuza pattern, Arch: ARM, EABI4

SHA256

d4f223fc5944bc06e12c675f0664509eeab527abc03cdd8c2fbd43947cc6cbab

Yakuza pattern, Arch: ARM, model 1

SHA256

f64b5f6dd7f222b7568bba9e05caa52f9e4186f9ba4856c8bf1274f4c77c653c

Yakuza pattern, Arch: Intel 80386

[ad_2]