Author: Sergio Prado (reposted with permission)
A few days ago I decided to reverse engineer my router’s firmware image with binwalk.
I’ve bought the TP-Link Archer C7 home router. Not one of the best, but good enough for my needs.
One thing I always do when I buy a new router is install OpenWRT. Why? Because the manufacturer’s firmware quality is usually bad, are not maintained over time and is insecure, with many bugs waiting to be exploited. I prefer to trust on a well maintained and open-source software project like OpenWRT.
When installing and configuring OpenWRT, I also downloaded the last version of the Archer C7 official firmware image provided by TP-Link and decided to analyze it. Just for fun, and to write a little bit about binwalk, one of the best tools for this job!
WHAT IS BINWALK?
Binwalk is an open-source tool for analyzing, reverse engineering and extracting firmware images.
Created in 2010 by Craig Heffner, binwalk is able to scan a firmware image and search for file signatures to identify and extract filesystem images, executable code, compressed archives, bootloader and kernel images, file formats like JPEGs and PDFs, and many more!
You can use binwalk to reverse engineer a firmware image to understand how it works. You can reverse engineer binaries inside filesystem images to look for vulnerabilities. You can extract files from the image and search for backdoor passwords or digital certificates. You can identify opcodes for a variety of CPU architectures.
You can decompress filesystem images to search for specific password files (passwd, shadow, etc) and try to break password hashes. You can perform a binary diff between two or more files. You can perform data entropy analysis to search for compressed data or hardcoded crypto keys. All of this without needing access to source code!
HOW DOES BINWALK WORK?
The main feature of binwalk is its signature scanning. Binwalk can scan a firmware image to search for different embedded file types and file systems.
You know the file command line utility, right?
$ file /bin/bash /bin/bash: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, BuildID[sha1]=12f73d7a8e226c663034529c8dd20efec22dde54, stripped
The file command will look at the header of the file and search for a signature (magic number) to identify the type of the file. For example, if the file starts with the sequence of bytes 0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A, it knows it’s a PNG file. Check this Wikipedia page for a list of common file signatures.
Binwalk works the same way. But instead of looking for signatures just at the beginning of the file, binwalk will scan the entire file. In addition, binwalk is able to extract the files found in the image.
Both file and binwalk tools use the libmagic library to identify file signatures. But binwalk additionally supports a list of custom magic signatures to find compressed/archived files, firmware headers, Linux kernels, bootloaders, filesystems, and so on!
Now let’s have some fun?
Binwalk is supported on several platforms, including Linux, OSX, FreeBSD, and Windows.
Binwalk has a lot of options:
$ binwalk Binwalk v2.2.0 Craig Heffner, ReFirmLabs https://github.com/ReFirmLabs/binwalk Usage: binwalk [OPTIONS] [FILE1] [FILE2] [FILE3] ... Signature Scan Options: -B, --signature Scan target file(s) for common file signatures -R, --raw=<str> Scan target file(s) for the specified sequence of bytes -A, --opcodes Scan target file(s) for common executable opcode signatures -m, --magic=<file> Specify a custom magic file to use -b, --dumb Disable smart signature keywords -I, --invalid Show results marked as invalid -x, --exclude=<str> Exclude results that match <str> -y, --include=<str> Only show results that match <str> Extraction Options: -e, --extract Automatically extract known file types -D, --dd=<type:ext:cmd> Extract <type> signatures, give the files an extension of <ext>, and execute <cmd> -M, --matryoshka Recursively scan extracted files -d, --depth=<int> Limit matryoshka recursion depth (default: 8 levels deep) -C, --directory=<str> Extract files/folders to a custom directory (default: current working directory) -j, --size=<int> Limit the size of each extracted file -n, --count=<int> Limit the number of extracted files -r, --rm Delete carved files after extraction -z, --carve Carve data from files, but don't execute extraction utilities -V, --subdirs Extract into sub-directories named by the offset Entropy Options: -E, --entropy Calculate file entropy -F, --fast Use faster, but less detailed, entropy analysis -J, --save Save plot as a PNG -Q, --nlegend Omit the legend from the entropy plot graph -N, --nplot Do not generate an entropy plot graph -H, --high=<float> Set the rising edge entropy trigger threshold (default: 0.95) -L, --low=<float> Set the falling edge entropy trigger threshold (default: 0.85) Binary Diffing Options: -W, --hexdump Perform a hexdump / diff of a file or files -G, --green Only show lines containing bytes that are the same among all files -i, --red Only show lines containing bytes that are different among all files -U, --blue Only show lines containing bytes that are different among some files -u, --similar Only display lines that are the same between all files -w, --terse Diff all files, but only display a hex dump of the first file Raw Compression Options: -X, --deflate Scan for raw deflate compression streams -Z, --lzma Scan for raw LZMA compression streams -P, --partial Perform a superficial, but faster, scan -S, --stop Stop after the first result General Options: -l, --length=<int> Number of bytes to scan -o, --offset=<int> Start scan at this file offset -O, --base=<int> Add a base address to all printed offsets -K, --block=<int> Set file block size -g, --swap=<int> Reverse every n bytes before scanning -f, --log=<file> Log results to file -c, --csv Log results to file in CSV format -t, --term Format output to fit the terminal window -q, --quiet Suppress output to stdout -v, --verbose Enable verbose output -h, --help Show help output -a, --finclude=<str> Only scan files whose names match this regex -p, --fexclude=<str> Do not scan files whose names match this regex -s, --status=<int> Enable the status server on the specified port
We are now ready to scan firmware images.
SCANNING A FIRMWARE IMAGE WITH BINWALK
Let’s start by searching file signatures inside the image (I downloaded this image from TP-Link’s website).
Running binwalk with the –signature parameter will do the job:
$ binwalk --signature --term archer-c7.bin DECIMAL HEXADECIMAL DESCRIPTION ------------------------------------------------------------------------------------------ 21876 0x5574 U-Boot version string, "U-Boot 1.1.4-g4480d5f9-dirty (May 20 2019 - 18:45:16)" 21940 0x55B4 CRC32 polynomial table, big endian 23232 0x5AC0 uImage header, header size: 64 bytes, header CRC: 0x386C2BD5, created: 2019-05-20 10:45:17, image size: 41162 bytes, Data Address: 0x80010000, Entry Point: 0x80010000, data CRC: 0xC9CD1E38, OS: Linux, CPU: MIPS, image type: Firmware Image, compression type: lzma, image name: "u-boot image" 23296 0x5B00 LZMA compressed data, properties: 0x5D, dictionary size: 8388608 bytes, uncompressed size: 97476 bytes 64968 0xFDC8 XML document, version: "1.0" 78448 0x13270 uImage header, header size: 64 bytes, header CRC: 0x78A267FF, created: 2019-07-26 07:46:14, image size: 1088500 bytes, Data Address: 0x80060000, Entry Point: 0x80060000, data CRC: 0xBB9D4F94, OS: Linux, CPU: MIPS, image type: Multi-File Image, compression type: lzma, image name: "MIPS OpenWrt Linux-3.3.8" 78520 0x132B8 LZMA compressed data, properties: 0x6D, dictionary size: 8388608 bytes, uncompressed size: 3164228 bytes 1167013 0x11CEA5 Squashfs filesystem, little endian, version 4.0, compression:xz, size: 14388306 bytes, 2541 inodes, blocksize: 65536 bytes, created: 2019-07-26 07:51:38 15555328 0xED5B00 gzip compressed data, from Unix, last modified: 2019-07-26 07:51:41
Now we have a lot of information about the image!
The image uses U-Boot as the bootloader (image header at address 0x5AC0 and compressed bootloader image at address 0x5B00). Based on the uImage header at address 0x13270, we know the CPU architecture is MIPS and the Linux kernel is version 3.3.8. And based on the image found at the address 0x11CEA5, we can see that the rootfs is a squashfs filesystem.
Let’s now extract the bootloader (U-Boot) with the dd command:
$ dd if=archer-c7.bin of=u-boot.bin.lzma bs=1 skip=23296 count=41162 41162+0 records in 41162+0 records out 41162 bytes (41 kB, 40 KiB) copied, 0,0939608 s, 438 kB/s
Since the image is compressed with LZMA, we need to decompress it:
$ unlzma u-boot.bin.lzma
Now we have the U-Boot image:
$ ls -l u-boot.bin -rw-rw-r-- 1 sprado sprado 97476 Fev 5 08:48 u-boot.bin
How about a search for the default value of the bootargs environment variable?
$ strings u-boot.bin | grep bootargs bootargs bootargs=console=ttyS0,115200 board=AP152 rootfstype=squashfs init=/etc/preinit mtdparts=spi0.0:128k(factory-uboot),192k(u-boot),64k(ART),1536k(uImage),14464k@0x1e0000(rootfs) mem=128M
The U-Boot bootargs environment variable is used to pass parameters to the Linux kernel. And from the output above we have a better understanding of the device’s flash memory layout.
How about extracting the Linux kernel image?
$ dd if=archer-c7.bin of=uImage bs=1 skip=78448 count=1088572 1088572+0 records in 1088572+0 records out 1088572 bytes (1,1 MB, 1,0 MiB) copied, 1,68628 s, 646 kB/s
We can confirm that the image was extracted successfully with the file command:
$ file uImage uImage: u-boot legacy uImage, MIPS OpenWrt Linux-3.3.8, Linux/MIPS, Multi-File Image (lzma), 1088500 bytes, Fri Jul 26 07:46:14 2019, Load Address: 0x80060000, Entry Point: 0x80060000, Header CRC: 0x78A267FF, Data CRC: 0xBB9D4F94
The uImage file format is basically the Linux kernel image with an additional header. So let’s remove this header to get the final Linux kernel image:
$ dd if=uImage of=Image.lzma bs=1 skip=72 1088500+0 records in 1088500+0 records out 1088500 bytes (1,1 MB, 1,0 MiB) copied, 1,65603 s, 657 kB/s
The image is compressed, so let’s decompress it:
$ unlzma Image.lzma
Now we have the final Linux kernel image:
$ ls -la Image -rw-rw-r-- 1 sprado sprado 3164228 Fev 5 10:51 Image
What could we do with the kernel image? We could for example search for strings in the image to find the Linux kernel version string and learn about the environment used to build the kernel:
$ strings Image | grep "Linux version" Linux version 3.3.8 (leo@leo-MS-7529) (gcc version 4.6.3 20120201 (prerelease) (Linaro GCC 4.6-2012.02) ) #1 Mon May 20 18:53:02 CST 2019
Although the firmware was released last year (August 2019) as I write this article, it uses an old Linux kernel version (3.3.8) released in 2012 compiled with a very old GCC version (4.6) also from 2012!
With the –opcodes option, we can also use binwalk to search for machine instructions and identify the CPU architecture of the image:
$ binwalk --opcodes Image DECIMAL HEXADECIMAL DESCRIPTION -------------------------------------------------------------------------------- 2400 0x960 MIPS instructions, function epilogue 2572 0xA0C MIPS instructions, function epilogue 2828 0xB0C MIPS instructions, function epilogue
What about the root filesystem? Instead of manually extracting the image, let’s use binwalk’s –extract option:
$ binwalk --extract --quiet archer-c7.bin
The full root filesystem will be extracted in a subdirectory:
$ cd _archer-c7.bin.extracted/squashfs-root/ $ ls bin dev etc lib mnt overlay proc rom root sbin sys tmp usr var www $ cat etc/banner MM NM MMMMMMM M M $MMMMM MMMMM MMMMMMMMMMM MMM MMM MMMMMMMM MM MMMMM. MMMMM:MMMMMM: MMMM MMMMM MMMM= MMMMMM MMM MMMM MMMMM MMMM MMMMMM MMMM MMMMM' MMMM= MMMMM MMMM MM MMMMM MMMM MMMM MMMMNMMMMM MMMM= MMMM MMMMM MMMMM MMMM MMMM MMMMMMMM MMMM= MMMM MMMMMM MMMMM MMMM MMMM MMMMMMMMM MMMM= MMMM MMMMM, NMMMMMMMM MMMM MMMM MMMMMMMMMMM MMMM= MMMM MMMMMM MMMMMMMM MMMM MMMM MMMM MMMMMM MMMM= MMMM MM MMMM MMMM MMMM MMMM MMMM MMMM MMMM$ ,MMMMM MMMMM MMMM MMM MMMM MMMMM MMMM MMMM MMMMMMM: MMMMMMM M MMMMMMMMMMMM MMMMMMM MMMMMMM MMMMMM MMMMN M MMMMMMMMM MMMM MMMM MMMM M MMMMMMM M M M --------------------------------------------------------------- For those about to rock... (%C, %R) ---------------------------------------------------------------
Now we can do a lot of things!
We can search for configuration files, password hashes, crypto keys, and digital certificates. We can analyze the binaries to find bugs and vulnerabilities.
$ ls bin dev etc lib mnt overlay proc rom root sbin sys tmp usr var www $ cp /usr/bin/qemu-mips-static . $ sudo chroot . ./qemu-mips-static bin/busybox BusyBox v1.19.4 (2019-05-20 18:13:49 CST) multi-call binary. Copyright (C) 1998-2011 Erik Andersen, Rob Landley, Denys Vlasenko and others. Licensed under GPLv2. See source distribution for full notice. Usage: busybox [function] [arguments]... or: busybox --list[-full] or: function [arguments]... BusyBox is a multi-call binary that combines many common Unix utilities into a single executable. Most people will create a link to busybox for each function they wish to use and BusyBox will act like whatever it was invoked as. Currently defined functions: [, [[, addgroup, adduser, arping, ash, awk, basename, cat, chgrp, chmod, chown, chroot, clear, cmp, cp, crond, crontab, cut, date, dd, delgroup, deluser, dirname, dmesg, echo, egrep, env, expr, false, fgrep, find, free, fsync, grep, gunzip, gzip, halt, head, hexdump, hostid, id, ifconfig, init, insmod, kill, killall, klogd, ln, lock, logger, ls, lsmod, mac_addr, md5sum, mkdir, mkfifo, mknod, mktemp, mount, mv, nice, passwd, pgrep, pidof, ping, ping6, pivot_root, poweroff, printf, ps, pwd, readlink, reboot, reset, rm, rmdir, rmmod, route, sed, seq, sh, sleep, sort, start-stop-daemon, strings, switch_root, sync, sysctl, tail, tar, tee, telnet, test, tftp, time, top, touch, tr, traceroute, true, udhcpc, umount, uname, uniq, uptime, vconfig, vi, watchdog, wc, wget, which, xargs, yes, zcat
Cool! But notice that the BusyBox version is 1.19.4. This is a very old BusyBox version released on April 2012.
So TP-Link releases a firmware image in 2019 using software (GCC toolchain, kernel, BusyBox, etc) from 2012!
Can you see now why I always install OpenWRT on my routers?
MORE COOL STUFF
Binwalk is also able to perform entropy analysis, printing raw entropy data and generating entropy graphs. The entropy will be high when the bytes in the image look random, and that could mean the image has an encrypted, compressed or obfuscated file, or even hardcoded crypto key!
We can also use the –raw option to search for a custom sequence of raw bytes in the image or the –hexdump option to perform a hex dump comparing two or more input files.
Custom signatures can be added to binwalk either through a custom signature file specified on the command line via the –magic option or by adding them to your $HOME/.config/binwalk/magic directory.
You can find more information about binwalk in the official documentation usage page.
There is a binwalk API implemented as a Python module that can be used by any Python script to programmatically perform binwalk scans and the binwalk command line utility can be duplicated nearly entirely with just two lines of Python code!
With the Python API, you can also create Python plugins to customize and extend binwalk.
So why don’t you download any firmware image from the Internet and try binwalk yourself? I promise you will have a lot of fun!
This blog post originally appeared on #embeddedbits and is reposted here with permission by Sergio Prado.