
Reverse engineering my router's firmware with binwalk - sprado
https://embeddedbits.org/reverse-engineering-router-firmware-with-binwalk/
======
bonyt
This is indeed a cool tool! I've used it before when forensically analyzing a
cell phone, and found interesting things. For example, I found that a web
browser had cached the unencrypted bytes from an HTTP message. Binwalk
identified the gzip header's magic number (1f 8b), and after decompression
there were interesting results.

Another cool tool I learned about recently is signsrch. It's more for reverse
engineering binaries of software that implements encryption of some type.
It'll find signatures in the binaries of these encryption methods, giving you
a place to look when, for example, reverse engineering a file format that you
suspect is encrypted in some way.

[https://www.oreilly.com/library/view/learning-malware-
analys...](https://www.oreilly.com/library/view/learning-malware-
analysis/9781788392501/4f565d19-d23b-4859-9990-f9724684967c.xhtml)

------
JoeAltmaier
Cool tool! I wrote something for reverse-engineering code, as a consultant
years ago. They had a radio module but the manufacturer had lost the source
code.

So the tool was called Golem. It had tables for defining opcode to assembler
pattern matching, that could be written for any machine (instead of just the
one I was cracking).

It worked iteratively. You ran it over the binary once, it produced arbitrary
labels from jump-points. You could annotate that output by changing the labels
to something human-readable (e.g. Loop-back, Main, TimerISR etc) and add
comments.

The next iteration would read that back in to build a symbol table, rescan the
binary and re-output. But this time it would understand that the symbols were
always on opcode boundaries, distinguish data table from code entry points
(because you marked them) etc. So it would do a better job of staying in sync
with the code.

Once I was done with that project (and had re-compilable source for the radio
module) I put it away and never thought of it again.

~~~
souprock
You were on your way to cloning IDA Pro, Ghidra, Binary Ninja, or Hopper
Disassembler. To varying degrees, sometimes as a pay-extra option, those tools
can produce source code.

~~~
JoeAltmaier
Um. I think they post-dated me! But I didn't go anywhere with it.

~~~
souprock
IDA Pro started as a 16-bit MS-DOS program. It's real old. I'm pretty sure I
was using it back in 1992, when it was already a well-developed program.

Ghidra is old too, although only recently public. It couldn't be older than
Java, which is from 1996.

~~~
JoeAltmaier
Cool. I did mine in 2006. Hey, those have mostly Intel disassemblers. Mine did
any machine code you cared to write a dissector for.

Are they iterative? Can you add human clues/cues so they do a better job the
next time?

~~~
souprock
They are not at all mostly Intel disassemblers, though some of them have
freeware versions (to suppress competition) or time-limited demo versions that
are purposely limited. They are very much designed around humans adding clues:
you can declare function parameters, struct types, enumerations, and the
meaning of various offsets in code. They are interactive GUI tools,
continuously updating automated analysis as the user assists by providing
clues to the analysis engine. Ghidra and Binary Ninja can be simultaneously
multi-user, storing the database on a server for collaboration.

IDA Pro supports dozens of processor architectures. I count about 70, not
including model variations and not including community support.
[https://www.hex-rays.com/products/ida/processors/](https://www.hex-
rays.com/products/ida/processors/)

Ghidra supports "X86 16/32/64, ARM/AARCH64, PowerPC 32/64/VLE, MIPS
16/32/64/micro, 68xxx, Java / DEX bytecode, PA-RISC, PIC 12/16/17/18/24, Sparc
32/64, CR16C, Z80, 6502, 8051, MSP430, AVR8, AVR32, and variants of these
processors."

Binary Ninja officially supports x86, x64, ARMv7, Thumb2, ARMv8, PowerPC,
MIPS, 6502. Community support adds AVR, MSP430, and VMNDH-2k12.

Hopper Disassembler supports "x86{16,32,64}, Dalvik, avr, ARM, java, PowerPC,
Sparc, MIPS"

------
hyper_reality
It's a good article but there are much easier ways to use binwalk than
presented here.

In the first example he uses the "\--signature" and "\--term" flags, these are
unnecessary. Running binwalk with no flags will produce the same output.

To extract part of the file, he also uses dd with the "skip" and "count"
options painfully calculated. You can just use:

binwalk --dd='.*' img.bin

and it will extract everything that matches the pattern - the pattern above
will extract all found files.

~~~
hiisukun
Just a quick note to be careful extracting what binwalk considers to be
'everything' (such as the pattern above, or a -e for known file types) on
larger files. Sometimes there will be a higher amount of matches than you
might expect (such as in a .pcap file). You could magically extract gigabytes
of data from a 100MB file, which may be unhelpful and takes a long time.

------
ggcdn
A slightly related question for HNers: Is there any easy tool for a non-cs guy
to reverse engineer a binary file containing numbers and text in some specific
format?

I have to work with some old structural analysis software. The material and
element definitions come in an obscure file format ".PF3CMP". I know it
contains text like the material names, and numbers/letters for the material
properties.

Ultimately its my goal to be able to write these files from matlab or python,
instead of using the horribly clunky user interface. But first I need to know
the structure of the file, and I'm not even sure how to begin figuring that
out.

[0] is what it looks like when opened in a hex editor

[0] [https://imgur.com/a/jvqV3k8](https://imgur.com/a/jvqV3k8)

~~~
Youden
I don't know of any straightforward tools, most people I've seen reverse
engineer a format do it with a hex editor and writing custom scripts. It's not
directly relevant but the best I've seen is this presentation about reverse
engineering the protocol used to communicate within a car:
[https://www.youtube.com/watch?v=KkgxFplsTnM](https://www.youtube.com/watch?v=KkgxFplsTnM)

It uses some techniques that might be relevant, like monitoring different
parts of a file as you make different changes (like accelerating or
decelerating). In your case it might be possible to compare between different
material definitions for example.

~~~
ggcdn
Ok thanks, I'll take a look. It's possible for me to generate these files for
each of the various material settings so I can manually 'diff' them, simillar
to what you're describing

~~~
hnick
If there are massive differences with minor changes that can be a clue that
the data is compressed or encrypted in some manner.

A good test would be if you can name/tag/comment items in the file, you can
search for these strings.

~~~
ggcdn
I don’t think there’s compression or encryption. I can search and find the hex
representation of text and values that I expect to be there. I guess I need to
bite the bullet and spend some time tagging the parameters I know, then
figuring out the pattern of padding that is in between.

------
xenocratus
I first found out about binwalk from this YT video on Firmware Reverse
Engineering:
[https://www.youtube.com/watch?v=GIU4yJn2-2A](https://www.youtube.com/watch?v=GIU4yJn2-2A)

Quite a good, short intro into the subject as well!

------
GEBBL
This is amazing! I’ve used binwalk extract for ‘capture the flag’ challenges
but I never really thought about the practical applications of it. Wow! Thank
you

~~~
LeonM
Funny, I always assumed that there would be no application for binwalk other
than for extracting binary firmware images of embedded devices.

Using binwalk for CTF challenges is actually a new insight for me :)

~~~
beefhash
Conversely, it's a convenient tool for obfuscation. You can trigger plausible
false positives all over, while also making sure that there's nothing of
immediate use with binwalk left.

------
ChuckNorris89
_> Although the firmware was released last year (August 2019) as I write this
article, it uses an old Linux kernel version (3.3.8) released in 2012 compiled
with a very old GCC version (4.6) also from 2012!_

This is what happens whey you pay peanuts for embedded devs and outsource
development to the cheapest sweatshop you can find so your products can meet a
competitive price point.

Sadly this will not change until there's regulation in place to hold
manufacturers accountable for their massively obvious vulnerabilities since
nobody cares that they're flooding the market with potential botnet hosts when
they're overworked, paid miserably and have a manager constantly breathing
down their neck.

~~~
bluesign
It is mostly related to drivers to soc, not about paying devs

~~~
ChuckNorris89
So how did OpenWRT manage to build firmware with up to date components for it?
The Qualcomm chips inside of it seem fairly modern for such an old kernel.

~~~
prashnts
Note that openwrt has a big community of contributors and not all
devices/features are supported. In contrast the manufacturer firmware is at
least feature complete and easy for regular users to set up.

~~~
rahuldottech
OpenWrt is also free. Both as free software, and free of cost. When you're
paying a manufacturer for a product, surely it's not too much to expect them
to ship with functional software that also happens to be up-to-date and
secure?

~~~
jschwartzi
You can get that, but not at consumer-grade router prices. I have a separate
router that I put behind my stand-alone cable modem. I paid for that separate
router about $200.00. And another $100 for the modem. A wifi access point cost
me another $100.

So it's about $400.00 for a router that has updated firmware(pfSense). Or you
can cheap out and spend only $100.00. This is what you get by doing that.

------
commandlinefan
From the output I see:

    
    
      23296         0x5B00          LZMA compressed data, properties: 0x5D, dictionary size:
    
                                    8388608 bytes, uncompressed size: 97476 bytes
    
      64968         0xFDC8          XML document, version: "1.0"
    

So it looks like the size of the bootloader should be 64968 - 23296 = 41672.
But he extracts 41162:

    
    
      $ dd if=archer-c7.bin of=u-boot.bin.lzma bs=1 skip=23296 count=41162
    

Curious if anybody knows why 41162; is this a block-size alignment
requirement?

~~~
mrspeaker
I'm wondering how these values are determined too. I'm "following along at
home" without any idea what I'm doing (though all the files, bytes, and
offsets are matching with the tutorial... Also, if the original author finds
this thread: amazing write-up - got me really interested in the topic!).

At the step where they remove the header with

    
    
        dd if=uImage of=Image.lzma bs=1 skip=72
    

It results in a file that if I try and un compress it with `unlzma Image.lzma`
it complains with "Compressed data is corrupt"

I don't know where the magic number "72" comes from. Is it likely that could
be different on my machine (a mac)?

[edit: I think there's something else wrong - if I use `mkImage` to examine
the uImage file I only get:

    
    
        mkimage -l uImage
        GP Header: Size 27051956 LoadAddr 78a267ff
    

Instead of image information]

~~~
syntheticnature
The 72 bytes is from the difference between the uImage header and the lzma
inside, from the post. 0x132b8-0x13270 = 72 (dec).

So you'll need to check what binimage says about your image, the uImage header
isn't necessarily fixed in size. Also see the comment above about the --dd
switch, though mind the reply to that pointing out you might want to check
what it finds before just letting it write a pile of files.

------
0xquad
Given the TERMS OF USE under TP-Link's privacy policy [ [https://www.tp-
link.com/us/about-us/privacy/](https://www.tp-link.com/us/about-us/privacy/) ]
it seems like they consider it illegal to do any of this. Their terms, along
with the "we don't even pretend to care about your privacy rights" attitude
have made me question any further purchase of TP-Link products.

Relevant quotes: "By using the Products or Services in any way, you agree to
the Terms. " "Also, modifying, translating, adapting, or otherwise creating
derivative works and improvements, decompiling, decoding, reverse engineering,
disassembling, or otherwise reducing the code used in any software in
connection with the Services into a readable form in order to examine the
source code or construction of such software and/or to copy or create other
products based (in whole or in part) on such software, is prohibited."

~~~
rblatz
How does that jive with the GPL code they are shipping?

------
leeoniya
glad i flashed latest dd-wrt beta on my archer-c7 v5 :D. though my wan-facing
device runs OPNSense.

i actually prefer to run Tomato, but archer c7 is not broadcom :(

can anyone offer advice about dd-wrt vs openwrt (considering trying openwrt).

~~~
josteink
Latest version of OpenWRT (19) runs noticeably better on this device, with
better HW offloading support and based on a nearly mainline, modern Linux
kernel and a brand new device-tree for the Atheros SoC.

What reasons do you have to stay on dd-wrt?

~~~
leeoniya
> What reasons do you have to stay on dd-wrt?

mostly that i've used it before. can i gui-flash to openwrt from dd-wrt? i've
done tftp flashes before but they're pretty fiddly with getting the stupid
30-30-30 or whatever timing right. also i think these routers try to "pull"
from a tftp server rather than having you push to one that they bootstrap -
i've never been able to get the "pull" variant to work.

would be hell of a lot easier if the router could be booted into something
like android's (arm's?) fastboot or flashmode mode so i can just push an
image.

~~~
SpikedCola
Going from dd-wrt to openwrt should be as simple as a firmware flash from the
web gui, and an nvram reset. Worst case, you can flash a "revert to stock"
image from ddwrt to go back to factory, then flash openwrt as if the device
was factory.

Openwrt also has a handy failsafe built into a lot of models. It boots a
stripped down http server where you can upload recovery firmware.

Used to swear by dd-wrt, now I prefer openwrt.

------
fr0ster
I'm trying repeat steps from article. After next command: dd if=uImage
of=Image.lzma bs=1 skip=72 I'm trying unpack lzma file: unlzma Image.lzma And
get message: unlzma: Image.lzma: Compressed data is corrupt

Does it mean I downloaded corruption zip file from TP-Link site? How I can
extract kernel image? Binwalk says about Image.lzma: 0 0x0 LZMA compressed
data, properties: 0x6D, dictionary size: 8388608 bytes, uncompressed size:
3164228 bytes

~~~
fr0ster
I don't understand how I can unpack Image.lzma, if "unlzma Image.lzma" doesn't
work but "Binwalk -e Image.lzma" work correct?

------
josteink
Did I read the blog wrong, or was the stock firmware too based on a OpenWRT
kernel?

That would be pretty hilarious if it was true.

~~~
fencepost
I'm pretty sure a lot of stock firmware is based on OpenWRT or used to be,
though I'm pretty sure most of them lag well behind the current version. I
haven't paid much attention for a while, but I think a lot were based on
Kamikaze which is more than 10 years old now.

For the vendors with access to closed-source drivers and chipset info they can
likely support devices not supported on the open source packages.

Edit: Per Wikipedia, "Qualcomm's QCA Software Development Kit (QSDK) which is
being used as a development basis by many OEMs is an OpenWrt derivative"

It also notes Ubiquiti's wireless router firmware as being derived from
OpenWRT, but I thought I remembered discussion of Ubiquiti being derived from
a different open source distribution - unless perhaps the routers and wireless
devices don't share a code base.

~~~
josteink
That's pretty cool. I didn't know that.

Looking into the equivalent firmware[1] for my Archer C7 v2, I didn't find any
OpenWRT bits though. I was honestly a little bit disappointed.

I guess the difference between hardware revisions might be more fundamental
than I assumed.

    
    
        DECIMAL       HEXADECIMAL     DESCRIPTION
        --------------------------------------------------------------------------------------------------------
        0             0x0             TP-Link firmware header, firmware version: 1.-15188.3, image version: "",
                                      product ID: 0x0, product version: -956301310, kernel load address: 0x0,
                                      kernel entry point: 0x80002000, kernel offset: 16384512, kernel length:
                                      512, rootfs offset: 855873, rootfs length: 1048576, bootloader offset:
                                      15204352, bootloader length: 0
        71520         0x11760         Certificate in DER format (x509 v3), header length: 4, sequence length: 64
        98560         0x18100         U-Boot version string, "U-Boot 1.1.4 (Mar  5 2018 - 13:57:29)"
        98736         0x181B0         CRC32 polynomial table, big endian
        131584        0x20200         TP-Link firmware header, firmware version: 0.0.3, image version: "",
                                      product ID: 0x0, product version: -956301310, kernel load address: 0x0,
                                      kernel entry point: 0x80002000, kernel offset: 16252928, kernel length:
                                      512, rootfs offset: 855873, rootfs length: 1048576, bootloader offset:
                                      15204352, bootloader length: 0
        132096        0x20400         LZMA compressed data, properties: 0x5D, dictionary size: 33554432 bytes,
                                      uncompressed size: 2451644 bytes
        1180160       0x120200        Squashfs filesystem, little endian, version 4.0, compression:lzma, size:
                                      9878520 bytes, 789 inodes, blocksize: 131072 bytes, created: 2018-03-05
                                      06:16:10
    
    

[1] [https://static.tp-
link.com/2018/201806/20180611/Archer%20C7(...](https://static.tp-
link.com/2018/201806/20180611/Archer%20C7\(EU\)_V2_180305.zip)

~~~
mjevans
The BOM can vary quite a lot between 'revisions', using your product as an
example...

[https://openwrt.org/toh/tp-link/archer-c7-1750](https://openwrt.org/toh/tp-
link/archer-c7-1750) (Scroll down to the Info Links table and the Wikidevi
Info column)

v1 to v2 upgrades the Flash (8MB to 16MB) and uses a slightly different AN+AC
wifi chip. v2 and v3 seem pretty similar at a glance. v4 is rated at 12v 2a
rather than 2.5a; using a completely different BGN(2.6ghz) chip and also
different ethernet chip/switch. v5 is lower power still at 1.5a, but it's less
obvious where that change happened due to lack of pictures. A guess based on
the simpler antenna list is that it uses less antenna.

------
andrewshadura
Another similar tool to look at is Hachoir.

------
Relys
If you like binwalk, you might want to check out the commercial product,
Centrifuge[1], that the developers are working on (I know the CSO).

[1] [https://www.refirmlabs.com/centrifuge-
platform/](https://www.refirmlabs.com/centrifuge-platform/)

------
tasubotadas
I am really surprised that firmware images are not just .tar.gz files renamed
to .bin :/. That's how I would have implemented a distribution of new
firmware.

~~~
josteink
And how do you partition boot-loaders, kernels, and rootfs and such in that
tar.gz?

Embedded device will be hard coded to look at a fixed point and start booting
from there, there’s no UEFI. How will you ensure boot-loaders get unpacked
precisely where they need to be?

And that doesn’t even touch the idea of having a _router_ understand a file
system before any firmware code is loaded.

Routers really are quite different from PCs.

~~~
bshipp
True enough, but I think they used to be even more unique and over time
they've become more like PCs.

One of these days I'm going to log in to the admin interface and find candy
crush installed.

~~~
vlovich123
They're "like PCs" in the sense that the instruction set has of the CPUs has
caught up and in theory you can attach more complicated peripherals. However,
unless your embedded product has MMC flash attached (for many applications it
doesn't due to cost + physical size) you're SOL for the following reasons:

1\. For M4s your storage is typically some kind of SPI flash which doesn't act
like the traditional desktop flash you're dealing with. You have to manually
specify the address you're reading/writing & you have to do it on block
boundaries (multiple KB). You're generally looking at 8-64MB. 2\. For M0 your
storage is typically flash built-in with potentially even more restrictions.
3\. These devices have _very_ little RAM. Decompression means you have to have
a way of enforcing constraints on the amount of space you'll need. Aside from
the space needed regularly for decompression you may need to buffer the
decompressed content in-memory to align with block boundaries. All of this
means development time, increased costs & risk for something you may not be
able to pull of.

If your vendor actually internally compresses their image then great but
generally they don't for all the same reasons (+ sometimes this is touching
ROM code in the chip).

