
Python for Reverse Engineering 1: ELF Binaries - xrisk
https://icyphox.sh/blog/python-for-re-1/
======
saagarjha
> I’m not sure why it uses puts here? I might be missing something; perhaps
> printf calls puts.

It’s because you passed a constant string to printf, so the compiler decided
it was not worth making the call and used puts instead.

~~~
Icyphox
Thanks! I’d actually figured that out a little while after publishing.

------
billfruit
In general though, dealing with binary data in python isn't particularly
intuitive.Also many python tutorials and books fails to mentioned how to
manipulate binary data. I feel that is one of the places where the standard
library is not that rich.

~~~
civility
I disagree. The struct and array (not Numpy) modules are pretty great at
cutting up binary data. You provide a format string and it just works.

~~~
billfruit
I thought, the format string is unintutive if there are nested binary
structures or if there are arrays of nested binary structures.

~~~
civility
I do wish they were combined. It would be nice to handle arrays of structs and
structs of arrays more gracefully, and it's unfortunate how the format strings
almost (but not really) agree with each other.

And so long as I'm asking for ponies, it would also be nice if they handled
complex numbers gracefully.

------
hultner
Is it just for me or is the scroll on this site horrible broken? Shame because
the content looks great.

~~~
bhargav
Default behaviour seems to be overridden. I read the article and would
recommend you look past the scrolling. If you are on an iDevixe, reader mode
will help!

Edit: Spelling

------
RayDonnelly
If you haven't seen it, also checkout Project LIEF. It is very good indeed. We
use it for a lot of post-build binary verification in the conda ecosystem.

Windows, macOS and Linux are all supported.

[https://lief.quarkslab.com/](https://lief.quarkslab.com/)

------
Icyphox
Hi, I’m the author of this post. Feel free to ask questions, if any.

~~~
matmann2001
Hey. In your C code, you write to memory beyond what you malloc'd. You
malloc'd 9 bytes for 'pw', but later do "pw[9] = '\0'", which accesses the
10th byte, which doesn't belong to you.

~~~
blattimwind
malloc allocates aligned memory [1], so technically it's correct that he
writes past the allocated memory, but technically it's also impossible for
that write to fail _or_ for that write to overwrite something else.

[1] bonus point: for what kind of alignment? (The minimum is quite well
specified, for C standards)

~~~
spieglt
[https://www.gnu.org/software/libc/manual/html_node/Aligned-M...](https://www.gnu.org/software/libc/manual/html_node/Aligned-
Memory-Blocks.html)

"The address of a block returned by malloc or realloc in GNU systems is always
a multiple of eight (or sixteen on 64-bit systems)."

I was about to say, "what if they're on a 32-bit system and so were only
allocated one 8-byte block?" but then realized that since they'd requested 9
bytes, they'd be given two 8-byte blocks, or one 16-byte block on a 64-bit
system. Is that right?

~~~
spieglt
Well, I guess alignment doesn't say anything about how large of a block is
allocated.... And this is the clearest source I can find, which says 32 bytes.
[https://prog21.dadgum.com/179.html](https://prog21.dadgum.com/179.html)

~~~
blattimwind
> Well, I guess alignment doesn't say anything about how large of a block is
> allocated

It tells you where something can't be, and because virtual memory is allocated
in whole pages the "padding" so to speak will always be accessible.

There's also the obvious truism that if you can access something in a cache
line, all addresses in the cache line are safe to access. (Vectorized
algorithms frequently implicitly rely on this for short reads, IOW there is no
way reading a 128 or 256 bit vector can fault if just reading the first lane
would not fault).

~~~
saagarjha
> Vectorized algorithms frequently implicitly rely on this for short reads

This is _extremely_ processor-dependent and you should not be writing C if
you’re relying on this.

~~~
blattimwind
> This is extremely processor-dependent

No, it's not.

> you should not be writing C if you’re relying on this.

Luckily you are in no position to tell anyone what they should or shouldn't
do.

~~~
saagarjha
Sorry, I misunderstood the context of that statement and was thought you were
talking about vectorized algorithms exploiting out-of-bounds reads in general,
which is pretty dependent on the processor as to when it will work (depending
on how page boundaries and cache lines are set up). And I didn't really mean
my statement about using C in the prescriptive way you seem to have taken it:
I was merely trying to say that you should probably be using assembly in this
case, because you are relying on details of your processor that your compiler
is likely to be unaware about and may penalize you for. For example, the
vectorized string routines in libSystem _do_ overshoot the end of the string
because they use pcmpeqb, and it is written in assembly because it relies on
alignment guarantees that are difficult to express in C. Plus it guarantees
vectorization ;)

~~~
blattimwind
Ah, true, it is my turn to apologize then for interpreting your post in a
rather uncharitable way.

------
monocasa
Neat!

You can see some similar code I wrote in Rust here:
[https://github.com/monocasa/exeutils](https://github.com/monocasa/exeutils)

~~~
Icyphox
Nice. I’ve been planning to rewrite `readelf(1)` in Nim, I’ll check out your
code for some pointers :)

~~~
monocasa
Word, you should check out the backing library I wrote too then.

[https://github.com/monocasa/exefmt](https://github.com/monocasa/exefmt)

------
qaq
Wonder why security topics never get much interest on HN. It's a huge industry
with a ton of VC funding going to security startups.

~~~
rhexs
For one, the article seems to be impossible to read on an iPhone via safari.

~~~
kiddico
It seems to break in a different way every time I reload the page.

~~~
Icyphox
Mind telling me which model? Could be my my piss poor CSS acting up at that
resolution.

