Hacker News new | past | comments | ask | show | jobs | submit login
Libtree: Ldd as a tree saying why a library is found or not (github.com/haampie)
315 points by fanf2 10 months ago | hide | past | favorite | 55 comments



Does this copy that unexpected behavior where ldd will actually execute some libraries it's interrogating?

https://catonmat.net/ldd-arbitrary-code-execution


Recent (like 5+ years) versions of "ldd" will not invoke the target binary either.

See: https://manpages.debian.org/unstable/manpages/ldd.1.en.html#...


Does this mean that ldd is secure by default now?


From a quick scan of the code, on mobile, it doesn't look like it does.

This seems to actually manually parse the ELF file and recursively parse any dependencies.

Pretty cool.


I wrote something vaguely similar for python and we could never get around that issue https://github.com/google/importlab/issues/69


Or use objdump, which prints the data as it is encoded in the ELF file. Including the list of libraries the vdso will search for.



Having to recursively and iteratively hunt missing dependencies with ldd can be pretty tedious. So this seems like a good improvement. Will try it next time I get an obscure not found


Basically the Linux CLI version of depends.exe.


Note that that particular tool doesn't work very well for modern Windows versions anymore.

Use https://github.com/lucasg/Dependencies instead (even though that isn't exactly up-to-date either...)

If you have Visual Studio installed (and have selected the 'x64/x86 build tools (latest)' in the installer), `dumpbin /dependents` from a VS Developer Command Prompt remains the most reliable option.


Yep, my thoughts exactly.

[1] https://www.dependencywalker.com/


In case anyone else is wondering what the colors mean (I couldn't find this in the manpage/README):

Magenta: In exclude list (only shown with -v[v[v]])

Blue: Seen before (so you can spot which dependencies appear multiple times)


What does "why a library is found or not" mean? It is either in the LD_LIBRARY_PATH or it isn't? The screenshot doesn't make it clear to me what this refers to.


I don't know what it means in the context of this tool, but library searching is much more complicated than a single environment variable.

There is the system search path, runpath, rpath, and LD_LIBRARY_PATH as just a few different distinct methods library searched on directories.

Libraries are also usually linked just by the short name, IE foo.so, but they can also be dynamically linked to the full path of the library.

Side note, as a general rule if you can avoid setting the LD_LIBRARY_PATH you'll be much off. It's not always possible, but setting puts it at the top of the search for all executions. Even if something dynamically links the full path to a library, LD_LIBRARY_PATH will take precedence. It completely flattens the search.


On top of that, libraries can be linked by their SONAME and not the file name. (This is rarely done however.)


The SONAME is the filename used for runtime linking. On the filesystem it may or may not be a symlink but that does not concernt the linker. It may not be (and usually isn't) the same filemame used when bulding the binary. This is because the SONAME is meant to specify a specific ABI version of the library.


I thought dynamic linking always uses the soname


Static linking uses the DT_SONAME of a linked library to create an entry in the DT_NEEDED table. The name in the DT_NEEDED table is used by the dynamic loader to search for a matching filename to load in. So in a very indirect sense the dynamic loader always uses the SONAME that was supplied at build time, but it also uses a search algorithm to find that filename at load time.


A library need not be in LD_LIBRARY_PATH in order to be found.

The point of the title is to say that libtree makes it easy to find the paths from an executable to all its direct and indirect dependencies, one of the uses of which is to help figure out what's up with missing dependencies.

In reality if you're using a packaging system then you likely won't have missing dependencies, so you'd use libtree for other reasons.


> It is either in the LD_LIBRARY_PATH or it isn't

Or RPATH, which is evaluated at load time per each library. The bigger point is that dependencies form a graph (which can be displayed as a tree) and its useful to know why a library wasn't found because of the library that required it.


LD_LIBRARY_PATH is not the only way libraries are found. The loader considers several other sources to find libraries. Assuming a sane setup, this will be a combination of ordinary ELF fields in each binary that gets loaded and other paths known to the loader. Relying on LD_LIBRARY_PATH can be very short-sighted on systems with many versions of the same library or multiple libraries with the same name. The loader must search the paths in LD_LIBRARY_PATH for each binary in order, and it will pick the first library that matches (unless you have other higher-priority paths set by other means). That may not be the one you really want. This can lead to unexpected errors at runtime. The better way is to set RPATH for the affected binaries to the location that has the libraries that it requires.

It is possible to end up loading multiple versions of the same library as well, if your environment and RPATH settings are not consistent. This tool will help you figure out if you have problems and why.


> It is either in the LD_LIBRARY_PATH or it isn't?

at minimum you have RPATH, RUNPATH, LD_LIBRARY_PATH.

this tool is based on ldd and thus also (presumably) resolves DYLD_LIBRARY_PATH, DYLD_FALLBACK_FRAMEWORK_PATH, DYLD_FALLBACK_LIBRARY_PATH, @executable_path, @loader_path and @rpath.


Nope, only elf, no macho support.


See `man ld.so` for the search behavior.

Note that there are inconsistencies between glibc and musl when it comes to rpath and runpath.


I think it wanted to say that is it shows how it found the dependency: LD_LIBRARY_PATH, rpath, etc.


this is so useful; I usually read sections with readelf to figure out what the real requirements are


Is LD_DEBUG=libs not enough?


This is not a "static" evaluation of the library dependencies, it's a loader debug flag. So it's not quite the same thing.


True! Conversely, libtree will miss dynamically loaded dependencies though (e.g. a lot of stuff in the Python world).


Not sure if it's a bug, but I get different libraries from ldd and libtree for the vim example (e.g. linux-vdso.so.1 appears top of the list on ldd but not at all on libtree)


linux-vdso.so.1 isn't a real library you'll find anywhere in the file system and it's not referenced within the ELF file, so libtree doesn't know about it. Instead it is mapped into the address space of a newly started process automatically by the kernel. It's an optimization feature to avoid syscall overhead for function like gettimeofday. See https://man7.org/linux/man-pages/man7/vdso.7.html


If you compile the Linux kernel you will have in the build directory the actual .so file(s) which are ultimately embedded in the kernel. They don't ever get installed as conventional libraries (since they are not), but they are otherwise real shared dynamic ELF binaries. If you're curious you can make a copy:

1. grab the vdso offset XX in memory of, for example, the running shell

    gawk -n -vFS=- '/\[vdso\]/{printf("%i",("0x"$1)/4096)}' /proc/$$/maps
2. extract that page

    dd if=/proc/$$/mem of=linux-vdso.so bs=4096 count=1 skip=XX
where XX is the offset from step 1

3. check with

    nm -D linux-vdso.so

    objdump -ad -j .text linux-vdso.so


Oh, so it's like kernel32.dll on the Windows side of things?


linux-vdso.so is mostly about sharing a data page so syscalls can be optimised away. It's not for all the kernel API entry points (and it's userland specific so the VDSO varies with the libc version according to which features it optimises in this way). For the specific example above gettimeofday() - each process can read the clock value straight out of the shared VVARS page mapped into its space, no expensive context switch.


No, kernel32.dll is a real library that is referenced in the import section of executables and libraries that need it.


Well, yes, it is a real file, but it is always mapped (at least it used to) into every process' memory space, even if you don't import it explicitly, and it provides a user-space layer above the low-level syscalls.


i wrote a horrible little script to use ldd recursively on nixos to figure out what I need to include to run a closed source binary, will give this one a go if I ever try that again.


Shouldn't it print a DAG instead of a tree?


Seems a bit harder to print a DAG in a terminal in a readable way, even if the output is shorter.


Something like the output of `git log --graph` maybe? The tree unfolding is probably more than sufficient, though, given the size of typical DT_NEEDED lists.


It prints a DAG as a tree: every line is an edge, identical nodes are repeated. With `-vvv` it prints all possible paths as a tree.


Anyone know of something similar for MacOS?


https://github.com/ReverseApple/dgraph but IIRC it still has some @rpath and its Apple extended family counterpart issues. PRs welcome!


User this tool before for double checking our foss compliance, its pretty neat


libvorbis/ogg as a vim dependency, interesting


It supports playing audio files; got added quite a few years back. :help sound-functions.


I love this joke:

curl -Lfs https://raw.githubusercontent.com/haampie/libtree/master/lib... | ${CC:-cc} -o libtree -x c - -std=c99 -D_FILE_OFFSET_BITS=64


The first suggestion under the `# Install` header is to use a prebuilt binary like:

https://github.com/haampie/libtree/releases/download/v3.1.1/...

The only verification provided is a sha256sum provided in the README, there's no way to confirm what source code or compiler was used to produce the binary, and no way to verify the binary was produced by that github user / that the README sha256 I was served was untampered with.

I would propose that the 'unsafe' curl method is in fact probably safer than the recommended-first choice in the README.


yeah, I went with the safer method of `sudo dnf install libtree-ldd`


The original joke was to define an alias

    alias libtree='curl -LfSs https://raw.githubusercontent.com/haampie/libtree/master/libtree.c | ${CC:-cc} -o /tmp/libtree -x c - -std=c99 -D_FILE_OFFSET_BITS=64; /tmp/libtree'
so any time you invoke `libtree` it downloads the latest version, compiles it, and runs it.

With a stable internet connection, it is still faster than pax-util's lddtree, which is written in Python.

C is an excellent scripting language.


The joke that kills lol


[flagged]


C++, Nim, Rust, Zig would probably work as well.


I wonder if this implementation is exploitable by malformed libraries.


[flagged]


Usually people compile a "hello world" exe, and implicitly extrapolate that size to all programs of all sizes.

...which is then followed by arguments about how libc's own size should be compared to other libs that need to be linked statically and do a different amount of work.


Fun little fact about gcc's lib sizes (no libc): https://gitlab.archlinux.org/archlinux/packaging/packages/gc...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: