Hacker News new | past | comments | ask | show | jobs | submit login
Fq: Jq for Binary Formats (github.com/wader)
670 points by ingve on June 3, 2023 | hide | past | favorite | 114 comments



fqing finally. It always seemed strange that there wasn’t any central database of binary parsers that everyone could contribute to. Nearly every file format is fully documented, but none of the docs are programmatic.

I was trying to rip some sounds from a wii game called Rhythm Heaven, and it’s ridiculous how primitive the tech is. By that I mean the programming community’s tech. If you want to extract some assets, you’d better be running Windows, and you’ll need to download some random exe from mediafire made by Jared, a 13yo that coded the extractor in C in his spare time. This is only a very slight exaggeration; Windows being a requirement isn’t.

Hopefully projects like this will standardize all binary formats once and for all.

Actually, this is a good opportunity to ask: how would one contribute a binary parser to fq? If I wrote one for wii sound files, can I just submit a PR or is there some other process?

EDIT: https://github.com/wader/fq/blob/master/doc/dev.md documents the development side of things. I’m more interested in the project itself — if someone puts in the work to make a decoder for an obscure binary format, will it get merged (assuming it’s high quality) or is this only for popular formats?


There is the 010 Editor, at heart a cross-platform scriptable hex editor with a template language [1]. It has a central template repository [2] as well as templates around the internet (e.g. 3, 4).

But it being a paid tool means there are fewer template contributions from 13 year olds, which if we are all honest make up the majority of unpaid open source contributions - they simply have more spare time.

1: https://www.sweetscape.com/010editor/

2: https://www.sweetscape.com/010editor/repository/templates/

3: https://github.com/tge-was-taken/010-Editor-Templates/tree/m...

4: https://wiki.redmodding.org/cyberpunk-2077-modding/modding-k...


Do 13 year olds care about licenses and pay for the software they use? I certainly didn't when I was 13. And 010 editor is about as well protected as WinRAR.

I have a hard time imagining a 13 year old using that particular tool buying it. In fact there is a good chance it is used by the crackers themselves.


010editor's templating language looks very interesting (and something I had been trying to accomplish with my own mixed-bag of tools when reverse-engineering). I suppose as a hobbyist, the price is a hard one to get over...


What is wrong with contributions from 13 year olds?


There aren't enough of them, because the tool costs money, and 13-year-olds can't spend money on software as readily as adults.


I don’t think it was implied that there’s anything bad, I read it as the opposite.


Being a long time personal friend with the author I can assure you the more obscure the better :-) His interest in esoteric things and solutions are "well documented" if you browse around his github repos.

Some examples:

https://github.com/wader/jqjq https://github.com/wader/catgolf


I am RE-ing some binary file formats for games. Should those be contributed as well?


Wonderful, thank you so much! Such a cool project.


fun fact: fq is kind of the spiritual predecessor of https://github.com/wader/flac.tcl, you can see traces of it in flac_frame.go in fq, was used to prototype some things :)


Kaitai Struct is the better way to go for an ecosystem solution, but the tooling could certainly use improvements. In addition to 010 Editor, there's also KDE's Okteta. It does not have a lot of good OSDs and the OSD format/scripting for specifying formats is a little anemic (I'd like to help improve it if I can find time...) but it's very serviceable and a decent open source alternative to what 010 has. Shameless plug, I made a decent Windows EXE/PE OSD for Okteta. (It's even got a bit of support for NE16 executables, just for fun.)

This entire genre of tools has been a long time point of interest for me. In addition to making a couple OSDs and contributing some tiny improvements to Kaitai, I also have my own binary schema library for Go, restruct, which, biased or not, remains my favorite way to poke at arbitrary formats, since it's really easy to sketch stuff out and read and write to files quickly. It's basically Go's encoding/binary but with struct tags for more advanced things.


> This entire genre of tools has been a long time point of interest for me.

Same for me! It turned into a long journey and I am working on a solution that I am very happy about.

Somewhere mid-journey I learned about kaitai struct and lost a bit of steam seeing it was similar. But I think my offering is superior in a more simple template format with less programming required and a nice cli app.

I am yet to announce it publicly, but i been meaning to for so long already.

If you would like to check it out I would be happy!

You can use it to map / view content from a format there is a template for. Alot of common formats is already included and you can extend it using your own templates.

https://github.com/martinlindhe/feng


As someone who had a great time with Kaitai, may I suggest that you write an interface so that fq can be used with any format that Kaitai understands (and any that people add in future)


The functionality that I am personally interested in from a binary parsing framework like Kaitai is generating an encoder implementation in addition to a decoder one. In other words, given a description of a binary format, I would like to be able to construct an instance of a class whose memory layout matches the format. For instance, if the format has an int n, then an array `a` of size `n`, and then a double `d`, it would be awesome to be able to construct a corresponding object with fields `n`, `a` and `d` and when I change `n`, then the size of `a` changes accordingly. And then, if I pass a pointer to this object to the decoder, it would be able to parse it correctly, as if the memory representation of the object came from some external buffer.


Kaitai support for serialization has been a long time issue. It's obviously non-trivial given that it has at least one case that doesn't exist today (instantiating a structure without loading any existing data.)

That said, it does exist in some form.

https://doc.kaitai.io/serialization.html

Restruct supports serialization. I use it all the time to hack on proprietary formats.


It seems strange, because it's not reality! Forensic tools like FTK and Autopsy have had a plug-in framework for these forever, speaking as a former contributor to the former. There's also Kaitai Struct.

I'm sure other communities have popped up that I haven't heard of, too. There's lots of interest in unifying forensic parsing under open work.


I'm working on something, that is a open template format for binary file formats. It is usable today as a universal file extractor, with some bugs and limitations.

Check it out at https://github.com/martinlindhe/feng


> It always seemed strange that there wasn’t any central database of binary parsers that everyone could contribute to.

Fully agree on the need for such a database. The problem is that it's been tried, but lacked traction. The latest one I've seen is Kaitai format files[1], that can be used in visualizers or to auto-generate parsers.

[1] https://formats.kaitai.io/


Thanks for the reminder about Katai, I’ve been meaning to look at it specifically for something but forgot what it was called.


I used to be from the romhacking community back in the 2000s and due to usage of Windows, open source/foss wasn't even known to most people. The culture of Windows programmers is way more focused on freeware/binaries.

Still waiting to this day for FuSoYa to release the source code of Lunar Magic.

About a central database of binary parsers, I've been wanting this for ages too. The closest I ever found was augeas, but that's for configuration files.


> About a central database of binary parsers, I've been wanting this for ages too. The closest I ever found was augeas, but that's for configuration files.

I'm working on something, that is a open template format for binary file formats. It is usable today as a universal file extractor, with some bugs and limitations.

Check it out at https://github.com/martinlindhe/feng


Yeah that’s a mysterious one. Such an incredible achievement, and it enabled so, so much creativity, and free (as in beer) to all as far as I know. I hope FuSoYa does open source it someday.


> I was trying to rip some sounds from a wii game called Rhythm Heaven, and it’s ridiculous how primitive the tech is.

This seems like a pretty niche need with only some hobbyists motivated enough to work on it. Is there a broader application than your use case? Otherwise I think thats why the existing software for this isnt great.


Yeah, I didn’t mean to sound entitled. I only meant I was excited for projects like fq to shake things up. When I was writing a parser for the sound file I was thinking “hmm… this really feels like duplicate work.”

On the other hand, it’s surprising to me that “grab sounds from a wii game” is so niche! My gut felt like it would be slightly more complicated than unzipping a tarball, but my gut didn’t expect it to be a programming challenge worthy of a small competition.


It’s all fun and games until Nintendo sends in a cease and desist notice


Seriously. Nintendo is an evil corporation. I was about to write “close to,” but they crossed the line when they sent someone to prison and garnished his wages for the rest of his life.


Not just someone but dude was not ripping or cracking stuff for fun. He made business out of it and what’s worse he added ransomware to scam his “customers”.

POS deserved all of it if not more.


If he added ransomware, that’s a bit different. Making a business out of it isn’t that bad (think about it in terms of people going to prison for assault vs merely making some money), but the ransomware would be.

Still, garnishing someone’s wages the rest of his life seems out of proportion. But I admit it’s harder to defend someone that made a livelihood out of holding peoples’ data hostage.


The punishment was severely out of proportion to the actual crime. He was definitely being made into an example.


I really don't think protecting IP produced at company expense is "evil." That's their prerogative, and people knowingly violating agreements/ToS are playing with fire.


There is QuickBMS[1], which covers quite a few game related formats.

[1] http://aluigi.altervista.org/quickbms.htm


> It always seemed strange that there wasn’t any central database of binary parsers that everyone could contribute to.

file utility with /etc/magic database


Yeah, that was my thought as well, although I don't know that file/magic get into much structure beyond identifying the file format.


I miss the ResEdit days of Macintosh days (i.e. MacOS 7-8-9). You could see/steal/modify/hack the visual assets of most binaries. You could remap keyboard shortcuts, modify menus, etc.


Well, app wrappers make this even simpler: just poke inside with a file browser.


And them being signed (and those signatures being validated) makes it highly impractical.


What the technology giveth, the policy taketh away...


Hi, sorry for the delay, on vacation. There is no process really more then convincing me :) and i think i will accept any decoder that is for a format used in public, standardized or proprietary.

I do want to add some kind of runtime format support and i'm working adding kaitai support but it's not ready yet, it's not an easy thing to do :) but i've made very good progress. ideally it will be something like: fq -d format.ksy <query> file

Subscribe or keep an eye on this issue for updates https://github.com/wader/fq/issues/627

And feel free to ask any questions!


Binwalk is my go-to for that kind of thing usually.


Try unblob sometime, it’s a more modern, maintained alternative (not a fork). A company called OneKey that do some firmware security stuff maintain it, and generally it’s pretty good.


Like wireshark directors but more general


That's also my experience. People releasing their tools on obscure forums, usually without source code and no version control. Almost enire communities that haven't heard of GitHub. Though usually the tools work in wine... but are next to useless. GUI only and very clumsy to use. So many clicks for each single file to extract, no batch operations, no command line interface. A big WTF. How can you do any modding with those tools?


It's hard to not read this as elitist and entitled, to be honest. There are many people who know that GitHub exists but don't want to use it, and they also don't care for you having their source code. It's a choice they make. This entitlement is interesting because the solution obviously is to just make your own tools and not be a lazy library/tool hunter all the time.


If they do less work, i.e. not write a GUI, but just provide a command line tool you can suddenly automate things and it all gets much easier to use.

Anyway, I then wrote my own tools and put them on GitHub. And documented the reverse engineered file formats. And then got pull requests from people that want to help!

I was just voicing my bewilderment at this other culture. Don't understand why one would want to live like that.


I love that this includes a section in the README about other tools that are similar or related to fq. Every open source project should list its competitors.


Agreed. I try to do the same with my stuff. Honest, transparent market research / placement improves your offering, rather than diminishes it.


Thanks for the kind words! without the other tools i don't think fq would even exist, they inspired and show what was possible.


I don’t even necessarily see listing alternatives as competition. Often alternatives solve overlapping, but different problems.


I wouldnt even say the are necessarily competitors. More like a flathead screwdriver to compliment your phillips in most of those cases


Not only that, the README has a section called _Hopes_, the second bullet point being

> Inspire people to create similar tools.


Thanks for the kind words! they inspired me so i hope i can inspire others


Right? Not doing so is either disingenuous or amateurish


FOSDEM presentation by the author earlier this year: https://fosdem.org/2023/schedule/event/bintools_fq/


Awesome.

It would be good if some form of externally plugable binary format specification is doable in the future. As far as I can see, if the binary format is not supported OTB, you can't use this tool.


Kaitai Struct might be a good choice for that: https://kaitai.io/


I've found Kaitai struct to be an absolute joy to use on things like EEPROM dumps from car computers.


Has anyone found anything incredible for arbitrary UTF-8 data?


Honestly, VIM.


Thanks for the link, that’s really cool!


http://www.jemarch.net/poke might be interesting in that case.


I second poke. It's an amazing tool for debugging in general.

It's relatively rare to look into standardized binary formats (you'll likely look directly into a library at that point), unless you're writing a writer/parser/decoder yourself and need to double-check the output.

When developing with general binary data in mind, poke is much more useful.


Hi, i'm working on runtime kaitai support and have made good process but it's quite a big task. Keep an eye on https://github.com/wader/fq/issues/627 your interested.


I suppose OTB means “off the bat?” Non-native speaker here, and my web search turned up nothing.


I assume in this case it’s “Out (of) The Box”



Thanks! Macroexpanded:

Fq: Jq for Binary Formats - https://news.ycombinator.com/item?id=29657094 - Dec 2021 (81 comments)


This looks beautiful. When there's more data I bet it's going to be a great tool to hook up to LLMs. "Draw the first frame of this video using fq."


This is what I like about powershell. It passes objects via pipeline and if you need to query or filter something, you don't need to learn millions of different tolls (jq, xmlstarlet, etc.) - just use programming language features for everything.


What about GNU Poke? http://www.jemarch.net/poke


That looks very similar. Any idea what the differences are?


Hi fq author here and also acquaintance with the gnu poke author. One difference is that fq is focused in decoding and querying while poke might be more focused on editing and runtime modeling of formats. But we have both inspired each others think.

Here is video of me demoing fq and talking to jose (gnu poke author) about it https://www.youtube.com/watch?v=GJOq_b0eb-s he also organized the FOSDEM binary tools track.


No idea. But i know poke has a rather big community


I've written a parser of Java class files (which works in any JVM language as they all compile to the same bytecode format). It was surprisingly easy! Maybe that could be useful to analyse class files in jq??!


Hi fq author here! please do if you want, would be a great addition.


I'd also like to throw https://github.com/WerWolv/ImHex in the mix here.


Did I miss something or is there no Ubuntu or Debian installer?

I certainly know how to download a file and add it to my path (or put it in my personal bin directory) but sure would be nice to have a super simple installer.

LOL - please do not make a snap or whatever the hell the "cool kids" use. I certainly wouldn't want to advocate continued use of that pattern for utility functions.


This has got to be the pettiest gripes I have heard in a while. It's already built, you literally just extract and run:

https://github.com/wader/fq/releases/tag/v0.6.0


You can always submit a PR that adds releases support for Debian or RPM packages or whatever format you like.

I might consider doing that next week, just realised I have no idea how a RPM is made, and I haven’t thought about making a Debian package in years.


Since this is written in Go, it's almost trivial to use fpm [1] to generate a variety of packages. Alternatively you can use nfpm [2] if you don't want to have to deal with installing Ruby & a gem.

[1]: https://github.com/jordansissel/fpm [2]: https://github.com/goreleaser/nfpm


Hi author here. Releases support as in package .deb and .rpm myself for each relese?


It's already in stock Debian.


It is available in the main repos in Bookworm and Sid. Bookworm will become the new stable version in a couple months.


> in a couple months

Or, a few days... time flies.


I’m currently trying to make sense of a binary file that is used by a proprietary program to import data.

The file is generated on a server out of my control but I’m able to see that some kind of key is being sent alongside the data. (To encrypt it?)

How would one approach something like this? Where could I look for freelancers who are able to help with this?


Do you have access to the program that reads the data? If so, you can use a debugger to step through the parser for the file, even if symbols are stripped [1]. You can breakpoint on syscalls, such as when the file gets opened [2] and then step through and look around memory for the decrypted version. If you have an idea of what the file should contain you can probably identify patterns this way.

I'm not an expert on this topic at all though.

[1] Of course you then have less information but it's still possible to see the assembly while the file gets parsed. See for example,

http://felix.abecassis.me/2012/08/gdb-debugging-stripped-bin...

[2] https://sourceware.org/gdb/onlinedocs/gdb/Set-Catchpoints.ht...


For this kind of task, using low-level debugger tools is probably better. Rizin[1][2]/Cutter[3][4] could help. We also have GSoC participant this year who works hard on improving debuginfo and debugging support[5]. I personally also like Binary Ninja, they recently made their debugger stable enough[6].

[1] https://rizin.re/

[2] https://github.com/rizinorg/rizin

[3] https://cutter.re/

[4] https://github.com/rizinorg/cutter

[5] https://rizin.re/posts/gsoc-2023-announcement/

[6] https://binary.ninja/2023/05/03/3.4-finally-freed.html#debug...


The most straightforward way will be to reverse engineer the program that imports the data.

Look for reverse engineer freelancers. Many of them in the video game space.


If the ‘j’ in `jq` stands for ‘JSON’, what does the ‘f’ stand for?


It stands for the sound you make when you cat a binary file.


Good question. Maybe 'format' or just 'file'?


Hi author here, i think format mostly but also i thought about fq as file(1) on steroids in the beginning :)


Am surprised, it’s been a day now and I haven’t seen RIIR alternative wow


Too bad that jq has such a shitty, convoluted syntax.

Could have definitely chosen one of the many alternatives to jq that also worked with JSON but did so in a much clearer and more elegant way.


Hi author here. Slide 3 from my FOSDEM talks explains my own very subjects reasons for choosing jq https://fosdem.org/2023/schedule/event/bintools_fq/attachmen...

To sum up: Very CLI friendly syntax, generators/backtracking makes working with tree structures very convenient.

But now after writing lots of jq i even thing writing multi line programs in jq to be quite nice :) but i can recommend IDE help, see end of slide 17


Commenting to follow because I’m curious what alternatives you mean. I thought a lot of people liked jq and I only just finally got around to installing it, so if there’s a much better way I’d like to hear it.


Can you name an alternative to jq with better syntax?


I prefer a SQL-like format. It’s not as complete but it cover most of the day-to-day use cases. Take a look at https://github.com/dcmoura/spyql (I am the author). Congrats on fq!


Thanks! i actually experimented a bit with an SQL-like interface for while, dump things into sqlite and use that as query engine. Problem usually was that file formats tend to be mix of array and tree structures more then relational and at least standard SQL is not great for that. Maybe some graph-SQL dialect could work?


There is XPath 3.1


Could this be used as a library in golang?


Hi author here. There is no stable API at the moment and it depends a bit what you want to do. If you mean write own private decoders it is possible but i can't guarantee the API will change, see this old twitter thread https://twitter.com/TimMattison/status/1600871136027627521 If you mean using existing decoders and access the result i think you probably do in theory. That is kind of how the interp packages in fq is implemented, it implements a jq interface + some fq bells and whistles using the decode value structure.


Thanks for the response! I meant the existing decoders -- I'll see if I could figure that out.


no problem! feel free to reach out if you have any questions


Oh nice! We got a public gqui now


Looks like not really. The proto support is pretty basic. It can’t print floats and doubles and doesn’t parse groups or packed fields. It doesn’t use a descriptor database so it can only print the field number, not its name, and it can only differentiate nested messages if the user calls ‘|protobuf’ on what is otherwise considered a string.


Hi author here. Yeap currently only decodes the "wire" format. Actually it can decode using a schema but not parsed from a proto file, this is used by other formats using protobuf as a subformat then they pass a schema internally using go types. But proper proto schema support would be nice.


Hi fq author here. What is gqui? some internal google thingy?


LOVE the name.


Reminds me of this emphatically best middle school

Fuqing #1 Middle School

https://g.co/kgs/HPtt5n


I didn't even think of that until you mentioned it!


the name kind of sucks, why 'f' q? F for ... 'FU' ('teenage snigger'). It should be 'bq', binary query, after 'jq' json query.

Cool project none-the-less. The comment about 'programmatic documentation' of binary formats is very interesting, maybe some kind of 'binary description markup' could be part of this?


No, FQ!!


In my mind the ‘f’ here stands for ‘f yeah’




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: