Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: T2b – A wicked-powerful text macro language for building binary files (github.com/thosakwe)
117 points by thosakwe on June 12, 2018 | hide | past | favorite | 52 comments



Interesting. I have several suggestions though:

- the hex modifier does both turning mode on and off, which will be confusing in a longer file

- it would be good if prefixes such as 0x or bx were handled to temporarily override the current setting

- using "get var" to actually output something is weird, i'd use "put"

- add a way to handle endianness


That's true, I'll get around to patching t2b today or tomorrow to get in the fixes that everyone's suggested.

This tool was hacked together quickly, and you can tell by using it that it's not close to perfect yet.


Another one: the ability to include the output of an external command.


Trying not to be a downer, but I wouldn't think of writing such a tool. I'd use Perl's pack or Python's struct.

Of course there might be some value in something less than a full language.


Was going to say I agreed with the sentiment. I didn't know about `struct` or `pack`, but Elixir (& Erlang) have a similar "special form" binary structures.

Though reading both `pack` and `struct` the formats are about as obtuse as a regex. And the lack of individual bit support in `struct` is limiting.

Come to think of it, like other comments already, this could be a handy CLI streaming tool. Dealing with adjusting binary output on the CLI turns out to be a pain...


For Python, suitcase (https://pypi.org/project/suitcase/) makes dealing with structured binary data easier.



Perl's pack is not very easy to understand though. If you have a intuitive tutorial. Do share.



Cool idea! Clear and nicely organized too :)

Wouldn’t it be more readable and less error-prone to have separate commands for turning hex mode on and off? Such as `hex on` and `hex off`?


I think it might be more intuitive as an explicit stream modifier: hex, decimal, etc. No reason to turn anything “off”, which is actually the activation of a diffferent mode.

Edit: What I mean is

  hex
  00
  dec
  00
  utf8
  00


Definitely, that's probably the best way, considering all the other commands are "stream-y" as-is.


Why have hex mode toggles at all? Just use 0xa1b2, and so on. 0x... -> hex, 0... -> octal -- invent a syntax for binary.


0 for octal was a huge mistake. Most sane languages use 0o now, or just forget octal. How often do you use octal anyway? Like, once a year for file permissions?


OK, sure. As long as we don't have to toggle number bases...


syntax for binary: 0b11010011


Believe it or not, I didn't even consider that. That's definitely a better way to do it.


Was the first thing that stood out to me as well. Toggled modes like this are also quite annoying when shuffling around sections of the file. Copy one hex too many and suddenly your numbers in another part are all wrong. And due to the toggle you'll have a hard time finding the place where it all went wrong.

As far as I could tell it's been more or less the only mode. At that point I'd probably get rid of it because global state (which modes are) has to be remembered at all times and in this case its invisible until you see output. The suggestion to use different operators or modifiers for a single numbers was a good one in that direction IMO.


In addition, global modes lead to hairy things like

    if ...
       hex off
    endif


It surprised me that an octal mode wasn't available, or a binary one. But that's a minor gripe.

The only other thing I could see being useful is the ability to define "structs", for binary-headers, etc.

But agreed, this is a neat tool.


In addition to "pack" and "struct", for prior art (or inspiration), look at any number of structured fuzzing frameworks (Peach might be a good example); constructing arbitrary binary formats from a recipe is how you build a structured fuzzer. I built one a few years ago that pursued a sort of DOM model, with id's and classes and path strings.

In the other direction, given the ability to spit out one binary character from the shell, you can bootstrap all the rest of this in pure shell script. I cheated and gave myself int{8..64} and did a whole ASN.1/DER in shell script. The shell has better control flow than custom language (but the custom language is a little simpler).


I did a KSH DER codec once too! It started out as an attempt to decode OIDs that were all over a codebase, and we had no tools to decode them, and I wanted to see if they were correct (indeed, they were not all correct). Then I extended that script a bit to other types. 'Twas a fun little side project.

With an FFI (which Bash has) one could do this much more easily than it was then (about 15 years ago, I think).


Length-prefixed strings would be useful for some file formats. You'd need to be able to pick a size for the length-prefix, such as u8,u16,etc.

Also some ability to length-prefix the output of a macro might be useful for tag-length-value-like formats.

For Unicode, ability to write out UTF-16 or UTF-32 would be useful, also CESU-8 and Java's Modified UTF-8.

Also, for a challenge, think about how you could write macros to generate ASN.1 formats such as BER. You'd probably need some more smarts in your macro language to handle such a task gracefully.

Also, how does one control the endianness of output? Maybe you need u16le and u16be, or some kind of general endianness modifier.

Output of IEEE floating point might also be useful in some applications.


> Also some ability to length-prefix the output of a macro might be useful for tag-length-value-like formats.

Exactly. TLV (tag-length-value) support is really what I expected to see. They are hairy to get right manually.


I think you could achieve something like that with a macro.

    macro tlv_string s
    begin
        i32 (get CONSTANT_STRING_TAG_FOO)
        i32 (len s)
        str s
        u8 0
    endmacro
By the way, there's no `len` command at this point, so that's something I'll have to add in eventually to get something like TLV to work.


> t2b always writes to stdout. To output to a file, simply use a pipe (|).

Shouldn't this be use output redirection (ie. >)?


Yes, haha. I was tired. There's a PR open to fix that, I'll merge it in.


Cool. That patch used weird language, though ("a greater-than sign"). I think the people who don't already know how to write to a file might appreciate knowing what this is actually called (output redirection) so they can look it up.


I use fasm for this very use case. It has a wonderful macro language that can be used to programmatically generate binary data on top of being able to emit instructions. It's really cool to see other languages for generating binary files as its an interesting niche to fill.


FlatAssembler is amazing. I remember people making fractal images using only preprocessor [1], JVM bytecode compilers and a lot of other cool stuff. Also it is a powerful editor, you can abuse 'file', 'load', 'store' directives to make existing file modifications.

[1] https://board.flatassembler.net/topic.php?t=2265


Agreed. Fresh IDE has a great sample that generates a Mandelbrot image when you "compile" the source file.


Nice. I once did something similar for parsing and generating binary files from a single specification: https://abau.org/hannah


Not sure if I will use it, but I like it. This is as close to raw bytes as it gets despite some HLL like features.

This to me is real hacking, saying what the heck and writing a tool for oneself instead of looking around and getting lost in the multitude of 'mature' options that exist. I am sure hacking this was quicker, and more fun, than browsing all the available options, picking one, installing the tools, learning to use them and getting the desired result.

Now, how long till someone decides to bootstrap this - use it to hack a binary that compiles it?


Oh definitely, this was a fun one to write. I got so caught up in it that I completely forgot the original project I wanted to use it, which was a small VM.


Looks good. I can see this being a great intermediate format for hacks etc.

Alternatively you could use some kind of fluent API to write binary data. Depends on your use case of course.


Thanks! Some sort of API would be pretty cool... I'll add that if this takes off.

At the very least, the function to execute T2B scripts could be exposed in a simple header.


Neat! I could used this for generating mixed binary/ASCII payloads for network protocol testing. One issue, though -- it seems like trailing whitespace in the input isn't handled correctly. It seems to be picked up by the command processor and treated as a duplicate of the previous command. Amusingly, I was able to diagnose this using t2b by piping its output back through itself:

   c:\mingw\MSYS64\usr\bin\echo -n hex u32 20786568 u32 36203875 u8 31 u8 0a u32 34203875 u8 31 | t2b

   Output: hex u8 61[LF]u8 41

   c:\mingw\MSYS64\usr\bin\echo -n hex u32 20786568 u32 36203875 u8 31 u8 0a u32 34203875 u8 31 | t2b | t2b

   Output: aA

   c:\mingw\MSYS64\usr\bin\echo -n hex u32 20786568 u32 36203875 u8 31 u8 0a u32 34203875 u8 31 u8 0a | t2b

   Output: hex u8 61[LF]u8 41[LF]

   c:\mingw\MSYS64\usr\bin\echo -n hex u32 20786568 u32 36203875 u8 31 u8 0a u32 34203875 u8 31 u8 0a | t2b | t2b

   Output: aAA
I don't know very much C++ yet, so I'm not sure of the best way to fix this.


"It's now feasible to write a machine code compiler in shell. Hooray. Not sure why you would ever do that to yourself, though."

For more trustworthy bootstrapping. It's why I included shell compilers on this site:

https://bootstrapping.miraheze.org/wiki/Main_Page


The problem with this kind of site and all those Awesome-XYZ lists is that it could take years to review each item.

I'm not saying building these kind of lists is a bad thing, but I haven't found an efficient way to make use of lists like this and void diverting my focus.


I doubt that given it took me weeks to find review, and put up half those links. It would take some time, though. If nothing else, check out projectoberon.com and/or the amber slides:

https://speakerdeck.com/nineties/creating-a-language-using-o...


Thats a very nifty project. I always wanted to do an incremental compilation like that.


Excellent idea. Throw in elf support and you’ll never need a linker for c code again.


I really wanted something like this when embedding serial numbers and keys in ROMs. At the time I was mostly working in C and PHP, so I was imagining something closer to a template language, but something like this would have given me a good intermediate step where I could have used PHP to render this text format and then ran that to produce the necessary bits on disk.

A decade late for me, but still looks cool.


It would be great if I could specify the number of bits per field like in Elixir/Erlang/C++ bit fields and then also specify what the extra bits should be set to if the total number of bits isn't divisible by 8. I know that document strings are helpful but writing it into the code would be much more robust.


There is already https://flatassembler.net/ which uses macros to emit ELF and PE headers.


> .idea

What's the purpose of checking this stuff into version control?


It's shared project configuration. If you're using the same IDE as the author (CLion in this case) you get all the code style settings, run/debug configurations and shared project config/settings without any setup required. If you're not using CLion just ignore the files, they don't do any harm.


Yep. It helps me a lot, because sometimes I switch between my Mac and Windows boxes. Especially in this case where I had to verify/release the build on two platforms.

The only downside is that not everyone uses the same editor, so in projects with more contributors, it can quickly create bloat.


How does it compare to https://kaitai.io ?


Well, for starts, Kaitai looks like it's much more mature, and it probably is, considering that t2b is just something I hacked together yesterday.

Also, it seems like Kaitai is for parsing structures, whereas t2b is for emitting data.


If you use the prebuilt binaries, does the GPL apply?


Not as good as https://github.com/oriansj/mescc-tools But certainly a good start




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: