
Show HN: T2b – A wicked-powerful text macro language for building binary files - thosakwe
https://github.com/thosakwe/t2b
======
yoz-y
Interesting. I have several suggestions though:

\- the hex modifier does both turning mode on and off, which will be confusing
in a longer file

\- it would be good if prefixes such as 0x or bx were handled to temporarily
override the current setting

\- using "get var" to actually output something is weird, i'd use "put"

\- add a way to handle endianness

~~~
thosakwe
That's true, I'll get around to patching t2b today or tomorrow to get in the
fixes that everyone's suggested.

This tool was hacked together quickly, and you can tell by using it that it's
not close to perfect yet.

------
emmelaich
Trying not to be a downer, but I wouldn't think of writing such a tool. I'd
use Perl's _pack_ or Python's _struct_.

Of course there might be some value in something less than a full language.

~~~
elcritch
Was going to say I agreed with the sentiment. I didn't know about `struct` or
`pack`, but Elixir (& Erlang) have a similar "special form" binary structures.

Though reading both `pack` and `struct` the formats are about as obtuse as a
regex. And the lack of individual bit support in `struct` is limiting.

Come to think of it, like other comments already, this could be a handy CLI
streaming tool. Dealing with adjusting binary output on the CLI turns out to
be a pain...

~~~
rcfox
For Python, suitcase
([https://pypi.org/project/suitcase/](https://pypi.org/project/suitcase/))
makes dealing with structured binary data easier.

~~~
unscaled
Or Construct:
[https://construct.readthedocs.io/en/latest/](https://construct.readthedocs.io/en/latest/)

------
edudobay
Cool idea! Clear and nicely organized too :)

Wouldn’t it be more readable and less error-prone to have separate commands
for turning hex mode on and off? Such as `hex on` and `hex off`?

~~~
abtinf
I think it might be more intuitive as an explicit stream modifier: hex,
decimal, etc. No reason to turn anything “off”, which is actually the
activation of a diffferent mode.

Edit: What I mean is

    
    
      hex
      00
      dec
      00
      utf8
      00

~~~
thosakwe
Definitely, that's probably the best way, considering all the other commands
are "stream-y" as-is.

~~~
cryptonector
Why have hex mode toggles at all? Just use 0xa1b2, and so on. 0x... -> hex,
0... -> octal -- invent a syntax for binary.

~~~
IshKebab
0 for octal was a huge mistake. Most sane languages use 0o now, or just forget
octal. How often do you use octal anyway? Like, once a year for file
permissions?

~~~
cryptonector
OK, sure. As long as we don't have to toggle number bases...

------
tptacek
In addition to "pack" and "struct", for prior art (or inspiration), look at
any number of structured fuzzing frameworks (Peach might be a good example);
constructing arbitrary binary formats from a recipe is how you build a
structured fuzzer. I built one a few years ago that pursued a sort of DOM
model, with id's and classes and path strings.

In the other direction, given the ability to spit out one binary character
from the shell, you can bootstrap all the rest of this in pure shell script. I
cheated and gave myself int{8..64} and did a whole ASN.1/DER in shell script.
The shell has better control flow than custom language (but the custom
language is a little simpler).

~~~
cryptonector
I did a KSH DER codec once too! It started out as an attempt to decode OIDs
that were all over a codebase, and we had no tools to decode them, and I
wanted to see if they were correct (indeed, they were not all correct). Then I
extended that script a bit to other types. 'Twas a fun little side project.

With an FFI (which Bash has) one could do this much more easily than it was
then (about 15 years ago, I think).

------
skissane
Length-prefixed strings would be useful for some file formats. You'd need to
be able to pick a size for the length-prefix, such as u8,u16,etc.

Also some ability to length-prefix the output of a macro might be useful for
tag-length-value-like formats.

For Unicode, ability to write out UTF-16 or UTF-32 would be useful, also
CESU-8 and Java's Modified UTF-8.

Also, for a challenge, think about how you could write macros to generate
ASN.1 formats such as BER. You'd probably need some more smarts in your macro
language to handle such a task gracefully.

Also, how does one control the endianness of output? Maybe you need u16le and
u16be, or some kind of general endianness modifier.

Output of IEEE floating point might also be useful in some applications.

~~~
childintime
> Also some ability to length-prefix the output of a macro might be useful for
> tag-length-value-like formats.

Exactly. TLV (tag-length-value) support is really what I expected to see. They
are hairy to get right manually.

~~~
thosakwe
I think you could achieve something like that with a macro.

    
    
        macro tlv_string s
        begin
            i32 (get CONSTANT_STRING_TAG_FOO)
            i32 (len s)
            str s
            u8 0
        endmacro
    

By the way, there's no `len` command at this point, so that's something I'll
have to add in eventually to get something like TLV to work.

------
cup-of-tea
> t2b always writes to stdout. To output to a file, simply use a pipe (|).

Shouldn't this be use output redirection (ie. >)?

~~~
thosakwe
Yes, haha. I was tired. There's a PR open to fix that, I'll merge it in.

~~~
cup-of-tea
Cool. That patch used weird language, though ("a greater-than sign"). I think
the people who don't already know how to write to a file might appreciate
knowing what this is actually called (output redirection) so they can look it
up.

------
emily-c
I use fasm for this very use case. It has a wonderful macro language that can
be used to programmatically generate binary data on top of being able to emit
instructions. It's really cool to see other languages for generating binary
files as its an interesting niche to fill.

~~~
tov_objorkin
FlatAssembler is amazing. I remember people making fractal images using only
preprocessor [1], JVM bytecode compilers and a lot of other cool stuff. Also
it is a powerful editor, you can abuse 'file', 'load', 'store' directives to
make existing file modifications.

[1]
[https://board.flatassembler.net/topic.php?t=2265](https://board.flatassembler.net/topic.php?t=2265)

------
abau_org
Nice. I once did something similar for parsing and generating binary files
from a single specification:
[https://abau.org/hannah](https://abau.org/hannah)

------
vinayms
Not sure if I will use it, but I like it. This is as close to raw bytes as it
gets despite some HLL like features.

This to me is real hacking, saying what the heck and writing a tool for
oneself instead of looking around and getting lost in the multitude of
'mature' options that exist. I am sure hacking this was quicker, and more fun,
than browsing all the available options, picking one, installing the tools,
learning to use them and getting the desired result.

Now, how long till someone decides to bootstrap this - use it to hack a binary
that compiles it?

~~~
thosakwe
Oh definitely, this was a fun one to write. I got so caught up in it that I
completely forgot the original project I wanted to use it, which was a small
VM.

------
brad0
Looks good. I can see this being a great intermediate format for hacks etc.

Alternatively you could use some kind of fluent API to write binary data.
Depends on your use case of course.

~~~
thosakwe
Thanks! Some sort of API would be pretty cool... I'll add that if this takes
off.

At the very least, the function to execute T2B scripts could be exposed in a
simple header.

------
memeslayer
Neat! I could used this for generating mixed binary/ASCII payloads for network
protocol testing. One issue, though -- it seems like trailing whitespace in
the input isn't handled correctly. It seems to be picked up by the command
processor and treated as a duplicate of the previous command. Amusingly, I was
able to diagnose this using t2b by piping its output back through itself:

    
    
       c:\mingw\MSYS64\usr\bin\echo -n hex u32 20786568 u32 36203875 u8 31 u8 0a u32 34203875 u8 31 | t2b
    
       Output: hex u8 61[LF]u8 41
    
       c:\mingw\MSYS64\usr\bin\echo -n hex u32 20786568 u32 36203875 u8 31 u8 0a u32 34203875 u8 31 | t2b | t2b
    
       Output: aA
    
       c:\mingw\MSYS64\usr\bin\echo -n hex u32 20786568 u32 36203875 u8 31 u8 0a u32 34203875 u8 31 u8 0a | t2b
    
       Output: hex u8 61[LF]u8 41[LF]
    
       c:\mingw\MSYS64\usr\bin\echo -n hex u32 20786568 u32 36203875 u8 31 u8 0a u32 34203875 u8 31 u8 0a | t2b | t2b
    
       Output: aAA
    

I don't know very much C++ yet, so I'm not sure of the best way to fix this.

------
nickpsecurity
"It's now feasible to write a machine code compiler in shell. Hooray. Not sure
why you would ever do that to yourself, though."

For more trustworthy bootstrapping. It's why I included shell compilers on
this site:

[https://bootstrapping.miraheze.org/wiki/Main_Page](https://bootstrapping.miraheze.org/wiki/Main_Page)

~~~
emmanueloga_
The problem with this kind of site and all those Awesome-XYZ lists is that it
could take _years_ to review each item.

I'm not saying building these kind of lists is a bad thing, but I haven't
found an efficient way to make use of lists like this and void diverting my
focus.

~~~
nickpsecurity
I doubt that given it took me weeks to find review, and put up half those
links. It would take some time, though. If nothing else, check out
projectoberon.com and/or the amber slides:

[https://speakerdeck.com/nineties/creating-a-language-
using-o...](https://speakerdeck.com/nineties/creating-a-language-using-only-
assembly-language)

------
adiusmus
Excellent idea. Throw in elf support and you’ll never need a linker for c code
again.

------
rmgraham
I really wanted something like this when embedding serial numbers and keys in
ROMs. At the time I was mostly working in C and PHP, so I was imagining
something closer to a template language, but something like this would have
given me a good intermediate step where I could have used PHP to render this
text format and then ran that to produce the necessary bits on disk.

A decade late for me, but still looks cool.

------
Adamantcheese
It would be great if I could specify the number of bits per field like in
Elixir/Erlang/C++ bit fields and then also specify what the extra bits should
be set to if the total number of bits isn't divisible by 8. I know that
document strings are helpful but writing it into the code would be much more
robust.

------
labdsf
There is already [https://flatassembler.net/](https://flatassembler.net/)
which uses macros to emit ELF and PE headers.

------
M_Bakhtiari
> .idea

What's the purpose of checking this stuff into version control?

~~~
chris_overseas
It's shared project configuration. If you're using the same IDE as the author
(CLion in this case) you get all the code style settings, run/debug
configurations and shared project config/settings without any setup required.
If you're not using CLion just ignore the files, they don't do any harm.

~~~
thosakwe
Yep. It helps me a lot, because sometimes I switch between my Mac and Windows
boxes. Especially in this case where I had to verify/release the build on two
platforms.

The only downside is that not everyone uses the same editor, so in projects
with more contributors, it can quickly create bloat.

------
benatkin
How does it compare to [https://kaitai.io](https://kaitai.io) ?

~~~
thosakwe
Well, for starts, Kaitai looks like it's much more mature, and it probably is,
considering that t2b is just something I hacked together yesterday.

Also, it seems like Kaitai is for parsing structures, whereas t2b is for
emitting data.

------
iffycan
If you use the prebuilt binaries, does the GPL apply?

------
z29LiTp5qUC30n
Not as good as [https://github.com/oriansj/mescc-
tools](https://github.com/oriansj/mescc-tools) But certainly a good start

