This is a great idea! There's an exceedingly large amount of junk in a typical H...

leroman · 2024-07-23T19:59:26 1721764766

This is some great feedback, thanks!

1. there some crazy links with lots of arguments and tracking stuff in them, so it gets very long, the refification turns them into a numbered "ref[n]" scheme, where you also get a map of ref[n]->url to do reverse translation.. it really saves a lot, in my experience. It's also optional, so you can be mindful when you want to use this feature..

2. I tried to keep it domain specific (not to reinvent HTML...) so mostly Markdown components and some flexibility to add HTML elements (img, footer etc).

3. Not sure I'm sold with replacing the switch, it's very useful there because of the many fall through cases.. I find it maintainable but if you point me to some specific issue there it would help

4. There are some built in functions to traverse and modify the AST. It is just JSON in the end of the day so you could leverage the types and write your own logic to parse it, as long as it conforms to the format you can always serialize it, as you mentioned..

5. The AST is recursive so not flat.. sounds like you want to either write your own AST->Semantic-Markdown implementation or plug into the existing one so I'll this in mind in the future

6. Sounds cool but out of scope at the moment :)

7. This feature would serve to help with scraping and kind of point the LLM to some element? Then the part I'm missing is how you would code this in advance.. There could be some meta-data tag you could add and it would be taken through the pipeline and added on the other side to the generated elements in some way..