Hacker News new | comments | show | ask | jobs | submit login
Show HN: Generate high-res images of code samples with Chrome Headless (github.com)
93 points by mplewis 8 months ago | hide | past | web | favorite | 89 comments

What's wrong with pygments? I've been using it to get highlighted code as latex, html, whatever, since forever. I'm all for reinventing deficient wheels, but is there anything wrong with this?

    pygmentize -f png -l python -o test.png test.py
Spinning up a browser for it seems like an odd choice. At least the author acknowledges that.

100% agreed.

Just fwiw, apparently (i.e., I discovered in the last 10 mins) pygmentize detects file extensions, so

    $ pygmentize -o test.png test.py 
is equivalent.

Hey, thanks for the suggestion. I've used Pygments in the past and didn't even think to check if it had a PNG output mode.

Here's an example of Pygments C++ output: http://imgur.com/a/7TbMb

And the same file through Prism.js + Puppeteer: https://github.com/mplewis/src2png/blob/master/docs/arduino....

Pygments does a great job of generating detailed PNGs. The default settings aren't quite what I'm looking for, but they look like they could be tweaked to increase the resolution and change the color scheme.

I'm embedding my examples in a Keynote deck, so I'm picky about font family, size, resolution, etc. I bet I could get similar results by writing a custom script for Pygments. Thanks for sharing!

If you're copying code into a Keynote presentation, you can use Pygments' RTF output mode, copy it to the clipboard, and just paste it into a text box in Keynote. That way, because it's actual text, it's smooth regardless of output resolution. Here's an example I just pulled from my shell history:

    pygmentize -f rtf -O "fontface=Fira Code,fontsize=26" -l prolog familyinteract.pl | pbcopy
(That was for pasting code into Pages; for Keynote you'll probably want a bigger font size.)

When the only tool you have is a hammer...

The interesting bit here would be to tie pygmentize into git. I want to be able to generate all the code samples for a deck in one go even as I make changes to the underlying files. At some point this breaks down but managing line numbers for simple changes shouldn't be required.

Couldn't you just write a script that does the pygmentize, and alias a git command to it?

If I want to scratch my left hear, I would use my left hand. Why in the world I'll do it with my right foot?

What happened to the simple screenshot?

This seems to be the flow:

    +----------+        +----------+          +-------------+
    |          |        |          |          |             |
    |   NodeJS +------->+   POI    +--------->+ Puppeteer   +----+
    |          |        |          |          |             |    |
    +----------+        +----------+          +-------------+    |
    +----------+        +----------+          +-------------+    |
    |          |        |          |          |             |    |
    |   NodeJS <--------+   POI    <----------+ Puppeteer   +<---+
    |          |        |          |          |             |
    +----------+        +----------+          +-------------+

* drawn with http://asciiflow.com/

I can't tell if you're trying to be funny with your usage of asciiflow to render something you could've rendered like this: NodeJS -> POI -> Puppeteer -> Puppeteer -> POI -> NodeJS

I wanted to automate a lot of screenshots of highlighted code at once, so I wrote a script to take input files and create output files. This was the first workflow that worked for me, so I went with it.

In the future I'd like to simplify the workflow a lot, including cutting out the dev server.

Automation, I don't think anyone is suggesting you should do this by hand.

Oh my god, the asciiflow.com is great. Thanks for sharing.

The next step is to start a virtual machine to run NodeJS to "improve" on this workflow.

Nah, this is perfect for containers. You could use kubernetes to manage it all.

For true webscale generation use AWS autoscaling. Render tens of screenshots in a matter of minutes!

It's all serverless now

I'm glad the dev included this summary:

  Oh god, this is horrifying. You have built a monster and it is made of JavaScript.
  Yes it is. Yes I have.

Nice work, I certainly think this is a nice pet project where the author learned a ton, however this looks too convoluted for my personal taste.

In my case, I use an Alfred workflow that formats the code snippet and then I can just paste the formatted code into Keynote/Powerpoint. It supports several color themes and it's super fast https://github.com/importre/alfred-hl

There's also highlight which is available in most Linux distributions and via homebrew for OSX and can output in RTF and a host of other formats. piping it's output to pbcopy will even put that rtf on the OSX clipboard.

Vim can do this

1. Open your script/code in vim editor

2. Enable syntax & set the required color scheme

:syntax on :colorscheme darkblue

3. Print the file in PS file format

:hardcopy >/tmp/filename.ps

4. Convert the PS file to PDF format

$ ps2pdf /tmp/filename.ps

5. Now you can open filename.pdf

You can also do :TOhtml, then open the html file with your browser and print to pdf. A bit simpler imo.

Or convert the ".ps" file into ".jpeg" instead ?

4. Using ghostscript: gs -sDEVICE=jpeg -dJPEGQ=100 -dNOPAUSE -dBATCH -dSAFER -r300 -sOutputFile=filename.jpg filename.ps

On a Mac you can use pstopdf (or just open the PS file in Preview).

Hmm. I would just usually screenshot the code snippet in my IDE. Seems like an easier solution.

However, congrats on assembling your "Lovecraftian amalgamation of software." :)

Thanks! I'm not super happy with the big pile of JS I now own, but this did save me a bunch of time I used to spend setting up Atom, picking the right colors, zooming in, loading all the text files...

I'm at a loss here, why do people need images of syntax-highlighted code? Text inside images cannot be easily copied.

Nice work. But I'm not looking forward to trying to select text only to drag an image

Thanks! I built this for Keynote presentations specifically. I wanted to be able to write code samples as snippet files, then generate a batch of images I can drop into my presentations.

If you're posting code on a website, you're right – you should prefer <pre><code> and prism.js or highlight.js :)

I like the idea too, but :) would be Chrome Headless capable of producing SVG somehow? Aside usability I think about indexability/SEO aspect of the material. SVG with texts would be excellent for on-line presentations without lecturer, PDFs, books and maybe even partially editable.

I agree, that would also benefit me enough to install a JS monster on my computer, if I could generate SVG or PDF from highlighted code. My current solution being taking Screenpresso screenshots.

My first thought was publication ready pseudo code or code snippets for posters

This would be nice, however, if you were making a printed document like a book.

I've been doing the following for my keynote slides.

   1. Write in emacs org-mode code blocks
   2. Export to html (requires htmlize)
   3. Copy and paste the html from safari to keynote
The Emacs theme determines the syntax highlighting. Org-mode controls the html export (font family, font size, line spacing), with optional line number or output from executions. The copy-pasting from safari to keynote preserves all the formatting.

Anyone knows a tool which can help us draw simple diagrams like the one that was mentioned in the README?? https://github.com/mplewis/src2png/blob/master/docs/foreach_...

`mscgen` [1] was a popular one back in the day. I think `graphviz` can do it too. Since sequence diagrams are part of UML, most UML tools can do those.

[1] https://en.wikipedia.org/wiki/MscGen

`mscgen' is still my go-to tool for message sequence charts

I tend to like https://www.draw.io/ for that sort of thing (there is a chrome app if you want to run offline).

LucidChart (https://www.lucidchart.com/) is better than draw.io IMO

Close! I started with that tool, but moved to the FOSS version: https://bramp.github.io/js-sequence-diagrams/

This kind of projects are always interesting, although not for the problem they try to solve (I ain't downloading >500mb of deps just to take a screenshot of code) but for discovering what are the trendiest technologies right now.

Thanks to this I've learnt about poi, yarn, the Fira Code font...

I've been using Fira Code for a while. I think it's beautiful, but my colleagues sometimes wonder if I'm really showing them real code right now, because the ligatures look like it was rendered for display.

Why "Show HN" but not publish to NPM? Seems neat, just publish it.

Also, ever try this technique? https://gist.github.com/jimbojsb/1630790

I've used it with success before.

Thank you! I don't want to publish it yet because I want to clean it up a lot first – ditch the dev server, clean up the build pipeline, add some actual CLI arguments, etc.

I'll check out that app on brew, it looks like a cool approach. Thanks!

You can just publish an 0.1, 0.2, etc. Then when you are ready just publish a 1.0 version.

Publish then iterate!

I would like a way to generate a syntax highlighted pdf from either a git repo or a PR, for printing purposes. This seems like it could be adapted in that direction.

Definitely – Puppeteer has a method to generate PDFs.

Images of code are the worst. Completely unusable.

He's using them for a Keynote presentation. I think it's probably okay.

Not really. For instance, if you put the PDF file of the presentation online, then search engines will not be able to index this code easily, humans will not be able to cut and paste the code easily, etc.

Protip: Copy syntax highlighted code from VSCode and paste it straight into MS Word. Retains coloring and formatting, looks great.

Wouldn't it be better as a PDF so that it can be selected? But somehow still retain the color / formatting.

My workflow:

One time setup

    $ python -m venv ve
    $ source ve/bin/activate
    $ pip install pygments WeasyPrint

    $ pygmentize -f html src.py | weasyprint - out.pdf
Both pygmentize and weasyprint have many options to play with. I find a set and create a shell script.

This also does pngs, etc:

    $ pygmentize -f html src.py | weasyprint - out.png

Can’t you it directly using Pygments’ ImageFormatter?


Apparently so! I don't generate images myself, so i didn't know.

    $ pygmentize -o out.png src.py 

I use vim for writing out html:

I found it to create the prettiest syntax highlighted code. But I'm bias because it produces html using my vim color scheme which I'm partial too :)

You can script vim too by specifying a list of commands in a file then running it:

  $ vim -S myscript some_file

I built this to generate PNGs to drop into a Keynote presentation, so PDFs don't quite work for my use case.

It seems like it should be pretty easy to print a PDF from Puppeteer though – they expose a #pdf method.


Just curious why you'd want to use code screenshots vs text in a presentation. Did it have to do with missing fonts or something like that?

Not OP, but syntax highlighting and other niceties are probably going to disappear within most presentation software.

Yep – I can't copy my code out of Atom and keep its highlighting in Keynote, unfortunately.

sorry for the minirant

This is something I find really frustrating in computing. We can do and are constantly doing such awesome and incredible things with computers. But at the same time transferring (copy-pasting) text from one application to another remains a challenge. And a reasonable solution for that is to bring in the 600lb gorilla that is a modern web browser to render fixed-width text into a bitmap so that it can be embedded into a presentation. It just feels so wrong, like the whole field is actually rotten inside under the shiny surface, while the state of the art blazes forwards thousand miles away from the real world. I, as a member of this community, feel utterly powerless to actually make the situation noticeably better, partly because of so much relies on interoperability, and partly because of the massive inertia that modern software carries.

I want out.

Clipboards support storing different content for different MIME types at the same time. Atom would have to put text/html (or some other colored markup) in addition to text/plain into the clipboard and then Keynote would have to know what to insert. The infrastructure for interoperability is there, the endpoints just need to actually support it.

EDIT: Apparently, the problem is (or was in 2014) that Keynote doesn't support HTML and Chrome (Atom is Electron-based, right?) doesn't support RTF. Well worth a read: https://apple.stackexchange.com/a/124167

Using Windows or OS X clipboard works perfectly fine, the author just uses applications that don't work as they should together.

I can easily do the same with Notepad++ and Powerpoint, just need to copy-paste in RTF.

There is a simple solution for that, and it exists on Android.

On Android all text formatting is compatible between apps nicely, and can easily be copied between apps.

Why not just do a screenshot of the code from Atom?

That doesn't scale well beyond doing a handful of them, I would think.

For code heavy presentations, I've seen a few people use Jupyter notebooks to solve this problem. Then just intermingle the rest of the presentation as markdown cells around the code cells.

Exactly, and to get the screenshot exactly as I want takes a couple minutes of setup – pick a theme, pick a font size, resize window – plus a few more if I have to stitch scrolled pages together.

You can embed PDFs in Keynote presentations.

You can use source-highlight to go to docbook or latex(color), and from either of those, can directly go to PDF.

I like how people come up with new uses cases for headless Chrome.

But for his use case ('presentation') there are tons of presentation libs/franeworks in HTML/JS/CSS which support code embeddings with syntax highlighting.

Same quality but much easier workflow since code stays editable.

You assume he uses HTML/JS/CSS for presentation. What if it's Keynote/Powerpoint?

If he presents code then a HTML/CSS/JS system should be prefered aince syntax highlighting is easier. Something you can't get automatically done with the alternatives you mentioned.

Interesting idea! I'm not sure about the implementation though. I mean couldn't you just skip the whole Poi/live-reloading steps and just output a html file with some <script> and <style> tags? Why bother with the live-reloading/poi complexity?

Batch-shotting without reloading chrome each time?

This is a step backward from my current workflow which uses vector graphics.

I'd love to see your current workflow if you're willing to share!

Sorry, I can't easily do that. But I can say it uses LaTeX under the hood.

Seems to want to use Fira Code for some reason. I'd expect it to let the user choose which font should be used but, if not, at least don't require one with those ridiculous ligatures.

Why are they ridiculous?

You might want to check sharp to remove the imagemagick dependency, I'm pretty sure sharp supports edge trimming in one way or another. Great project!

Thanks! I will check out sharp.

I will just leave it here: http://instaco.de/

An over engineered solution for what is super easy with Powerpoint and Notepad++.

Yeah, I was thinking the same. His potential solutions section is missing:

* Insert in Text-Editor or IDE with Syntax highlighting * Take a screengrab

I mean it's still nice in terms of being able to script the generation if you have code that updates frequently and you want to keep the pictures current.

I've used Emacs' htmlize to good effect in the past.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact