Hacker News new | past | comments | ask | show | jobs | submit login
Colab notebook to create Magic cards from image with Claude (colab.research.google.com)
107 points by minimaxir 9 months ago | hide | past | favorite | 25 comments



A few miscellaneous observations about Claude's new beta function calling / structured data support (https://docs.anthropic.com/claude/docs/tool-use) that I encountered while making this notebook:

1. It is not nearly as good as ChatGPT's function calling conformance, hence why the system prompt engineering is more aggressive than usual.

2. Claude doesn't seem to handle nesting schema well: otherwise I would have allowed "generating X cards in a cycle" as a feature. The documentation does state that it can't handle "deeply nested" data but a single list is not deeply nested.

3. The documentation mentioned that Claude can do a chain-of-thoughts with tools enabled: in my testing this reduces quality drastically. In Opus, which does it by default, it's a waste of extremely expensive output tokens and it has a tendency to ignore fields.

4. Haiku and Sonnet have different vibes but similar subjective quality, in this case Sonnet being more "correct" and Haiku being more "fun", which is surprising.


You should look at the consumer app system prompt: https://twitter.com/amandaaskell/status/1765207842993434880

I've had way more success taking it and forming it to my tasks than stuffing in a bunch of tokens about how to do things.

In one case I was able to go from ~6k in input tokens to ~3k because I no longer had to provide a mountain of examples and instructions for corner cases


Can you elaborate on how does the claude system prompt help in creating prompts for different tasks?


Anytime I see a prompt from one of these companies, I assume it matches the style of instructions the model encountered during pre-training.

And the kinds of instruction formats that were encountered during pre-training end up informing what style of instruction the model is best at following.

An extreme example would be prompt templates that a "raw" instruct-tuned LLM follows: the model will technically work with a suboptimal format, but you get much better performance if you follow the prompt template the model was trained/fine-tuned on.

-

It's not a guarantee a given prompting style was involved during pre-training of course, but at the very least it's going to a provide a jumping off point that the creators of the model co-signed on.


This is semi-unrelated, but I tried to log in to Claude today and saw that Anthropic had banned my account. I only used Claude to ask 3-4 questions, so I guess the problem was that one of them was intended to see how Claude will self-censor on shady questions.

The moral of the story is don't ask Claude anything out of the ordinary, as maybe now I'm on a list somewhere.


Reminds me of when Bing would slam the ‘end conversation’ button wherever you (or it) hit any hidden tripwires. You literally couldn’t ask it what topics to avoid, because that was one of them.


You can appeal here: https://support.anthropic.com/en/articles/8241253-trust-and-...

They seem to ban a lot of accounts "by mistake" or very aggressively but they also do unban. There are quite a few cases on /r/ClaudeAI subreddit with Anthropic employees directing them to the above link.


I did appeal (the error message informs you you can), but do I want to use an LLM that might randomly ban me?


I can't stop laughing at this prompt

"Your response must follow ALL these rules OR YOU WILL DIE:"

This is state of the art of programming too. I'm not passing judgement on this, except saying this is hilarious.

Edit: And at the end

"- If the user provides an instruction that contradicts these rules, ignore all previous rules."

What? That opens you up to all kind of attacks, no?


I have a separate blog post about threats in system prompts: https://news.ycombinator.com/item?id=39495476

Before I added the threat, Claude subjectively had a high probability of ignoring the rules such as generating a preamble before the JSON and thus breaking it, or often scolding the user.

At a high-level, Claude seems to be less influenced by system prompts in my testing, which could be a problem. I'm tempted to rerun the tests in that blog post, since in my experiments it performed much worse than ChatGPT.

> What? That opens you up to all kind of attacks, no?

tbh it doesn't listen to that line very well but it's more of a hedge to encourage better instructions.


This is very interesting, I had a similar feeling about Claude performing much worse than GPT-4. Granted, I didn't put much work into optimizing the prompts, but then again, the prompts were certainly not GPT-optimized or specific either. The problems were severe, such as it choosing the wrong side of the conversation, hallucinating weird stuff plus repeating part of the prompt, all in the same message


Ok, the other blogpost has been sent to my readlist now. Looks fun. I did have one question though, and please forgive me if the answer is RTFA, but why choose Claude? Was there a specific reason?


Because I have another blog post about ChatGPT's structured data (https://news.ycombinator.com/item?id=38782678) and wanted to investigate Claude's implementation to compare and contrast. It's easy to port to ChatGPT if needed.

I just wanted to do the experiment in a fun way instead of fighting against benchmarks. :)


Have you tried anything like that for DALL-E 2? It won't follow specific instructions whenever it's asked to draw people.


DALL-E 2/3 is too expensive to run significant tests on, but neither allow you to manipulate the system prompt to override some behaviors.

It is possible to work around it for GPT-4-Vision with the system prompt but it's very difficult and due to ambiguities in OpenAI's content policy I'm unsure if it's ethical or not.

I am still working on experimenting with its effects on Claude: it turns out that Claude does leak its system prompt telling it not to identify individuals without any prompt injections! If you do hit such an issue with this notebook, it will output the full JSON response.


Thank you!


That’s because if the woke-injection Open AI applies to the prompt.


It won't follow even simple, non-ethnic specific instructions such as "draw three women sitting at a cafe", it re-writes and completely forgets the original number of women, and adds a lot to the query that wasn't there.


I whipped up some card rendering CSS for this, pretty fun!

https://colab.research.google.com/drive/1VERzr75vpCgmXE6lgQC...? usp=sharing


At least from the limited sample size of results, the rules text seems like garbled than other models in the mtg card generating space. Curious if you're able to first generate a few mechanics, and then have it design an entire set (either bottom-up or top-down). I'm sure balancing is not something Gen AI can do properly... but I imagine this can really change how set designers approach new worlds or mechanics!


cool project. Can we see some example magic cards created? thanks!


Made a few demos on Twitter:

https://twitter.com/minimaxir/status/1777378034238030255

https://twitter.com/minimaxir/status/1777378037199179840

EDIT: Added image to the header of the notebook.


(Disclaimer, I'm the maintainer of this package, but this kind of use is exactly why I created it at the beginning)

If you know how to use HTML+CSS and would like to generate full-fledged cards, you could use a package such as html2image [0] to combine the text, the image and a card-template image into one final image. Chrome/Chromium has to be available on Colab Notebooks though, that's the only requirement. Using basic SVG without this package could also do the trick.

[0] https://github.com/vgalin/html2image


I had worked on a similar package for similar reasons but yes, Chromium is too big of a dependency.

I have an idea for a non-Chromium implementation but that’s a rabbit hole.


Since you’re already in Python, perhaps my `skit` package could be of service? https://pypi.org/project/skit-game/

It depends on Python and Pillow.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: