It's available on IBM WatsonX, but the Prompt Lab may still report "model unavailable". This is because of overeager guardrails. These can be turned off, but the German translation for this option is broken too: look for "KI-Guardrails auf" in the upper right.
Setting up AWS so you can try it via Amazon Bedrock API is a hassle, so I made a step-by-step guide: https://ndurner.github.io/amazon-nova. It's 14+ steps!
This is a guide for the casual observer who wants to try things out, given that getting started with other AI platforms is so much more straightforward. It's all open source, with transparent hosting, catering to any remaining concerns someone interested in exactly that may have.
The most common way for an AWS account to be hacked, by far, is mishandling of AWS IAM user credentials. AWS has even gone so far as to provide multiple warnings in the AWS console that you should never create long-lived IAM user credentials unless you really need to do so and really know what you are doing (aka not a “casual observer who wants to try things out”).
This blog post encourages you to do this known dangerous thing, instructs you to bypass these warnings, and then paste these credentials into an untrusted app that is made up of 1000+ lines of code. Yes, the 1000+ lines of code are available for a security audit, but let’s be real: the “casual observer who wants to try things out” is not going to actually review all (if any) of the code, and likely not even realize they should review it.
I give kudos to you for wanting to be helpful, but the instructions in this blog (“do this dangerous thing, but trust me it’s okay, and then do this other dangerous thing, but trust me it’s okay”) is exactly what nefarious actors would ask of unsuspecting victims, too, and following such blog posts is a practice that should not be generally encouraged.
Sharing your IAM credentials is like sharing your password. Just don't do it, regardless of the intentions. Even if this one doesn't steal anything it creates a precedence that will let people think it's ok and make them easier targets in the future. Besides, bedrock already has a console, so what's the point of using your UI?
If you're already in the AWS ecosystem or have worked in it, it's no problem. If you're used to "make OpenAI account, add credit card, copy/paste API key" it can be a bit daunting.
AWS does not use the exact same authn/authz/identity model or terminology as other providers, and for people familiar with other models, it's pretty non-trivial to adapt to. I recently posted a rant about this to https://www.reddit.com/r/aws/comments/1geczoz/the_aws_iam_id...
Personally I am more familiar with directly using API keys or auth tokens than AWS's IAM users (which are more similar to what I'd call "service accounts").
If you're looking for a generative AI model API only, I think Nova is not for you. If you want to build that capability into your cloud application, it uses exactly the model you expect and have, and you just add a new policy/role/whatever for whatever piece of it's going to use Nova.
Setting up Azure LLM access is a similar hellish process. I learned after several days that I had to look at the actual endpoint URL to determine how to set the “deployment name” and “version” etc.
Nice! FWIW, The only nova model I see on the HuggingFace user space page is us.amazon.nova-pro-v1:0. I cloned the repo and added the other nova options in my clone, but you might want to add them to yours. (I would do a PR, but... I'm lazy and it's a trivial PR :-)).
I'm so confused on the value prop of Bedrock. It's seems like it wants to be guardrails for implementing RAG with popular models but it's not the least but intuitive. Is it actually better than setting up a custom pipeline?
The value I get is:
1) one platform, largely one API, several models,
2) includes Claude 3.5 "unlimited" pay-as-you-go,
3) part of our corporate infra (SSO, billing, ... corporate discussions are easier to have)
I'm using none to very little of the functionality they have added recently: not interested in RAG, not interested in Guardrails. Just Claude access, basically.
However, I found that Whisper is thrown off by background music in a prodcast - and will not recover. (That was with the mlx-community/whisper-large-v3-mlx checkpoint, OP uses distil-whisper-large-v3). I concluded for myself that Whisper might be used in larger processing pipelines that will handle such - can someone provide insights about that? The podcast I used it on was https://www.heise.de/news/KI-Update-Deep-Dive-Was-taugen-KI-....
I use a noise filter pass (really just https://github.com/richardpl/arnndn-models/blob/master/bd.rn... and some speech band filtering after) before doing any processing in whisper. It's worked well for me when using dirty audio (music in the background, environmental noise, etc). When there is music, you either almost can't hear it at all or you'll only hear particularly clear parts featuring singing.
I use this trick for book announcements on Amazon: some ambitious book releases never get released, so I am not a fan of buying before the release. With LLM support, I'll add the release date given by Amazon to my calendar - quickly. The file download feature of my Workbenches helps with that: https://ndurner.github.io/chatbots-update
Some paid Slack plans allow exports from their platform. Inside the ZIP, there will be an XML file that lists the channels - channels.xml, if I am not mistaken. However, I don‘t know about „categories“ - this could be some client configuration that actually is not part of the export. If it‘s not, I would perhaps take screenshots (overlapping is OK), and feed it to some LLM/VLM for extraction. I am confident that GPT-4o or Claude 3.5 Sonnet will work, but Gemini 1.5 Pro should also work.
With gpt-4-32k viewed as deprecated and generally only available through Microsoft Azure until mid next year, this development may be reassuring to some users. Tentative pricing from the website: In: $6.00 / 1M tokens, Out: $18.00 / 1M tokens.
Different approach, which also works with Llama 3.0 8B: let the LLM write the tool invocation of choice in Python syntax, parse the LLM response with the Python AST package.
(PoC'y hackathon demo) code here: https://github.com/ndurner/aileen2.
However, commenters around here noted that these have likely not been fine-tuned to correlate with accuracy - for plaintext LLM uses. Would be interested in hearing finding for MLLM use-cases!
reply