Is that why Amazon's product search is terrible? Because it's more profitable for them when I scroll through 5 pages of junk than if I can navigate immediately to the thing I want?
It’s fascinating that Amazon Web Services have so many overlapping and competing services to achieve the same objective. Efficiency/small footprint was never their approach :D
For example, look how many different types of database they offer (many achieve the same objective but different instantiation)
As others said the product isnt the model, its the API based token usage. Happily selling whatever model you need, with easy integrations from the rest of your aws stack, is the entire point.
This is a digression, but I really wish Amazon would be more normal in their product descriptions.
Amazon is rapidly developing its own jargon such that you need to understand how Amazon talks about things (and its existing product lineup) before you can understand half of what they're saying about a new thing. The way they describe their products seems almost designed to obfuscate what they really do.
Every time they introduce something new, you have to click through several pages of announcements and docs just to ascertain what something actually is (an API, a new type of compute platform, a managed SaaS product?)
Amazontalk: We will save you costs
Human language: We will make profit while you think you're saving the costs
Amazontalk: You can build on <product name> to analyze complex documents...
Human language: There is no product, just some DIY tools.
Amazontalk: Provides the intelligence and flexibility
Human language: We will charge your credit card in multiple obscure ways, and we'll be smart about it
Yeah but even then they won't describe it using the same sort of language that everyone else developing these things does. How many parameters? What kind of corpus was it trained on? MoE, single model, or something else? Will the weights be available?
It doesn't even use the words "LLM", "multimodal" or "transformer" which are clearly the most relevant terms here... "foundation model" isn't wrong but it's also the most abstract way to describe it.
"Foundation model" is not Amazon lingo, though, but pretty standard industry term at this point. If you're doing any sort of AI in prod, you know what it means.
> How many parameters? What kind of corpus was it trained on?
It's rare for the leading model providers to answer these questions.
As someone who applies these models daily, I agree with the dead comment from meta_x_ai. Your questions are interesting/relevant to a person developing these models, but less important to the average person utilizing these models through Bedrock.
Once upon a time there were (and still are) mainframes (and SAP is similar in this respect). These insular systems came with their own tools, their own ecosystem, their own terminology, their own certifications, etc. And you could rent compute & co on them.
If you think of clouds as being cross continent mainframes, a lot more things make a more sense.
No audio support: The models are currently trained to process and understand video content solely based on the visual information in the video. They do not possess the capability to analyze or comprehend any audio components that are present in the video.
This is blowing my mind. gemini-1.5-flash accidentally knows how to transcribe amazingly well but it is -very- hard to figure out how to use it well and now Amazon comes out with a gemini flash like model and it explicitly ignores audio. It is so clear that multi-modal audio would be easy for these models but it is like they are purposefully holding back releasing it/supporting it. This has to be a strategic decision to not attach audio. Probably because the margins on ASR are too high to strip with a cheap LLM. I can only hope Meta will drop a mult-modal audio model to force this soon.
They also announced speech to speech and any to any models for early next year. I think you are underestimating the effort required to release 5 competitive models at the same time.
'better' is always a loaded term with ASR. Gemini 1.5 flash can transcribe for 0.01/hour of audio and gives strong results. If you want timing and speaker info you need to use the previous version and a -lot- of tweaking of the prompt or else it will hallucinate the timing info. Give it a try. It may be a lot better for your use case.
Setting up AWS so you can try it via Amazon Bedrock API is a hassle, so I made a step-by-step guide: https://ndurner.github.io/amazon-nova. It's 14+ steps!
This is a guide for the casual observer who wants to try things out, given that getting started with other AI platforms is so much more straightforward. It's all open source, with transparent hosting, catering to any remaining concerns someone interested in exactly that may have.
The most common way for an AWS account to be hacked, by far, is mishandling of AWS IAM user credentials. AWS has even gone so far as to provide multiple warnings in the AWS console that you should never create long-lived IAM user credentials unless you really need to do so and really know what you are doing (aka not a “casual observer who wants to try things out”).
This blog post encourages you to do this known dangerous thing, instructs you to bypass these warnings, and then paste these credentials into an untrusted app that is made up of 1000+ lines of code. Yes, the 1000+ lines of code are available for a security audit, but let’s be real: the “casual observer who wants to try things out” is not going to actually review all (if any) of the code, and likely not even realize they should review it.
I give kudos to you for wanting to be helpful, but the instructions in this blog (“do this dangerous thing, but trust me it’s okay, and then do this other dangerous thing, but trust me it’s okay”) is exactly what nefarious actors would ask of unsuspecting victims, too, and following such blog posts is a practice that should not be generally encouraged.
Sharing your IAM credentials is like sharing your password. Just don't do it, regardless of the intentions. Even if this one doesn't steal anything it creates a precedence that will let people think it's ok and make them easier targets in the future. Besides, bedrock already has a console, so what's the point of using your UI?
If you're already in the AWS ecosystem or have worked in it, it's no problem. If you're used to "make OpenAI account, add credit card, copy/paste API key" it can be a bit daunting.
AWS does not use the exact same authn/authz/identity model or terminology as other providers, and for people familiar with other models, it's pretty non-trivial to adapt to. I recently posted a rant about this to https://www.reddit.com/r/aws/comments/1geczoz/the_aws_iam_id...
Personally I am more familiar with directly using API keys or auth tokens than AWS's IAM users (which are more similar to what I'd call "service accounts").
If you're looking for a generative AI model API only, I think Nova is not for you. If you want to build that capability into your cloud application, it uses exactly the model you expect and have, and you just add a new policy/role/whatever for whatever piece of it's going to use Nova.
Setting up Azure LLM access is a similar hellish process. I learned after several days that I had to look at the actual endpoint URL to determine how to set the “deployment name” and “version” etc.
Nice! FWIW, The only nova model I see on the HuggingFace user space page is us.amazon.nova-pro-v1:0. I cloned the repo and added the other nova options in my clone, but you might want to add them to yours. (I would do a PR, but... I'm lazy and it's a trivial PR :-)).
I'm so confused on the value prop of Bedrock. It's seems like it wants to be guardrails for implementing RAG with popular models but it's not the least but intuitive. Is it actually better than setting up a custom pipeline?
The value I get is:
1) one platform, largely one API, several models,
2) includes Claude 3.5 "unlimited" pay-as-you-go,
3) part of our corporate infra (SSO, billing, ... corporate discussions are easier to have)
I'm using none to very little of the functionality they have added recently: not interested in RAG, not interested in Guardrails. Just Claude access, basically.
Just means it's better at one specific task than the others, which has always been the case. For each of Sonnet, GPT and Gemini I can readily name a task they are individually the best at. At the same time the consensus that Sonnet 3.5 is overall the currently strongest model remains correct, and that's what most people care about. Additionally most people do tasks that all of the models perform similarly at, or they can't be bothered to optimize every task by using the best model for that one task. Which makes sense since not a single cloud provider has all three of them. Now this one will likely be AWS-exclusive too.
Benchmarks are way too easy to game. There's no shortage of models that "beat GPT-4" according to some benchmark or another, that are obviously nowhere even close when you try them on novel tasks.
They missed a big opportunity by not offering eu-hosted versions.
Thats a big thing for complience. All LLM-providers reserve the right to save (up to 30days) and inspect/check prompts for their own complience.
However, this means that company data is potentionally sotred out-of-cloud. This is already problematic, even more so when the storage location is outside the EU.
I really wish they would left-justify instead of center-justify the pricing information so I'm not sitting here counting zeroes and trying to figure out how they all line up.
> The Nova family of models were trained on Amazon’s custom Trainium1 (TRN1) chips,10 NVidia A100 (P4d instances), and H100 (P5 instances) accelerators. Working with AWS SageMaker, we stood up NVidia GPU and TRN1 clusters and ran parallel trainings to ensure model performance parity
Does this mean they trained multiple copies of the models?
Models like this are experimentally pretrained or tuned hundreds of times over many months to optimize the datamix, hyperparams, architecture, etc. When they say "ran parallel trainings" they are probably referring to parity tests that were performed along the way (possibly also for the final training runs). Different hardware means different lower-level libraries, which can introduce unanticipated differences. Good to know what they are so they can be ironed out.
Part of it could also be that they'd prefer to move all operations to the in-house trn chips, but don't have full confidence in the hardware yet.
Def ambiguous though. In general reporting of infra characteristics for LLM training is left pretty vague in most reports I've seen.
Different models have different strengths and weaknesses, especially here in the early days when models and their capabilities progress several times per year. The apps, programs, and systems based on models need to know how to exploit their specific strengths and weaknesses. So they are not infinitely interchangeable. Over time some of that differentiation will erode, but it will probably take years.
AWS having customers using its own model probably improves AWS's margins, but having multiple models available (e.g. Anthropic's) improves their ability to capture market share. To date, AWS's efforts (e.g. Q, CodeWhisperer) have not met with universal praise. So for at least for the present, it makes sense to bring customers to AWS to "do AI" whether they're using AWS's models or someone else's.
I don't think there will be one model that will rule them all, unless there is a breakthrough. If things continue on the same path, I think Amazon, Microsoft and Google will be the last ones standing, since they can provide models from all the major LLM players.
1. A company the size of Amazon has enough resources and unique internal data no one else has access to that it makes sense for them to build their own models. Even if it's only for internal use
2. Amazon cannot beat Anthropic at this game. They are far a head of them in terms of performance and adoption. Building these models in-house doesn't mean it's a bad idea to also invest in Anthropic
Not sure if this was the goal, but it does work well from a product perspective that Nova is a super-cheap model that is comparable to everything BUT Claude.
They really should've tried to generate better video examples, those two videos that they show don't seem that impressive when you consider the amount of resources available to AWS. Like what even is the point of this? It's just generating more filler content without any substance. Maybe we'll reach the point where video generation gets outrageously good and I'll be proven wrong, but right now it seems really disappointing.
Right now when I see obviously AI generated images for book covers I take that as a signal of low quality. If AI generated videos continue to look this bad I think that'll also be a clear signal of low quality products.
When marketing talks about price delta and not quality of the output, it is DOA. For LLMs, quality is a more important metric and Nova would always try to play catch with the leaderboard forever.
Maybe. The major models seem to be about tied in terms of quality right now, so cost and ease of use (e.g. you already have an AWS account set up for billing) could be a differentiator.
Using LLMs via Bedrock is 10x more painful than using direct APIs. I could see cost consolidation via cloud marketplace a play - but I don't see Amazon's own LLM initiatives ever taking off. They should just lose those shops and buy one of the frontier models (while it is still cheap)
The major models are not tied in terms of quality. GPT-4 and GPT-o1 still beat everyone else by a significant margin on tasks that require in-depth reasoning. There's a reason why people just don't go for the cheapest option, whatever the benchmarks say.
Exactly. Citing cost has been an AWS play which worked during early days of cloud - so they are trying to stick to those plays. They don't work in AI world. No one would want a faster/cheap model that gives poor results (besides the cost of frontier model keeps coming down - so these are just dead initiative IMO).
On LLM, my experience with Claude has been much better than OpenAI models (though my use case is more on code generation)
For more complicated stuff, I did some experiments using LLMs to drive high-level AI decisions in video games. Basically, it gets a data schema and a question like "what do you do next?", and can query the schema to retrieve the info that it thinks it needs to give the best answer to that. GPT-4 and GPT-o1 especially are consistently the best performers there, both in terms of richness of queries they produce, and how they make use of them.
There's also a bunch of interesting examples along the same lines here: https://github.com/cpldcpu/MisguidedAttention. Although I should note that even top OpenAI models have troubles with much of this stuff.
https://github.com/fairydreaming/farel-bench is another interesting benchmark because it's so simple, and yet look at the number disparity in that last column! It's easy to scale, too.
Unfortunately, we're still at the point in this game where even seemingly trivial and unrelated minor changes in the prompt (e.g. slightly rewording it, and even capitalization in some cases) can have large effect on quality of output, which IMO is a tell-tale sign when the model is really operating in a "stochastic parrot" mode more so than any kind of actual reasoning. Thus benchmarks can be used as a way to screen out the poorly performing models, but they cannot reliably predict how well a model will actually do what you need it to do.
It's really amusing how bad Amazon is at writing and designing UI. For a company of their size and scope it's practically unforgivable. But they always get away with it.
At best, you can conclude that outdated product design doesn't always ruin a business (clearly). But you can't conclude the inverse (that investing in modern product design doesn't ever help a business).
That's a great point. Further, there are many sizeable businesses built on top of AWS where they deliver the abstractions with compression that earns them their margin.
Case in point: tell me, from the point of view of the user, how many steps it takes to deploy a NextJS/React ecosystem website with Vercel and with AWS, start to finish.
I think they have plenty of competition in the cloud computing space. It seems fair to say that their strategy of de-prioritizing UI/UX in favor of getting features out the door more quickly and cheaply has benefitted them.
However, I don't think it's fair to say that this trade-off always wins out. Rather, they've carved out their own ecological niche and, for now, they're exploiting it well.
Oh I'm sure, the ACM UI was impossible to use for years to find certificates, they improved it, but, it will never have the same level of functionality that the API gives you and that's the bread and butter.
Imagine a native desktop app that let you build a UI with very basic elements, à la Visual Basic, and behind each of those elements is an associated AWS CLI command. Such that "aws s3 ls" attached to a list element would render an account's buckets.
The AWS APIs are so expansive, a product like this could offer a complete replacement for the default web console and maybe even charge for it. Does anyone know if such a solution exists? Perhaps some more generic "shell-to-ui" application? If not, I'm interested in building one if anybody would like to contribute.
It makes more sense to conclude that if you have market dominance, you can get away with a lot, especially since we see this time and again in other matters, not just UI.
That happens when you ask SWE to design.
To fix this, Amazon will need to do extensive UX research and incrementally make changes until the UI doesn't look the same and is more usable. Because users hate sudden change.
What do you think is comparable but better? I think you’re really seeing that they have a large organization with a lot of people working on different complex products, which makes major changes much harder to coordinate, and their market skews technical and prioritizes functionality higher than design so there isn’t a huge amount of pressure.
"Jeff Bezos is an infamous micro-manager. He micro-manages every single pixel of Amazon's retail site. He hired Larry Tesler, Apple's Chief Scientist and probably the very most famous and respected human-computer interaction expert in the entire world, and then ignored every goddamn thing Larry said for three years until Larry finally -- wisely -- left the company."
Why was this being downvoted? It’s first-party evidence that substantiates the claims of the parent comment, and adds interesting historical context from major industry players
https://artificialanalysis.ai/leaderboards/models seems to suggest Nova Lite is half the price of 4o-mini, and a chunk faster too, with a bit of quality drop-off. I have no loyalty to OpenAI, if it does as well as 4o-mini in the eval suite, I'll switch. I was hoping "Gemini 1.5 Flash (Sep)" would pass muster for similar reasons, but it didn't.
I'd say, people that need it. Which could be the same for all the other models out there.
To create one model that is great at everything is probably a pipedream. Much like creating a multi-tool that can do everything- but can it? I wouldn't trust a multi-tool to take a wheel nut off a wheel, but I would find it useful if I suddenly needed a cross-head screw taken out of something.
But then I also have a specific crosshead screwdriver that is good at just taking out cross-head screws.
Use the right tool for the right reason. In this case, there maybe a legal reason why someone might need to use it. It might be that this version of a model can create something better that another model can't. It might be that for cost reasons you are within AWS, that it makes sense to use a model at the cheaper cost than say something else.
So yeah, I am sure it will be great for some people, and terrible for others... just the way things go!