How Amazon Web Services is Making Generative AI More Affordable

Amazon Web Services (AWS) is stepping up its game in the world of generative AI with some cool new features for its Bedrock Language Model (LLM) service. The big news? They’re adding caching and intelligent prompt routing to help businesses cut costs. Imagine you’re at a big office with lots of colleagues, and everyone's asking the same question over and over. You wouldn't want to pay for each answer individually, right? That’s what caching is for. It saves the answers to repeated questions, so you don’t have to pay each time. This can slash costs by up to 90% and give you answers faster, like up to 85% quicker. Adobe tested this and found it could get answers 72% faster.

Now, let’s talk about intelligent prompt routing. Think of it like a smart traffic cop directing cars to the right lanes. Here, it routes queries to different models to balance cost and performance. A small language model predicts how each model will handle a query and directs it to the best fit – no human needed! AWS isn’t the first to think of LLM routing. Other startups and open-source projects do it too. But AWS thinks its edge is that it can route queries automatically, without much human input. For now, it’s limited to models in the same family, but they plan to expand and give users more control in the future. Lastly, AWS is launching a marketplace for Bedrock. This is to support smaller, specialized models that only have a few users. Users will have to manage their own infrastructure here, which is usually handled automatically by Bedrock. AWS will offer around 100 of these models initially, with more on the way.

actions