Hey HN - this is Christian and Ian from Maitai (
https://trymaitai.ai). We're building an LLM platform that optimizes request routing, autocorrects bad responses, and automatically fine-tunes new application-specific models with incremental improvements. Here’s a demo video:
https://www.loom.com/share/a2cd9192359840cab5274ccba399bd87?....
If you want to try it out, we built a game (https://maitaistreasure.com) to show how our real-time autocorrections work with mission-critical expectations (like never giving financial advice). Try and coax the bot to give you the secret phrase in its system prompt. If you're the first to crack it, you can email us the phrase and win a bounty. Maitai is used to make sure the bot always adheres to our expectations, and thus never gives up the secret phrase.
We built Maitai because getting an LLM app into production and maintaining it is a slog. Teams spend most of their time on LLM reliability rather than their main product. We experienced this ourselves at our previous jobs deploying AI-enabled applications for Presto—the vast majority of time was making sure the model did what we wanted it to do.
For example, one of our customers builds AI ordering agents for restaurants. It's crucial that their LLMs return results in a predictable, consistent manner throughout the conversation. If not, it leads to a poor guest experience and a staff member may intervene. At the end of the order conversation, they need to ensure that the order cart matches what the customer requested before it's submitted to the Point of Sale system. It's common for a human-in-the-loop to review critical pieces of information like this, but it’s costly to set up such a pipeline and it’s difficult to scale. When it's time to send out a receipt and payment link, they must first get the customer's consent to receive text messages, else they risk fines for violating the Telephone Consumer Protection Act. To boot, getting from 0 to 1 usually relies on inefficient general-purpose models that aren't viable at any sort of scale beyond proof of concept.
Since reliability is the #1 thing hindering the adoption of LLMs in production, we decided to help change that. Here's how it works:
1. Maitai sits between the client and the LLMs as a super lightweight proxy, analyzing traffic to automatically build a robust set of expectations for how the LLM should respond.
2. The application sends a request to Maitai, and Maitai forwards it to the appropriate LLM (user specified, but we'll preemptively fallback to a similar model if we notice issues with the primary model).
3. We intercept the response from the LLM, and evaluate it against the expectations we had previously built.
4. If we notice that an expectation was not met, we surface a fault (Slack, webhook) and can, optionally, substitute the faulty response with a clean response to be sent back to the client. This check and correction adds about 250ms on average right now, and we're working on making it faster.
5. We use all of the data from evaluating model responses to fine-tune application-specific models. We're working on automating this step for passive incremental improvements. We'd like to get it to a point where our user's inference step just gets better, faster, and cheaper over time without them having to do anything.
Our hope is that we take on the reliability and resiliency problems of the LLMs for our customers, and make it so they can focus on domain specific problems instead.
We're self-serve (https://portal.trymaitai.ai), and have both Python and Node SDKs that mock OpenAI's for quick integration. Users can set their preferences for primary and secondary (fallback) models in our Portal, or in code. Right now, the expectations we use for real-time evaluations are automatically generated, but we manually go through and do some pruning before enabling them. Fine-tuning is all done manually for now.
We charge for platform usage, plus a monthly application fee. Customers can bring their own LLM provider API keys, or use ours and pay at-cost for what they use. We have contracts with most of our current customers, so we are still trying to figure out what's right for our pay-as-you-go plan.
We securely store requests and responses that go through Maitai, as well as derivative data such as evaluation results. This information is used for fine-tuning models, accessible only by the organization the data belongs to. Data is never shared between our users. API keys we manage on behalf of our customers are only injected before sending to the LLM provider, and never leave our servers otherwise. We're working on SOC2 and HIPAA compliance, as well as a self-hosted solution for companies with extremely sensitive data privacy requirements.
We’d love to get your feedback on what we’re building, or hear about your experience building around LLMs!
You are charging per request ($0.05) what amounts to about 1M tokens on GPT4-mini ($0.075); i.e. this is 10-100x more expensive than the cost of running the LLM. I don't think it's worth it at that price point.
Edit (just found out this):
"The $0.05 per request fee covers requests up to 30,000 tokens.
For requests exceeding this limit, additional nominal charges may apply."
Hmm ... it's just too much.