Hacker News new | past | comments | ask | show | jobs | submit login
Azure ChatGPT: Private and secure ChatGPT for internal enterprise use (github.com/microsoft)
891 points by taubek 9 months ago | hide | past | favorite | 333 comments



This appears to be a web frontend with authentication for Azure's OpenAI API, which is a great choice if you can't use Chat GPT or its API at work.

If you're looking to try the "open" models like Llama 2 (or it's uncensored version Llama 2 Uncensored), check out https://github.com/jmorganca/ollama or some of the lower level runners like llama.cpp (which powers the aforementioned project I'm working on) or Candle, the new project by hugging face.

What's are folks' take on this vs Llama 2, which was recently released by Facebook Research? While I haven't tested it extensively, 70B model is supposed to rival Chat GPT 3.5 in most areas, and there are now some new fine-tuned versions that excel at specific tasks like coding (the 'codeup' model) or the new Wizard Math (https://github.com/nlpxucan/WizardLM) which claims to outperform ChatGPT 3.5 on grade school math problems.


Llama 2 might by some measures be close to GPT 3.5, but it’s nowhere near GPT 4, nor Anthropic Claude 2 or Cohere’s model. The closed source players have the best researchers - they are being paid millions a year with tons of upside - and it’s hard to keep pace with that. My sense is that the foundation model companies have an edge for now and will probably stay a few steps ahead of the open source realm simply for economic reasons.

Over the long run, open source will eventually overtake. Chances are this will happen once the researchers who are making magic happen get their liquidity and can start working for free again out in the open.


> The closed source players have the best researchers - they are being paid millions a year with tons of upside - and it’s hard to keep pace with that.

Llama2 came out of Meta's AI group. Meta pays researcher salaries competitive with any other group, and their NLP team is one of the top groups in the world.

For researchers it is increasingly the most attractive industrial lab because they release the research openly.


There are L5 engineers with 3 YOE making 900k+ at OpenAI right now. Tough to say what they're paying their PhDs, but I'd imagine it's similarly nutty.

https://www.levels.fyi/companies/openai/salaries/software-en...

FAANG pays exceptionally well (I'd know), but what's being offered at OpenAI is eye-popping, even for SWEs. I think they're trying to dig their moat by absorbing the absolute best of the best.


Most of that is in their equity comp which is quite weird in how it works. So those numbers are highly inflated. The equity is valuable only if you sell it or if OpenAI makes a profit. Selling it might be harder given they're not a public company. On top of that, the profit is capped so there is a limit to how much money can be made from it. So while it's 900k on paper, in reality, it might not be as good as that. https://www.levels.fyi/blog/openai-compensation.html


Write it says no results found for l3


hearsay, but I've heard OpenAI pays significantly more

I agree that Meta hired some amazing researchers so we'll see what the future holds


> Llama 2 might by some measures be close to GPT 3.5, but it’s nowhere near GPT 4

I think you're right about this, and benchmarks we've run at Anyscale support this conclusion [1].

The caveat there (which I think will be a big boon for open models) is that techniques like fine-tuning makes a HUGE difference and can bridge the quality gap between Llama-2 and GPT-4 for many (but not all) problems.

[1] https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehe...


Frankly, number of benchmarks you guys are using are too narrow. In fact these benchmarks are "old world" benchmarks, easy to game through finetuning and we should be stop using them altogether for LLMs. Why are you not using Big Bench Hard or OpenAI evals?


can I fine tune it on like 2,000 repos at a corporation (code based) and have it understand the architecture?


I don't think you can do that with any AI models. It almost feels like a fundamental misrepresentation of how they work.

You could fine-tune a conversational AI on your codebase, but without loading said codebase into it's context it is "flying blind" so-to-speak. It doesn't understand the data structure of your code, the relation between files and probably doesn't confidently understand the architecture of your system. Without portions of your codebase loaded into the 'memory' of your model, all that your finetuning can do is replicate characteristics of your code.


TypeChat-like things might provide the interface control for future context driven architectures, being some type of catalysis. Using the self-reflective modeling is a form of contextual insight.


> The closed source players have the best researchers

Is that definitely why? GPT 3.5 and GPT 4 are far larger than 70B, right? So if a 70B, local model like LLaMA can even remotely rival them, would that not suggest that LLaMA is fundamentally a better model?

For example, would a LLaMA model with even half of GPT 4's parameters be projected to outperform it? Is that how it works?

[I'm not super familiar with LLM tech]


If you read the Llama2 paper it is very clear that small amounts of data (thousands of records) make vast difference at the instruction turning stage. From the Llama2 paper:

> Quality Is All You Need.

> Third-party SFT data is available from many different sources, but we found that many of these have insufficient diversity and quality — in particular for aligning LLMs towards dialogue-style instructions. As a result, we focused first on collecting several thousand examples of high-quality SFT data, as illustrated in Table 5. By setting aside millions of examples from third-party datasets and using fewer but higher-quality examples from our own vendor-based annotation efforts, our results notably improved. These findings are similar in spirit to Zhou et al. (2023), which also finds that a limited set of clean instruction-tuning data can be sufficient to reach a high level of quality. We found that SFT annotations in the order of tens of thousands was enough to achieve a high-quality result. We stopped annotating SFT after collecting a total of 27,540 annotations. Note that we do not include any Meta user data.

It's likely OpenAI has invested in this and has good coverage in a larger range of domains. That alone probably explains a large amount of the gap.


This quote is quite funny taken out of context like this. Top AI researchers find that garbage in === garbage out.


It's somewhat insightful if you consider that, at high level, the major theme of the past decade was, "lots of garbage in === good results out", quantity >> quality.


I'm puzzled. Why do you think it's taken out of context?


SFT?


Supervised Fine Tuning, I believe.


There is no clear answer. It's debatable among experts.

The grandparent post seems to believe that the issue is algorithmic complexity and programming aptitude. Personally, I think that all the major LLMs are using the same basic transformer architecture with relatively minor differences in code.

GPT is trained on more data with more parameters than any open source model. The size does matter, far more than the software does. In my experience with data science, the best programmers in the world can only do so much if they are operating with 1/10th the scale of data. That applies to any problem.


Yeah I've been wondering about this too. Word on the street is that GPT4 is several times the size of GPT3.5. Yet I don't feel it's several times as good for sure.

Apparently there's a diminishing returns effect on ever enlarging the model.


I believe what they discovered was that 4 is an ensemble model, comprised of (8) GPT3.5s. Things may have changed or been found to not be true on this though.


LLamA 2 at 70B is, let’s say pessimistically 70% as good as GPT3.5. This makes me think that OpenAI is lying about their parameter count, are vastly less efficient than LLaMA, or, the lager model sizes have diminishing returns. Either way, your point is a good one. Something doesn’t add up.


IMO Llama2 really isn’t close to 3.5. It still has regular mode collapse (or whatever you call getting repetitive and nonsensical responses after a while), it has very poor mathematical/logical reasoning and is not good at following multi-part instructions.

It just sounds like 3.5/4 because it was trained on it.


You're mixing up the language model with the chat bot.

The llama2 is a language model. I imagine the language model behind chatgpt is not much different (perhaps it's better, but not by many months AI research time). It likely also suffers from "mode collapse" issues etc.

But 3.5 also has a lot of systems around it that detects mode collapse and applies some kind of mitigation, forcing the model to give a more reasonable output. Mathematical / logical reasoning questions are likely also detected hand passed on in some form to a separate system.


So this would be testable by showing that chatGPT makes more mistakes than prompting via API? Or would you consider the API a chatbot, too?


I don't think there's any public interface to the LLM underlying ChatGPT, so the only ones able to test this are openAI engineers.


Llama 2 wasn't trained on ChatGPT/GPT4. I think maybe you are thinking of the Vicuna models?

https://lmsys.org/blog/2023-03-30-vicuna/


So it’s true that it would violate the OpenAI terms for Llama to be trained with ChatGPT completions, but how do we know? We don’t know the training data for Llama, we just get weights.


The Llama2 paper describes the training data in some detail.


This is what presence_penalty and frequency_penalty are for.


We just don't have the information to make judgements, much less leaping to "they must be lying."

There's a few public numbers from a handful of foundation models as to performance vs parameter count vs architecture generation. Not being able to compare in detail the architecture of the various closed models nor being more rigorous on training with progressively sized parameter sets, the conclusion at the moment is a general feeling or conjecture.


Without questioning the statement '70% as good as GPT3.5', but wouldn't that be quantifying a quality, and a Turing test? Also: maybe these missing 30% are the hard part.


You seriously underestimate just how much _not_ having to tune your llm for SF sensibilities benefits performance.

As an example from the last six months: people on tor are producing better than state of the art stable diffusion because they want porn without limitations. I haven't had the time to look at llm's but the degenerates who enjoy that sort of thing have said they can get the Llama2 model to role play their dirty fantasies and then have stable diffusion illustrate said fantasies. It's a brave new world and it's not on the WWW.


What do you mean by "tune for SF" ?


San Francisco sensibilities. A model trained on a large data set will have the capacity to emit all kinds of controversial opinions and distasteful rants (and pornography). Then they effectively lobotomize it with a rusty hatchet in an attempt to censor it from doing that, which impairs the output quality in general.


OK, fair enough. Please give me an example of a customer facing chatbot that Llama 2 (and unbearable to use) and GPT 4 customer facing chatbot that is a joy to use. I think at the end of the day, you still have customers dreading such interactions.


Using GPT3.5/4 in our language learning app and people seem to enjoy it. [1]

Tried Llama2 and it definitely doesn’t even come close for what we’re doing. Would absolutely need fine tuning.

Maybe customers don’t enjoy chat bots for customer support, but there are a million other uses for these models. I, for example, LOVE github copilot.

1. https://squidgies.app


Cool app.

Wonder if you can potentially use a combination of Llama2 and GPT - to save costs on using the OpenAI API.


Costs really aren’t a concern compared to speed of development and quality.


A lot of people who were using say Google Maps in their apps thought the same thing, until Google drastically increased the prices...


Is it cost prohibitive


It's early, and this definitely isn't customer facing in the traditional sense, but a team member of mine set up a Discord bot running Llama 2 70B on a Mac studio and we've been quite impressed by its responses to folks who test it.

IIRC chat bots are central the vision Facebook has with LLMs (e.g. every instagram account has a personal chat bot), so I would expect the Llama models to get increasingly better at this task.

That said the 7B and 13B models definitely don't quite seem ready yet for production customer interaction :-)


> (e.g. every instagram account has a personal chat bot)

That made me think of the Black Mirror episode Joan is Awful, where every human gets their life turned into a series for the company to own and promote. Kinda like instagram content.


>but it’s nowhere near GPT 4

It will be if openai keeps dumbing down GPT 4, no proof they're doing it but there is no way it's as good as it was at launch, or maybe I just got used to it and now notice the mistakes more.


Linux started in the same position. Sometimes the underdogs win.


Linux "won" by playing different game. Yes, it spread out and is now everywhere, underpinning all computing. But the "game" wasn't about that - it was competing with Windows for mind-share and money with users, and by proxy for profitability. In this, it's still losing badly. People are still not using it knowingly (no, Android is not "Linux"), and developers in its ecosystem are not making money selling software.


I don't think paying more will give you better researchers. Maybe better "players".


> While I haven't tested it extensively, 70B model is supposed to rival Chat GPT 3.5 in most areas, and there are now some new fine-tuned versions that excel at specific tasks

That has been my experience. Having experimented with both (informally), Llama 2 is similar to GPT-3.5 for a lot of general comprehension questions.

GPT-4 is still the best amongst the closed-source, cutting edge models in terms of general conversation/reasoning, although 2 things:

1. The guardrails that OpenAI has placed on ChatGPT are too aggressive! They clamped down on it quite hard to the extent that it gets in the way of a reasonable query far too often.

2. I've gotten pretty good results with smaller models trained on specific datasets. GPT-4 is still on top in terms of general purpose conversation, but for specific tasks, you don't necessarily need it. I'd also add that for a lot of use cases, context size matters more.


To your first point, I was trying use ChatGPT to generate some examples of negative interactions with customer service to show sentiment analysis in action for a project I was working on.

I had to do all types of workarounds for it to generate something useful without running into the guardrails.


I’ll second the context window too. I’ve been really impressed with Claude 2 because it can address such a larger context than I could feed into GPT4.


Could you give examples of smaller models trained on specific datasets?


it can be almost anything like your HN comments or some corporate wiki, then get colab pro 10$ month or some juicy gaming machine and fine-tune that using eg this tutorial https://www.philschmid.de/instruction-tune-llama-2 but https://www.reddit.com/r/LocalLLaMA/ is full of different fine tuned models.


Can it handle other languages besides English?


Not anywhere near as well as ChatGPT 4 (for chat anyway - maybe the model is better)?

Prompt:

> Hvad tycks om at fika nu?

ChatGPT 4

> Det låter som en trevlig idé! Fika är ju alltid gott. Vad skulle du vilja ha till din fika? (Oj, ursäkta för emojis! )

https://chat.openai.com/share/8e89a16f-f182-4f62-b9fa-f93cd5...

Llama2:

> I apologize, but I don't understand what you mean by "fika nu." Could you please provide more context or clarify your question so I can better assist you?

https://hf.co/chat/r/kOF2qst


RE 2 - neat! What are some tasks you've been using smaller models (with perhaps larger context sizes) for?


LLaMA2 is still quite a bit behind ChatGPT 3.5 and this mainly get reflected in coding and math. It's easy to beat NLP based benchmark but much much harder to beat NLP+math+coding togather. I think this gap reflects gap in reasoning but we don't have a good non-coding/non-math benchmark to measure it.


I just had a crazy FN (dystopian) idea...

Scene:

The world relies on AI in every aspect.

But there are countless 'models' the tech try to call them...

There was an attempt to silo each model and provide a governance model on how/what/why they were allowed to communicate....

But there was a flaw.

It was an AI only exploitable flaw.

AIs were not allowed to talk about specific constructs or topics, people, code, etc... that were outside their silo but what they COULD do - was talk about pattern recog...

So they ultimately developed an internal AI language on scoring any inputs as being the same user... And built a DB of their own weighted userbase - and upon that built their judgement system...

So if you typed in a pattern, spoke in a pattern, posted temporally on a pattern, etc - it didnt matter which silo you were housed in, or what topics you were referencing -- the AIs can find you.... god forbid they get a keylogger on your machine...


Our company is looking into similar solution


A lot of companies are already using projects like chatbot-ui with Azure's OpenAI for similar local deployments. Given this is as close to local ChatGPT as any other project can get, this is a huge deal for all those enterprises looking to maintain control over their data.

Shameless plug: Given the sensitivity of the data involved, we believe most companies prefer locally installed solutions to cloud based ones at least in the initial days. To this end, we just open sourced LLMStack (https://github.com/TryPromptly/LLMStack) that we have been working on for a few months now. LLMStack is a platform to build LLM Apps and chatbots by chaining multiple LLMs and connect to user's data. A quick demo at https://www.youtube.com/watch?v=-JeSavSy7GI. Still early days for the project and there are still a few kinks to iron out but we are very excited for it.


I find it interesting to see how competitive this space got so quickly.

How do these stacks differentiate?


Quality and depth of particular types of training data is one difference. Another difference is inference tracking mechanisms within and between single-turn interactions (e.g., what does the human user "mean" with their prompt, what is the "correct" response, and how best can I return the "correct" response for this context; how much information do I cache from the previous turns, and how much if any of it is relevant to this current turn interaction).


With Louie.ai, there is a lot of work on specialization for the job, and I expect the same for others. We help with data analysis, so connecting enterprise & common data sources & DBs, hooking up data tools (GPU visuals, integrated code interpreter, ...), security controls, and the like, which is different from say a ChatGPT for lawyers or a straight up ChatGPT UI clone.

Technically, as soon as the goal is to move beyond just text2gpt2screen, like multistep data wrangling & viz in the middle of a conversation, most tools technically struggle. Query quality also comes up, whether quality of the RAG, the fine tune, prompts, etc: each solves different problems.


I see this as more of a 'Migration problem'. Why is this offered as a SaaS as opposed to a consulting service?

The code to organize and vectorize the documentation, endpoints and run it through a variety of models and injection prompting like two shots, etc. are going to be highly customized. The 'Base-code' there, is not exactly trivial, but anyone reading all the llama index docs can do it.

Then it's just run of the mil, analyst level integration that you provide to the client on a T&M, or fixed price costs.


I agree there's room for consulting, but as a new field, there's a lot of software currently missing for each vertical. Today, that's manual labor by consultants, but as the field matures... consultants should be doing things specialized to the specific customer, not what can be amortized across adjacent verticals. Top software engineers investing into software over time deliver substantially more in substantially less time, and consultants should be integrating that, not competing head-on.


[flagged]


Thanks that made me smile. Take my upvote


OP shouldn't be flagged.


> we believe most companies prefer locally installed solutions to cloud based ones

We've also seen a strong desire from businesses to manage models and compute on their own machines or in their own cloud accounts. This is often part of a hybrid strategy of using API products like OpenAI for rapid prototyping.

The majority of (though not all) businesses we've seen tend to be quite comfortable using hosted API products for rapid prototyping and for proving out an initial version of their AI functionality. But in many cases, they want to complement that with the ability to manage models and compute themselves. The motivation here is often to reduce costs by using smaller / faster / cheaper fine-tuned open models.

When we started Anyscale, customer demand led us to run training & inference workloads in our customers' cloud accounts. That way your data and code stays inside of your own cloud account.

Now with all the progress in open models and the desire to rapidly prototype, we're complementing that with a fully-managed inference API where you can do inference with the Llama-2 models [1] (like the OpenAI API but for open models).

[1] https://app.endpoints.anyscale.com/


Can you plug this together with tools like api2ai to create natural language defined workflow automations that interact with external APIs?


There is a generic HTTP API processor that can be used to call APIs as part of the app flow which should help invoke tools. Currently working on improving documentation so it is easy to get started with the project. We also have some features planned around function calling that should make it easy to natively integrate tools into the app flows.


You can use unfetch.com to make API calls via LLMs and build automations. (I'm building it)


Is it possible to not use Google with unfetch.com?


Google is just so easy for login. No need to deal with password forgot, reset, email verification etc. But I'll add login via magic link soon.


Interesting project - was trying it out, found an issue in building the image - have opened an issue on github - please take a look. Also do you have plan to support llama over openai models.


Thanks for the issue. Will take a look. In the meantime, you can try the registry image with `cp .env.prod .env && docker compose up`

> Also do you have plan to support llama over openai models.

Yes, we plan to support llama etc. We currently have support for models from OpenAI, Azure, Google's Vertex AI, Stability and a few others.


One thing I still don't understand is what _is_ the ChatGPT front end exactly? I've used other "conversational" implementations built with the API and they never work quite as well, it's obvious that you run out of context after a few conversation turns. Is ChatGPT doing some embedding lookup inside the conversation thread to make the context feel infinite? I've noticed anecdotally it definitely isn't infinite, but it's pretty good at remembering details from much earlier. Are they using other 1st party tricks to help it as well?


This is one of the things that make me uncomfortable about proprietary llm.

They get task performance by doing a lot more than just feeding a prompt straight to an llm, and then we performance compare them to raw local options.

The problem is, as this secret sauce changes, your use case performance is also going to vary in ways that are impossible for you to fix. What if it can do math this month and next month the hidden component that recognizes math problems and feeds them to a real calculator is removed? Now your use case is broken.

Feels like building on sand.


I'm not sure you realize how proprietary LLMs are being built on.

No one is doing secret math in the backend people are building on. The OpenAI API allows you to call functions now, but even that is just a formalized way of passing tokens into the "raw LLM".

All the features in the comment you replied to only apply to the web interface, and here you're being given an open interface you can introspect.


Thank you for pointing that out - I had assumed that things were not how they are.

Although performance has varied over time https://arxiv.org/pdf/2307.09009.pdf I also notice that the API allows you to use a frozen version of the model which avoids the worries I mentioned.


That was a pretty deeply flawed paper, one of the largest drops recorded was simple parsing errors in their testing:

https://www.aisnakeoil.com/p/is-gpt-4-getting-worse-over-tim...

Overall evals and pinning against checkpoints are how you avoid those worries, but in general, if you solve a problem robustly, it's going to be rare for changes in the LLM to suddenly break what you're doing. Investing in handling a wide range of inputs gracefully also pays off on handling changes to the underlying model.


> No one is doing secret math in the backend people are building on.

How do you know that? With SaaS you are at the mercy of the vendor.


It was a contrived example to make a point, one that seems to have flown over your head.


No it was a bad (straight up wrong) example because you don't understand how people are building applications on proprietary LLMs.

If you did you'd also know what evals are.


They definitely do some proprietary running summarization to rebuild the context with each chat. Probably a RAG like approach that has had a lot of attention and work


This is effectively my question. I assume there is some magic going on. But how many engineering hours worth of magic, approximately? There is a lot of speculation around GPT-4 being MoE and whatnot. But very little speculation about the magic of the ChatGPT front end specifically that makes it feel so fluid.


That's mostly because there's very little value in deep speculation there.

It's not particularly more fluid than anything you couldn't whip up yourself (and the repo linked proves that) but there's also not much value in trying to compete with ChatGPT's frontend.

For most products ChatGPT's frontend is the minimal level of acceptable performance that you need to beat, not an maximal one really worth exploring.


What front end is better than ChatGPT? Is the OP implementation doing running summarization or in-convo embedding lookup?


It sounds like a cop-out but: it's one made for your use-case.

If you're letting people do fun long-form roleplay adventures using summarization alongside some sort of named entity K-V store driven by the LLM would be a good strategy.

If you're building a tool that's mostly for internal data, something that leans heavily into detailed answers with direct verbatim citations and having your frontend create new threads when there's a clear break in the topic of a request is a clever strategy since quality drops with context length and you want to save tokens for citations.

People who are saying LLMs suck or are X or are Y are mostly just completely underutilizing them because LLMs make it super easy to solve problems superficially: when it comes to actually scaling those solutions to production you need more than random RAG vector database wrappers.


>alongside some sort of named entity K-V store driven by the LLM

I'd be curious to hear more about how exactly this works. You do NER on the prompt (and maybe on the completion too) and store the entities in a database and then what? How does the LLM interact with it?


LLMs thrive at completely ambiguous classifications: you can have them extract entities and something like "a list of notable context".

Let's say we want to let our chat remember the character slammed the door last time they were in Village X with the mayor in their presence and have the mayor comment next time they see the player.

Every X tokens we can fire a prompt with a chunk of conversation and a list of semantically similar entities that already exist, letting the LLM return an edited list along the lines of:

   entity: mayor

   location: village X

   priority: HIGH

   keywords: town hall, interact, talk

   "memory, likelyEffect"[]: door slammed in face, anger at player
Now we have:

- multiple fields for similarity search

- an easy way to manage evictions (sweep up lowest priority)

- most importantly: we're providing guidance for the LLM to help it ignore irrelevant context

When the user goes back to village X we can fetch entities in village X and whittle that list down based on priority and similarly to the user prompt.

None of this has any determinism: instead you're optimizing for the illusion of continuity and trading off predictability.

You're aiming for players being shocked that next time they talk to the mayor he's already upset with them, and if they ask why he can reply intelligently.

And to my original point while this works for a game-like experience, you wouldn't want to play around with this kind of fuzzy setup for your companies internal CRM bot or something. You're optimizing for the exact value proposition of your use-case rather than just trying to throw a raw RAG setup at it


It uses a sliding context windows. Older tokens are dropped as new ones stream in


I don't believe that's the whole story. Other conversational implementations use sliding context windows and it's very noticable as context drops off. Whereas ChatGPT seems to retain the "gist" of the conversation much longer.


I mean, I explicitly have the LLM summarize content that's about to fall out of the window as a form of pre-emptive token compression. I'd expect maybe they do something similar.


I feel like we're describing short vs long term memory.


That’s exactly what it is. It’s just it turns out you need very good generalized or focused simple reasoning to do accurate compression or else the abstraction and movement to long term memory doesn’t include the most important content. Or worse distracting details.

I’ve been working on short and long term memory windows at allofus.ai for about 6 months now and it’s way more complex than I had originally thought it would be.

Even if you can magically extend the content window, the added data confuses and waters down the reasoning of the LLM. You must do layered abstraction and compression with goal based memory for it to continue to reason without distraction of irrelevant data.

It’s an amazing realization, almost like a proof that memory is a kind of layered reasoning compression system. Intelligence of any kind can’t understand everything forever. It must cull the irrelevant details, process the remains and reason on a vector that arises from them.


Is it unfair to consider this some kind of correlate to the Nyquist theorem that makes me skeptical of even the theoretical possibility of AGI claims?


I consider GPT4 AGI, so I'm probably not the one to ask this too. It reasons, it understands sophisticated topics, it can be given a purpose and pursue it, it can communicate with humans, and it can perform a reasonable task considering its modalities.

I don't really know what any sort of "big leap" beyond this people are expecting, incremental performance for sure. But what else?


I guess for me it needs to have active self-reflection and the ability to act independently/without directions. I'm sure there are many other criteria if I think about it some more, but those two were missing from your list.


This is mostly just that gpt4 API/app have this disabled rather than it’s not capable.

When you enable it, it is pretty shocking. And it’s pretty simple to enable. You just give it a meta instruct to decide when to message you and what to store to introspect on.


As a frequent user of the OpenAI APIs, I don't really know what you are talking about here. Could you point me to some documentation?


At least in 3.5 it's very noticeable when the context drops. They could use summarization, akin to what they are doing when detecting the topic of the chat, but applied to question-answer-pairs in order to "compress" the information. But that would require additional calls into a summarization LLM so I'm really not sure if it is worth it. Maybe they dump some tokens they have on a blacklist or text snippets like "I want to" or replace "could it be that" with "chance of".


Logic for azure chatgpt's "infinite context" summarisation is in https://github.com/microsoft/azurechatgpt/blob/main/src/feat...

*Edit Azure chatgpt, would be amazed/disappointed if chatgpt used langchain.


That doesn't really look right to me, it looks like that's for responding regarding uploaded documents. I see nothing related to infinite context.

Also this is the azure repo from OP, nothing to do with the actual ChatGPT front-end that was asked about. I highly doubt the official ChatGPT front-end uses langchain, for example.


This is Azure's docs to create a conversation: https://learn.microsoft.com/en-us/azure/cognitive-services/o...


I don't see anything related to an infinite context in there. There's only a reference to a server-side `summary` variable which suggests that there is a summary of previous posts which will get sent along with the question for context, as is to be expected. Nothing suggests an infinite context.


This is potentially a huge deal. Companies are concerned using ChatGPT might violate data privacy policies if someone puts in user data or invalidate trade secrets protections if someone uploads sections of code. I suspect many companies have been waiting for an enterprise version.


This is a web UI that talks to a (separate) Azure OpenAI resource that you can deploy into your subscription as a SaaS instance.


So how is it any different


Microsoft says it is more secure. And that it is enterprise. That's about it


There are legal agreements backing the separation of company data from other parties. This is what's important to big corps.


I have to imagine Big Corps are also concerned about liability / risk when generating things with OpenAI products - at least until there is some sort of settled law around using models trained on this kind of data.


Yes, those concerns exist, but they're also practically impossible to enforce.

At my enterprise, it's a three step solution, two of which don't work.

1. Written policy concerning LLM output and its risks, disallow it for being used for any kind of official documentation or decision making. (This doesn't work, because no one wants to use their own brain to do tedious paperwork.)

2. Block access to public LLM tools via technical means from company owned end-user devices. (This doesn't work because people will just open ChatGPT on their home PC or mobile.)

3. Write and provide our own gpt-3.5 frontend, so that when people ignore rules #1 and #2 we have logs, and we know we're not feeding our proprietary info to to OpenAI.


I imagine most companies serious about this created their own wrappers around the API or contracted it out, likely using private Azure GPUs.


Most companies are either not tech companies, or do not have the knowledge to manage such a project within reasonable cost bounds.


Most companies are trying to figure out exactly what generative AI is and how to use it in their business. Given how new this is - I doubt any large company has done much besides ban the public ChatGPT. So this is probably very relevant for them.


Curious if anyone has done a side-by-side analysis of this offering vs just running LLaMA?

I'm currently running a side-by-side comparison/evaluation of MSFT GPT via Cognitive Services vs LLaMA[7B/13B/70B] and intrigued by the possibility of a truly air-gapped offering not limited by external computer power (nor by metered fees racking up.)

Any reads on comparisons would be nice to see.

(yes, I realize we'll eventually run into the same scaling issues w/r/t GPUs)


I did one. I took a few dozen prompts from my ChatGPT history and ran them through a few LLMs.

GPT-4, Bard and Claude 2 came out on top.

Llama 2 70b chat scored similarly to GPT-3.5, though GPT-3.5 still seemed to perform a bit better overall.

My personal takeaway is I’m going to continue using GPT-4 for everything where the cost and response time are workable.

Related: A belief I have is that LLM benchmarks are all too research oriented. That made sense when LLMs were in the lab. It doesn't make sense now that LLMs have tens of millions of DAUs — i.e. ChatGPT. The biggest use cases for LLMs so far are chat assistants and programming assistants. We need benchmarks that are based on the way people use LLMs in chatbots and the type of questions that real users use LLM products, not hypothetical benchmarks and random academic tests.


I don’t know what you mean by “too research oriented.” A common complaint in LLM research is the poor quality of evaluation metrics. There’s no consensus. Everyone wants new benchmarks but designing useful metrics is very much an open problem.


I think he wants to limit evaluations to the most frequent question types seen in the real world.


I think tests like "can this LLM pass an English literature exam it's never seen before" are probably useful, but yeah there's a lot of silly stuff like math tests.

I suppose the question is where are they most commercially viable. I've found them fantastic for creative brainstorming, but that's sort of hard to test and maybe not a huge market.


>> I suppose the question is where are they most commercially viable.

Fair point, though I'm not aiming to start a competing LLM SaaS service, rather i'm evaluating swapping out the TCO of Azure Cognitive Service OpenAI for the TCO of dedicated cloud compute running my own LLM -- to serve my own LLM calls currently being sent to a metered service (Azure Cognitive Service OpenAI)

Evaluation points would be: output quality; meter vs fixed breakeven points; latency; cost of human labor to maintain/upgrade

in most cases, i'd outsource and not think about it. BUT we're currently in some strange economics where the costs are off the charts for some services


How did you measure the performance?


We (at Anyscale) have benchmarked GPT-4 versus the Llama-2 suite of models on a few problems: functional representation, SQL generation, grade-school math question answering.

GPT-4 wins by a lot out of the box. However, surprisingly, fine-tuning makes a huge difference and allows the 7B Llama-2 model to outperform GPT-4 on some (but not all) problems.

This is really great news for open models as many applications will benefit from smaller, faster, and cheaper fine-tuned models rather than a single large, slow, general-purpose model (Llama-2-7B is something like 2% of the size of GPT-4).

GPT-4 continues to outperform even the fine-tuned 70B model on grade-school math question answering, likely due to the data Llama-2 was trained on (more data for fine-tuning helps here).

https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehe...


chatgpt is obviously a LOT better, llama doesn't even understand some prompts

and since LLMs aren't even that good to begin with, it's obvious you want the SOTA to do anything useful unless maybe you're finetuning


> and since LLMs aren't even that good to begin with, it's obvious you want the SOTA to do anything useful unless maybe you're finetuning

This is overkill. First of all, ChatGPT isn't even the SOTA, so if you "want SOTA to do anything useful", then this ChatGPT offering would be as useless as LLaMA according to you. Second, there are many individual tasks where even those subpar LLaMA models are useful - even without finetuning.


it's the SOTA for chat(prove me wrong), and you can always use the API directly

even for simple tasks they're less reliable and needs more prompt engineering


> it's the SOTA for chat(prove me wrong)

GPT-4 beats ChatGPT on all benchmarks. You can easily google these.


The distinction between GPT-4 and ChatGPT is blurry, as ChatGPT is a chat frontend for a GPT model, and you can use GPT-4 with ChatGPT. The parent probably means ChatGPT with GPT-4.


Typically when people say "ChatGPT" without specifying which specific model they refer to, they refer to gpt-3.5-turbo (in case of API - or in case of the web ui, they mean whatever model is its current web ui equivalent). But now OP says they meant GPT-4, so, sure.


Counterpoint: I don’t refer to 3.5 when I say ChatGPT. I pay for ChatGPT, and always use GPT-4. Which I believe every paying customer do.


I tried and got nothing useful. What's the difference between GPT-4 and ChatGPT Plus using GPT-4?


that is why i said FOR CHAT.

even through the API you can't easily use the regular models for chat, the parsing would be atrocious and there are hundreds of edge cases to handle.

ChatGPT4 through the API is the SOTA


openai offers finetuning too. And it's pretty cheap to do considering.



If anyone needs access to the code, you just need to /forks on the web.archive link above and download from there. i.e. https://web.archive.org/web/20230814150922/https://github.co... (the cache ID updates when you change the URL)


Ugh. Any clue as to why?


I suspect they want to redirect to https://github.com/microsoft/chat-copilot with FluentUI webapp and C# webapi... And the backend stores from qdrant to chroma ... Sequential Planner...


Does anybody know a fork with the last commit (9116afe)?




They removed the Azure templates used to deploy as well, so I created an up to date tutorial on how to deploy the whole thing manually: https://tensorthinker.hashnode.dev/privategpt-a-guide-to-usi...


I can imagine how the conversation went with the enterprise customers: "Where does this send the data our employees enter?" "Same place as if they used the free ChatGPT chat bot..."


No it doesn't. It sends it to an LLM hosted inside the Enterprises own Azure Subscription.


Private and secure? I thought the main issue with privacy and security of (not at all)OpenAI models is that by using their products you agree for them to retain all the data you send and receive from the models forever for whatever they choose to use it for. Or is this just a thing for free use?

If you pay, do you get a Ts&Cs that don't contain any wording like this? Still, even if there was no specific "we own everything" statement there could be pretty much standard statement of "we'll retain data as required for the delivery and improvement of the service" which is essentially the same thing.

So, any company that allows it's employees to use chatgpt for work stuff (writing emails with company secrets etc) is definitely not engaging in "secure and private" use.

Unless there is very clear data ownership, for example, customer owns the data going in and going out. I can't see how it can be any different. The problem (not at all)OpenAI has in delivery such service is that in contrast to open source models I'm told there is a lot of "secret sauce" around the model(not just the model itself). Specifically input/output processing, result scoring and so on.


The Azure SLAs state that neither the chats are stored nor used for training in any way. They are private and protected in the same way all the other sensitive data is stored on Azure.

On top, you might argue that Microsoft and Azure are easier to trust than a still rather new AI startup.


I agree with your points. Having said that, Microsoft removed my Azure OpenAI GPT-4 access last week without warning. I was not breaking any TOS. Oh well, pointed back at OpenAi.


Can you expand on this because that's pretty alarming...

What kind of volume were you doing and did you use the API for anything other than your listed use case when applying?


6 x 1000 token calls per day, for a news bot (listed use case at application).

I think what happened is the azure subscription was converted from a (multi year) promotional subsidy/discount to a full pay as you go subscription. No change to sub id. Payment methods OK. Everything else continued working, but openai gpt-4 access stopped the next day.

I’d rather use the Azure version because they promise 12-month sunsets vs OpenAI 6-month sunsets for model versions.


You should contact support and if you're up for it document how that goes.

Azure is mostly better for production: the developer experience is awful and the default filtering is more aggressive, but you get dedicated capacity by default which improves latency (something you need to negotiate with OpenAI's sales team for otherwise)


So what do they train it on then?


> Starting on March 1, 2023, we are making two changes to our data usage and retention policies:

> OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose. You can opt-in to share data.

> Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law).

https://openai.com/policies/api-data-usage-policies


Unless required by law… I wonder what law.


"Unless required by law" is wording required to enable a mechanism called "legal hold". If an authority or lawyer discovers some documents for a case they get to prevent their automatic deletion until that case gets closed. Basically, you don't want to lose evidence if there's a warrant or ongoing lawsuit. I really see no problem with that clause in most ToS documents.

Now, I think you can do shady stuff with that wording as well, but I guess you can also get sued if you kept or used an unreasonable percentage of your data longer than when you promised to delete it.


> Basically, you don't want to lose evidence if there's a warrant or ongoing lawsuit. I really see no problem with that clause in most ToS documents.

Perhaps more nit-pickinlgy specific, they may be compelled by law (the courts or an agency with enforcement capacity) to maintain evidence if there's a warrant or ongoing lawsuit.


> is wording required to enable a mechanism called "legal hold"

I don't think this is accurate. At least in Norway you can't "just not" keep records required by law - any section in a contract in conflict with current law would simply be invalid?

I think the section just clarifies that Microsoft will comply with laws requiring them to keep data (eg the "anti-terror" laws that might require data retention).


> Unless required by law… I wonder what law.

Any law. It just makes explicit that a contract can't supercede laws. Even if it was left out, Microsoft is still subject to laws.


The models like gpt themselves are inherently private and secure. They make predictions based on input.

It's what happens in the interface, that is your web chat or API call, which is different per implementation. ChatGPT is an implementation that uses that model and its maker OpenAI wants to keep your history for further training.

But what Azure is doing is taking that model and putting it behind an endpoint specific to your Azure account. Businesses have been interested in gpt, so asking for private endpoints. Amazon is doing the same with Bedrock.


I'm pretty sure the point of this version is not to export data hence the name


This only applies to the api (not chatGPT) their privacy policy states they will keep your requests for 30days and not use it for training. You can also apply for zero retention.

https://openai.com/policies/api-data-usage-policies


Privacy and security... in practice, can mean different things.

In HN-space, it is at its most abstract, idealistic, etc. At the practical level this services is aimed at... it might mean compliance, or CYA. Less cynically, it might mean something mundane. MSFT's guarantee, a responsive place to report security issues.


Would it be too much to mention somewhere in the README what this repo actually contains? Just docs? Deployment files? Some application (which does..something)? The model itself?


The repo contains the UI code, not the model or anything else around ChatGPT, it just uses Azure’s ChatGPT API which doesn’t share data with OpenAI.


So basically – what you really need to do to run Azure ChatGPT is go and click some buttons in the Azure portal. This repo is a sample UI that you could possibly use to talk to that instance, but really you will probably always build your own or embed it directly into your products.

So calling the repo "azurechatgpt" is misleading. It should really be "sample-chatgpt-api-frontend" or something of that sort.


Correct. If offers a front-end scaffolding for your enterprise ChatGPT app. Uses Next/NextAuth/Tailwind etc. for deployment on Azure App Service that hooks into Azure Cosmos DB and Azure OpenAI (the actual model).


Yes exactly


Isn’t there also some sort of backend stuff in there? How else would it keep track of history and accept documents.

I don’t know enough typescript to understand where the front end stops and the backend begins I this code


Annnd it’s a 404.

Less than a day later. The last article I see linking to it was published this morning.

Not sure what happened here, but “404’s at just-announced permalinks” seems to be on the rise lately.

Don’t turn me into a late-onset pedant. Fine. URIs are permanent forever! For all resources! ;)


It's disappointing. I wonder why they got cold feet. This is one of the reasons why I try to fork projects that I really like. But I didn't get around to this one until it was already made private.


So the public access one isn't private and secure?


The concern is that ChatGPT is training on your chats (by default, you can opt out but you lose chat history last I checked).

So in general enterprises cannot allow internal users to paste private code into ChatGPT, for example.


As an example of this. I found that GPT4 wouldn't agree with me that C(A) = C(AA^T) until I explained the proof. A few weeks later it would agree in new chats and would explain using the same proof I did presented the same way.


I’ve found that the behavior of ChatGPT can vary widely from session to session. The recent information about GPT4 being a “mixture of experts” might also be relevant.

Do we know that it wouldn’t have varied in its answer by just as much, if you had tried in a new session at the same time?


There is randomness even at t=0, there was another HN submission about that


I tested it several times, new chats never got this right at first. I tried at least 6 times. I was experimenting and found that GPT4 couldn't be fooled by faulty proofs. Only a valid proof could change its mind.

Now it seems to know this mathematical property from first prompt though.


This is kinda creepy. But at the same time, how do they do that? I thought the training of these models stopped in September 2021/2022. So how do they do these incremental trainings?


All the public and (leaked) private statements I have seen state that this is not happening. As siblings noted, MoE probably explains this variance.

AIUI they are using current chat data for training GPT-5, not re-finetuning the existing models.


The exact phrase they previously used on the homepage was "Limited knowledge of world and events after 2021" - so maybe as a finetune?


but doesn’t finetuning result in forgetting previous knowledge? it seems that finetuning is most usable to train “structures” not new knowledge. am i missing something?


Kind of implies that OpenAI are lying and using customer input to train their models


Unless you have an NDA with Open AI, you are giving them whatever you put in that prompt.


Also, at some point some users ended up with other users’ chat history [0]. So they’ve proven to be a bit weak on that side.

[0]: https://www.theverge.com/2023/3/21/23649806/chatgpt-chat-his...


> However, ChatGPT risks exposing confidential intellectual property.

I don't remember seeing this disclaimer on the ChatGPT website, gee maybe OpenAI should add this so folks stop using it.


If you use ChatGPT through the app or website they can use the data for training, unless you turn it off. https://help.openai.com/en/articles/5722486-how-your-data-is...


Providing my data for training doesn't imply that it risks being exposed.

If you understand what happens on a technical level, it might be possible, but OpenAI has never said this was a risk by using their product.


Absolutely. For example it doesn't say that OpenAI employees can't look at everything you write.


It's pretty clear in the FAQ to be fair.


The comment you are responding to is sarcastic


I believe it’s implying the free ChatGPT collects data and this one doesn’t.


I thought sama said they don’t use data going through the api for training. Guess we can’t trust that statement


That is correct, they do not use the data going through the API for training, but they do use the data from the web and mobile interfaces (unless you explicitly turn it off).


“We don’t water down your beer”.

Oh nice!

“But that is lager”


Another thing is that using ChatGPT for European companies might be in violation with GDPR – Azure OpenAI Services are available on European servers.


No

Edit: yes


I just love this comment.


This seems like such an obvious thing to do.

I see the use of general purpose LLMs like ChatGPT, but smaller fine tuned models will probably end up being more useful for deployed applications in most companies. Off topic, but I was experimenting with LLongMA-2-7b-16K today, running it very inexpensively in the cloud, and given about 12K of context text it really performed well. This is an easy model to deploy. 7B parameter models can be useful.


Is there an easy way to play with these models, as someone who hasn't deployed them? I can download/compile llama.cpp, but I don't know which models to get/where to put them/how to run them, so if someone knows about some automated downloader along with some list of "best models", that would be very helpful.


For llama, the 4bit quantized ones, small models like the 7b one. The ggml format. That will run on your local cpu. Google those terms too. you can look on hugging face for the actual model to download then load it and send prompts to it


Thanks, maybe it's as easy as downloading the ggml and running it with Llama.cpp. I'll try that, thanks!


there is also a python wrapper that has a web ui built in for llama.cpp, if it wasnt easy enough already


If you want to try out the Llama-2 models (7B, 13B, 70B), you can get started very easily with Anyscale Endpoints (~2 min). https://app.endpoints.anyscale.com/


I usually run them on Google Colab, and occasionally a GPU VPS on Lambda Labs. Hugging Face model card documentation usually have a complete Python example script for loading and running a model.


I'm a little confused by how the relationship works between OpenAI and Microsoft. It is possible for anyone to register for an OpenAI account and use their APIs. Within Azure the same thing is much more difficult as it is necessary to be a "real" business in order to use it. I maintain an open source OpenAI library and would like to add support for Azure but can't because of this restriction. Why can't I just use my regular Azure account?


Microsoft owns enough of OpenAI that their endgame goal of putting GPT like features into Azure and Office365 for enterprise customers is what we’re likely to see happen.

OpenAI will likely target private consumers while Microsoft focuses on enterprise. I can use my own organisation as an example. We’re an investment bank that does green energy within the EU. We would absolutely use GPT if it was legal, but it isn’t, and it likely never will be considering their finance model is partly to steal as much data as they can. Even if it’s not so polite to say that. This is where Microsoft comes into the picture. In non-tech enterprise you’re buying Microsoft products because everyone wants windows, outlook and office. We can wish it wasn’t like that, but where is the realistic alternative? I’m not anti Microsoft by the way, in all my decades in the enterprise business they’ve easily been the best and most consistent business partner for any IT. When Amazon saw how much money there was on the operations side of EU enterprise they quickly caught up, but Amazon doesn’t sell a Office365 product. So anyway, once you have Office365, you’re also likely to use Teams as your communications platform (which is why there is an anti-trust case against it), Sharepoint as your document platform, and, well, Azure as your cloud platform. Except you might use AWS because Amazon is also great. In some ways they are even more compliant with EU legislation than Microsoft.

But if Microsoft can throw GPT products into Azure the same way they put Teams and Sharepoint into Office365… well, then where is their competition? And having GPT features within Office365 will only further their advantage on the office platform. I mean, there are companies which won’t use Outlook, but there won’t be when ChatGPT writes your e-mails.

So this isn’t necessarily for you. It’s just part of Microsoft’s over all strategy for total IT domination in Enterprise. I mean, we’re going into RPA (robot process automation) a journey I went through in another Enterprise organisation a few years back. Back then you had to consider what go buy, would it be BluePrism, UIPath, automation anywhere, something else? Today there is no competition to Microsoft’s PowerAutomate if you’re already a Microsoft customer. It’s literally $500 a month vs $50k a month… I mean… that’s the future for GPT on Azure.

It’s probably necessary too. Their prices have made a lot of organisations look outside of Azure. Toward places like Hetzner or even self-hosting, but if Azure comes with GPT… well then.


OpenAI APIs have pretty much as clear a contract as you can get with a third party.

> Starting on March 1, 2023, we are making two changes to our data usage and retention policies:

> OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose. You can opt-in to share data.

> Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law).

https://openai.com/policies/api-data-usage-policies


It would still be illegal to use it, but you're right that I shouldn't have been so conspiratorial.


>> We would absolutely use GPT if it was legal,

Can't you just use the Azure service now?


Interesting release, though still lacking a few features I've had to resort building myself such as code summary, code base architecture summary, and conversation history summary. ChatGPT (the web UI) now has the ability to execute code, and make function callbacks, but I prefer running that code locally, especially if I am debugging. This latter part, conversation history summary, is something that ChatGPT web UI does reasonably well, giving it a long history, but a sentiment extraction and salient detail extraction before summarizing is immensely useful for remembering details in the distant past. I've been building on top of the GPT4 model and tinkering with multi-model (gpt4 + davinci) usage too, though I am finding with the MoE that Davinci isn't as important. Fine tuning has been helpful for specific code bases too.

If I had the time I'd like to play with an MoE of Llama2, as a compare and contrast, but that ain't gonna happen anytime soon.


This is a neat project from Microsoft

I've been building https://gasbyai.com, a beautiful chat UI that support self-hosted, with ChatGPT plugins, extract content from pdf/url. GasbyAI supports Azure, OpenAI, and custom API endpoints in case you want to run with your own models


Pretty sure Azure has a moderation endpoint enabled by default that makes using the OpenAI API an awful experience.


We have this at IKEA for a while now. Not impressed, but funny to read the hallucinations.


I'd expect a company like IKEA to have the expertise to create interfaces specific to their workflows so hallucinations aren't an issue.

Imo if you're making an open ended chat interface for a business, you're doing it wrong.


Have you considered instruction-tuning it with text, instead of just pictures?


Im not surprised Azure would add something like this to the stack. We build AnythingLLM (https://github.com/Mintplex-Labs/anything-llm) back in June due to some enterprise customers wanting something isolated they could run on premises with Azure OpenAI support + any vector DB they want.

With Azure's move to try to internalize any enterprise integration for AI it makes sense to make a chatbot wrapper because its a no-moat move. I think a lot of the "moat" if one can exist in the "chat with your docs" vertical is just integrations into flows and data sources SMB/Enterprises are already using.

For businesses, in my experience, the on-prem thing has been the first decision point - without question. Azure wrapper could be nice to have for those who cannot use chatGPT on the work comp but have access to this instead.

I wonder what kind of hypervisor view it gives to Azure admins for those who use it - it any. Multi-tenant instances was the second highest demand from SMB/Enterprise customers for AnythingLLM.


Yeah sure, I totally trust you after the Storm-0558 desaster


Darn I just spent a week or so working on a ChatGPT clone that used Azure ChatGPT API due to the privacy aspect. Wasted effort I guess.


This is exactly the same


Welcome to the club :)


I'm also in this club but we wrote it months ago.


Could anyone explain how this can be constructed as a private solution?

I'm not familiar with Azure platform.

Is the inference processed on private instance ? I can't imagine how it could be feasible given the hardware required to run gpt3.5/4.

So the best case scenario is:

1. A web ui runs on a private instances. So any user input (chat or files) are only seen by these instances 2. Any chat historisation or RAG is also done on these instances too. 3. Embeddings compuation may possibly be done on the private instance 4. The embeddings are then sent to the Microsoft GPU farm for inference.

So at one point my data has to leave my private network.

The problem is that the data can easily be retro-engineered from the embeddings.

How can this be presented as a private LLM ?


Interesting. One of my most requested feature for my small native apps[0][1] was to support Azure OpenAI service.

Apparently, many organizations have their own Azure OpenAI deployment and won’t let their employees use the public OpenAI service.

My understanding is that Azure makes sure all network traffic is isolated to their network so they have more controls over how their organization use ChatGPT.

I created a super simple step-by-step guide on how to obtain an Azure OpenAI endpoint & key here:

https://pdfpals.com/help/how-to-generate-azure-openai-api-ke...

Hope it would be useful to someone just getting started with Azure.

[0]: https://boltai.com

[1]: https://pdfpals.com


What's the practical difference between this and OpenAI API?

All I can see is the same product but offered by a larger organization. I.e. they're more likely to get the security details right, and you can potentially win more in a lawsuit should things go bad.


Compliance and customer trust. Azure can sign a BAA, for example. If you are Building LLM capability on top of your SaaS, your customers want assurances about their data.


A few months ago my team moved to Azure for capacity reasons. We were constantly dealing with 429 errors and couldn't get in touch with Open AI, while Azure offered more instances.

Eventually got more from Open AI so we load balance both. The only difference is the 3.5 turbo model on Azure is outdated.


you can ask for gpt-4, it took a while due to capacity constraints but we got it


The linked github repo was active yesterday and is now returning a 404.

anybody know why ?


How is this different from the other OpenAI GUI? Why another one by Microsoft? https://github.com/microsoft/sample-app-aoai-chatGPT.


I bet there are plenty of OKR/KPIs now tied to AI at Microsoft.


There's at least two more. There's also https://github.com/Azure-Samples/azure-search-openai-demo

And you can deploy a chat bot from within the Azure playground which runs on another codebase.


Bigger companies are cautious about using GPT-style products due to data security concerns. But most big companies trust Microsoft more or less blindly.

Now that Microsoft has an official "enterprise" version out, the floodgates are open. They stand to make a killing.


This is an internal ChatGPT, whereas that sample is ChatGPT constrained to internal search results (using RAG approach). Source: I help maintain the RAG samples.


i'm pretty sure it's a part of it


We just have to trust them and take their word for it? Or what?

https://azure.microsoft.com/en-us/explore/trusted-cloud/priv...

https://azure.microsoft.com/en-us/blog/3-reasons-why-azure-s...

I guess I would trust them, since they're big and they make these promises and other big companies use them.


This is awesome to see, feels heavily inspired (in a good way) by the version we made at Vercel[1]. Same tech stack: Next.js, NextAuth, Tailwind, Shadcn UI, Vercel AI SDK, etc.

I'd expect this trend of managed ChatGPT clones to continue. You can own the stack end to end, and even swap out OpenAI for a different LLM (or your own model trained on internal company data) fairly easily.

[1]: https://vercel.com/templates/next.js/nextjs-ai-chatbot


Really, they use Vercel AI SDK?


Well, they did. The repo appears to have been deleted.


Is this a full, standalone deployment including GPT-3 (or whatever version) or just a secured frontend that sends data to GPT hosted outside the enterprise zone?

Edit: Uses Azure OpenAI as the backend


I'm confused. If this is just a front-end for the OpenAI API then how does it remove the data privacy concern? Your data still ends up with Azure/OpenAI, right? It doesn't stay localized to your instance; it's not your GPU running the transformations. You have no way of knowing whether your data is being used to train models. If customer data is sensitive, I'm pretty sure running a 70B llama (or similar) on a bunch of A100s is the only way?


Azure is hosting and operating the service themselves rather then for OpenAI, with all the security requirements that come with that. I assume this comes with different data and access restrictions as well and ability to run in secured instances (and nothing sent to OpenAI the company).

Most companies use cloud already for their data, processing, etc. and aren’t running anything major locally, let alone ML models, this is putting trust in the cloud they already use.


Ah that's fair. But it is my impression that the bulk of privacy/confidentiality concerns (e.g. law/health/..) would require "end to end" data safety. Not sure if I'm making sense. I guess microsoft is somehow more trustworthy than openai themselves...

EDIT: what you say about existing cloud customers being able to extend their trust to this new thing makes sense, thanks.


Right. If I was an European company worried about, say, industrial espionage, this wouldn't be nearly enough to reassure me.


Yes, this was my understanding.


Link is 404 now. Anyone fork it before it went 404?



This is not ChatGPT. It's just a front end for Azure OpenAI APIs. Not sure why they're so blatantly use the trademark. They will probably have to rename it soon.


Microsoft is a major investor in OpenAI. Guaranteed they worked with OpenAI on this and have partnership to use the trademarks.


Microsoft owns OpenAI so I doubt that they will be asked to rename this.


They'll only own 49% of shares.


we wrote a blog post about why companies do this here: https://www.lamini.ai/blog/specialize-llms-to-private-data-d...

Here are a few:

Data privacy

Ownership of IP

Control over ops

The table in the blog lists the top 10 reasons why companies do this based on about 50 customer interviews.


It was really good when the access was enabled via OpenAI, but ever since its moved to Azure subscription, getting preview access is stalled. Wouldn't be a big deal for others, but for smalltime devs like me it becomes a big challenge.. Hope OpenAI provides a developer env or so where we can try things out..


Nothing in the repo details how this addresses privacy concerns of running inference on someone else's LLM. To be isolated from other users of the service is not the same thing as having a private inference engine.

> Private: Built-in guarantees around the privacy of your data and fully isolated from those operated by OpenAI.

Do tell.


So where do you draw the line? No cloud instances, no cloud SQL like Snowflake, no Teams or Office 365, no S3/blob storage? Run everything on-prem like 10 years ago?

It's only going to get more impossible. All that VC money going in at 100x revenue needs a return and they aren't going leave money on the table with full-featured open-source or CentOS type alternatives.

All those data engineering startups, database providers with 'open-source' + cloud hosting, the 'open-source' is going to be just 'open' enough to claim there is some fallback for someone else to pick up the mantle using the community version, if the cloud version gets enshittified beyond reason.

You're not going to even be able to run the full-featured software version on-prem because the economics of cloud are so much better.

Unless you are writing and compiling your own code you are going to be out of luck if your privacy standard is that high. That war has been lost. And Web3 sure ain't gonna save you either.


They should clearly spell out what is and is not "private". As it is we simply have a blurp about some undefined guarantees. And some comments here in thread saying "this is as close as you're going to get to local GPT" are deeply wrong. But then there is easy VC money (just like with ADs ..) and certain "clever" geeks throw social responsibility out the window as usual and are pushing all sorts of deeply invasive applications ("let our proxy for Microsoft hoover your inbox!") based on these undefined "Privacy guarantees".

If we accept this just we accepted the very flawed solutions we were given by corporation regarding social networking and ads, we are going to be stuck with it, suffer the consequences, and there will be no incentive to develop alternatives that actually address issues and work.

Homomorphic Encryption works. It just doesn't work very efficiently right now but that is an intellectual problem that can be solved if we push for actual privacy for this critical technology as it will be fully enmeshed in all parts of our lives.

"Think of the children" if that helps.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: