Hacker News new | past | comments | ask | show | jobs | submit login
B2B SaaS in the AI era is impossible?
15 points by nicelir1996 8 months ago | hide | past | favorite | 21 comments
The ultimate application for an enterprise inevitably involves access to all the internal data for the AI to do its magic.

However, companies are extremely afraid of having any data leaving their company to be sent to some AI APIs or stored in some other databases, which makes almost any sale impossible. Which is stupid because they already have all their data, like Confluence, Notion, Slack, Zendesk, etc., on external servers.

How can a startup convince B2B clients to use AI product as SaaS and not a weird self-hosted/self-deloyed monstrosity that looks more like consulting than SaaS?




In my experience, as a small business, it's almost impossible to convince any large company to use your product no matter what it is or how good it is; it doesn't matter if it's self-hosted, SaaS, enterprise or open source...

The problem is that you're a small shrimp and nobody who has any power in the big company knows who you are, they don't trust you. Not to mention that big company directors and senior employees are often best buds with their current SaaS software providers and so there will always be insiders who will object whenever someone proposes to switch providers to your solution because they don't want to hurt their buddies... And they love to keep that revolving door open for themselves so that they can switch jobs to the SaaS provider's company and get an even bigger salary than they're getting now (as thanks for helping the provider to keep the contract all these years by repeatedly pushing for platform lock-in, even though it was against the previous company's interest).

Also, most big tech companies have enough money coming in from their market monopoly that they really don't care about optimizing stuff. Nobody cares that your solution is 10x simpler and 10x more efficient in terms of computational resources. It's a marginal cost to them.


We got this question from a prospective customer literally yesterday.

As I understand, when using the Azure OpenAI Service under an enterprise plan, the data is contractually siloed and not used for retraining the general model.


Consultant here, all of our clients that do something with GenAI are on Azure for exactly that reason. Also, Microsoft has European data centers, sending stuff to the US is a big no-no for pretty much all of our clients.


Contractual obligations mean nothing to hackers and the pressure from future CEOs who will eventually have to find a way to make the next quarter look good (many t&cs have a clause that says “we can change this at any time without notice).


If we are assuming enterprise contractual obligations dont matter, there are much richer locations from which to harvest information -- Teams, SharePoint, Outlook, etc.


Why won't this follow the same pattern with the 3rd party providers? You give a good point that they already have their data on Notion/Slack/Zendesk etc. of which they have no control over the databases (just promises embedded in a Privacy Policy)

Maybe the worry is that their data will be trained on a model that will be open to consumers and an API key might get leaked through?


Are they worried about you (the startup) or about OpenAI?

If it's the former, there's nothing new, startups have fought this battle for a while.

If the reluctance is around OpenAI, there are several options: Azure OpenAI services is ok for most companies that are on Azure. And if not, then you can use open source models, which you can fine-tune for the specific task.


For the leadership of large companies it is a matter of (sometimes probably mistaken) trust. They trust Microsoft, Amazon, ... enough to use their cloud services. In general such a trust does not exist with small startups. I do not think that is specific for the AI era, though.


Trust is one thing but it is also a matter of legal and fiduciary responsibility.

They simply *are not able* to do it for legal reasons. When a cloud provider contract is signed it is chock full of liability and CYA language to make sure that if the provider makes a boo-boo and the data gets exposed/lost/damaged/etc. they will get hauled in court and driven bankrupt.

The reason for this are often laws - e.g. something like payroll or personal information (e.g. HR) is heavily heavily regulated by law and exposure or loss of such data would get the company in hot water with the regulator. So any service provider must guarantee that this won't happen.

The other reason are contractual relationship with their customers - customer information is heavily NDAed, you can't just share it with whoever you want or even upload it into any sort of cloud. E.g. we have at work restrictions even on things like which geographical location the data can be uploaded to and which not, even on the same cloud provider! The customers don't want to take any chances that e.g. some government will want to "take a peek" - and then e.g. share the information with their competitor. This did happen before.

Good luck to a startup with no name and that can be gone tomorrow with this. Simply isn't going to happen.


> The other reason are contractual relationship with their customers - customer information is heavily NDAed, you can't just share it with whoever you want or even upload it into any sort of cloud. E.g. we have at work restrictions even on things like which geographical location the data can be uploaded to and which not, even on the same cloud provider! The customers don't want to take any chances that e.g. some government will want to "take a peek" - and then e.g. share the information with their competitor. This did happen before.

So how does this work then with MS Office 365 (uploading for example all emails to MS servers will also have some customer data exposed) or running your ERP in a cloud (like all ERP providers are preaching to their customers)? If clauses in contracts are enough to keep you out of legal trouble the same should be true independently of the size of the provider.


You have to be extremely careful and conscious about what you are able to upload and what you are not.

E-mails are generally OK as long as they are not containing customer info (most aren't). The moment you get a data file from the customer (e.g. a CAD drawing, spreadsheet, some analysis, etc.) that is protected by a restrictive NDA, you must not even upload it to the corporate Onedrive to provide it to a colleague. It has to go through an on-premises server - or you have to send someone with an (encrypted) USB stick. And access is strictly controlled, only people who need it will have it.

This stuff is taken extremely seriously - a security breach leaking customer data because someone carelessly uploaded a confidential file where they shouldn't have could cost you millions in both lost customers and huge lawsuits.

>If clauses in contracts are enough to keep you out of legal trouble the same should be true independently of the size of the provider.

The problem is that you are not only after the clause in the contract to "keep you out of legal trouble" and suing the provider for money should anything go wrong. You actually want to be 99.9999% sure that nothing bad happens in the first place, that you have almost 100% uptime and the data are safe, not only to legally cover your backside.

Why? Because your own customers (or government) would haul your arse to court otherwise. No amount of compensation you get out of the service provider will fix it should it come to that. You could easily go bankrupt or to prison here.

Small startups have no chance to compete in this area with giants like Microsoft or Amazon. E.g. we are in the EU and are legally forbidden from storing some data in US hosted servers. So both Microsoft and Amazon (and also Google) have complied and have EU datacenters for this reason - and you can explicitly specify (both technically and contractually) which instances your e-mail or files are allowed to be stored in and which ones not.

An US startup with no own infrastructure only renting servers/compute from e.g. Amazon? How exactly are they going to ensure this, regardless of what is in the contract with me when I have no influence on how they structure their own contracts with their cloud providers?

Esp. when that company is here today but may not be tomorrow - goes bust, gets bought out by a competitor, etc. No amount of legalese will protect you here when you have nobody left to sue.

Unfortunately in this area the deck is stacked against startups and small companies sky-high, even if you aren't trying to convince them to give you all their confidential data and only trying to sell much more pedestrian services - e.g. payment services or something like payroll management.


That is the theory - only in my experience the reality is quite different. Employees do not bother to classify incoming emails as confidential or internal. So everything gets sent to the cloud.

Also, having servers in the EU helps to fulfill some regulations, it certainly cannot prevent that your customer data gets exposed.


You are missing the point. Sensitive customer data never touch e-mail, period. So there is nothing to "send to the cloud".

That's a matter of organizing the work and training your staff, along with appropriate technical measures where required. Not some hypothetical theory or matter of whether someone bothers to classify something or not.

If it is confidential to the degree that it cannot leave company premises then e-mail is automatically taboo and the file never gets transferred anywhere by mail (or any sort of cloud service).

Moreover, access is controlled and only the people who need it get it - along with a strictly worded information that it is under such and such NDA and must not leave the premises or be shared/uploaded outside of the company.

If the company you work for doesn't do it like this, then they either don't deal with so sensitive materials or they don't care - and then they will end up in court or bankrupt (or both) sooner than later.

Having servers in the EU is not about "not getting data exposed" or only a matter of some sort of regulatory compliance (even though that is important too - e.g. whenever GDPR is involved). Without that minimal guarantee that e.g. NSA or Boeing or Ford or some other major US competitor of some of our customers won't get to see their information (happened before, industrial espionage at a state level is a thing) they wouldn't even talk to us about sharing data with us. On top of that data is obviously encrypted too.

Data/IP protection is a process, there is no single magic thing that you do and are set. Leaks, whether intentional or not can and will happen - e.g. it is difficult to 100% prevent a disgruntled employee from walking out of the door with some sensitive files on a USB stick or a phone even if you institute a completely draconian regime at the workplace (which is counterproductive).

It is about having a process in place to mitigate against and minimize both such occurrences and their impact.


It's the common "your data will be used to improve the model that we also offer for sale to your competitors" clause. It's difficult for business people to get on board with the idea that their internal data might benefit competitors.


We're building B2B SaaS that has AI at its core. Some of the customers asked exactly this ("we want your thing to be in house, so that our data never leaves our infra").

The solution is fairly straightforward - prepare custom models that can be deployed on customer side. It is not even something that is hard to achieve with all the great open source models out there. Yes, it's going to be costly, but it can be done.


Is this concern hypothetical or something you actually had in (potential) customer conversations?


Interesting question. I guess it depends on the jurisdiction but also culture.

Given that openAi is SOC2, etc. compliant it might not be such a big deal https://trust.openai.com/


> Which is stupid because they already have all their data, like …, on external servers.

No, the companies that actually care about their data do not. This is just the typical SV perspective where everyone imagines the cloud is the only thing that exists.


Aren't there dedicated certifications for that? Company maturity model, threat model, data handling procedures verified by a trusted third party?


>to do its magic

...which is what, exactly?


Maybe build products that customers actually want and not seek what problem you can solve with whatever hype bandwagon you got yourself onto?

Otherwise it is trying to convince the customer that they are idiots and didn't drink enough of the hype cool-aid. Not a good start for building a business and a customer relationship, IMO.

>The ultimate application for an enterprise inevitably involves access to all the internal data for the AI to do its magic

That's complete nonsense in more ways than one. It is the same hype BS as "web3", "blockchain" and other fads people tried to sell before. It completely ignores what the technology actually can and cannot do and what are the sensible applications of it. And also totally ignores what the customers actually need to get work done.

Feeding all internal data to a 3rdparty AI (presumably LLM) provider is a complete non-starter, no matter what kind of resulting pie in the sky handwavy magic you will try to sell to anyone.

Would you feed your employee data there? Your payroll? Your customer information? With the unknown risks that it could be regurgitated to your competitors, made public or errors introduced due to hallucinations?

Probably not, right? Why do you think other companies should be convinced to do this?

And that doesn't touch legal restrictions, with laws outright prohibiting the company from doing this in many cases - e.g. employment data are highly confidential and heavily protected/regulated in Europe. The same applies to confidential customer data that is typically heavily NDAed.

>Which is stupid because they already have all their data, like Confluence, Notion, Slack, Zendesk, etc., on external servers.

There is a big difference in having e.g. HR data and something like developer chat/OPs or customer support in the cloud. Ask your lawyer, they will explain it to you right away.

In addition, all these cloud contracts come with heavy heavy liability language where the cloud provider guarantees security and confidentiality - or they get their asses hauled to court immediately should anything go wrong. Are you able to do that as a startup? Likely not.

Sensible LLM applications are highly customized on premises systems for querying locally generated and locally available information. If you can't build or sell that - well, sucks to be you?

Or look for other, more meaningful, applications of AI - classification problems, defect detection, outlier detection, language translations, speech recognition, etc. That is what neural networks are actually good at and what can be sensibly deployed.

Obviously, all the above applies if you are trying to build a business that will actually deliver some value to the customers and not only to the shareholders/founders/VCs financing until the company is sold off. If you are trying to do the latter then investors love meaningless buzzword salads and the latest hype.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: