Crucially, I want to understand the license that applies to the search results. Can I store them, can I re-publish them? Different providers have different rules about this.
We work with search providers and ensure that we have zero data retention policies in place.
The search results are yours to own and use. You are free to do what you want with it. Of course you are bound by local laws of the legal jurisdiction you are in.
OK, so it looks like you aren't willing to share which providers you are working with. Can you share the rationale for not sharing that information instead?
We have relationships with many providers and I don't want to be seen as promoting or not promoting a specific provider. Some decent privacy-preserving vendors - Brave, Exa, Parallel Web Systems, DuckDuckGo etc
We will continue to monitor what's good to improve the output quality and results. Sometimes it could be the combination of providers to yield even better results. If I say one combination right now, and realize another combination is better, and make changes, I wouldn't need to broadcast it each time or risk misrepresenting the feature, which is to have amazing search and research capabilities that can augment models for a superior output.
Exa: https://exa.ai/assets/Exa_Labs_Terms_of_Service.pdf "You may not [...] download, modify, copy, distribute, transmit, display, perform, reproduce, duplicate, publish, license, create derivative works from, or offer for sale any information contained on, or obtained from or through, the Services, except for temporary files that are automatically cached by your web browser for display
purposes"
Many of the things I want to do with a search API are blocked by these rules! So I need to know which rules I am subject to.
IANAL, but if Ollama says "you can do with the results whatever you want", then they would be the ones liable for any breach of TOS.
That's admittedly a pretty foolish behaviour on their part and doesn't instill trust in Ollama as a service provider, but you as the end-user should be in the clear.
It's pretty wild that Brave's terms of service state as much, considering their search API is entirely derived from storing the results of other search systems. https://support.brave.app/hc/en-us/articles/4409406835469-Wh.... Aka Brave is blocking exactly what it does to Bing and Google.
My nightmare scenario is that I build my own crucial database of information partially derived from a search API... and then later get into legal trouble which forces me to delete that data, which is now intermingled with other information I've collected.
Yes - it's important to me that I understand the source of the data I've collected and if that source results in restrictions on what I can do with that data.
Especially when I'm building databases that I want other organizations to be able to use.
Fun fact: many geocoding APIs have restrictions on what you can do with the data you get back from that geocoder - including how long you can store it and whether you are allowed to re-syndicate to other people. That's one of the reasons I like OpenCage: https://opencagedata.com/guides/how-to-compare-and-test-geoc...
This information is very useful to the open source community. Whats the rationale in not "building in the public"? Is Ollama turning its back on the open source community? Also why should we believe ollama web search is better than my locally run searxng server?
Oh yes! that is why I want to provide the names of the providers we use. I do believe in building in the open. The web search functionality has a very generous free tier (it is behind Ollama's free account to prevent abuse) that allows you to give it a try comparing to running a searxng server locally.
On making the search functionality locally -- we made considerations and gave it a try but had trouble around result quality and websites blocking Ollama for making a crawler. Using a hosted API, we can get results for users much faster. I'd want us to revisit this at some point. I believe in having the power of local.
Would be curious about legal statement with EU AI Act that kills Bing API (Microsoft switch to Grounding Bing that rephrase the content)
Yes, Ephemeral queries must not retain any data, but there is also other rules, for instance it is forbidden for commercial services (where Ollama have a pricing model ?).
I don't know where I stand on the issue but it's interesting Facebook has been known to block PB links while Google seemed to refuse requests to do the same
Instead of turning this into an academic debate about copyright, a more practical thing to do is to examine the terms and conditions of whatever API you are using. Because if you are going to end up in a conflict with a search API provider, those probably spell out pretty clearly what the provider wants to allow or not and what you are agreeing to by using their API.
Caching is a problem with many geocoding APIs (which I happen to be familiar with) and a good reason to prefer e.g. Opencage over the Google or Here geocoders because unlike most geocoder terms and conditions, Opencage actually encourages you to cache and store things; because it's all open data. The Here geocoder requires you to tell them how much data you store and will try to charge you extra for the privilege of storing and keeping data around. Because it's their data and the conditions under which they license it to you are limiting what you can and cannot do. Search APIs are very similar. Technically geocoding is a form of search (given a query, return a list of stuff).
It is strange to launch this type of functionality with not even a privacy policy in place.
It makes me wonder if they’ve partnered with another of their VC’s peers who’s recently had a cash injection, and they’re being used as a design partner/customer story.
Exa would be my bet. YC backed them early, and they’ve also just closed a $85M Series B. Bing would be too expensive to run freely without Microsoft partnership.
Get on that privacy notice soon, Ollama. You’re HQ’d in CA, you’re definitely subject to CCPA. (You don’t need revenue to be subject to this, just being a data controller for 50,000 Californian residents is enough.)
Crucially, I want to understand the license that applies to the search results. Can I store them, can I re-publish them? Different providers have different rules about this.