Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd love to know what search engine provider they're using under the hood for this. I asked them on Twitter and didn't get a reply (yet) https://twitter.com/simonw/status/1971210260015919488

Crucially, I want to understand the license that applies to the search results. Can I store them, can I re-publish them? Different providers have different rules about this.





We work with search providers and ensure that we have zero data retention policies in place.

The search results are yours to own and use. You are free to do what you want with it. Of course you are bound by local laws of the legal jurisdiction you are in.


OK, so it looks like you aren't willing to share which providers you are working with. Can you share the rationale for not sharing that information instead?

We have relationships with many providers and I don't want to be seen as promoting or not promoting a specific provider. Some decent privacy-preserving vendors - Brave, Exa, Parallel Web Systems, DuckDuckGo etc

We will continue to monitor what's good to improve the output quality and results. Sometimes it could be the combination of providers to yield even better results. If I say one combination right now, and realize another combination is better, and make changes, I wouldn't need to broadcast it each time or risk misrepresenting the feature, which is to have amazing search and research capabilities that can augment models for a superior output.


The reason I care about this is that different providers have different rules about how I can use the results.

Brave: https://api-dashboard.search.brave.com/terms-of-service "Licensee shall not at any time, and shall not permit others to: store the results of the API or any derivative works from the results of the API"

Exa: https://exa.ai/assets/Exa_Labs_Terms_of_Service.pdf "You may not [...] download, modify, copy, distribute, transmit, display, perform, reproduce, duplicate, publish, license, create derivative works from, or offer for sale any information contained on, or obtained from or through, the Services, except for temporary files that are automatically cached by your web browser for display purposes"

Many of the things I want to do with a search API are blocked by these rules! So I need to know which rules I am subject to.


IANAL, but if Ollama says "you can do with the results whatever you want", then they would be the ones liable for any breach of TOS.

That's admittedly a pretty foolish behaviour on their part and doesn't instill trust in Ollama as a service provider, but you as the end-user should be in the clear.


It's pretty wild that Brave's terms of service state as much, considering their search API is entirely derived from storing the results of other search systems. https://support.brave.app/hc/en-us/articles/4409406835469-Wh.... Aka Brave is blocking exactly what it does to Bing and Google.

(IANAL) You can normally safely ignore such things.

My nightmare scenario is that I build my own crucial database of information partially derived from a search API... and then later get into legal trouble which forces me to delete that data, which is now intermingled with other information I've collected.

So we don't have just data now, but data-obtained-by-particular-process? If you have a database, should it matter how it was gathered?

Yes - it's important to me that I understand the source of the data I've collected and if that source results in restrictions on what I can do with that data.

Especially when I'm building databases that I want other organizations to be able to use.

Fun fact: many geocoding APIs have restrictions on what you can do with the data you get back from that geocoder - including how long you can store it and whether you are allowed to re-syndicate to other people. That's one of the reasons I like OpenCage: https://opencagedata.com/guides/how-to-compare-and-test-geoc...


I agree with you in spirit, but that’s not an answer you can apply when there’s someone else’s money at stake.

This information is very useful to the open source community. Whats the rationale in not "building in the public"? Is Ollama turning its back on the open source community? Also why should we believe ollama web search is better than my locally run searxng server?

Oh yes! that is why I want to provide the names of the providers we use. I do believe in building in the open. The web search functionality has a very generous free tier (it is behind Ollama's free account to prevent abuse) that allows you to give it a try comparing to running a searxng server locally.

On making the search functionality locally -- we made considerations and gave it a try but had trouble around result quality and websites blocking Ollama for making a crawler. Using a hosted API, we can get results for users much faster. I'd want us to revisit this at some point. I believe in having the power of local.


How much is the generous free tier? I couldn't find it in the website.

I believe it's free.

> I'd want us to revisit this at some point. I believe in having the power of local

Thanks! please do!


DuckDuckGo isn't a provider, it's just Bing wearing a duck hat.

Would be curious about legal statement with EU AI Act that kills Bing API (Microsoft switch to Grounding Bing that rephrase the content)

Yes, Ephemeral queries must not retain any data, but there is also other rules, for instance it is forbidden for commercial services (where Ollama have a pricing model ?).


You can say you're training an AI model and do whatever you want with it.

The "Zuckerberg defence".

It's OK to pirate a massive amount of books if you're not reading or sharing, but rather just training an AI.


I don't know where I stand on the issue but it's interesting Facebook has been known to block PB links while Google seemed to refuse requests to do the same

What are peanut butter links?

I'm guessing Pirate Bay

Oh, I don't recall seeing anyone sharing Pirate Bay links; why not share just the magnet uri?

Or is it about sharing the domains of mirrors?


Yes

And by the way I prefer Google's approach in this particular case

Zuckerberg strikes me as far too adaptive, too fair weather


You should ask if search results are even copyrightable, if they are just a list of links.

Instead of turning this into an academic debate about copyright, a more practical thing to do is to examine the terms and conditions of whatever API you are using. Because if you are going to end up in a conflict with a search API provider, those probably spell out pretty clearly what the provider wants to allow or not and what you are agreeing to by using their API.

Caching is a problem with many geocoding APIs (which I happen to be familiar with) and a good reason to prefer e.g. Opencage over the Google or Here geocoders because unlike most geocoder terms and conditions, Opencage actually encourages you to cache and store things; because it's all open data. The Here geocoder requires you to tell them how much data you store and will try to charge you extra for the privilege of storing and keeping data around. Because it's their data and the conditions under which they license it to you are limiting what you can and cannot do. Search APIs are very similar. Technically geocoding is a form of search (given a query, return a list of stuff).


It is strange to launch this type of functionality with not even a privacy policy in place.

It makes me wonder if they’ve partnered with another of their VC’s peers who’s recently had a cash injection, and they’re being used as a design partner/customer story.

Exa would be my bet. YC backed them early, and they’ve also just closed a $85M Series B. Bing would be too expensive to run freely without Microsoft partnership.

Get on that privacy notice soon, Ollama. You’re HQ’d in CA, you’re definitely subject to CCPA. (You don’t need revenue to be subject to this, just being a data controller for 50,000 Californian residents is enough.)

https://oag.ca.gov/privacy/ccpa

I can imagine the reaction if it turns out the zero-retention provider backing them ended up being Alibaba.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: