Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know there was a downloadable version of Wikipedia (not that large). Maybe soon we'll have a lot of data stored locally and expose it via MCP, then the AIs can do "web search" locally.

I think 99% of web searches lead to the same 100-1k websites. I assume it's only a few GBs to have a copy of those locally, thus this raises copyright concerns.





The mostly static knowledge content from sites like Wikipedia is already well represented in LLMs.

LLMs call out to external websites when something isn’t commonly represented in training data, like specific project documentation or news events.


That's true, but the data is only approximately represented in the weights.

Maybe it's better to have the AI only "reason", and somehow instantly access precise data.


What use cases will gain from this architecture?

Data processing, tool calling, agentic use. Those are also the main use-cases outside "chatting".



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: