Hacker Newsnew | past | comments | ask | show | jobs | submit | pjsousa79's commentslogin

One thing that seems to be missing in most discussions about "context" is infrastructure.

The dream system for AI agents is probably something like a curated data hub: a place where datasets are continuously ingested, cleaned, structured and documented, so agents can query it to obtain reliable context.

Right now most agents spend a lot of effort stitching context together from random APIs, web scraping, PDFs, etc. The result is brittle and inconsistent.

If models become interchangeable, the real leverage might come from shared context layers that many agents can query.


Am working on making this layer currently. It’s a more interesting problem even when you remove AI agents from the picture, I feel a context layer can be equally as useful for humans and deterministic programs. I view it as a data structure sitting on top of your entire domain and this data structure’s query interface plus some basic tools should be enough to bootstrap non trivial agents imo. I think the data structure that is best suited for this problem is a graph and the different types of data represented as graphs.

Stitching api calls is analogous to representing relationships between entities and that’s ultimately why I think graph databases have a chance in this space. As any domain grows, the relationships usually grow at a higher rate than the nodes so you want a query language that is optimal for traveling relationships between things. This is where a pattern matching approach provided by ISO GQL inspired by Cypher is more token efficient compared to SQL. The problem is that our foundation models have seen way way way more SQL so there is a training gap, but I would bet if the training data was equally abundant we’d see better performance on Cypher vs SQL.

I know there is GraphRAG and hybrid approaches involving vector embeddings and graph embeddings, but maybe we also need to reduce API calls down to semantic graph queries on their respective domains so we just have one giant graph we can scavenge for context.


This resonates strongly. We've been working on exactly this problem with ArcadeDB — a multi-model database that natively supports graphs, documents, key-value, time-series, and vector search in a single engine. (https://arcadedb.com)

The insight about relationships growing faster than nodes is spot on, and it's why we think the graph model is the natural fit for context layers. But in practice, you also need documents, vectors, and sometimes time-series data alongside the graph. Forcing everything into a single model (or stitching together multiple databases) creates friction that kills agent workflows.

On the GQL/Cypher vs SQL point — agreed on token efficiency. We support both SQL (extended with graph capabilities) and Cypher-style syntax, and the difference in prompt size for traversal queries is dramatic. An N-hop relationship query that takes 5+ lines of SQL JOINs is a single readable line in a graph query language. For LLM-generated queries, that's not just an aesthetic win — it directly reduces error rates and token costs.

Re: GraphRAG — we've seen the same convergence. Vector similarity to find the right neighborhood, then graph traversal for structured context. Having both in one engine (ArcadeDB supports vector indexing natively) means you avoid the API orchestration overhead you mention. One query, one database, full context.

The training gap for graph query languages is real but closing fast. As more agent frameworks adopt graph-based context, the flywheel will kick in.


Data should not be ingested. Data should originate from the same environment that you want to activate it in. That means you need build a system from the ground up for your searches, your document creation etc, so that this data is native to your system and then easily referenced in your commands to the llm interface.

The best example of this is probably CrewAI and Alibaba CoPaw. CoPaw has a demo up.


I put together a small, SQL-backed dashboard comparing the S&P 500 with several "Main Street" indicators: retail sales, construction spending, bank credit, and industrial production (manufacturing, mining, utilities).

The question I wanted to sanity-check was simple: markets seem to be pricing a strong AI-driven growth narrative — do broad macro indicators show a similar acceleration?

The dashboard is fully grounded on public data, and every chart is traceable to the underlying SQL query and datasets.

I’m not pushing a conclusion here — genuinely curious how others interpret the divergence (or lack thereof), and whether there are indicators you’d add or remove.


In 2019, ambient air pollution claimed the lives of young children at alarming rates in several countries. Here's the top 10 list of countries with the highest number of deaths per 100,000 children under 5 due to ambient air pollution: Nigeria – 18.95 Chad – 18.10 Sierra Leone – 12.02 Mali – 10.56 Guinea – 9.90 Niger – 9.64 Cote d'Ivoire - 9.04 Central African Republic - 8.79 Cameroon - 8.69 Burkina Faso - 8.68

These numbers highlight how air pollution isn't just an urban problem — it's a public health crisis in low-income countries where children are the most vulnerable.

Source: Baselight analysis using data from Our World in Data, originally supplied by the World Health Organization (WHO). https://baselight.app/u/pjsousa/query/top-10-countries-with-...


Not that this isn’t terrible, but those numbers look really low. Surely malnutrition and violence must be a hundred times more likely to kill them?

Not trying to say we shouldn’t consider this, but it seems like there’s bigger fish to fry first (assuming we can’t fry them all at the same time).


>Not that this isn’t terrible, but those numbers look really low. Surely malnutrition and violence must be a hundred times more likely to kill them?

They don't seem low at all to me. And a quick search suggests that malnutrition probably causes fewer deaths [1] (note that it's counted for all people here, not just under 5).

And in places like India and SEA, where malnutrition and violence are less of a problem, air pollution stands out even more.

[1] https://ourworldindata.org/grapher/malnutrition-death-rates


Take Nigeria - capital is cca 6 million. That means, that every year around 1150 children die from just air pollution alone, every year.

That is properly fucked up for children under 5. They start with absolutely clean lungs and the damage compounds so much they die from it. Think about all the other age groups that have some other horrific numbers.


In 2022, the all-cause mortality under-5 was 117 per 1000 in Nigeria.

Not per 100k, per 1000.

That means that for every child that dies from air pollution, ~600 more die from some other cause.

To be fair, most causes seem related to terrible living conditions, so everything probably improves or degrades together.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: