More

base76 · 2026-04-03T08:24:57 1775204697

Nouse doesn't compete with NumPy — they're in completely different categories. NumPy is numerical array computation. Nouse is a knowledge graph that injects structured context into LLM prompts.

The relevant comparison is: does a small model with Nouse answer domain questions better than a large model without it?

Benchmark says yes. 96% vs 47%.

dominexmacedon · 2026-04-03T08:28:33 1775204913

Yes, it was javascript-based package. NumPy is highly optimized by Native. It can't be competed. But starlight-numera is still useful for advanced tasks like computing and data science - related tasks on web and scripting.

base76 · 2026-03-03T12:05:58 1772539558

did you try it?

base76 · 2026-03-03T10:08:44 1772532524

One MCP tool call → your entire AI agent fleet as a live JSON graph.

  Nodes: gateway, agents, memory, bus
  Edges: verify → escalate → store
  Status: active / idle / blocked / error

  Built on FNC (Field→Node→Cockpit) — the same framework I use for consciousness detection research, now running a local AI armada.

base76 · 2026-03-02T17:24:25 1772472265

I built TrustPlane on top of cognos-proof-engine, an epistemic scoring engine that scores every LLM request before it reaches users.

  Formula: C = p × (1 − Ue − Ua)

  Four outcomes per request: PASS, REFINE, ESCALATE (webhook), BLOCK.
  Every decision gets a trace ID and ends up in an audit log.

  Pluggable providers: Ollama, OpenAI, Anthropic, Groq, Cerebras.
  Multi-tenant PostgreSQL isolation. Redis rate limiting. MCP server for
  Claude Code integration.

  The honest limitation: p (prior confidence) is currently a configurable
  baseline, not dynamically calibrated per request. That's on the roadmap.

  Self-hosted via docker-compose or SaaS. Air-gap compatible for regulated
  deployments.

base76 · 2026-03-01T13:57:24 1772373444

would love to hear what you say abot it

base76 · 2026-03-01T13:54:04 1772373244

I built a two-stage prompt compressor that runs entirely locally before your prompt hits any frontier model API.

  How it works:
  1. llama3.2:1b (via Ollama) compresses the prompt to its semantic minimum
  2. nomic-embed-text validates that the compressed version preserves the original meaning (cosine ≥ 0.85)
  3. If validation fails → original is returned unchanged. No silent corruption.

  When it actually helps:
  The effect is meaningful only on longer inputs. Short prompts are skipped entirely — no cost, no risk.

  ┌─────────────────────────────────┬────────────┬────────┐
  │              Input              │   Tokens   │ Saving │
  ├─────────────────────────────────┼────────────┼────────┤
  │ < 80 tokens                     │ skipped    │ 0%     │
  ├─────────────────────────────────┼────────────┼────────┤
  │ Academic abstract (207t)        │ 207 → 78   │ 62%    │
  ├─────────────────────────────────┼────────────┼────────┤
  │ Structured research doc (1116t) │ 1116 → 275 │ 75%    │
  ├─────────────────────────────────┼────────────┼────────┤
  │ Short command (4t)              │ skipped    │ 0%     │
  └─────────────────────────────────┴────────────┴────────┘

  If you're sending short one-liners, this won't help. If you're injecting long context, research text, or system prompts — it pays off from the first call.

  Known limitation:
  Cosine similarity is blind to negation. "way smaller" vs "way larger" scores 0.985. The LLM stage handles this by explicitly preserving negations and conditionals, but it's an open
  research question — tracked in issue #1.

  Install as MCP (Claude Code):
  {
    "mcpServers": {
      "token-compressor": {
        "command": "python3",
        "args": ["/path/to/token-compressor/mcp_server.py"]
      }
    }
  }

  Requires: Ollama + llama3.2:1b + nomic-embed-text

  Repo: https://github.com/base76-research-lab/token-compressor-

base76 · 2026-02-28T17:38:09 1772300289

We measured 62% token reduction on academic text with 92% semantic integrity.

  Not a claim. A measurement. Live, today, on our own research papers.                                                                                                                      
                                                                                                                                                                                            
  How it works:
  → Local LLM compresses the prompt
  → Embedding model validates: cosine similarity ≥ 0.90
  → Below threshold? Raw text sent instead. No silent loss.

  This runs as middleware inside CognOS Gateway — before every upstream API call.

  Client → [compress + validate] → OpenAI / Claude / Mistral / Ollama

  40-62% API cost reduction. Semantic integrity guaranteed or fallback.

  Code + methodology:


  #AI #LLM #MLOps #AIInfrastructure #TokenEfficiency

base76 · 2026-02-28T15:58:16 1772294296

Happy to answer questions from anyone testing it.

  The core loop is simple: every request through the gateway gets a trace_id, a trust score, and a signed decision context. The audit trail stays in your own infrastructure — no external  
  calls.                                                                                                                                                                                    

  One thing we're actively working on: calibrating trust thresholds per risk category under EU AI Act Article 13. If anyone's done similar work on confidence scoring for high-risk AI
  systems, I'd genuinely like to compare notes.

  Repo + 5-minute quickstart: https://github.com/base76-research-lab/cognos-proof-engine

base76 · 2026-02-28T13:12:59 1772284379

This is the exact problem CognOS was built to solve.

  99% reliable means you still can't remove the human from the loop — because you never know which 1% you're in. The only way to actually trust output is to attach a verifiable confidence   
  signal to each response, not just hope the aggregate accuracy holds.                                                                                                                        
                                                                                                                                                                                            
  We built a local gateway that wraps every LLM output with a trust envelope: decision trace, risk score, and an explicit PASS/REFINE/ESCALATE/BLOCK classification. The point isn't to make 
  LLMs more accurate — it's to make their uncertainty legible so the human knows when to step in.

  Open source if you want to look at the architecture: github.com/base76-research-lab/operational-cognos

base76 · 2026-02-28T13:11:41 1772284301

The cynicism is earned. But "nobody is good for humanity" is where analysis stops.

  What's actually happening is a jurisdictional split forming in real time. The US is pricing out companies that won't remove human oversight from weapons systems. That's not a soap opera — 
  it's a policy signal with long-term consequences for where frontier AI development lands geographically.                                                                                  

  Europe isn't perfect. But it's the only major jurisdiction actively building the governance infrastructure to keep humans in the loop by law, not by goodwill. That matters when goodwill
  runs out — which, as you note, it tends to do.

  The people building that alternative aren't the CEOs in Washington. Worth knowing they exist.