More

jascha_eng · 2024-11-25T18:21:50 1732558910

Hmm I like the idea of providing a unified interface to all LLMs to interact with outside data. But I don't really understand why this is local only. It would be a lot more interesting if I could connect this to my github in the web app and claude automatically has access to my code repositories.

I guess I can do this for my local file system now?

I also wonder if I build an LLM powered app, and currently simply to RAG and then inject the retrieved data into my prompts, should this replace it? Can I integrate this in a useful way even?

The use case of on your machine with your specific data, seems very narrow to me right now, considering how many different context sources and use cases there are.

TeMPOraL · 2024-11-25T23:42:41 1732578161

I'm honestly happy with them starting local-first, because... imagine what it would look like if they did the opposite.

> It would be a lot more interesting if I could connect this to my github in the web app and claude automatically has access to my code repositories.

In which case the "API" would be governed by a contract between Anthropic and Github, to which you're a third party (read: sharecropper).

Interoperability on the web has already been mostly killed by the practice of companies integrating with other companies via back-channel deals. You are either a commercial partner, or you're out of the playground and no toys for you. Them starting locally means they're at least reversing this trend a bit by setting a different default: LLMs are fine to integrate with arbitrary code the user runs on their machine. No need to sign an extra contact with anyone!

jspahrsummers · 2024-11-25T18:48:27 1732560507

We're definitely interested in extending MCP to cover remote connections as well. Both SDKs already support an SSE transport with that in mind: https://modelcontextprotocol.io/docs/concepts/transports#ser...

However, it's not quite a complete story yet. Remote connections introduce a lot more questions and complexity—related to deployment, auth, security, etc. We'll be working through these in the coming weeks, and would love any and all input!

jascha_eng · 2024-11-25T19:10:25 1732561825

Will you also create some info on how other LLM providers can integrate this? So far it looks like it's mostly a protocol to integrate with anthropic models/desktop client. That's not what I thought of when I read open-source.

It would be a lot more interesting to write a server for this if this allowed any model to interact with my data. Everyone would benefit from having more integration and you (anthropic) still would have the advantage of basically controlling the protocol.

nl · 2024-11-25T23:39:21 1732577961

OpenAI has Actions which is relevant for this too: https://platform.openai.com/docs/actions/actions-library

Here's one for performing GitHub actions: https://cookbook.openai.com/examples/chatgpt/gpt_actions_lib...

somnium_sn · 2024-11-25T19:33:22 1732563202

Note that both Sourcegraph's Cody and the Zed editor support MCP now. They offer other models besides Claude in their respective application.

The Model Context Protocol initial release aims to solve the N-to-M relation of LLM applications (mcp clients) and context providers (mcp servers). The application is free to choose any model they want. We carefully designed the protocol such that it is model independent.

jascha_eng · 2024-11-25T19:47:00 1732564020

LLM applications just means chat applications here though right? This doesn't seem to cover use cases of more integrated software. Like a typical documentation RAG chatbot.

mike_hearn · 2024-11-25T20:57:24 1732568244

Local only solves a lot of problems. Our infrastructure does tend to assume that data and credentials are on a local computer - OAuth is horribly complex to set up and there's no real benefit to messing with that when local works fine.

bryant · 2024-11-25T18:27:10 1732559230

> It would be a lot more interesting if I could connect this to my github in the web app and claude automatically has access to my code repositories.

From the link:

> To help developers start exploring, we’re sharing pre-built MCP servers for popular enterprise systems like Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer.

jascha_eng · 2024-11-25T18:29:16 1732559356

Yes but you need to run those servers locally on your own machine. And use the desktop client. That just seems... weird?

I guess the reason for this local focus is, that it's otherwise hard to provide access to local files. Which is a decently large use-case.

Still it feels a bit complicated to me.

singularity2001 · 2024-11-25T19:04:47 1732561487

For me it's complementary to openai's custom GPTs which are non-local.

jascha_eng · 2024-11-08T12:47:43 1731070063

How much slower is it to attach an external drive here?

kemotep · 2024-11-08T12:56:31 1731070591

It has thunderbolt 5 ports but the only drives capable of using that aren’t widely available or out yet and cost just as much as a base model Mac Mini.

But it should be essentially the same speeds as the average internal m.2 drives it seems.

https://www.owc.com/solutions/envoy-ultra

AbuAssar · 2024-11-09T04:28:07 1731126487

Only the mac mini with m4 pro has thunderbolt 5

dlachausse · 2024-11-08T14:22:13 1731075733

In typical average everyday use, not much at all if you get a high quality external SSD.

Source: Me, with my M1 Mac mini using a Samsung T7 connected via USB.

Things I use frequently are on the 256GB internal SSD, such as Office 365 and Xcode. Huge things like games that aren't a huge deal if they take a few more seconds to load are offloaded to the external. The only inconvenience this setup has caused me is that I have to periodically uninstall old iOS simulators from Xcode to keep enough free space available for OS updates.

jascha_eng · 2024-11-01T00:52:55 1730422375

That's a bit too simple. There is way fewer people producing quality content "for fun" than people that aim or at least eventually hope to make money from it.

Yes a few sites take this too far and ruin search results for everyone. But taking the possibility away would also cut the produced content by a lot.

Youtube for example had some good content before monetization, but there is a lot of great documentary like channels now that simply wouldn't be possible without ads. There is also clickbait trash yes, but I rather have both than neither.

ericd · 2024-11-01T01:21:15 1730424075

Demonetizing the web sounds mostly awesome. Good riddance to the adtech ecosystem.

erickhill · 2024-11-01T02:01:19 1730426479

The textual web is going the way of cable TV - pay to enter. And now streaming. "Alms for the poor..."

But, like on OTA TV, you can get all the shopping channels you want.

BoiledCabbage · 2024-11-01T02:20:44 1730427644

Not to be the downer, but who pays for all the video bandwidth, who pays for all the content hosting? The old web worked because it was mostly a public good, paid for by govt and universities. At current webscale that's not coming back.

So who pays for all of this?

The web needs to be monetized, just not via advertising. Maybe it's microtransactions, maybe subscriptions, maybe something else, but this idea of "we get everything we want for free and nobody tries to use it for their own agenda" will never return. That only exists for hobby technologies. Once they are mainstream they get incorporated into the mainstream economic model. Our mainstream model is capitalism, so it will be ever present in any form of the internet.

The main question is how people/resources can be paid for while maintaining healthy incentives.

6510 · 2024-11-01T07:15:05 1730445305

No one paid you to write that?

randomNumber7 · 2024-11-01T08:07:09 1730448429

Except I also pay my network provider to run the infrastructure

I think you forgot that

toomuchtodo · 2024-11-01T02:56:04 1730429764

It costs the Internet Archive $2/GB to store a blob of data in perpetuity, their budget for the entire org is ~$37M/year. I don't disagree that people and systems need to be paid, but the costs are not untenable. We have Patreon, we have subscriptions to your run of the mill media outlets (NY Times, Economist, WSJ, Vox, etc), the primitives exist.

The web needs patrons, contributions, and cost allocation, not necessarily monetization and shareholder capitalism where there is a never ending shuffle of IP and org ownership to maximize returns (unnecessarily imho). How many times was Reddit flipped until its current CEO juiced it for IPO and profitability? Now it is a curated forum for ML training.

I (as well as many other consumers of this content) donate to APM Marketplace [1] because we can afford it and want it to continue. This is, in fits and starts, the way imho. We piece together the means to deliver disenshittification (aggregating small donations, large donations, grants, etc).

(Tangentially, APM Marketplace has recently covered food stores [2] and childcare centers [3] that have incorporated as non profits because a for profit model simply will not succeed; food for thought at a meta level as we discuss economic sustainability and how to deliver outcomes in non conventional ways)

[1] https://www.marketplace.org/

[2] https://www.marketplace.org/2024/10/24/colorados-oldest-busi...

[3] https://www.marketplace.org/2024/08/22/daycare-rural-areas-c...

generalizations · 2024-11-01T02:00:16 1730426416

> There is way fewer people producing quality content "for fun" than people that aim or at least eventually hope to make money from it...But taking the possibility away would also cut the produced content by a lot.

....is that a problem? most of what we actually like is the stuff that's made 'for fun', and even if not, killing off some good stuff while killing off nearly all the bad stuff is a pretty good deal imo.

lolinder · 2024-11-01T13:09:09 1730466549

Agreed. The entire reason why search is so hard is because there's so much junk produced purely to manipulate people into buying stuff. If all of that goes away because people don't see ads there anymore, search becomes much easier to pull off for those of us who don't want to stick to the AI sandbox.

There's a slight chance we could see the un-Septembering of the internet as it bifurcates.

jascha_eng · 2024-10-29T21:02:58 1730235778

That's super interesting, I've been removing a lot of the redundant comments from the AI results. But adding new more explanatory ones that make it easier for both AI and humans to understand the code base makes a lot of sense in my head.

I was big on writing code to be easy to read for humans, but it being easy to read for AI hasn't been a large concern of mine.

jascha_eng · 2024-10-28T05:13:40 1730092420

I'm trying to prevent anyone from ever dropping a table in production again or executing a delete without a where clause.

https://github.com/kviklet/kviklet

Essentially a PR review flow for production access, which allows you to enforce a second pair of eyes workflow. I was always a bit scared when I was on call and had all the power in my finger tips to ruin everyone's day. I think this helps alleviate the risk of human error significantly. Also helps with compliance of course.

jascha_eng · 2024-10-11T08:52:26 1728636746

Run mypy in strict mode and make that check required.

jascha_eng · 2024-10-09T15:42:35 1728488555

Making up a new name really? I have an open source project that I am thinking about monetizing. In the end as an author you actually have to choose a sensible license set-up for that project.

If this then counts as open-source or not really only matters for marketing purposes. And a new term will not have the same effect for that. If I put fair-source on my landing page it just means I have to explain more.

People that really care about the license will read the license. People that don't really care will be fine with generic terms imo and don't need the classification.

jascha_eng · 2024-10-08T02:52:49 1728355969

This is trivial but the problem is that especially consistency is not a binary choice. Heck even non distributed systems can give you varying consistency guarantees it's the whole point of isolation levels of most RDBMS.

It's good to visualize that there is a trade off to be made. However that trade off does not have to be binary. You can get a bit more consistency for a little less partition tolerance or availability. All of Designing data intensive applications is about those trade offs.

dilyevsky · 2024-10-08T05:05:07 1728363907

Consistency in CAP and Consistency in ACID have entirely separate meanings.

gpderetta · 2024-10-08T08:33:47 1728376427

The C in ACID has a different definition of Consistency, yet the combination of guarantees given by ACID should also imply Consistency in the distributed system sense, right? I.e. a distributed database that claims to be ACID cannot sacrifice consistency [edit: at least for some isolation levels].

anonymousDan · 2024-10-08T09:26:44 1728379604

Isolation (the I in ACID) is more closely related to the notion of consistency in the distributed system community.

anonymousDan · 2024-10-08T09:24:59 1728379499

Consistency in the case of CAP refers to linearizability.

jascha_eng · 2024-09-30T17:49:04 1727718544

Still working on Kviklet: https://github.com/kviklet/kviklet

Trying to make production database access easy while avoiding dropping a table in production.

Experimenting with websockets right now so you don't have to create formal requests like a pull request anymore but instead can have more fluent database access sessions while another engineer watches over your virtual shoulder.

jascha_eng · 2024-09-16T12:28:07 1726489687

This is so cool haha