More

tartakovsky · 2024-02-26T15:20:57

ollama is MIT licensed unless i am misreading

wokwokwok · 2024-02-27T07:06:39

Look more closely at the software.

It does not just magically conjure LLM model files out of thin air.

Where do those models come from?

https://github.com/ollama/ollama/issues/2390

The registry is not open source.

You think I’m being unfair?

https://github.com/ollama/ollama/issues/914#issuecomment-195...

(Paraphrased)

>> How do I run my own registry?

> email us, let’s talk.

tartakovsky · 2024-02-14T00:13:55

What is your goal? if d1, d2, d3, etc is the dataset over which you're trying to optimize, then the goal is to find some best performing d_i. In this case, you're not evaluating. You're optimizing. Your acquisition function even says so: https://rentruewang.github.io/bocoel/research/

And in general if you have an LLM that performs really well on one d_i then who cares. The goal in LLM evaluation is to find a good performing LLM overall.

Finally, it feels that your Abstract and other snippets sound like an LLM wrote them.

Good luck.

doubtfuluser · 2024-02-14T06:11:10

I disagree that the goal in „evaluation is to find a good performing LLM overall“. The goal in evaluation is to understand the performance of an LLm (on average). This approach actually is more about finding „areas“ where the LLm does not behave well and where the LLm behaves well (by the Gaussian process approximation) This is indeed an important problem to look at. Often you just run an LLm evaluation on 1000s of samples, some of them similar and you don’t learn anything new from the sample „what time is it, please“ over „what time is it“.

If instead you can reduce the number of samples to look at and automatically find „clusters“ and their performance, you get a win. It won’t be the „average performance number“, but it will give you (hopefully) understanding which things work how well in the LLm.

The main drawback in this (as far as I can say after this short glimpse at it) is the embedding itself. Only if the distance in the embedding space really correlates with performance, this will work great. However we know from adversarial attacks, that already small changes in the embedding space can result in vastly different results

tartakovsky · 2024-02-06T02:23:58

What is this exactly in plain english, please?

tartakovsky · 2023-12-25T10:38:17

A happy person is not in a particular set of circumstances, but rather has a particular set of attitudes.

tartakovsky · 2023-10-18T01:16:48

The webpage discusses a Bayesian approach to experimentation, focusing on interpreting and extrapolating experimental results, mainly in a tech environment aiming to maximize user retention. It addresses challenges like the inference problem, the extrapolation problem, the explore-exploit problem, and a culture problem within tech companies around misuse of experiments. The author suggests providing decision-makers with benchmark statistics to help them estimate true effects of different policies, and discusses a model of experimentation dealing with observed and true effects along with the noise in experiments 1 .

tartakovsky · 2023-09-17T17:52:54

Great question. A legend or brief description of the underlying logic / heuristic would be helpful.

breckenedge · 2023-09-17T22:30:06

The heuristic is likely the result of an ML algorithm, so the underlying logic may not make much sense to us.

tartakovsky · 2023-07-09T06:13:24

Not secure... NET::ERR_CERT_COMMON_NAME_INVALID Subject: *.safezone.mcafee.com

Issuer: McAfee OV SSL CA 2

Expires on: Aug 3, 2023

Current date: Jul 8, 2023

PEM encoded chain: -----BEGIN CERTIFICATE----- MIIGfzCCBWegAwIBAgIQKt9VNrFtaozA1bILX1OcfzANBgkqhkiG9w0BAQsFADBk MQswCQYDVQQGEwJVUzELMAkGA1UECBMCQ0

tartakovsky · 2023-07-05T21:54:01

What is “human intent”?

cubefox · 2023-07-06T00:36:21

Preference utilitarianism?

tartakovsky · 2023-06-20T22:52:28

Why do you not think it’s featured? Has a16z funded many of those companies, lol? And somehow, rejected milvus?

SheepHerdr · 2023-06-20T23:26:31

The guys at Milvus raised a total of $113M according to Crunchbase, second only to Pinecone, which is funded by a16z. You're not going to highlight the main competitor of one of your portfolio companies.

gk1 · 2023-06-21T13:04:16

The post mentions six alternatives to Pinecone. The reality is Milvus isn't as relevant today as it was a year ago. I'm from Pinecone but I'll give credit where it's due: Weaviate, Chroma, and Qdrant completely lapped Milvus in the open-source space. That's why they got mentioned and Milvus didn't.

tartakovsky · 2023-06-14T00:12:26

Same idea here? Larger models do a better job forgetting their training data and dropping their semantic priors. Perhaps another way of thinking through this is that larger models learn new information and drop old information faster. https://arxiv.org/abs/2303.03846

Isn't that interesting? The idea of "mental liquidity", or "strong opinions weakly held"? https://news.ycombinator.com/item?id=36280772

indus · 2023-06-14T00:34:35

Wouldn’t this be the equivalent of ranking? I thought LLM are not supposed to get influenced by freshness.

marcosdumay · 2023-06-14T01:58:58

By the freshness of training with some data?

Well, aren't they? I believe any kind of reinforcement learning is supposed to be biased into the last training set.