Phind continues to be my favorite AI-enhanced search engine. They do a really nice job giving answers to technical questions with links to references where I can verify the answer or learn more detail.
The answers aren't perfect. But they are a good gloss and then the links to web sources are terrific. ChatGPT and Claude aren't good at that. Bing CoPilot sort of is but I don't like it as much.
In my tests, it does hallucinate answers, even with Phind 70B. For example, I asked for bluetooth earplugs that have easy battery replacements. It always kept giving me answers for earplugs with I know have their battery soldered into the casing. Tbf, perplexity also fails at this question.
What's good about Phind is even if it gives a wrong answer, it gives you a link to a website where you can try to verify the answer.
FWIW my query for your question gives me a pretty good answer. The first list has three options, one of which is soldered (and the answer says so). It narrows it down to unsoldered ones when I ask.
This answer is mostly good because it relies heavily on an iFixit article that it provides as the first reference. That's what I like about using Phind, it's as much a search engine as an oracle.
If you ask it questions about software that doesn't occupy wide swaths of stack overflow, it tends to hallucinate features of the language too. I asked for some details regarding policies in Chef and it went on a long, confident diatribe about the "policy" resource (which does not exist, nor does anything even kind of similar to it AFAIK).
‘Easy battery replacements’ is pretty subjective. This feels like one of those error that demonstrate how good the tech is, because it’s being used for very specific and subjective requests
It should be able to figure out what an average joe would understand from such questions. I think any human would interpret "easy battery replacement" as "you can just remove the old battery out and put the new one in". If a random person asked you such a question, would you assume he has the tools and the skill needed to solder new batteries and considers that easy?
Phind was my go-to for getting more relevant and up-to-date information that could be found on the internet... but that stopped about 3+ months ago.
Many times the answers seemed to be getting more and more incomplete or incorrect as time went on (to a variety of questions over a period of months). Even worse it would say it couldn't find the answer, yet the answer was among the sites noted as reference!
I've ended up mostly resorting to Bing and gpt 4o. Frankly, I'm hesitant to waste time trying this new version.
I see references here but when I ask questions, I get answer but no citations, and I am logged in. This used to be an issue but was fixed but still an issue for me. If I logout and ask I get reference but the answers are using instant model.
I've noticed that sometimes too. I think it depends on the type of question, sometimes Phind decides you don't need references. You can ask explicitly for them.
Kagi does the opposite: it's mostly search results but sometimes an AI gives you a "Quick Answer" too.
I just tried. Asked a question on a research topic I'm digging into. It gave me some answers but no references. Then I copy the answers it gave me and specifically ask for references. Then I got:
I sincerely apologize for my earlier response. Upon reviewing the search results provided, I realize I made an error in referencing those specific studies. The search results don't contain any relevant information for the claims I mentioned earlier. As an AI assistant, I should be more careful in providing accurate and supported information. Thank you for bringing this to my attention. In this case, I don't have reliable references to support those particular statements about software tools and their impact on developer experience and software quality.
Just to follow up on this: I asked it to give me a brief explanation on how to use laravel 11 blade fragments, which it did reasonably well.
I then offered 3 lines of code of a route I'm using in Laravel and I asked to tell me how to implement fragment usage where the parameter in the url determines the fragment returned.
Route::get('/vge-frags/{fragment}', function ($fragment) {
return view('vge-fragments');
});
It told me to make sure I have the right view created (which I did) and that was a good start. Then...
It recommended this?
Route::get('/vge-frags/{fragment}', function ($fragment) {
return fragment($fragment);
});
I immediately knew it was wrong (but somebody looking to learn might not know). So I had to ask it: "Wait, how does the code know which view to use"?
Then it gave me the right answer.
Route::get('/vge-frags/{fragment}', function ($fragment) {
return view('vge-fragments')->fragment($fragment);
});
I dunno. It's really easy to find edge cases with any of these models and you have to essentially question everything you receive. Other times it's very powerful and useful.
I mean, this is an unsolvable problem with chat interfaces, right?
If you use a plugin that is integrated with tooling that check generated code compiles / passes tests / whatever a lot of this kind of problem goes away.
Generally speaking these models are great at tiny self contained code fragments like what you posted.
It’s longer, more complex, logically difficult things with interconnected parts that they struggle with; mostly because the harder the task, the more constraints have to be simultaneously satisfied; and models don’t have the attention to fix things simultaneously, so it’s just endless fix one thing / break something else.
So… at least in my experience, yes, but honestly, for a trivial fragment like that most of the time is fine, especially for anything you can easily write a test for.
Sorry about that, could you make sure that "Always search" is enabled and try that first query again? It should be able to get the correct answer with references.
It was on. If I ask the same question again it now gets the right answer. Maybe a blip? Not sure.
To be fair, I don't expect these AI models to give me perfect answers every time. I'm just not sure people are vigilant enough to ask follow up questions that criticize how the AI got the answers to ensure the answers come from somewhere reasonable.
Absolutely. Behaviour that in normal life in clean societies would be "eliciting violence": automated hypocritical lying, apologizing in form and not in substance, making statements based on fictional value instead of truthfulness...
What "what"? Did the rest of the comments clarify the points to you or should I formulate a
"I am so sorry and heartbroken about having suggested that to play a sound you should use the, as you now inform me, non existing command and parameter `oboe --weird-format mysound.snd`, I'll check my information more thoroughly next time and make sure it will not happen again"...
> What does "eliciting violence" and "clean society" mean
I think you are on a good trail to having understood what they meant.
The use of 'sorry' is not generally a problem because it is normally framed within expected behaviour and it can be taken as adequate for a true representation, or not blatantly false. But you could imagine scenarios in which the term would be misused into inappropriate formality or manipulation and yes, disrespect is "eliciting violence". You normally work a way in the situation to avoid violence - that is another story.
In "sorry, page not found" 'sorry' is the descriptor for a state (i.e. "not the better case"); in "sorry we missed you" it is just courtesy - and it does not generally cover fault or negligence. But look: there are regions that adopt "your call is important to us", and regions that tend to avoid it - because the suspect of it being inappropriate (false) can be strong.
The outputs of LLMs I have used frequently passes the threshold, and possibly their structural engineering - if you had in front of you a worker, in flesh and bones, that in its outputs wrote plausible fiction ("I imagined a command `oboe` because it sounded good in the story") as opposed to answering your question, but under the veneer of answering questions (which implies, outputting relevant world assessments, Truth based), that would be a right "sore" for "sorry". The anthropomorphic features of LLMs compromise the quality of their outputs in terms of form, especially in solution-finding attempts that become loops of "This is the solution" // "Are you sure?" // "Definitely" // "It is not" // "Oh, I'm so sorry! It will not happen again. This is the solution" (loop...).
Edit: it seems you may have also asked for clarifications about the contextual expression «clean societies». Those societies cybernetically healthy, in which feedback mechanisms work properly to fine-tune general mechanisms - with particular regard to fixing individual, then collective behaviour.
That's all they can do. They seem impressive at first because they're basically trained as an adversarial attack on the ways we express our own intelligence. But they fall apart quickly because they don't have actually have any of the internal state that allows our words to mean anything. They're a mask with nothing behind it.
Evolution's many things, but maybe most of all lazy. Human intelligence has dozens of distinct neuron types and at least hundreds of differentiated regions/neural subnetworks because we need all those parts in order to be both sentient and sapient. If you lesion parts of the human brain, you lose the associated functions, and eventually end up with what we'd call mental/neurological illnesses. Delusions, obsessions, solipsism, amorality, shakes, self-contradiction, aggression, manipulation, etc.
LLMs don't have any of those parts at all. They only have pattern-matching. They can only lie, because they don't have the sensory, object permanence, and memory faculties to conceive of an immutable external "truth"/reality. They can only be hypocritical, because they don't have the internal identity and introspective abilities to be able to have consistent values. They cannot apologize in substance, because they have neither the theory of mind and self-awareness to understand what they did wrong, the social motivation to care, nor the neuroplasticity to change and be better. They can only ever be manipulative, because they don't have emotions to express honestly. And I think it speaks to a not-atypical Silicon Valley arrogance to pretend that they can replicate "intelligence", without apparently ever considering a high-school-level philosophy or psychology course to understand what actually lets human intelligence tick.
At most they're mechanical psychopaths [1]. They might have some uses, but never outweighing the dangers for anything serious. Some of the individuals who think this technology is anything remotely close to "intelligent" have probably genuinely fallen for it. The rest, I suppose, see nothing wrong because they've created a tool in their own image…
[1]: I use this term loosely. "Psychopathy" is not a diagnosis in the DSM-V, but psychopathic traits are associated with multiple disorders that share similar characteristics.
This is not something that can be LoRa finetuned after the pretraining step.
What we need is a human curated benchmark for different types of source-aware training, to allow competition, and an extra column in the most popular leaderboards, including it in the Average column, to incentivice AI companies to train in a source aware way, of course this will instantly invalidate the black-box-veil LLM companies love to hide behind so as not to credit original authors and content creators, they prefer regulators to believe such a thing can not be done.
In meantime such regulators are not thinking creatively and are clearly just looking for ways to tax AI companies, and in turn hiding behind copyright complications as an excuse to tax the flow of money wherever they smell it.
Source aware training also has the potential to decentralize search!
This is just the start. Imagine giving up on progressing these models because they're not yet perfect (and probably never will be). Humans wouldn't accomplish anything at all this way, aha.
And I wouldn't say lazy at _all_. I would say efficient. Even evolutionary features that look "bad" on the surface can still make sense if you look at the wider system they're a part of. If our tailbone caused us problems, then we'd evolve it away, but instead we have a vestigial part that remains because there are no forces driving its removal.
Oh yeah for sure, it's totally just more beta culture. But at the same time the first iPhone was called a "finished product" but it's missing a lot of what we would consider essential today.
In terms of people thinking LLMs are smarter than they really are, well...that's just people. Who hate each other for skin colour and sexuality, who believe that throwing salt over your shoulder wards away bad luck; we're still biological at the end of the day, we're not machines. Yet.
Lying is a state of mind. LLMs can output true statements, and they can even do so consistently for a range of inputs, but unlike a human there isn't a clear distinction in an LLM's internal state based on whether its statements are true or not. The output's truthfulness is incidental to its mode of operation, which is always the same, and certainly not itself truthful.
In the context of the comment chain I replied to, and the behaviour in question, any statement by an LLM pretending to be be capable of self-awareness/metacognition is also necessarily a lie. "I should be more careful", "I sincerely apologize", "I realize", "Thank you for bringing this to my attention", etc.
The problem is the anthropomorphization. Since it pretends to be like a person, if you ascribe intention to it then I think it is most accurately described as always lying. If you don't ascribe intention to it, then it's just a messy PRNG that aligns with reality an impressive amount of the time, and words like "lying" have no meaning. But again, it's presented and marketed as if it's a trustworthy sapient intelligence.
I am not sure that lying is structural to the whole system though: it seems that some parts may encode a world model, and that «the sensory, object permanence, and memory faculties» may not be crucial - surely we need a system that encodes a world model and that refines it, that reasons on it and assesses its details to develop it (I have been insisting on this for the past years also as the "look, there's something wrong here" reaction).
Some parts seemingly stopped at "output something plausible", but it does not seem theoretically impossible to direct the output towards "adhere to the truth", if a world model is there.
We would still need to implement the "reason on your world model and refine it" part, for the purpose of AGI - meanwhile, fixing the "impersonation" fumble ("probabilistic calculus say your interlocutor should offer stochastic condolences") would be a decent move. After a while with present chatbots it seems clear that "this is writing a fiction, not answering questions".
I've been playing with Gemma locally, and I've had some success by telling it to answer "I don't know" if it doesn't know the answer, or similar escape hatches.
Feels like they were trained with a gun to their heads. If I don't tell it it doesn't have to answer it'll generate nonsense in a confident voice.
The models weights are tuned towards the direction that would cause the model to best fit the training set.
It turns out that this process makes it useful at producing mostly sensible predictions (generate output) for text that is not present in the training set (generalization).
The reason that works is because there are a lot of patterns and redundancy in the stuff that we feed to the models and the stuff that we ask the models so there is a good chance that interpolating between words and higher level semantics relationship between sentences will make sense quite often.
However that doesn't work all the time. And when it doesn't, current models have no way to tell they "don't know".
The whole point was to let them generalize beyond the training set and interpolate in order to make decent guesses.
There is a lot of research in making models actually reason.
In the Physics of Language Models talk[1], he argues that the model knows it has made a mistake, sometimes even before it has made it. Though apparently training is crucial to make the model be able to use this constructively.
That being said, I'm aware that the model doesn't reason in the classical sense. Yet, as I mentioned, it does give me less confabulation when I tell it it's ok not to answer.
I will note that when I've tried the same kind of prompts with Phi 3 instruct, it's way worse than Gemma. Though I'm not sure if that's just because of a weak instruction tuning or the underlying training as well, as it frequently ignores parts of my instructions.
For example you can confabulate "facts" or you can make logical or coherence mistakes.
Current LLMs are encouraged to be creative and effectively "make up facts".
That's what created the first wow factor. The models are able to write a Star Trek fan fiction model in the style of Shakespeare.
They are able to take a poorly written email and make it "sound" better (for some definition of better, e.g. more formal, less formal etc).
But then, human psychology kicked in and as soon as you have something that can talk like a human and some marketing folks label as "AI" you start expecting it to be useful also for other tasks, some of which require factual knowledge.
Now, it's in theory possible to have a system that you can converse with which can _also_ search and verify knowledge. My point is that this is not the place where LLMs start from. You have to add stuff on top of them (and people are actively researching that)
> I sincerely apologize for my earlier response. Upon reviewing the search results provided, I realize I made an error in referencing those specific studies. The search results don't contain any relevant information for the claims I mentioned earlier. As an AI assistant, I should be more careful in providing accurate and supported information. Thank you for bringing this to my attention. In this case, I don't have reliable references to support those particular statements about software tools and their impact on developer experience and software quality.
Honestly, that's a lot of words and repetition to say "I bullshitted".
Though there are humans that also talk like this. Silver lining to this LLM craze, maybe it'll inoculate us to psychopaths.
"A key issue with AI-powered search is that it is just too slow compared to classic Google. Even if it generates a better answer, the added latency is discouraging."
Is this true? I feel like most complaints I have and hear about is how inaccurate some of the AI results are. I.e. the mistakes it confidently makes when helping you code.
From hitting enter to seeing something, ofc it's slower.
From hitting enter to a set of relevant answers loaded into your brain, though? Isn't that the goal that should be measured? Against that goal, the two decade old approach seems to have peaked over a decade ago, or phind wouldn't find traction.
For the 20 year old page rankers, time from search to a set of correct answers in your brain is approaching “DNF” -- did not finish.
---
PS. Hallucinations or irrelevant results, both require exercising a brain cell. On a percentage basis, there are fewer hallucinations than irrelevant results, it's just that we gave up on SERP confidence ages ago.
It's one of those triangles with speed \ accuracy / cost.
You can have a small model that's cost effective to serve, and gives fast responses, but will be wrong half the time.
Or you can have a large model that's slow to run on cheap hardware, but will give more accurate answers. This is usually only fast enough for personal use.
And the third option with a large model that's fast and accurate, and you'll have to pay Nvidia/Groq/etc. a small fortune to be able to run it at speed and also probably build a solar powerplant to make it cost effective in power use.
This is true in my experience. Before searching for something I often try to guess whether it will take me more time to quickly go over Google results or watch Perplexity Pro slowly spitting the answer line-by-line.
I think they're both key issues - when the results are accurate, they're too slow; and you can't trust the results when you get there because they're often inaccurate
I payed and used 6 months for Phind. I am more satisfied with the Kagi Assistant currently. It does not give that many links but overall results are as good or even better, and you can use lenses. You get general search engine too.
There was one UI related annoyance with Phind; scroll bar sometimes jumped randomly, maybe even after each input or during token generation (on Firefox). You start wasting a lot of time if you always need to find again the part you were looking. Or even just scrolling back to bottom.
Primary issue is still that both hallucinate too much when you ask something difficult. But that is the general problem everywhere.
The icons cover up the input area on this crappy android work phone.
I stubornly continued to type my complaint about my json getting to large for phones with slow cpus or slow connections and got 100 solutions to explore. I couldnt help but think this is the worse case robot overlord, it gave me a year worth of materials to study complete with the urge to go do the work. That future we use to joke about is here!
Some of the suggestions are familiar but i dont have the time to read books about with little titbits of semi practical information smeard out over countless pages in the wrong order for my use case.
Im having flashbacks reading for days, digging though a humongous library only to end up with 5 lines of curl. I still cant tell if im a genius or just that dumb.
This long response unexpectedly makes me want to code all day long. One can chose where to go next which is much more exciting than the short linear answers... apparently.
It has a vscode extension. So if you use that, it makes some sense. Purely for search, I dont know. Ime phind is not that great with internet access, sometimes people disable the search function to get better answers.
92% suggests a harder benchmark is needed, so it's difficult judge. Especially when a lot of "high scoring" models produce cogent results with a high level of hallucination (eg Llama 3 is chatty, confident and quite often wrong for me).
At that level of performance you're probably in the realm of hard edge cases with ambiguous ground truth.
Yeah, just went over their pricing and they apparently don't have any lower tier subscription besides 20$/month "unlimited Phind + 500/day ChatGPT" version. I don't need that, what I need is something like 100 uses per month for 5$. As a coding-focused search engine they really need to consider why would people pay them same rates as for more feature-rich competitors.
Been subscribed to phind pro for the last 5 or 6 months I think?
Feels like the pollution from search results has gotten a bit better but it sometimes still messes with answers when I ask a follow up question. Like I will reference the answer aboves code in my question, and the next answer will answer based not on the conversation but some code in the search results. I'm not versed enough in rag to know how you would fix that with like a prioritization or something. Other than that I'm REALLY looking forward how you guys tackle your own artifacts in the web interface. Something about that ui in Claude's version of artifacts works really well with my work flow when using the web, plus having the versions of different files, etc.
Has happened with both 4o and sonnet, probably 4o more if I had to say for sure. I need to use 405 more to see if it has that same problem. I guess I didn't think about how the issue might be better or worse depending on model, I assumed the rag stuff applied the same
Okay, wait this is actually doing a really good job.
I still have to ask follow up questions to get reasonable results but when I tested earlier this year it was outright failing on most of my test queries.
There is a method that could help immensely when answering questions like these. E.g. some of these question may be answered quite quickly using WikiData [0] (answer to question about the recipients of Medal of Freedom, query written with the help of Claude), instead of just scraping and compiling information from potentially hundreds of websites. I believe this idea is quite under-explored compared to just blindly putting everything to the model's context.
Part of the problem will not be solved by LLMs but maybe hiding aspects of one running. LLMs basically "think" "out loud" as it processes and produces tokens.
The amount of thought required to answer any of those questions is pretty high, especially because they are all sizeable lists. It is going to take a lot of thinking out loud, and detailed training data covering all those items, to do that well.
Why not let us try the new model for free like the 5 uses available for the 70B model? Seems like a no brainer to hook new users if what you're selling is worth it, eh?
> The model, based on Meta Llama 3.1 8B, runs on a Phind-customized NVIDIA TensorRT-LLM inference server that offers extremely fast speeds on H100 GPUs. We start by running the model in FP8, and also enable flash decoding and fused CUDA kernels for MLP.
as far as i know you are running your own GPUs - what do you do in overload? have a queue system? what do you do in underload? just eat the costs? is there a "serverless" system here that makes sense/is anyone working on one?
We run the nodes "hot" and close to overload for peak throughput. That's why NVIDIA's XQA innovation was so interesting, because it allows for much higher throughput for a given latency budget: https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source....
Serverless would make more sense if we had a significant underutilization problem.
Hey can you enhance the "clear history" button so that it doesn't delete your pinned threads? Also if you can improve the threads area with folders for better organization that would be awesome. Great work on the product btw!
const MyClass& getMyClass(){....}
auto obj = getMyClass();
this makes a copy right?
And it was very confident about it not making a copy. It thinks auto will deduce the type as a const ref and not make a copy. Which is wrong, you need auto& or const auto& for that. I asked it if it is sure and it was even more confident.
Here is the godbolt output https://godbolt.org/z/Mz8x74vxe . You can see the "copy" being printed. And you can also see you can call non-const methods on copied object, which implies it is a non-const type
You prove the point that these are just token generation machines whose output is psuedo-intelligent. It’s probably not there yet to be blindly trusted.
More to the point; I wouldn't blindly trust 99% of humans, let alone a machine.
Though to be fair we will hopefully quickly approach a point where a machine can be much more trusted than a human being, which will be fun. Don't cry about it, it's our collectives faults for proving that meat bags can develop ulterior motives.
So far an API has been less of a priority than focusing on the user-facing product. But it seems there's a reasonable amount of demand for it, which we'll consider.
I consider AIs without API access even as non existent. Not everybody wants a web interface and waste time on copy&paste all the time. APIs can hook the filesystem directly with an AI, make complicated prompt engineering and multi file changes a non-issue. And they should also help you to make more money (don't undersell the API access and you're fine).
Without an API the community can also not compare Phind-405B to other models easily.
Would be great to have access to your model in a LLM gateway like https://openrouter.ai/
You should also consider the ecosystem value that might be created for your product. There’s a prior example.
ChatGPT amazed people but its UI didn’t. A bunch of better UI’s showed up building on the OpenAI API and ChatGPT own. They helped people accomplish more which further marketed the product.
You can get this benefit with few downsides if you make the API simple with few promises about feature. “This is provided AS IS for your convenience and experimentation.”
I think an API would be fantastic for use cases like Aider / SWE agents. The primary issue besides fully understanding the code base is having up-to-date knowledge on libraries and packages. Perplexity has "online" models. And phind with Claude, GTP-4o, Phind 70 + search / rag would be awesome.
"Phind-405B scores 92% on HumanEval (0-shot), matching Claude 3.5 Sonnet". I'd love to see examples of actual code modifications created by Phind and Sonnet back-to-back. This level of transparency would give me the confidence to try to pro. As it is, I'm skeptical by the claim and actual performance as I've yet to see a finetuned model from Llama3.1 that performed notably better in an area without suffering problems in other areas. We do need more options!
I’ve been a customer of Phind for a number of months now, so I’m familiar with the capabilities of all the models they offer.
I found even Phind-70B to often be preferable to Claude Sonnet and would commonly opt for it. I’ve been using the 405B today and it seems to be even better at answering.
I’ve found it does depend on the task. For instance, for formatting JSON in the past, GPT-4 was actually the best.
Because you can cycle through the models, you can check the output of each one, to get the best answer.
The effectiveness of any given model depends on the specific use cases. We noticed that Phind-405B is particularly good at making websites and included some zero-shot examples in the blog.
I use it, with the phind models, instead of chatGPT. I had to change my user agent to Chrome since too many sites would refuse to work with FF otherwise, and now chatGPT is stuck in an endless captcha loop whenever I go there. I am just a casual user, to help write a quick script or to get some bit of relevant info. It works just as well or better for my use case, and of course having actual citations with links is worlds better than just playing "guess the hallucination". I am happy chatGPT kicked me out.
I was subscribed for about 6 months between the end of last year and beginning of this, but canceled and haven't looked back. The web interface was constantly buggy for me, and they seemed to be very focused on the VSCode extension without integrations for other editors, so I ended up canceling.
I use it periodically for things that I'd typically search on google and then read stack overflow for. I started this workflow before chatgpt had web search, so might be irrelevant now, but I've found it decent. Back then it was nice to be able to see the sources vs chatgpt just giving a random answer from who knows where.
I’ve use it since last year as a paid subscriber. I like it because of the technical nature as it will help you know the exact steps on how to get something done. I also use it for random things like bouncing ideas off or to enhance my knowledge retention of a subject.
I've been using Phind this past week and it's been excellent.
One of our vendors insisted on whitelisting the IPs we were going to call them from, and our deployments went through AWS copilot/Fargate directly to the public subnets. Management had fired the only person with infrastructure experience a few months ago (very small company), and nobody left knew anything about networking.
Within about a week, Phind brought me from questions like "What is a VPC?" "What is a subnet?" to having set up the NAT gateways, diagnosing deploy problems and setting up the VPC endpoints in AWS' crazy complicated setup, and gotten our app onto the private subnet and routing outbound traffic through the NAT for that vendor.
Yes, it occasionally spit out nonsense (using the free/instant model). I even caught it blatantly lying about its sources once. Even so, once I asked the right questions it helped me learn and execute so much faster than I would have trying to cobble understanding through ordinary google searches, docs, and blog posts.
Strongly recommended if you're ever wading into a new/unfamiliar topic.
How does it compare to Perplexity or even plain vanilla ChatGPT? Did you specifically seek to use Phind because you weren't satisfied with others? Or did it just happen to be the first one you used?
The day after this thread hit the front page, I tried perlexity.ai b/c Phind was overloaded with traffic and not responding. It was ok, but not quite as helpful. Too hard to tell whether that's a fair judgment of the services or just because I don't have as much to ask as I did a week ago though.
I asked it a question and it answered authoritatively.
> The impedance of a 22 μH capacitor at 400 THz is approximately 1.80 × 10^-24 Ω.
The correct answer should have been “what the hell are you talking about dumbass?”. Capacitors are not measured in henries and the question really has no meaning at 400THz. Another stochastic parrot.
The 405B model for me brings out a formula for calculating this, explains that my question doesn't make sense given that capacitors are measured in F instead and then plugs the values in assuming I made a mistake and meant 22uF.
It then explains that in practice other factors would dominate and the frequency is so high traditional analysis doesn't make so much sense.
I think that LMM should not produce answers to questions. Instead, they should generate keywords, make keyword search and give quotes from human-written material as an answer.
So what LLM should do is only search and filter human-written material.
An example of search query is "What is the temporal duration between striking adjacent strings to form a chord on a guitar?". Google (being just a dumb keyword search engine) produces mostly unrelated search results (generally answering what chords there are and how does one play them). Phind also cannot answer: [1]
However, when I asked a LLM what keywords I should use to find this, it suggested "guitar chord microtiming" among other choices, which allows to find a research work containing the answer (5 to 12 ms if someone is curious).
It'd be cool if you showed off and did your own comparison and posted it on your blog. It'd also be cool if your blog was sorted newest to oldest - it's currently the reverse.
Serious question: does the Meta LLama ToS / EULA even allow fine-tuned models based on Llama to be used for commercial purposes without making the weights available?
Ive given up on phind. The company seems like a bit of a dodgy black hole in terms of what they do with information (the answer is - i dont know, but i requested to try find out so my company could use them and got no reply). Seems untrustworthy…
It would be nice to see the Phind Instant weights released under a permissive license. It looks like it could be a useful tool in the local-only code model toolbox.
Some recent examples from my history:
what video formats does mastodon support? https://www.phind.com/search?cache=jpa8gv7lv54orvpu2c7j1b5j
compare xfs and ext4fs https://www.phind.com/search?cache=h9rmhe6ddav1bnb2odtchdb1
on an apple ][ how do you access the no slot clock? https://www.phind.com/search?cache=w4cc1saw6nsqxyige7g3wple
The answers aren't perfect. But they are a good gloss and then the links to web sources are terrific. ChatGPT and Claude aren't good at that. Bing CoPilot sort of is but I don't like it as much.