In my opinion it's a weak take that only got so upvoted and commented on hacker news because it has the existential and universal logical quantifier symbols in the title and also because it uses the chiasmus rhetoric device both in the title and at the end of the article.
The argument is that several technologies don't work 'out of the box' and you have to tweak their settings for each problem that you face, and that this means it's a scam. For example you have to change prompts in LLMs or change hyperparameters in other machine learning solutions. This argument is some combination of not insightful and not true.
> The argument is that several technologies don't work 'out of the box' and you have to tweak their settings for each problem that you face, and that this means it's a scam.
That's not what the argument is. The operative keyword in the argument is the word "sell", not the quantifiers themselves. If you're not "selling" it under that premise and you're open about the nuance, then it's not a scam.
Intentional or not, your representation of the argument is a scarecrow.
You might argue that this is semantics, but it's not. It's literally the point that is driving real execs desire to replace actual humans with GPT, right now.
I'm happy to have learned about the chiasmus thing, I hadn't seen that before. In general, better writing, including the use of rhetorical devices, is what gets attention. It's better than some clickbait. Personally I though the use of symbols was a distraction (spurred some useless comments) but maybe that did attract people.
My take is that the argument boils down to saying that overfitting is a thing in LLMs too, but it comes in at the prompting phase. That's almost obviously correct, but I hadn't seen it voiced explicitly before.
The addition of "scam" in the title was a bit of hyperbole, which is probably responsible for a lot of the views and responses.
Anyway, it seems to have triggered a lot of very reactive and defensive takes, which isn't surprising but also isn't really necessary.
In a vacuum yes, but the rhetorical device in question isn't really a standard clickbaity one is it, it's more literary. It's also logical for good writing to attract more views.
I wonder if there is a name for this. Antimetabole comes to mind but that usually requires two clauses. I suppose it is a "chiastic pattern" which is a more broad definition, which I believe also applies to smaller patterns although I have only seen it used for larger patterns like chapters.
Yes. I initially thought that the post makes a subtle point along the lines of
"if you can show that a system can solve all problems in a domain for a parameter choice, that doesn't mean it can solve all problems in the domain for the same parameter choice".
However, this is clearly trivially true, and it and doesn't mean that it may not be advantageous to map a problem from problem space to system parameter space. Hence, I fail to see how this would be a scam in general.
but the cello isnt being spun by the musical instrument press (and cottage industry of self-appointed advisors to wealthy royal music patrons) as a self-playing enchanted wonder capable of performing any piece, real or imagined, with limitless creative and technical skill and requiring little more guidance than vague requests hummed out of tune by its handler.
Playing Cello is a profession, not a Hobby. Nobody wakes up in the morning thinking: I need to play some Cello. It is only needed as part of an orchestra. Whatever else you learned about this is mostly an invention of the media.
Amateur musicians outnumber professionals, on most instruments. There are entire amateur orchestras.
I'm am amateur cellist. I took cello lessons as a kid, and played in a community orchestra. I also played in a lot of smaller (so called "chamber") ensembles, and learned some solo cello pieces. I never expected to pursue cello as a career.
At the start of the pandemic, I got out my old student cello and started playing it again, strictly for my own pleasure. I will never get good enough to play classical music for money. I have found some jam sessions where my cello is welcome among folk musicians, and have also begun learning to play jazz on the cello.
It sure is a hobby, but one that's not for everybody. Instant gratification? Forget it. Endless struggle? You bet. Most amateur cellists learned the instrument as kids, but that still accounts for a lot of people.
Having seen my daughter progress from a 4 year old beginner on the cello to eventually playing in a state orchestra and now teaching it, I guess my take is that fundamentally it is a beautiful instrument. It is designed to resonate close to the human voice. But like the voice, there is an infinite amount of variation - on where to press the fingerboard, how to move the bow. Just like learning to speak and sing with articulation takes years of experience. She became extremely proficient, but just giving her a well made cello didn't achieve anything much. It was only with many hours of coaching and practice refining the input based on the output.
Fretless string instruments like the cello and the violin are notoriously hard to even get a good note out of. Beginners just learning to play tend to produce a sound not unlike a cat being strangled.
Playing a cello takes real skill. Even producing a single good note takes practice. Whereas on a piano, producing a good note is trivial. The skill comes in playing a complex requiring 10 fingers.
So I guess the author of the article expected these new AI systems to be more like a piano, but ue got a cello instead.
If I understand correctly, the meat of the argument is "that is a system for every (∀) task, there exists (∃) a setting that gives the correct answer for that one task."
My understanding of this (correct me if I'm wrong) is that the scam is convincing users that GPT-X can do anything with say, the correct prompts.
This argument misses the mark for me. It's not that it solves all the problems, it's that the problems it does solve is economically impactful. Significantly economically impactful in some cases- obvious examples of call centers and first-line customer support.
> Significantly economically impactful in some cases- obvious examples of call centers and first-line customer support.
Is it that obvious?
Yesterday I had a trivial but uncommon issue with my pharmacy. I reached out to them online - their chatbot was the only channel available. I tried, over the course of 20 minutes and 3 restarted sessions, to communicate an issue that a human would have been able to respond to in 30 seconds. Eventually I just gave up and got the prescription filled elsewhere.
No doubt this pharmacy saved money by cutting support staff. I just think it's easy to see these solutions and cost savings without bothering to look at how much of a frustrating experience it can be for a customer.
I’d be surprised if something like a pharmacy managed to adopt a tech that quickly. In my experience non-tech industries often take quite a while to adopt.
I have seen plenty of chatbots used by my IT company and 3rd party suppliers I deal with. They really just turn what used to be phone tree to something text based. Pretty basic keyword search and recipes from my experience - that I usually like to escape to a human as soon as I can. I welcome a proper conversational AI chatbot that actually gets stuff done.
How are you gonna answer things like pricing? Issue with pharmacies is the super complicated and super secretive pricing structure. A good UI can solve this if they want to drop the secrecy.
I don't think so, I assume it's some more dated ML approach -- my point is moreso that it's not obvious that it's a good solution for this problem yet.
The argument is more nuanced. Importantly, the article is not making a judgement on the value of GPT, generally (at least not explicitly). It is arguing against one specific narrative.
That narrative goes like this:
Step 1: give me a specific task T from a class of tasks C.
Step 2: I’ll show you that I can formulate a prompt to solve task T.
Step 3: therefore, if you engineer prompts well enough, you can find a single prompt that will solve any task in class C.
The argument is that Step 3 is a non sequitur and that’s the scam.
The question isn’t “is GPT useful” so much as “can products be built on top of GPT?”
More formally, the system specifies a set S of predicates that it can evaluate based on configurations (prompts etc) applied to the general method.
You have a particular predicate p that you want to evaluate, in some larger space of “appropriate” predicates P, then usually machine learning gives you some claim that,
∀p in P. ∃s in S. p(x) → s(x)
Call this last bit “weak prediction,” p is not easily computable or else you would not use machine learning but machine learning can compute any s in S and if we find this s then we can use s(x) as evidence that p(x) by Bayes theorem or so. Machine learning has never affirmed strong prediction, in particular there have always been ways to maliciously modify; you look at the details of the machine s, you have s(x,y) “I classify an x as a y” and s(x + Δx, y'), “I classify an x + Δx as a completely different y',” where humans literally cannot tell the difference between x and x + Δx, the alterations are in the “noise” of the data.
The “scam” is that you then tell people to hand calculate a subset X ⊆ p, so p(x) for any x in it, and then sell this as,
∃s in S. ∀x in X. s(x)
What's the problem? I think the claim hints that there are a few:
1. Selection bias in claimed accuracy. You generate candidates s¹, s², s³, ... and analyze their accuracy over X to pick one, say the one that gets 96% of X right. The accuracy of the selected solution sⁿ is not 96%, that's lying to yourself. The proper way to use this is to partition X into X¹ + X², use data X¹ to select sⁿ, then evaluate its actual accuracy on X². (This is how I originally read the article.) In particular there is a loss of p(x) from the right hand side of the new expression, suggesting “alignment” lapses, e.g. the machine learning algorithm that appeared to “learn” to identify tanks but actually was identifying clouds in the sky.
2. Selection bias in terms of problems solved, researchers generate problems p¹, p², p³ ... in P and we hear about pⁿ having solution sⁿ with 99% accuracy, this gives us an unrepresentative idea of what the success rates look like in P overall. (This is how I read your take, and meshes better with the final comments in the post.)
3. This broader context looks suspiciously like a problem where you can drive the false positive rate arbitrarily low by raising the false negative rate arbitrarily high, which roughly might explain the tendency of ChatGPT et al. to hallucinate.
#1 it’s not clear that this ∃ ∀ construction is a fair representation of what is being ‘sold’ by GPT-x
#2 it’s also not clear what this proposed inverted formulation (∀ ∃) that describes what the author thinks GPT actually is even means. For every setting there exists a task that it answers? Does that even make sense?
there exists a setting which will work for all your test points.
The caveat of the author is (I think) that if you have a task, you collect a set of points (questions) on which you will test this task. Then you tune your setting (prompt) to start working for your test point (questions).
After that procedure, you do not know if that prompt solves the original task. You might have overfitted to your test points.
And by repeatedly doing this overfitting for various tasks, you are not gathering evidence that a good setting truly exists for all tasks
You can go as far as claiming that it's true for your colleagues as well! They've solved their piece of the job so far, but what evidence do you have that they will keep solving them in future? It's all just speculation!
Informally, "this thing can solve all your problems" vs "for each of your problems there is a thing that can solve it".
I suppose the argument is that LLMs are not a solution to any problem, they are a complex tool which might be used to find a solution, with non-zero effort.
As an example of non-zero effort: I spent a fair amount of time the other day trying to get chatGPT to advise how to effectively deal with a grey squirrel problem. It was more interested in telling me that squirrels should be treated humanely, to the extent that it suggested doing things that are illegal in my country (releasing a captured grey squirrel). I asked why and it told me all animals had a right to dignity and respect. I couldn't resist getting side tracked by this. After some light trolling I asked it about how it had come to hold these values and it told me that as an LLM it didn't have values, but then restated its position anyway.
In the end I got some more sense out of it with a new prompt where I specifically said I was interested in effective, legal methods of control without any moralising.
If you're concerned, I haven't killed any squirrels and almost certainly won't.
Right, but this seems like strawmanning to me. The vast majority of useful technology ever developed has been "a complex tool which might be used to find a solution, with non-zero effort".
The complaint here seems to be about the existence of marketing.
Since my blog post is linked, I wanted to clarify something. While this appears to be the broad message, I don't think the author intended to imply this about my post specifically, but I still feel the need that I point out the following in my prompt eng blog post (linked by the OP)[1]:
> To start, you must have a problem you're trying to build a solution for. The problem can be used to assess whether prompting is the best solution or if alternative approaches exist that may be better suited as a solution. Engineering starts with not using a method for the method's sake, but driven by the belief that it is the right method.
Crucially there is not a program for every problem. Many (presumably "Almost all" in a mathematical sense) problems are Undecidable and so a program can't do that.
> There exists a program every problem you have, you just have to find the code.
That proposition is that of general programming languages.
The compiler's proposition is that it will take your programs and faithfully per the language specification produce object code suited for its intended deployment platform.
So
> Are [general purpose programming languages] a scam as well?
Well, kinda. Most languages come with a hype brigade and before you know it, darling of the day is "the only rational choice" for writing. Also, many b.s. about solving problem x whereas they only hand the problem over to something else, sometimes the programmer. So (mildly) yeah.
Not at the moment. but there was a time that some thought that we should be able to produce a general purpose programming language, to solve every problem. One size fits all.
It took more than 40 years to realize that was never going to be.
Your claim doesn't make any sense. The stored program electronic computers are a realisation of the concept from Turing, but Turing's work is based on Kurt Gödel's work, and the whole point of what Gödel showed was that no, you can't do that.
There's a period of maybe a couple of decades at the start of the 20th century where Whitehead and Russell have used logic to prove that 1 + 1 = 2, and it seems like maybe all of mathematics can be placed on a firm foundation, and then Gödel shows that oops, no, actually a logical system can't prove some true things, or, it can contradict itself. No firm foundations for us.
Thanks for pointing that out, I'll add it to the list of words I'm supposed to use to sound smart.
However, I did find the main argument compelling, through my own waste of time (yes I learned it the hard way) I've come to acknowledge that the tradeoff for prompting GPT in order to get a valuable answer, is just not worth it.
However, it does seem that most of the people are intrigued by "it can answer anything with the right prompt" promise, and they devote a lot of time in order to fulfill it.
You cannot trust GPT, you cannot rely on it and it can't replace anyone, until it learns to prompt itself.
I think what this article really implies is that - we the humans are doing the quality assurance for the GPT answers, if we take that out of the equation, and we don't give it quality prompts, etc.
I'm no linguist but it seems to me that that word doesn't really work for that definition. It sounds like it should pertain to hidden form, not hidden similar form, like say, cryptoisomorphic.
You're right there. It's amusing to me because I suspect it is difficult or impossible to meaningfully define "obvious". In my head, it matches "cryptozoology", where the "crypto" denotation kind of includes "pseudoscience".
But also "hidden shape" has a loose implication of "hidden equivalent shape".
Sometimes you hope to get a system that works without constant tweaking. Once you've adjusted it properly, it should work without adjusting the settings. So there's one setting that works for all the inputs you care about.
But instead you have a system that you have to adjust all the time. For every input, you can get it to work. But you're never done tweaking.
This is actually fine if you're expecting the user to do some work. Consider doing a Google search, seeing what you get, and then modifying your query when you didn't get what you want.
It sucks if you're hoping to go into fully-automatic mode. You didn't get that, you got an interactive application. (Or maybe the tweaks aren't reasonable for the user to make themselves, so you got nothing?)
So the scam, if there is one, is selling something as fully-automated when it's not. It's a tool that requires a person to work with a machine, and you're never going to be able to take the human out of the loop. (Let's say you're hoping for a driverless car but you never quite get there.)
Often, it's not the end of the world. An interactive tool can be very valuable! But you need to set your expectations appropriately.
> Also, as they the hyper-parameter selection fully to the users, they become not falsifiable. If they didn’t work, it is because you didn’t pick the right hyper-parameters or training procedures.
The author says that the builders of these systems are also the ones that run the scam. And they deliberately build it in a fashion that enables the scam. Seriously?
This is a gripe with sales and marketing. A tale as old as time.
The article provides abbreviated definitions that might be less confusing to follow than the comprehensive Wikipedia page. But, I agree that calling them out a bit more clearly in the article would have been helpful.
I understand this as question on how these symbols are pronounced (the names are in the article if you are curious): actually "for all/every <...>" and "there is/exists (at least one) <...>".
Example: ∀ x∈ℝ\{0} ∃y∈ℝ : x*y=2
"For all values x from the real numbers excluding zero, there is a value y from the real numbers so that x*y equals 2."
He, imo correctly, puts them both in the category of extra degrees of freedom that can allow the user to overfit and get results that appear more impressive than the underlying reality about how the model has generalized.
But generating new prompts for GPT is incredibly cheap! You just rephrase and ask again.
That there are prompts which generate impressive results with GPT is the point. Because anyone can generate prompts - and get impressive results.
Whereas hyper parameter tuning is expensive. A system that generates good results with the right tuning doesn’t tell you much about your ability to use it to generate good results, because it will be hard to try lots of different tuning approaches to discover if you can get a useful result.
Think about when you use GPT to generate prompts, something that seems to be growing common. Once you have a pipeline like that, changing the (meta)prompt can be expensive. You designed the pipeline thinking you only have to come up with the meta-prompt once, and it would work from now on. But you find you have to keep tuning your Kubernetes/upgrading your dependencies/tweaking your meta prompt.
"Scam" is overblown, but I think OP is right to warn of a possible future issue. It's an issue endemic to all software, so not something worth calling out recent AI advances in particular for. But it seems something we should all be trying to get better about. Monitor the ongoing costs of the systems we create. Do they really pay for themselves or are we waiting for a Godot that never arrives?
That's not an insight, it's a misunderstanding. Overfitting is only applicable relative to claims of statistical performance.
And in any event, there are lots of systems with fewer degrees of freedom (or in the case of deep learning, more generalization potential) than the training data, that are not at particular risk of being overfit, and there are measures and tests to mitigate the risk of overfitting. It's not some inherent characteristic of "systems".
ChatGPT itself is in on the scam. The way i think about it is that ChatGPT is already superhuman at bullshitting, many people want to give it credit for being more capable than it really is.
it is interesting to postulate if it is the “most likely word” heuristic that leads to this behavior (e.g. never says anything that startles people) or HFRL training systematically teaching it to say what people want to hear.
GPT-4 is very useful to me right now, for small programming projects. Though it can't write an entire program for a nontrivial project, it is good at the things I hate doing, like figuring out regular expressions and SQL commands. I smile broadly at the fact that I may never have to write either of those things again. And GPT-4 knows of the existence of countless software libraries and modules that I've never heard of. It doesn't always use them correctly, but just alerting me to their existence is tremendously helpful. It can usually answer questions about APIs correctly. I have no idea what impact LLMs will have on the world as a whole, but they will clearly revolutionize coding.
I can walk and chew bubble gum at the same time: on one hand, yes, there's certainly a lot of Kool-Aid being drank by the AI folks. Even on HN, I constantly argue with people that genuinely think LLMs are some kind of magical black box that contain "knowledge" or "intelligence" or "meaning" when in reality, it's just a very fancy Markov chain. And on the other hand, I think that language interfaces are probably the next big leap in how we interact with our computers, but more to the point of the article:
> To conclude: one must have different standards for developing systems than for testing, deploying, or using systems.
In my opinion, you unfortunately will never (and, in fact could never) have reliable development and testing standards when designing purely stochastic systems like, e.g., large language models. Intuitively, the fact that these are stochastic systems is why we need things like hyper-parameters, fiddling with seeds, and prompt engineering.
The Microsoft Research "Sparks of AGI" paper spends 154 pages describing behaviors of GPT-4 that are inconsistent with the understanding of it being a "fancy Markov chain": https://arxiv.org/abs/2303.12712
I expect that the reason people are constantly arguing with you is that your analysis does not explain some easily testable experiences, such as why GPT-4 has the ability to explain what some non-trivial and unique Python programs would output if they were run, despite GPT-4 not having access to a Python interpreter itself.
> trivial and unique Python programs would output if they were run, despite GPT-4 not having access to a Python interpreter itself
Trivially explained as "even a broken clock is right twice a day." I skimmed the paper, as it was linked here on HN iirc. First, it was published by Microsoft, a company that absolutely has a horse in this race (what were they supposed to say? "The AI bot our search engine uses is dumb?"). Second of all, I was very interested in their methodology, so I fully read the first section, which is woefully hand-wavy, a fact with which even the authors would agree:
> We acknowledge that this approach is somewhat subjective and informal, and that it may not satisfy the rigorous standards of scientific evaluation.
The paper, for instance, is amazed that GPT knows how to draw a unicorn in TikZ, but we already know it was trained on the Pile, which includes all Stack Exchange websites, which happens to include answers like this one[1]. So to make the argument that it's being creative, when the answer (or, more charitably, something extremely close to it) is literally in the training set, is just disingenuous.
> Trivially explained as "even a broken clock is right twice a day."
Trivial, vacuous, and wrong. It is not plausible to correctly predict the output of a serious Python program by coincidence. See the detailed examples in the paper -- one of which was pseudocode, not Python -- to see how silly this claim sounds.
> First, it was published by Microsoft, a company that absolutely has a horse in this race
Firstly, this is evidence that you are neither very familiar with Microsoft, nor with academic or industrial research labs. Microsoft Research (the affiliation of the authors) is practically a different company, and run more like a university lab. It is somewhat deeply insulting to their (often still primarily academic) researchers to suggest that they would publish a misleading puff piece to benefit the commercial arm.
Secondly, while the paper describes itself as qualitative, you can reproduce the major claims yourself (and I have).
> familiar with Microsoft, nor with academic or industrial research labs
The idea that Microsoft Research would publish anything remotely damaging to Microsoft is beyond naïve. I mean, one of their core tenets is "Ensure that Microsoft products have a future," but okay.
You're one of the people that will be yelling to everyone else in a potential future "We're only seemingly oppressed! It's just a parlor trick that they've turned most of humanity into paperclips, sci-fi authors wrote about this already!"
I don't think GPT-4 is magic, but unless the unicorn is literally an exact replica of something from it's training set, it clearly has "knowledge", and it's weird that you'd try to deny that.
Do you think the card catalog down at the local library is sentient?
Maybe that's not enough data, though!
Is the card catalog for NY Public Library sentient?
Maybe that's still too local.
Is Google sentient?
Everyone with a clue would admit these are all examples of "knowledge"
It's the same parlor trick, with fancier algos. It's not intelligence. It won't produce a human-level AI.
Period.
It's not the amount of data, it's what it does with that data. Try typing "Tell me a story about a unicorn arguing with people on Hacker News" into a card catalog and tell me how good it is at storytelling. Typing that into GPT-4 might not win any literary awards, but it obviously understands what you meant and does a passable job.
> but unless the unicorn is literally an exact replica of something from it's training set, it clearly has "knowledge", and it's weird that you'd try to deny that
Speaking of weird, that's a very weird definition of knowledge. Because in that case, then almost any data transformation operation implies knowledge. MS Word thesaurus? Knowledge. Search and replace? Knowledge. Markov chain[1]? Knowledge.
Knowledge is one of those vague words like conciousness that has people arguing past one another. However, yes, an interactive thesaurus has "knowledge" of a very limited sort. And LLMs have much more "knowledge", and are able to synthesize novel things from that knowledge.
You can argue "it's only seemingly got knowledge!" all you want, as everyone else enjoys increasingly capable AI.
Good point. While it seems obvious to me that LLMs can never be anything more than fancy Markov chains, in my experience it seems the majority of human "logic" does not operate much differently. Very rare to encounter someone who is able to think or speak critically. Most regurgitate canned responses based on keywords.
I'm gonna respond to you, because i think you like GPT4 and i do too (even if the only use i trust for now is "Resume me this **lot of text/research article** in less than 200 words", which is already great for a knowledge hoarder like me)
You can think against yourself, a LLM have troubles doing so. Also, they fail spectacularly when asked to do real-life operation: "I have to buy two bagettes at one euros, then five chocolatine, croissants and raisin bread at 1.40, 1.20 and 1.60 respectively, how much should i take with me?" when in my head, i just know it'll be between 20 and 25 in seconds (and in fact it's 23, i took random numbers but they are quite easy to add).
> You should take 23 euros with you to purchase all the items.
Are you sure you're using GPT-4 and not 3.5? GPT-4 is incomparably more competent compared to GPT-3.5 on logical tasks like this (trust me, I've had it solve much more complicated questions than this), and you aren't using GPT-4 on chat.openai.com unless you're paying for it and deliberately picking it when creating a new chat.
Edit: Here's an example of a more complicated question that GPT-4 answered correctly on the first try: https://i.imgur.com/JMC7jsw.png
Funnily enough, this was also a problem that a friend posed to me while trying to challenge the reasoning ability of GPT-4. As you can see (cross-reference it if you like), it nailed the answer.
The rare humans who don't speak any language (or animals, for that matter) can still think, which shows that thought is more than manipulating language constructs.
Well, for one, humans are obviously at least more than a fancy Markov chain because we have genetically hard-wired instincts, so we are, in some sense, "hard-coded" if you forgive my programming metaphor. Hard-coded to breed, multiply, care for our young, seek shelter, among many other things.
Markov chains, like any algorithm, are hard-coded. And just as evolution hard-codes our genes, supervised learning (and in the future reinforcement learning) hard-codes LLMs and other AI models.
>contain "knowledge" or "intelligence" or "meaning" when in reality, it's just a very fancy Markov chain
These are not mutually exclusive. If you have a Markov chain that 100% of the time outputs "A cat is an animal", then it has knowledge that a cat is an animal.
Knowledge is awareness of information. "Awareness" is a quagmire because a lot of people believe that 'true' awareness requires possessing a sort of soul which machines can't possess.
I think the important part is information, the matter of 'awareness' can simply be ignored as a philosophical/religious disagreement which will never be resolved. What's important is: Does the system contain information? Can it reliably convey that information? In which ways can it manipulate that information?
"Awareness of information" describes belief. Knowledge is justified, true belief (you can believe things you don't actually know/don't have justification for, and you can be made aware of information you don't believe). If you're dismissive of philosophy and then ask epistemological questions, you'll miss out on a lot of good pondering people have done on the subject, and end up reinventing some of it without encountering the criticism of those ideas.
2. "Belief" isn't any less of a quagmire than "awareness." Materialists and dualists will never agree on whether machines can have 'belief' or 'awareness', so discussions between the two will always be fruitless.
If you're asking explicitly epistemogical questions, knowledge as in "do you have knowledge of the events of last night" is probably not the definition you want. You probably want want the definition as in, "what is knowledge, what do I know, and how do I know it?" (Note the definition I used is also there.)
You're asking questions and then declaring the answers impossible to determine, I don't really see the point. You don't really avoid the question of belief in the line of questioning you propose. It just gets implicitly shifted into the observer.
Personally I don't care about whether this paradigm will ever reconcile with that one, I care about which I think is most appropriate to a given problem space.
> You're asking questions and then declaring the answers impossible to determine
If you mean to say that I've asked whether machines can know or believe, you're wrong. I have not asked whether it's possible for machines to 'know' or 'believe'. What I have asserted, not asked, is these questions are a waste of your time, because the divide between materialists and duelists will never be bridged. The the root of the disagreement is an irreparable philosophical divide, essentially religious disagreement.
To reiterate for clarity, these are the questions which I said are relevant: "Does the system contain information? Can it reliably convey that information? In which ways can it manipulate that information?" I haven't declared these questions impossible to answer. On the contrary, these are questions for engineers, not philosophers or theologians. They are mundane, practical questions:
The system is a spreadsheet: Does it contain information? Yes, assuming it isn't blank. Even a fraudulent spreadsheet contains information, false as it my be. Can it convey that information? Yes, given appropriate spreadsheet software and a user who knows how to use it. Can it manipulate that information? Certainly, a spreadsheet can sort, sum, etc.
The system is an AI: Can it contain information? Yes, plenty of information is fed into them during training. Can it reliably convey that information? That depends on the degree of reliability you desire. Can it manipulate that information? Yes, numerous kinds of manipulations have been demonstrated. The reliability of information conveyance and the manner of manipulations which are possible are important questions for any engineer who is thinking about creating or employing such a system. The answers to these questions are not impossible to determine.
But can an AI "know" things? Pointless question, like asking if a submarine can "swim". Important questions about submarines include: How deep can it go? How fast can it go? How quiet is it? These are questions for which empirical answers can be determined. Whether a submarine can "swim" is a pointless question, all it does is interrogate how much anthropocentric baggage the word "swim" has. Maybe that's an interesting question to linguists, poets or philosophers, but it isn't an important question to engineers trying to solve a real problems.
I know swimming submarines are cliche so here's another: Can a seat-belt hug you? That's a stupid question for poets or linguists who want to interrogate the anthropocentric implications of the word 'hug'. Can a seat-belt restrain you? That's a useful question for automotive engineers who want to build a car.
The heart of the issue is always the same: is a perfect simulation actually the same thing as what it simulates. I would argue that yes especially when the definition of knowledge or intelligence is already so fuzzy but some people will probably always disagree.
If it requires interpretation, than it is the "yes+human" system that has the knowledge.
What really happened here is that a human wrote down, "a cat is an animal," and then another human read it, understood it, and believed it. And so the knowledge moved from one human to another. `yes` was only a conduit for that information to travel through.
If something was a conduit for knowledge wouldn't it make sense that it would have at some point contained the knowledge? The knowledge is stored into it by one human and extracted by another human.
Absolutely you can store knowledge in text. You can store quite a bit of knowledge in a book for instance, but the book doesn't have any beliefs and doesn't know anything. Whether a sufficiently complex Markov chain or ANN can have beliefs, I don't know but I'm skeptical that these ANNs do in particular.
It's ability to produce text containing true statements isn't sufficient evidence to conclude that it has beliefs, and it's easy to find cases where it contradicts itself (eg, if you play around with the wording you can find a prompt where it tells you that solving the trolley problem is a matter of harming the fewest people but proposes a solution that harms the most people). I take that as an indication it's primarily regurgitating text and rearranging the prompt rather than applying knowledge (which, to be clear, is useful for a number of tasks).
I think there has always been a great deal of Stone Soup in the software world. We are often promising solutions that will work as long as all the ingredients are supplied in the form of processes delivered by the customer.
Strangely though the very promise of the magic stone does allow solutions (soup) to emerge.
> Convince the user that it is their job to find a instantiation or setting of this control to make the system work for their tasks.
As opposed to convincing the user that it is their job to brief a suitably qualified contractor or employee to make a company perform the required work?
An operating system does many things.
A general purpose language when contrasted with a domain specific language.
A human programmer thinking that they're a wizard that can solve all problems.
However I get the sentiment I think, it's a little bit like snake oil and alternative medicine. They promise to solve all your problems.
What's interesting to think about is who it is to "blame". The developers of the product, the sales people of the product, the wild imagination of the buyer thinking the product can solve everything, or maybe the wild imagination of the developers behind the buyer. However if the seller can set off a wild imagination in the buyer, maybe you've successfully scammed them.
No. But I'm old enough to remember how the people that bought the first IBM PCs complained that the computers were useless. Of course they needed much more than the very expensive machines: software, training, peripherals, backups...
They were scammed by the vendors, even if the product was not a scam.
It is difficult to conclude this is true, and it will be even more difficult in the future because tech like "AI" has the ability to almost completely saturate the amount of data any human can ingest. The relationship will soon be symbiotic, with everything showing the extent of our progress... in the same manner we can date movies by the kinds of phones they use. Many plots will become stale, many worldviews will be condensed to something that supports "AI" and the offending branches will be snipped. The only way to really forget this limitation is to be myopic enough to disregard everything else. With the way the internet is going, I'm sure one day these LLMs will be heralded as "free" and "open" media, their fuzzy recollections the only records we will have of the past, and extensive use will essentially morph civilization in their own image.
This has always been present in subfields of AI. For example, in classical computer vision, one had to figure out specific parameters working just for a single image or video scene by hand. Machine learning can in theory at least make these parameters learnable at the cost of complexity.
>To conclude: one must have different standards for developing systems than for testing, deploying, or using systems. Or: testing on your training data is a common way to cheat, but so is training on your test data.
Isn't this already a solved problem? Every reasonable paper on ML separates their test data from their validation data already.
That in no way prevents overfitting though hyperparameter optimization / graduate student descent. All the common benchmarks, by definition of being a common benchmark, are susceptible to overfitting
That's why you split it 3-way into train, validation, and test datasets, to ensure you didn't use too many hyperparameters and overfit to the validation data.
The idea is that you do all your hyperparameter optimization with the test data and then only run through the validation data once before you submit your paper.
In the second part of the essay the writer mentions the statisticians fallacy.
"Adding more hyperparameters leads to overfitting."
This is not true with large neural networks anymore and is part of the magic.
To be fair, on the pure logic side (forgetting about what the link is really about, i.e. selling scams), it is true that any ∀ ∃ can be turned into a ∃ ∀. It's called skolemization.
It is a superficially clever argument. It's not actually a clever argument because it elides the existence of "but easier" or "but faster" as mechanisms for valid business models.
He is making a substantive point, but you are rejecting him out of hand due to terminology which you feel signals he's outside a clique. Who is the one without an adult argument?
It's a good point, I'd consider it (overfitting) a pitfall or common mistake in ML rather than the only mode. I'd agree that most ML models and almost all state-of-the-art are over-fit to the point of being useless, but that's not an inevitability.
The argument is that several technologies don't work 'out of the box' and you have to tweak their settings for each problem that you face, and that this means it's a scam. For example you have to change prompts in LLMs or change hyperparameters in other machine learning solutions. This argument is some combination of not insightful and not true.