> Ironically, Tao's post convinces me that AI, though amazing, isn't really the ...

brian_cloutier · on April 12, 2023

To reinforce this point, we have known for a very long time that better UX and data quality has innumerable benefits. Remember back when everybody was excited about web 2.0 and all the amazing mashups which would soon become possible?

If you are producing data then exposing it in a nice programmable format is an extra cost and generally provides you no benefit. It usually hurts you, if people stop visiting your site and see fewer of your ads!

This is "really" a problem of incentives. It is usually not possible to capture any of the positive externalities of exposing your data. So maybe we could convince everybody in the world to switch to using different browsers with a native micropayment system; that might incentivize everybody to release all data as clean machine-readable tuples.

What I'm saying is, the phrase "Better UX and data quality" ignores just how hard that solution really is. It turns out training an LLM over most of the internet is _easier_ than global coordination.

gpderetta · on April 12, 2023

i.e. semantic web was dead on arrival.

ingenieroariel · on April 12, 2023

But chatgpt can crawl a semantic web and use it without us knowing.

I have asked a langchain bot about wikidata ids for specific places, links to the page, to read it and then to answer facts about places and got very good results instead of made up numbers.

Wikidata links to FIPS codes, OSM ids, GeoNames and that gives us an opening to link against the cool datasets from Flickr, Foursquare and others who have created gazetteers.

To me, Semantic Web was dead on arrival because of its UX, but now a semi-smart agent can help us get past the UX problems and jump from plain text to json output.

bumby · on April 12, 2023

I think it's a great tool but I think a generous interpretation of the OP is that it's solving problems that aren't necessary in the context of a better overall system/process.

A woodworking example is that a planer is great tool that helps you make nice flat surfaces. But, to a certain extent, it's a downstream fix that wouldn't be necessary if a carpenter was using a better overall process. I.e., if their upstream process for cutting/ripping wood made nice flat surfaces to begin with, the awesomeness of the planer becomes moot. (Apologies to the legitimate woodworkers if this analogy is off).

Where tools like GPT becomes invaluable is when you have no control over those upstream processes but still need to get the job done. But leveraging a tool for a downstream fix when upstream fixes are possible is usually a less-good approach to creating good systems.

gpderetta · on April 12, 2023

You mean it "only" solves problems in the real world and not in the ideal world?

bumby · on April 12, 2023

That's not what I mean, unless you assume that you have no control over other elements of the process.

To torture the woodworking analogy, your assumption is that the carpenter has no control over ripping the boards. In some instances that may very well be the case, but there will also the instances where the carpenter does have influence over creating the boards, or even wholesale control over ripping them. In those cases, using a planer to fix poorly ripped boards may not be the best approach.

OkGoDoIt · on April 12, 2023

How often do you actually have control over the entire process of anything? Even website development, unless you’re going to handle raw TCP sockets, you’re going to build on top of someone else’s tools one way or another. And in a business world, you almost always have to deal with other teams, other people, other priorities. Even when you run a company, not all of your employees and partners can always do things exactly the way you want. Having a flexible tool that works in the real world on real world data on top of real world processes is incredibly valuable.

bumby · on April 12, 2023

I think you're engaging in some dichotomous thinking. I'm not making the claim you'd have to have "control over the entire process". What I'm cautioning against is just looking to the tail end of a process and assuming that's where you need to add leverage.

Even in your examples, yes, you have to work with other teams. "Control" doesn't mean you have dictatorial control over those teams. But it does sometimes mean you have build relationships, leverage what you can, and explain the value to those that do have some modicum of control. The idea that we just throw our hands up and jump to workarounds is often an excuse for taking the short-term easy at the expense of a better long-term solution.

xyzzyz · on April 12, 2023

I like your planer analogy, though it is indeed off for woodworkers, but in a way that’s rather nuanced, so it doesn’t inhibit getting the idea across to normal people who don’t know a lot about intricacies or working with wood.

There are a couple of reasons why it would be hard to change the upstream process to not necessitate planers. The main one is that logs are typically ripped into boards when the wood is still green, and in the process of drying, boards change shape and dimensions: they bow, cup, warp, and shrink, and you might still need a planer to bring them back to flatness and to desired final thickness.

weaksauce · on April 12, 2023

I guess the only way to change it would be to rip them when they are green... dry them to a decently low moisture content and then plane them again. it's still a bit off since wood is never static and the differential in moisture between the shop and your shop can also change the wood. you really do need to dimension it after it has stabilized in your shop for a spell.

rdslw · on April 12, 2023

Nope.

Cost of implementing better process for all carpenters is significantly higher, than all carpenters still using bad process + _one_ AI being able to clean it for carpenter, plumber, translator, developer (you name it, you got it).

Not even entering laziness/corposlowness gardens etc.

bumby · on April 12, 2023

This is an misunderstanding of how processes typically work for a couple of reasons. For one, it assumes the "cost of implementing better process for all carpenters is significantly higher". This may be the case for Tao, where he has limited control over the inputs, but probably not the case for the woodworking analogy, for a variety of reasons. You are essentially advocating for "rework" to fix problems which is considered a unnecessary waste in process design.

"Corposlowness" is just another name for "bad processes". It supports the claim rather than negates it. Using AI to overcome bad bureaucracy makes it a workaround, not an idealized process. What often happens when implementing workarounds rather than good processes is that the workaround can create bloat and waste of its own and overtime, not really fix the problem. Like hiring more administrators for a large organization, they can take on a life of their own, eventually becoming divorced from the problem they were intended to solve.

Again, I'm not saying that AI is misapplied in Tao's case. I'm just cautioning that it's not a panacea for bad processes. In many ways, it can be misused as a band-aid for bad processes, just like creating excess inventory is a band-aid for bad quality control.

endisneigh · on April 12, 2023

you have better explained my own point, haha

waynesonfire · on April 12, 2023

i still don't get it. i only understand libraries of congress and cars.

jancsika · on April 12, 2023

> incredible what mental hoops people will jump through to disqualify AI!

There are high peaks and troughs in AI buzz right now.

Yes, on the one hand you've the but-can-it-dance crowd.

On the other hand, Terence Tao on GPT-4. I mean, I'm not weird for really expecting the story here either be about GPT4 helping Terence Tao on some difficult newfangled proof, or Tao talking about the math behind large language models. Instead this boils down to

GPT4 even does the work of some of the smartest mathematicians in the world[1]

1: by parsing some web pages and PDFs for their meetings

knodi123 · on April 12, 2023

> GPT4 even does the work of some of the smartest mathematicians in the world[1]

> 1: by parsing some web pages and PDFs for their meetings

Like that old joke about the guy who impressed people by claiming he had helped a brilliant mathematician solve a problem that had stumped him. And the punchline is something like "yeah, and it only took me a few minutes, all I had to do was replace his timing belt."

jacquesm · on April 12, 2023

The insight I think is that if smart mathematicians can use it to do drudge work they then have more time available to do smart maths.

worrycue · on April 12, 2023

I wonder why he doesn't have a secretary. Isn't he like one of the top mathematicians in the world? Why the heck are they having him do drudge work?

glitchc · on April 12, 2023

Profs don't get secretaries. There's usually one admin assistant for the entire dept. and they're busy with applications for admissions, scholarships, grants, etc.

CamperBob2 · on April 12, 2023

Or grad students, for that matter. The notion that Tao should have to monkey around with Excel in person is just bonkers.

itsacomment · on April 12, 2023

He'd be a seriously bad professor if he wasted the time of the people he was supposed to be teaching with copying badly formatted data for a talk.

paulpauper · on April 12, 2023

mathematicians do not do math all day. there is probably a point of diminishing returns where more doing more math is not useful.

majormajor · on April 12, 2023

The first time you copy and paste your disorganized data into ChatGPT and then into a spreadsheet, it's fun because you know how much longer it took before.

The hundredth time you do it you're going to be like "why is this so f'in annoying still."

Today's interface to language models is subpar for a lot of applications. Lots of room to improve that. A tool can be both amazing and still be just another step on the road to something truly seamless - just like how now it's "tedious" to use the computer for it instead of mailing/faxing paper forms around and filling out tables by pen and pencil.

endisneigh · on April 12, 2023

> the reality is that a lot of things have a terrible UI with unorganized data. that's why this tool is so amazing - because it doesn't matter anymore.

how naive. how do you know it's right? ah, you have to manually do the calculation anyway to confirm, this is what Tao ended up saying in a reply asking as much.

AI is great, but it's not a silver bullet, since its correctness can never be 100% under the current LLM framework.

hn_throwaway_99 · on April 12, 2023

How naive to think regularly structured spreadsheets are the answer to this problem. There is a famous joke about "all your budgeting spreadsheets being useless when you discover you have a formula error in one cell that propagates throughout the whole spreadsheet".

You have to do those same confirmation calculations anyway when you use a spreadsheet. In my experience the utility of something like what ChatGPT can do is still unparalleled.

webdood90 · on April 12, 2023

I'm convinced people like you are in for such a rude awakening.

I don't understand how you can hold this position with AI considering it's only the beginning.

sublinear · on April 12, 2023

Ensuring correctness by a human is existential to the work itself, not a mere annoyance. Yes AI can do it, but the more work you give to the AI, the less control you will have on the quality of the work.

This AI future you're wanting is an Idiocracy-like world where nobody knows how anything works and everything is in decay.

worrycue · on April 12, 2023

> considering it's only the beginning.

Some of us are just burned out on the hype cycles and prefer not to count our chicks before they hatch.

DanHulton · on April 12, 2023

I think we're all in for a rude awakening, regarding what societal changes AI actual makes, but that's neither here nor there.

You're making the same mistake you're accusing them of making - assuming to know the future at the beginning. You're assuming that fixing these issues will prove to be trivial or at least inevitable. Sure, recent progress has been swift, but if you recall, it was damn-near stagnant for decades. Some were even claiming we were in the middle of an "AI Winter" and could not see the spring!

Based on all currently-available evidence, the current techniques that we utilize for generative AI are unreliable, in terms of accuracy of derived facts. It will require either a different or complementary approach to iron that out, or we're going to have to start seeing some _very interesting_ emergent properties from scaling higher. This stuff could show up tomorrow, or it might never show up at all! But the _current_ LLM framework does not look like it can do what we're looking for here, not reliably, certainly not 100% reliably.

endisneigh · on April 12, 2023

ChatGPT is impressive, but I do not think it is a silver bullet, even for data cleansing and processing tasks. I would use your words and say people who think (that ChatGPT solves the problem of data cleansing once and for all) so are the ones who will be in for a rude awakening.

ipaddr · on April 12, 2023

I can't wait for 3d printers to take over manufacturing or self driving cars to self drive. Still waiting for the metaverse. Or cryto to take over banking. We are not even at those points for AI. You might see the future.. let's get a 3d printer in every home first.

jeromegv · on April 12, 2023

What’s the use case of a 3D printer in every home? Why is that even relevant in comparaison of how useful AI can be? The use cases are already there, today.

uoaei · on April 12, 2023

Ask those people who rode the hype wave 10 years ago. They had grand plans to make Star Trek replicators happen a few centuries before they were predicted.

I suspect the satire whooshed over your head.

stocknoob · on April 12, 2023

Can’t wait for the internet, web, cell phones, or social networks to change society… AI is closer to these than the others.

golol · on April 12, 2023

AI only has to be as reliable as a human. The task is ultimately trivial and clearly in the capability of GPT-4. It is easy to statistically verify if GPT-4 has higher or equal correctness than a human. Of course Tao did not do this here, but it will be done.

beepbooptheory · on April 12, 2023

Right but at the very least we can strive for consistency and efficiency, if only to save cost and energy. ChatGPT may save time, but not energy or money, assuming your idea is to just totally rely on LLM's for data processing (which is why it "doesn't matter" for you anymore).

If somehow magically from the beginning of computing we had a natural language interface to a computer's operations, we would still have arrived at particular standards/specs for data formats. There would still be something like xml/json/csv. Indeed, I'd wager there would still at a certain point be some kind of high level programming (or otherwise formal) language adopted, to answer the particular clumsiness of natural language itself [1].

Putting aside any issues of reliability, its simply not sustainable (economically, environmentally) to put all our work into this stuff. Even if it does shine with one off stuff like this.

1. https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...

contravariant · on April 12, 2023

And to the extent that it does matter it has already been shown that in many areas the best UI is simple human readable text.

Zanni · on April 12, 2023

AI is the better UX! I've been using ChatGPT to do all sorts of things--proofreading, generating "diffs" on unstructured documents, getting text web-ready by substituting HTML-entities--that I already have tools for but use so infrequently that I waste time looking up exactly how I need to specify the request. Using ChatGPT, I can just use English. It's amazing. And frustrating, because now I want this built in to everything, and it's not ... yet.

Teever · on April 12, 2023

And AI can be used to make improvements in existing UI, like it's just so damn stupid.

If you don't like this tool, don't use it! If you like, it, use it!

duringwork12 · on April 12, 2023

GPT = new UI more conversational and easier transformation to commands

I think a lot of people miss that due to being shown in search instances and companion apps.

ragazzina · on April 12, 2023

> a lot of things have a terrible UI with unorganized data. that's why this tool is so amazing - because it doesn't matter anymore.

Do you imagine a future where machines output data that can be barely read by humans, but can only managed through the help of AI? Honest question.

77pt77 · on April 12, 2023

This is an advertisement for something like the semantic web, not AI or language models.

hn_throwaway_99 · on April 12, 2023

And let me know when the semantic web becomes successful after 2+ decades trying. The semantic web was always pretty much doomed to failure because it imposed a large cost on the content creator, who themselves get little benefit out of the structure.

jacquesm · on April 12, 2023

The bigger reason why it was a failure is not because it imposes a cost in terms of work on the content creator, it imposes a lack of control on the part of the party that considers themselves the 'owner' of the data. If every webpage that right now is plastered with ads and various junk to get you to subscribe/form a relationship/amplify the content would just be the structured data and a default way to render that data would be present in every browser then most content monetization strategies would fail. Follow the money, not the effort.

scientism · on April 12, 2023

Nail in the head. But can you imagine what it would have been if hakia would have been a thing instead of the SEO-spam, ad-infested Web that Google and Co. gave us?

jacquesm · on April 12, 2023

That's a hard question. I really have no idea, but I would have loved a browser that takes in structured data and displays it in a way that I control any day over the junk that the web has become.

77pt77 · on April 12, 2023

> becomes successful after 2+ decades trying

Rich pro-ai argument.

hn_throwaway_99 · on April 12, 2023

Fair point. Still, the semantic web is dead because we already solved the problem with a better solution, which is open APIs.

The idea that everything would work great if only all of our data was structured and easily parseable everywhere just leads me to ask "Do you not interact with humans on a regular basis?"

wazoox · on April 12, 2023

Actually I know a guy who've been working on semantic web for like two decades and his project is finally gaining some traction right now precisely because he can now leverage AI to sort it out and turbocharge his application.

77pt77 · on April 12, 2023

I'm very skeptical. It seems to go against the grain.

wazoox · on April 13, 2023

What do you mean? "Semantic web" has been around since XML was touted as the data silver bullet, promising structured and semantically organised data 25 years ago. Countless startups tried : XML databases, digital asset management, countless o-so solutions... Until today.

causi · on April 12, 2023

Not to mention elimination of pointless work.

I needed to gather various statistics on the speakers at the previous ICM

Why did this work need to be done? Are even the people who say they want this data actually going to use it for something productive? Is there something revelational in this data?

justeleblanc · on April 12, 2023

He was the chairman of the ICM structure committee. Having stats on how previous ICMs were organized is obviously useful to him. It's not "people who want this data" who ordered him to gather it (not that anyone could order Tao anything), he wanted the data to use it himself. He had the choice between sifting manually through dozens of webpages or automating the task.