I’ve been quizzing ChatGPT about it’s knowledge of the oil industry - it has no idea what’s talking about.
For example, I ask it to simulate a discussion between a flow engineer and a geologist, and it doesn’t seem to understand they work together regularly.
Not like there’a a stack overflow for this sort of thing.
you need to prompt it with more specific knowledge; if you're generic in your prompts it will be generic in its responses. inject some field-specific terminology and be esoteric in your queries, etc
its latent representation is vast, I find you can get much higher quality+interesting responses if the prompts are tuned appropriately
This is at best going to slightly improve the output. I asked about something hyper specific and also extremely trivial: transit wayfinding type queries specific to the NYC subway. All the semantics are services and stations. A service is nothing more than a list of stations it stops at and a station is nothing more than a list of services which stop at it. Eg if I tell it "I'm on the F, how do I get to Time Sq", it should cross reference the service/station lists, find the commonality, and tell me "switch to the A at W 4th".
No context about anything in the outside world or even about the physical nature and purpose of commuting is needed to comprehend and answer these questions. There is no external context or sophisticated logic beyond just matching those sorts of lists to each other. Its a wholly closed system of highly standardized tokens.
I ask it what stops the F and A have in common. 125th St is on the list. I know that's wrong. I ask it to list all the services that go to 125th. The F is not on the list (correctly so). I point out the inconsistency between the two outputs. It says sorry and does nothing. I tell it to remake the list of stops the F and A have in common. It is now missing a few stations from before, and has added new incorrect results.
This is just a snippet. I went in circles with it for probably an hour playing wack a mole with its inability to correctly recall more than 10 true details in a row. These were not obtuse, esoteric, or even logically complicated queries. Nor was it ambiguous stuff open to interpretation. Nor was it even something that you'd need a real meatspace body to comprehend, like the feeling of the sun on a summer day. This should have been a language models bread and butter.
You're right, it seems particularly bad in navigation. I lead with some general queries on the NYC system, asked it for the common stops for service lines F and A, and it also hallucinated. The wiki for the service lines have pretty complete representations here, so this data should be suitably represented in its training corpus.
Would be interested in how a multimodal LLM such as PaLM-E (trained on maps, etc) fair in these sorts of queries.
I'm not hopeful that "seeing the map" (multimodal training) will make the difference everyone is hoping for. The transit map is going to be the exact same information as the lists, but coded in the learned design language of colored lines and dots. The words-only version should work just as well or better because it shortcuts the implicit OCR problem of trying to make it learn off the map. Indeed transit maps other than NY are often abstracted and have nothing to do with the underlying geography. So abstract representations (such as lists of words) should be fit for purpose.
Here's another one that fails spectacularly. The digits 0-9 drawn as an ASCII 7 segment display. It gets it mostly correct, but it throws in a few erroneous non-numbers and repeated/disordered/forgotten numbers. Asking it for ASCII drawings of simple objects can really go off the rails quickly.
The fault mode is very consistent. When a prompt forces it to be specific and accurate on an unambiguous topic for 10 or more line items, it will virtually always hallucinate at least one or two. Especially if the topic is too simple to hide behind a complex answer. Even if its learned not to hallucinate 90% of the time, and even if that's good enough to pass at first glance, within a list of 10 things it only has a 35% chance of not hallucinating any of them.
For what its worth, it did very well on law questions. Try as I might, it refused to accept that there's a legal category known as "praiseworthy homicide". Though, I suspect this has less to do with the underlying model and more to do with openAI paying special attention to profitable classes of queries.
I'm sorry to say, I think the problem maybe more intrinsic to the current approach to AI. Frankly, LLM works unreasonably beyond my expectations, but its making up for a lack of a real first principles theory of cognition with absurd amounts of parameters and training.
Here’s the thing though, when you keep having to “prompt it”, are you really doing the work yourself by coaxing a good enough bullshit answer that you’ll believe it? It could still be useful, not sure.
A minimum sort of knowledge of the field would be helpful.
But I feel that if one is inquisitive enough, they'd know when exploring a topic to ask broad questions of the sort:
- what are the common types of problems encountered in X field
- what are the main roles and responsibilities of personnel in X field
- list some open questions in the field of X
After these open-ended questions one can dive into each of them with:
- what are some common solutions to X, compare and contrast X vs Y
The risk of seeing hallucinated info is always there, but just for an example when I was exploring the types of space-time metrics and asking chatgpt which tensors are most suited for which scenario (and also which coordinates are best for which scenario, such as moving from Schwarzchild to Eddington Finkelstein coordinates to deal with the event horizons near black holes), their individual wiki pages said generally the same things.
I was able to ask ChatGPT in very naïve "idk please explain like im 5" ways and it was able to very patiently rephrase dense jargon into surprisingly understandable explanations.
I'm sure if an expert in GR were to drill down deeper into the details they will find inaccuracies in ChatGPT's responses, but for the inquisitive information sponge user, a dive through ChatGPT to get a general feel of a topic is insanely useful. After that, one could dive into the appropriate books/papers/primary sources to start the real learning once they find what they're interested in.
It's sort of like, how would you imagine an expert would respond if all you said was "simulate a discussion between a flow engineer and geologist". Without providing any further context or hints into what you want, I also wouldn't be surprised if I get a generic response. But instead if you sort of know what you want to ask, they'd have more to work with and be able to better answer your query.
> Simulate a discussion between a flow engineer and a geologist in the planning phases of a drill operation. Use field-specific terminology and aim for accuracy, the scenario will be used to teach a petroleum engineering class focused on deformation monitoring.
Flow Engineer: Good morning, I'm the flow engineer for this drill operation. Nice to meet you.
Geologist: Hi, I'm the geologist in charge of the geological analysis for this project.
Flow Engineer: Great, I understand we need to work together to ensure the success of this drill operation. Can you tell me more about the geological features of the area we're drilling in?
Geologist: Sure. We're drilling in an area with significant deformation, caused by tectonic activity. There are a variety of rock formations, including sandstone, shale, and limestone. There's also a possibility of encountering fault zones, which could pose a risk to the drilling operation.
Flow Engineer: Thanks for the information. We'll need to take into account the deformation and rock properties when designing the wellbore trajectory. Have you conducted any geomechanical analysis?
Geologist: Yes, we've conducted a rock mechanics study to determine the stress and strain characteristics of the formations in the area. Based on the results, we recommend using a multilateral well design with a horizontal section to avoid encountering the fault zones and minimize the impact of deformation.
Flow Engineer: That's helpful. We'll need to ensure that our drilling fluid properties are appropriate for the formations we'll be encountering. Can you tell me more about the porosity and permeability of the rock formations?
Geologist: The porosity and permeability of the sandstone formations vary, but in general, they're relatively high, which means we'll need to be careful when selecting our drilling fluid to avoid fluid loss into the formation. The shale formations are less porous and permeable, which means we'll need to use a different type of fluid to avoid damaging the formation.
Flow Engineer: Thanks for the information. We'll take that into account when selecting our drilling fluid. We'll also need to monitor deformation during the drilling process. Do you have any recommendations for deformation monitoring tools we should use?
Geologist: Yes, we recommend using acoustic emission sensors to monitor rock fracturing and deformation, as well as tilt sensors to monitor any changes in the wellbore inclination. We also recommend conducting periodic borehole imaging to ensure that the wellbore is remaining stable.
Flow Engineer: Excellent. We'll ensure that we incorporate those tools into our drilling plan. Thank you for your time and expertise. I'm looking forward to working with you on this project.
Oh yeah - and now I change this and suddenly the thing is fun and slightly unreal:
"Simulate a discussion between a project manager and a public health officer in the planning phases of a drill operation. Use field-specific terminology and aim for accuracy, the scenario will be used to teach a petroleum engineering class focused on public relations."
GPT:
> Project Manager: Good morning, I'm glad to have you here to discuss the planning phases of our upcoming drill operation. As a project manager, my priority is to ensure that all aspects of the drill operation are conducted in a safe and efficient manner.
> Public Health Officer: Good morning. As a public health officer, my role is to ensure that the drill operation does not pose a risk to public health and the environment. I'll be happy to discuss our strategies for achieving that goal.
> Project Manager: Great. One of the first things we need to do is to identify potential health hazards associated with the drill operation. As you know, drilling involves the use of various chemicals, which can pose a health risk if not handled properly.
> Public Health Officer: Absolutely. We'll need to carefully review the chemicals being used and ensure that they are being handled in accordance with proper safety protocols. We'll also need to develop emergency response plans in case of accidental spills or leaks.
> Project Manager: Agreed. Another concern is air quality. The drilling process can generate dust and emissions, which can have adverse effects on air quality.
> Public Health Officer: That's right. We'll need to monitor air quality throughout the operation to ensure that it meets the standards set by the Environmental Protection Agency. We'll also need to take measures to minimize dust and emissions, such as using dust control equipment and properly maintaining all equipment.
> Project Manager: And what about water quality? We'll be drilling in close proximity to bodies of water, and we need to ensure that we don't contaminate them.
> Public Health Officer: Yes, that's a critical concern. We'll need to make sure that all wastewater and other fluids are properly contained and disposed of. We'll also need to monitor water quality regularly to ensure that there are no leaks or spills.
> Project Manager: It sounds like we have a lot of work to do to ensure that this drill operation is safe and environmentally responsible.
> Public Health Officer: Indeed. But by working together and following proper protocols and procedures, we can ensure that the operation is successful without posing a risk to public health or the environment.
I guess there was some confusion about roles/at the existing temperature roles swap (if you rerun it, you get different results, essentially swapping the lines...)
EDIT: and third try actually arrives at really nice hallucinations:
> Project Manager: Good morning, Dr. Smith. Thank you for taking the time to meet with me today. We are in the planning phases of a drill operation and we want to ensure that we are following all necessary public health protocols. Can you provide some guidance on what we should consider?
> Public Health Officer: Absolutely, it's my pleasure to help. The first thing you should consider is ensuring that all workers on-site are up-to-date with their immunizations, particularly for diseases such as tetanus and hepatitis.
> Project Manager: Understood. We require all our workers to have up-to-date immunizations before starting work, but we will double-check to ensure that all workers have the necessary immunizations.
> Public Health Officer: That's great to hear. In addition to immunizations, you should also consider providing training on proper hygiene practices, such as hand-washing and avoiding contact with contaminated materials.
For example, I ask it to simulate a discussion between a flow engineer and a geologist, and it doesn’t seem to understand they work together regularly.
Not like there’a a stack overflow for this sort of thing.