Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: BBC “In Our Time”, categorised by Dewey Decimal, heavy lifting by GPT (genmon.github.io)
688 points by genmon on March 8, 2023 | hide | past | favorite | 171 comments
I'm a big fan of the BBC podcast In Our Time -- and (like most people) I've been playing with the OpenAI APIs.

In Our Time has almost 1,000 episodes on everything from Cleopatra to the evolution of teeth to plasma physics, all still available, so it's my starting point to learn about most topics. But it's not well organised.

So here are the episodes sorted by library code. It's fun to explore.

Web scraping is usually pretty tedious, but I found that I could send the minimised HTML to GPT-3 and get (almost) perfect JSON back: the prompt includes the Typescript definition.

At the same time I asked for a Dewey classification... and it worked. So I replaced a few days of fiddly work with 3 cents per inference and an overnight data run.

My takeaway is that I'll be using LLMs as function call way more in the future. This isn't "generative" AI, more "programmatic" AI perhaps?

So I'm interested in what temperature=0 LLM usage looks like (you want it to be pretty deterministic), at scale, and what a language that treats that as a first-class concept might look like.




Finally, my interest in LLMs is piqued!

Seems like everyone has been getting excited around the search or code-generation use cases ... or simply trying to make it say naughty things (boring, not interested, wake up in a few more years), but this is eye opening.

The idea of this as a "universal coupler" is fascinating, and I think I agree with the author that we are probably standing at an early-90s-web moment with LLMs as a function call (the technology is kinda-there and mostly-works, and people are trying out a lot of ideas ... some work, some don't).

My mind is racing. Thanks for the epiphany moment.


We are months away from being able to do this with images too.

All the pieces are there, and multi-modal (smallish) large language + image models are already being used in research labs; eg MS Kosmos-1[1]. Check out the visual IQ test results in the paper.

Kosmos-1 is only 1.6B parameters. When that or similar models scale out to 50B_ params they will be pretty amazing.

[1] https://arxiv.org/abs/2302.14045


Linked article / brief mentions interleaving of text and images as essential for certain kinds of learning; that particular intersection reminds me of the profoundly compelling multimodal book by Nick Sousanis, "Unflattening". Highest possible recommendation.

https://en.m.wikipedia.org/wiki/Unflattening


I had that book but lost it in LA. This is the page I remembered the most and have shown to the most people:

https://twitter.com/nsousanis/status/245176914900299776?lang...


That is the truly mind-blowing thing that's coming.

I am currently creating reference images for a game which I expect within 6-12 months I'll be able to feed into a multimodal ChatGPT to create 3D assets out of the 2D pics.

We'll be able to conjure up worlds at a whim - so start imagining them already!


Mark gets a lot of flack for metaverse, but can imagine world design where you start in a blank room and describe what you want around you, and it appears. Like the Matrix loading white room. And with voice recognition and eye tracking (and brain scans), how close are we to “you have to use your hands? It’s like a baby’s toy.”


This was always the coolest capability of Star Trek's holodecks, not the projector technology. Super cool that we might just get there within the lifetime of someone who watched the show in the 80s.

My favorite Star Trek episode has always been "Identity Crisis". Not because it's one of the good ones, it's pretty clunky. But it contains a fantastic 5-7 minute long montage featuring Geordi La Forge interacting with computers (by touch, voice and on the holodeck) to solve a murder mystery, analyzing and live-manipulating 3D "holo footage" to discover a vital clue. Whoever imagined that sequence is the hero of my childhood and perhaps the reason I became a software engineer, doing an oddball mix of HMI and systems engineering.

There's so much in that sequence. The free mixing of different input modes, the complementary collaboration between a human and an AI system, carrying state with you from room to room. Analyzing and generating. Following instructions and making suggestions. Powerful inference, precision of control.

We're getting close now!



Thought-to-space?


YES!!! That's exactly what's coming. I'm getting a head start by envisioning a rich world and creating pictorial references for it along with text descriptions. But it's a tiny head start: once the "holodeck" technology comes along, we'll all be able to create anything with just natural language and correcting anything the model got wrong. ITERATIVE CHISELING YO!!!

By the way, I am still using SD 1.5 inpainting.ckpt - thank you Runway for releasing it, it's perfect for my needs and abilities. I never even tried SD 2 and later ones - heard they worked completely differently and I'm too busy creating to be relearning.


Bit of a tangent but I also predict an explosive demand for parametric AI-first design and simulation software.

OpenSCAD works right now in the sense that the produced source is valid, but asking for even something well documented e.g. an AR-15 lower receiver does not impress.


My previous boss (we were in the civil construction industry) has been saying for years that we are on the cusp of an industry-shattering breakthrough with generative engineering.

We should be able to, by now, input terrain maps, account for local water ways & rainfall, and Autodesk's Brobdingnagian suite of tools should be able to spit out a whole new suburban package, engineered to local country/city engineering standards.

The main thing that's holding is back is the complexity of engineering standards, and its implementation in the software. But all that'll take is time & money.


I'm about to start work in this space. Some companies are getting close to full building automation, Augmenta seems to me to be closest / have the best approach. [0]

Although the complexity of building standards is a challenge, the speed and quality of various generative algorithms are still major limitations.

[0] https://www.augmenta.ai/


Thanks for linking augmenta. I was wondering if you know of other similar companies? I'm an experienced structural engineer who's just completed an MSc in Comp.Sci. I want to combine the two disciplines and so far I've only been looking for consultants who utilise parametric design (Rhino + Grasshopper).


No problem!

I don’t know of any other multidiscipline generative design companies, but for process plant design there is OptiPlant and PipeStream. There are a lot in the architecture space but these are mostly for high level conceptual design.

In my experience many structural engineers code and use simple parametric models. Finding a company using actual generative design algorithms is still rare but there are definitely engineering teams out there who value comp.sci skills.


I did not realize it could do that. I asked it to generate a cube with a hole through the center and it really struggled to succeed. Interesting concept though!


ChatGPT has a pretty good handle on G-code (the machine code to drive CNC and 3D printers). A machinist friend of mine and myself got it to write G-code to create a shape from a block of material, adding holes and arrises and so on. To top it off it wrote some decent poetry about 2 mates discussing G-code over a few beers.


Fusion 360 has a generative AI design extension already. Never used it, I’m too much a poor to use something like that.


There is a free hobby plan for Fusion 360 that I use for my personal projects.


I thought it doesn’t have the generative AI extension? When I looked at the plan comparison it was a part of the commercial.


This is where I see the Semantic Web, semantic interoperability, and ontology alignment spaces coming to life. Who needs to standardize data when you can stick an AI in the middle of two services to do the translation for you? And then suddenly the interoperability of the metaverse doesn't seem so insurmountable. I made a post last summer that speculated on this, but I didn't think this future was so near. Or even possible really.

https://en.wikipedia.org/wiki/Semantic_Web

https://en.wikipedia.org/wiki/Semantic_interoperability

https://en.wikipedia.org/wiki/Ontology_alignment

https://metaverse-research-590.web.app/posts/communication/


I remember having that epiphany in 2017, reading the original google transformer press release. I remember thinking “they taught computers to understand!, and someday soon i’ll be able to translate anything into anything.” And not like translating “languages” but translating contexts and concepts.

LLMs really shine when you try and combine two previously untouched things or map one bit of knowledge it a common pattern. Eg “summarize Mary Shelly’s Frankenstein as if it was an episode synopsis of Magic School Bus.” “Now do the same for Dracula.”

I remember my mind racing so much that day. With all the ideas of things it would unlock. And I still managed to under imagine just how many different types of knowledge, and different layers of knowledge and language, they would be able to encode into one compact dataset.


LLMs will be used to discover "not so obvious" connections in science. I think we will see first scientific discoveries done by LLMs very soon (in fact, such works are probably already happening). Expect major breakthroughs in physics, math, cosmology, biology, etc. published in Nature and authored by LLMs.


Cranking that hype level to the next stage eh?


It’s awesome for formatting and structuring.

Copy and paste a bunch of styled JS components -> get back out a single CSS sheet

Paste in a markdown document -> get out the same thing in HTML

Fun stuff like that.


Ah yes, you now too can have the wonders of pandoc - now on sale! Retail pandoc price of 20MB, now selling for only 3TB with the exclusive offer of ChatGPT!


I don't want to be mean, but this seems like the famous Dropbox/rsync comment.

The value here is how easy it is, and the fact that a generalised model can take the place of a (well-engineered) specialised tool.


Some unlinked features...

If you put the Dewey division in the URL, the directory auto-opens. e.g. here are episodes about prehistoric life (my current jumping-off point)

https://genmon.github.io/braggoscope/directory#560

There's a visual map of episodes. After principal component analysis of the episode embedding vectors, these are the most significant two components as the x,y

https://genmon.github.io/braggoscope/map.html

(it's not super useful tbh -- e.g. the Manhattan Project and the Cambrian Explosion have the same x,y... presumably because they are both about explosions?)

Many episodes have a reading list, and these are all linked to Google Books (so you can purchase/check out from a library), e.g. this episode page

https://genmon.github.io/braggoscope/2022/10/20/the-fishtetr...

There are ~4,600 books, and I have ~88% coverage on getting a Google Books page from the original data. Any ideas about what to do with this big list of academic-recommended books v welcome!


This is great. I'll certainly be thinking of "classification" uses for ChatGPT in the future.

Thinking out loud: Add the experts, not just the reading lists. They're a jumping-off point into academic-paper-space. What have they published? In what journals? Who have they collaborated with?


Here’s a powerful use - content moderation. Today we literally traumatize content moderators with the dregs of the human mind. Chatgpt is fairly good at identifying the classification of content on many dimensions, including the ones it’s actively screens for. Regardless of how you personally feel about content moderation, I would be happy to see humans not have to be actively involved in it and face the traumas they must live with for where the moderation happens. I’m sure it’ll get things wrong, but humans do too.


It's good at it because OpenAI outsources the bad stuff to Kenya https://time.com/6247678/openai-chatgpt-kenya-workers/

People still have to look at this horrendous stuff.


To bootstrap, yes. But once you’ve built the machine, the people aren’t necessary.


Automating your moderation policies is a cop-out. It isolates you even further from the peons whose lives you're controlling.

European humans have the right for any automated decision affecting them to be reviewed by a human. We do not want a world where a tiny handful of people, getting fewer by the day if it's Twitter, have untrammeled control over what billions of people can say, see or do. We can't let businesses do what they like, change their policies on a whim and have it instantaneously applied by automatons, if it's bringing misery to millions of people.

Remember those old-fashioned things called laws? Courts? Judges? Legislators? The things we used to use for deciding what people could/couldn't do, before tiny companies with billions of users applied their own de-facto laws with no oversight, governance or accountability. Move fast and break things!


1) you can have human reviewed appeal and have automated moderation 2) a provider of a service is not a provider of a public commons, nor is the service being confined to a specific type of discourse preventing anyone from doing anything in their life. You don’t have the right to go to a movie theatre and use it to practice your singing recital during the movie - the theatre owner can and will kick you out. This doesn’t trample on your rights - you can still practicing singing, just not there. Likewise you can still spew racism and talk about child sexual abuse in your home all you want. But no one is obliged to provide you a platform for that. 3) I see nothing superior to a capricious and biased human judging my content vs a capricious and biased model judging my content. The human, however, will be traumatized by the “millions of people” made miserable by their inability to spew hate and share child porn on Facebook. 4) we still do depend on judges and laws. Today the judges as laws say you not only have the right but the obligation to create a space relatively safe. In your European utopia speech and thought is even more limited by law than in the US, but in both locales forum providers are held to a standard. Right now in the US there’s a major case in front of the Supreme Court which will determine whether these companies can be held liable directly for failing to sufficiently moderate - if it passes, they’ll be even more obliged to “purify” their platforms of anything that anyone might feel compelled to sue them over. But even now all providers have to conform to the laws and judgements of the lands they operate in, and those lands say it’s not only ok to moderate, but it’s mandatory. So, your final point really, truly, makes no sense.


I'm sure I remember Zuckerberg talking about this very thing in one of the Congress interviews he had.

I don't disagree with you, the content producers are always going to outnumber the moderators by a massive margin. It makes reasonable moderation very difficult.


OpenAI have a content moderation endpoint for this too.


Excellent! I'd love to see a script that sends a list of "descriptions" (1-100 words) to ChatGPT and directly gives you back a ready-made (embedding vectors closeness) map in a (textual) graph/chart format (like your above map or your plot https://interconnected.org/more/2023/02/in_our_time-PCA-plot...)


It turns out that "closeness" is usually hard to visualise/explore when you're dealing with a 1,000-dimensional space... and PCA has the failures mentioned above.

It's weird -- it's locally useful to navigate, and at a high level kinda useful, but only if you squint and don't look at the problems. So I feel like a fisheye visualisation would be appropriate? That's something that I'm exploring in other projects.


I wouldn't necessarily reach for PCA. No reason to think that the first two principal components necessarily encode anything particularly interesting. If you want to lay out each point in 2D in a way which keeps similar points nearby, something like t-SNE is worth a try - visualizing embeddings is what it was invented for.


Excellent, new to me and I'll give it a go, thanks!

I gravitate to PCA for terrible reasons (undergrad so it's what I think of first) and like you say, it's beguiling yet disappointing, the components rarely have any human meaning.


I'd suggest trying t-SNE [1] instead; you'll be losing almost all of the variance by projecting onto the first two eigenvectors produced by PCA.

[1] http://karpathy.github.io/2014/07/02/visualizing-top-tweeps-...


TSNE or UMAP (as others have mentioned) is good but also take a look at the tensorflow projector. You can host your own and/or pass custom data to it.


The word game of 'semantle' (https://semantle.com/) is a nice way to get an intuitive grasp of how unintuitive the closeness in a highdimensional space is, as you're required to guess a word based on it's semantic similarity (according to classic word2vec metric) to your previous guesses.



Love the plot, for high dimensional embeddings try UMAP for a better result, maybe T-sne too if you're interested in clusters.


Love the visual map. What does color mean? Any way to do a 3rd PC, and put the visualization in a cube one can toy around with?


Colour is the 3rd component -- I wanted to see the difference between overlapping episodes.

As for the 3D plot... here you go!

https://interconnected.org/more/2023/03/in_our_time-PCA-3D-p...

Basic PCA + Plotly is actually in OpenAI's official Python library (in `embedding_utils`) -- this plot is just the output from that.


:D this made my day.


Your PCA is awesome. The horizontal axis seem to go from people (high level nature?) on the left to physics (low level nature?) on the right, while the vertical axis seems to go from individuals and particles (small things?) at the bottom, up to civilizations and the universe (big things?) at the top. Maybe the Manhattan Project and Cambrian Period (not explosion) are together because they're both big things in their fields and physics and evolution are near each other horizontally?


I love this project!!

Ever since my partner and I discovered In Our Time a few years back, it’s been our go-to podcast to listen to together. Part of the allure is that the archive is so vast, but that makes it hard to browse.

My partner made her own archive of In Our Time here, if you’re interested: https://shelby.cool/melvyn/

She used Wikipedia to find and categorize each episode. I also really like that she indexed episodes by guest, too. Certain guests are REALLY good and have been on many episodes.

Super excited to see someone else make an archive; we’ll definitely be exploring yours!


That's really impressive.

I've had a mild idea for ages to have a sort of annotated In Our Time. Listening to the podcast on a webpage as the text rolls by links could appear to explain or give background to the item or person being discussed.

SMIL is the multi-media mark up language. Generally if one thinks of something there's someone on the Internet who has already had that idea.

Additional: I think the BBC is very careful about transciptions. They've sold a book of the transcripts of several episodes, but it would be a great way to go through a subject.


> She used Wikipedia to find and categorize each episode.

This is a really clever use of an existing dataset. I clicked through before reading this and was stunned by how thorough the tag set was. Even more obscure things like "Alumni of Magdalen College, Oxford" have multiple episodes. I'm going to keep this in mind on future projects for sure.


No way! This is incredible. That h1 "Hello," <3

Is the tagging manual? It's really good.


It's definitely not manual, here are the scripts she used to generate the archive: https://github.com/shelbywilson/melvyn/tree/main/scripts


Nice project OP. I also love In Our Time.

Some favourite episodes off the top of my head:

* Wilfred Owen - https://www.bbc.co.uk/programmes/m001df48

* The Evolution of Crocodiles - https://www.bbc.co.uk/programmes/m000zmhf

* The May Forth Movement - https://www.bbc.co.uk/programmes/m001282c

* The Valladolid Debate - https://www.bbc.co.uk/programmes/m000fgmw

* Gerard Manley Hopkins - https://www.bbc.co.uk/programmes/m0003clk

* Henrik Ibsen - https://www.bbc.co.uk/programmes/b0b42q58

* Wuthering Heights - https://www.bbc.co.uk/programmes/b095ptt5

And finally, in which three mathematicians heroically attempt to explain asymptotic analysis to (septuagenarian novelist and cultural broadcaster) Melvyn:

* P v NP - https://www.bbc.co.uk/programmes/b06mtms8


Great list! Some personal faves in return, in no particular order:

- The evolution of teeth https://www.bbc.co.uk/programmes/m0003zbg

- The fish-tetrapod transition https://www.bbc.co.uk/programmes/m001d56q

- The late Devonian extinction https://www.bbc.co.uk/programmes/m000sz7x

- The American West https://www.bbc.co.uk/programmes/p00548gg

- Metamorphosis (Ovid) https://www.bbc.co.uk/programmes/p00546p6

- Politeness https://www.bbc.co.uk/programmes/p004y29m

- The Bronze Age collapse https://www.bbc.co.uk/programmes/b07fl5bh

- Doggerland https://en.wikipedia.org/wiki/Doggerland


Also going to chime in to recommend the Gin Craze episode https://www.bbc.co.uk/programmes/b084zk6z


One thing I am slightly wary about after listening to quite a few episodes is Melvyn Braggs seems to be a bit too patriotic oftentimes. It's nothing really obvious, but you hear a lot of "we" when talking about positive aspects of english history, and a lot of focus on all english innovations compared to other countries. Maybe I'm wrong though.


Seems pretty reasonable for a national broadcaster, no? Also British encompasses more than just the English, unless you specifically mean he’s excluding the other Home Nations…


Sorry, I didn't mean English more than British (the distinction being a bit too subtle for me to make). It's "par for the course" I guess for a national broadcaster, but:

* That doesn't mean it's not less than ideal.

* It's something one might want to bear in mind, when listening, since it might introduce some kind of a bias to the whole thing (maybe the other comment about number of episodes about european vs african history is a good argument in that direction)


Wrt bias it's actually a refreshing change for some of us to hear the voice of a traditional (old-school) leftist, as they tend to be more interested in actual social problems rather than the made-up ones that infest modern discourse, and - perhaps - shift the centre so much that plain common-sense is now perceived as "bias".


I am also an english patriot so that doesn't bother me.


He's a Labour peer in the House of Lords. They're not known as a group for rampant patriotism.

My criticism of him his how much he struggles with anything of a maths or scientific nature.


Good to know, it might be me putting such thought in his mouth then!


I really enjoyed the recent Superconductivity episode. Especially hearing Melvyn say 'Good god' half way through.


I like the way they keep the very end of where the producer comes into the studio and asks the guests politely if they want coffee or tea. Something very satisfyingly British about that.


I clicked on the May Fourth movement because I thought it was odd that BBC would make a documentary about the origins of https://en.wikipedia.org/wiki/Star_Wars_Day.


I wanted to see which speakers had been on the most episodes.

I went to https://genmon.github.io/braggoscope/guests and opened the Firefox DevTools console and ran this:

    guests = Array.from(document.querySelectorAll('ul a.text-blue-500.underline')).map(el => ({
      name: el.innerText, count: parseInt(el.nextElementSibling.textContent.slice(1, -1), 10)
    }))
Then this:

    guests.sort((a, b) => a.count < b.count)
    console.log(JSON.stringify(guests, null, 2)) 
The top few were:

    [
      {
        "name": "Simon Schaffer",
        "count": 24
      },
      {
        "name": "Angie Hobbs",
        "count": 23
      },
      {
        "name": "Martin Palmer",
        "count": 22
      },
      {
        "name": "Steve Jones",
        "count": 21
      },
      {
        "name": "Paul Cartledge",
        "count": 20
      }


I think Angie Hobbs will take it! Looks like she is referred to with a few different names in the episode notes and I’ll need to merge them manually (will be 25 appearances)

(btw thanks for datasette. I use SQLite as an intermediary db and datasette was invaluable for exploring and refining queries.)


Any chance you might share the intermediary DB? Even just including the binary database file on GitHub somewhere would be neat, since I could play around it by doing https://lite.datasette.io/?url=URL-to-your-db-file-on-GitHub


Lots of great contributors to the show over the years. My favourite is Steve Jones for his huge intellect, humility, knowledge and listenabilty - yeah it's the right word for someone like him. I've listened to a lot of episodes (some many times over). I'll just mention The Migration of Birds as a sound listen.

However, Melvyn is an "interesting" catalyst but still great to fall asleep to. Long may it continue.

I believe there is sister program with a younger female presenter. Name escapes me...


In my memory, Frank Close has been on just about every one, but I think that's because I mostly listen to the ones about physics and astronomy. https://en.wikipedia.org/wiki/Frank_Close


Ah, now that's interesting, having attended Schaffer's lectures at uni many years ago, I wasn't sure if it was my imagination that I seemed to hear his mellifluent drone coming out of the wireless more and more often these days... at last a stat to back up my hunch!


> Web scraping is usually pretty tedious, but I found that I could send the minimised HTML to GPT-3 and get (almost) perfect JSON back: the prompt includes the Typescript definition.

Could you share the prompt? Or, if OP can't share, does anyone have ideas for a prompt to do something like this?


+1, and the OP mentions a wrapper to handle invalid JSON from GPT-3. I’d be interested in that too

OP’s other write up:

https://interconnected.org/home/2023/02/07/braggoscope



The prompt is probably simple, but the bigger challenge is that even a minified html of a typical web page would be more than the 4k gpt token limit


I extract the main content div, which includes various other divs and assorted HTML cruft from 25 years of content management systems.

Then convert that to Markdown, which GPT groks happily, and it preserves the right balance of discarding meaningless structure but preserving some semantics (italics, headings, etc).

The best tool I've found for that process is aaronsw's html2text, amazing that it's still so valuable after all these years.


Thanks for explaining — very helpful!


I see several very dubious classifications. Shouldn’t the great stink be under civil engineering rather than agriculture, and why is Plato’s Atlantis under computer science?


Now that's an interesting regression! I don't remember seeing it there before.

(The worst I've noticed before has been Lawrence of Arabia under History of the Ancient World. Very much 20th century really.)

Several other classifications are arguable -- which I think shows one of the limitations of this technique: it's not possible to iterate + improve.

So instead I've been wondering about using the embeddings of each episode synopsis, and comparing to the embeddings of Dewey subdivisions. I should be able to tune the results better that way.

There's also a technique from Google called CAVs (Concept Activation Vectors) that I'm intrigued about trying -- would love to hear if anybody has experience using this

https://arxiv.org/abs/1711.11279


FYI you can get dewey decimals programmatically by hacking the OCLC API: http://classify.oclc.org/classify2/api_docs/classify.html

this says you need an API key but I think I found some way to call this without one… You might be able to improve classification by either incorporating the dewey decimals of the books mentioned on a podcast or fine-tuning a model based on known book titles (or maybe there are book summaries somewhere) to known dewey decimals from OCLC.


There's also an episode about aliens under 000.


At the same time I asked for a Dewey classification... and it worked.

May I ask - how do you know exactly?

Specifically - with what level of accuracy? And how is it assessed?

At first glance it does seem to be highly accurate - except where it isn't:

    650 Management and public relations (1)
    - Caxton and the Printing Press 18 Oct, 2012
Which is not to knock what you're doing - because overall the performance does seem to be quite impressive (especially considering the effort invested).

Still - there's that "last mile".

So it would be nice to now how the performance compares to that other topic modeling approaches. And to what extent LLMs actually provide a boost.


I love In Our Time, a real BBC gem.

I've been meaning to pull all the audio for a while and this has inspired me.

A fun thing to do would be to pass through Whisper, a great corpus to play with.


I've been considering this, but my assumption is that it would be tripped up by the specialist words.

I wonder... is there a way to "prime" Whisper (e.g. with the embedding of the episode synopsis) so that it "listens out" for words related to a particular topic? I haven't looking but this would be neat!


In my experience, Whisper does a great job even with specialized terminology. It won't catch everything, but I think it will exceed your expectations. One of the hardest things about Whisper is choosing which model to use; they offer a variety of sizes, and sometimes the smaller ones do better than the larger ones. It's worth trying a few different models and deciding what is best for each particular application.

I will also say that I've personally been unimpressed with the new "large-v2" model, even though it supposedly scores better. The original "large-v1" model seems to work better than the "large-v2" model in the audio clips I've been testing Whisper against, but results will vary. In general, I find I'm really happy with what "small.en" and "medium.en" will emit, and they're much faster than the large models. (The ".en" models are specialized to English, and usually perform better for strictly English input, whereas the non-".en" models are trained on multiple languages.)


I'd still like the ability to prime Whisper. I used it to transcribe a podcast episode I appeared on recently and one of the fixes I had to make was that ChatGPT came out as "chat GPT" every time it was mentioned: https://simonwillison.net/2023/Mar/7/kqed-forum/#kqed-forum

Update: turns out this exists already: https://platform.openai.com/docs/guides/speech-to-text/promp...


I recorded myself saying a few sentences from that transcript, then fed it through different Whisper models. "small.en" and "large-v1" both generated "chat GPT", "large-v2" generated "chat-gpt", but somehow "medium.en" correctly generated "ChatGPT".

This was the same audio sample fed through each of those four models, with no "prompting" as you're discussing.

If I add "--initial_prompt ChatGPT", then all four models are able to get the spelling correct.

Regardless, I don't think "chat GPT" versus "ChatGPT" is a huge deal. There will always be some level of uncertainty and ambiguity in the transcript, and even books written by humans always have a few typos get past multiple stages of copy editing. Perfection is virtually unachievable, but you can always scroll through the transcript and make some edits after the fact, if desired. Maybe some future model will magically eliminate all typos.


Yeah it wasn't a big problem for me - I had to do a bunch of other tidy-ups on the transcript anyway to add things like the name of the person who was speaking.

I cleaned that bit up with a bulk replace of "chat GPT" with "ChatGPT" in VS Code.


I haven't tried, but the Open AI docs mention priming on the whisper model being available

    prompt
    string
    Optional

    An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.


Oh that's really neat! I hadn't seen that: https://platform.openai.com/docs/guides/speech-to-text/promp...


I hadn't noticed that, very much appreciated.


Whisper worked great on an old speech by Winston Churchill. Haven’t looked into the time limit.

API used and results can be found here: https://techtldr.com/transcribing-speech-to-text-with-python...


In Our Time has been a life-long companion, one of things that makes me proud of the BBC


Have you read the New Yorker articles about it?

On _In Our Time_ -> https://www.newyorker.com/culture/podcast-dept/escape-the-ne...

Profile of Melvyn Bragg -> https://www.newyorker.com/culture/the-new-yorker-interview/t...

Both well worth your time.


That NewYorker summarises it nicely. I can’t stand the over-produced stuff on NPR for example. In Our Time has no politics, no hook to get you listening for the next episode. It’s just a lovely small window into academia. Above all it makes things simple without dumbing it down


"It’s just four intelligent people in a studio, discussing complex topics that are, as a friend of mine once said of Bragg’s openers, aggressively uncommercial."

Nicely put by the New Yorker.


I emailed you but did something similar and posted to Show HN a while back -

https://weekend-collection.s3.amazonaws.com/Catalog+-+Feb+17...


Just to make sure that others see this, I found your clustering technique super interesting and very effective, and something I want to try myself. Your technical writeup is here:

https://weekendcollection.substack.com/p/technical-details

(Recursive coarse clustering as opposed to one-shot fine clustering.)


thanks!


This is a really interesting use-case

Applying "transformations" or classifying data in this way without having to setup a lot of detail-work seems like a real labor-saver/multiplier


My company has gone years wanting our product catalog to have structured data around our products but not going through the tedium of extracting it all. about an hour of prompt tweaking and it can pull, normalize, summarize and output valid json from all of our products. Basically pulling it out of a big unstructured html blob.


I've started thinking about LLMs as a "universal coupling", if that makes sense? It's wild to be able to conceive of APIs to plain text, and natural language queries on structured APIs, but that's what we've got.

My mind was really opened by Nat Friedman's work in GPT for browser automation: https://github.com/nat/natbot

And of course using langchain/ReACT.

So different from ChatGPT and (imo) way more intriguing.

mentioned in this blog post: https://interconnected.org/home/2023/02/07/braggoscope


This is so cool! Thanks for sharing.

Taxonomies are inherently limited -- I love this portion of the talk "Everything is Miscellaneous" about Melvil Dewey: https://www.youtube.com/watch?v=x3wOhXsjPYM&t=1206s

Surely some things fit equally well in more than one category.

Have you considered asking for best two or three categories, and placing the episodes in multiple locations? Or would that be too noisy?


I found that Dewey was borderline acceptable — more categories seemed to degrade it, though admittedly I didn’t spend much time prompt tuning. I also tried tags and these weren’t reliable (not at a consistent “scale” from episode to episode).

I suspect I’ll need a more mechanical approach, long term.


Yeah, I've found that ChatGPT is excellent at filling out XML templates too:

    Fill out the following XML template to describe a room in a fantasy RPG dungeon:
    
    <room width="" length=""><furniture><object name="" description=""/></furniture><npcs><npc name="" description=""/></npcs><items><item name="" description=""/></items></room>
This yields, for example:

    <room width="10" length="15">
      <furniture>
        <object name="torch" description="A wooden torch mounted on the wall, casting flickering light around the room."/>
        <object name="table" description="A large wooden table in the center of the room, covered with maps, scrolls, and dusty tomes."/>
      </furniture>
      <npcs>
        <npc name="Goblin" description="A small, green-skinned creature with a wicked grin and a rusty dagger in hand."/>
        <npc name="Orc" description="A towering figure with rippling muscles, wielding a massive axe and snarling menacingly."/>
      </npcs>
      <items>
        <item name="Potion of Healing" description="A small vial filled with a glowing green liquid, said to restore vitality when consumed."/>
        <item name="Sword of Flame" description="A gleaming sword with a blade wreathed in flames, capable of burning foes to a crisp."/>
      </items>
    </room>
So you get the best of both worlds, really clever procedural generation of content but also a tight structure - I've yet to see a syntax error out of it.


Wait. Is temperature=0 "pretty deterministic", or is it deterministic?


In the strict definition, it's deterministic: you get the same response for the same prompt, each time (given the exact same model).

But the prompt is parameterised. The bulk of the prompt is requesting a list of guests and speakers to be extracted, and the episode synopsis is appended as the "parameter". And I've noticed that the variation of the parameter changes what the overall prompt returns... so it might start being less reliable at responding with valid JSON, for example.

So it's instance-deterministic but, across a range of parameters, class-fuzzy, if that makes sense?


Temperature=0 is not perfectly deterministic for the OpenAI API.

It's mentioned briefly in the OpenAI text completion guide: https://platform.openai.com/docs/guides/completion/introduct...

If you have two possible tokens with probability 40% and 30%, you'll always get the 40% token at T=0. But if you have two possible tokens at 40% and 39.99%, you may get the 39.99% token on occasion, even if at T=0. (Numbers illustrative.)


Where does the inherent randomness come from?


It's injected into the model, you can deviate from no randomness to get "creative".


Isn't the temperature the "injected" randomness? The way I imagine it would be somewhere in the model you would do

input += (Math.random - 0.5) * Coefficient * temperature

so setting temperature to 0 would mean no randomness. On thinking further about why there is inherent randomness I believe it is from a lack of associativity in floating point operations. They obviously do A LOT of parallel floating point operations.


On a related note, has anyone noticed the growing use of "pretty" as a hedge against ever being wrong? Also, "not", as in "not the fastest" or "not the hottest".

It could be 46C/115F outside and you say "wow, today is unbearably hot", to which someone retorts "nah, it's fine. It's not the hottest day". That's pretty good hedging. You can make infinite technically correct statements this way without ever saying anything meaningful


> Also, "not", as in "not the fastest" or "not the hottest".

Mid twenties Hiberno-English speaker here and that's always been a fairly common form of hedging, and I've not noticed an increase.


fairly, lol


It’s my understanding that 0 is completely deterministic, except for when the model is updated, which does happen occasionally.


@haolez I just tested w/ davinci-003 w/ temperature to 0

Prompt: https://imgur.com/YtQ4fbf --

Reveal the question marks in an interesting way:

The dog goes ????????????? --

With temperature == 0, it consistently ("pretty deterministically"?) generated "woof!"

ex. The dog goes woof!


Their docs say (somewhat recently updated, I think) that even 0 is not perfectly deterministic in all cases (though it's very close). Some people had previously observed this and speculated that it was some kind of floating point roundoff issue, when two outputs have almost identical scores.


GPU's often do floating point computations in nondeterministic order for performance reasons, so you would get small differences even for exactly the same order. It is probably that.


Well technically everything is deterministic.


Hey Matt! Great to see this on HN, we’ve been chatting away about this on a private slack group! Super cool work as ever. Stay awesome.


But isn't classification a simple problem solved long time ago with simple ML techniques such as SVMs and neural networks? I remember having done a project for a ML class in Uni and I used a SVM. It took movies and they were classified in genres. Training data was IMDB comments.


How long did it take you ? This tech turns days of work into hours or minutes of work.


A week or something.


It is an interesting approach and I appreciate it. However, an alphabetical list would be more useful for me when I am interested in topics and might not know where they are classified.


I always assumed the podcast was current with the show but it seems like it's about 4 weeks behind.

e.g. I have Tycho Brahe on March 2 in my podcast app, rss shows it released Feb 2.


The BBC is pushing its Sounds app by lagging the podcast. Disappointing but understandable I guess given they want the opportunity to suggest other programmes.


That's recent. Everybody wants a walled garden.


Some episodes that have subtitles are titled incorrectly (only using the subtitle). For example, this episode [1] is the first of a pair of episodes called The Written World, but it only shown as Episode 1.

[1]: https://genmon.github.io/braggoscope/2012/01/02/episode-1.ht...


yesterday I had quite a lot of success giving it some json and asking for a python script to make an Anki deck from it. The programmatic use case is very magical


I'm not sure this one about sewage is correctly classified. It's in agriculture, presumably because it mentions fertilizing fields. It probably fits better under engineering. https://genmon.github.io/braggoscope/2022/12/29/the-great-st...


Yea LLM for data Extraction will be huuuuuge. I had so many Data Projects in the past which boiled down to: Damn the data is too unstructured.


Web scraping is usually pretty tedious, but I found that I could send the minimised HTML to GPT-3 and get (almost) perfect JSON back: the prompt includes the Typescript definition.

This is the part that sounds amazing to me. I haven't done web scraping since forever, but it was tag matching hell even on a consistent data source.

And that it hands you back JSON is just unfair.


This is amazing, and exactly the kind of thing I’ve been hoping for to help us find the gems in the endless stream of excellent content.

What would it take to go deeper on this and narrow down to, say, single-sentence intervals? For example finding everything about a particular character in the Ramayana, or every statement about NPR itself?


The BBC offers pretty exhaustive RSS feeds (https://podcasts.files.bbci.co.uk/b006qykl.rss) so I'm not exactly sure what ChatGPT even did here (Maybe the Dewey classification? Which is of dubious usefulness).


There's more on the About page but in summary: extracts the synopsis, guests (name, affiliation), and reading list (title, author, publisher, year) as structured data. Not massively hard with a bit of web scraping, but tedious and results in brittle code -- this took 20 minutes to write the prompt plus 3 cents per episode.

https://genmon.github.io/braggoscope/about

(GPT-3 not ChatGPT for the model.)


Have you considered making the repo public so other folks (well, certainly me!) could extend and adapt the idea to other use cases? It's just very exciting work and it would be fun to see the code.


For comparison two links for the (currently) last episode, first from BBC, the second from genmon

https://www.bbc.co.uk/programmes/m001jkzg

https://genmon.github.io/braggoscope/2023/03/02/megaliths.ht...


Also a big fan of the show -- cool project!


Here come the real AI use cases! Nice work!

AI generated content is the initial "ooh, money" wave that will be the future's basic spam to deal with.

Difference between AI and web3 is that web3 didn't have a real use case to follow the gold rush phase :x


So ChatGPT was used in prepping the data but isn’t used dynamically correct?

You may be able to create a bot that chats with you and suggests episodes, etc. in real time. Or lets you ask questions TO an episode. And other things.


I wonder what the standard approach is when the LLM does not return valid JSON data? Do you skip the input data all together, or use the parsing error to generate a valid JSON?


The general rule is: be minimally liberal in what I receive :)

A parse error kicks off a recovery process where, in theory, we could run any number of rules. In practice the only problems are unescaped quotes in strings, or mismatched quotes (start with `"` and terminate with `'`)


I think it’s crazy to think about that we scraped all this data from the web to train this crazy LLM to be able to interpret the web and scrape more data from it.


What a beautiful thing.


90 episodes on the History of Europe and 3 on the History of Africa.

Blimey


To be fair, the podcast is produced by Europeans. Still, a British history podcast should spend more time recogning with its colonial history, and they have done at least one very very good episode on it.


The Dewey system is famously stuck in its time and culture. Look at religion, there are multiple top level cateogries for Christian religion related things, then there's "Other religions" for everything else. At this point, it's embarassing and insulting.


Well spotted!! My degree was in African Studies and I remember one of the first lectures talked about a famous Oxford historian (Hugh Trevor Roper?) who said something like - there is no history in Africa, only the wild gyrations of savage tribes..... Made me very sad.


There's a lot of Egypt episodes missing from that category, for example.


This is very cool.

Would love to try it on The Rest is History podcast.


This is pretty cool. If you like In Our Time, I would also suggest the BBC World Service equivalent, The Forum. https://www.bbc.co.uk/sounds/brand/p004kln9

It's also a shame the BBC delays the In Our Time podcast by a month unless you use their shite BBC Sounds app (which I think only exists to make the Spotify app look good in comparison).


Tea under Economics and coffee under Technology > Home and family management... some... interesting decisions in here.


Should be arranged in chronological order. That way you can listen to the entire history of the world.


I love it!

"In Our Time" is the first podcast I listened to, way back in 2004, and I still listen to every episode.


No longer does Tony the Pony need to parse XHTML with regex. Who knew GPTs would be the actual solution?


Something like this for Essential Mix episodes with tracklists and genre would be mindblowing


In Our Time is one of the best podcasts out there. Are there any similar ones?


Similar to In Our Time is another BBC radio programme called The Forum (from BBC World Service) which explores world history, culture and ideas. Every week an eclectic topic is discussed with three experts in a lively, informative and stimulating discussion. Highly recommended:

https://www.bbc.co.uk/programmes/p004kln9/episodes/downloads


> Web scraping is usually pretty tedious, but I found that I could send the minimised HTML to GPT-3 and get (almost) perfect JSON back

What does this even mean?? You collected all the minimised html snippets then fed them to GPT-3?? How does this save any time??


Love it - also a huge fan of In our Time.

V2 would be:

- Transcribe through Whisper - Semantic search


What's the prompt?


One of the prompts is

  Extract the description and a list of guests from the supplied episode notes from a podcast.

  Also provide a Dewey Decimal Classification code and label for the description

  Return valid JSON conforming to the following Typescript type definition:

  {
    "description": string,
    "guests": {"name": string, "affiliation": string | null}[]
    "dewey_decimal": {"code": string, "label": string},
  }

  Episode synopsis (Markdown):

  {notes}

  Valid JSON:
(And the completion tends to be JSON, but not always.)


I like the typescript definition, rather than example json that I normally use.


Credit where it's due: I was working with structured data as JSON for the completion, and the Typescript definition hugely increased reliability. I took that from helpful advice (on Twitter) from Noah Brier who afiak came up with the approach:

https://brxnd.substack.com/p/the-prompt-to-rule-all-prompts-...


One interesting side effect of the Typescript interface approach is that it doesn't tokenize well: https://twitter.com/m1guelpf/status/1630015536632569857


From what I understand:

- Typescript definition (turn raw html markup >> well-formed JSON)

- Dewey Decimal classification/score

So probably combination of those two


How perfect is almost perfect? How do you correct errors?


This is amazing, thanks for sharing!


This is great.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: