Essentially, I wrote a small browser extension, that takes the content of LinkedIn, Twitter, YouTube posts/titles, and filters them out based on if they are clickbait, low effort, etc.
The thing that made the initial chatgpt refreshing was the lack of ads - it wasn’t trying to sell you anything. This obviously will not continue; commercial pressures will direct AI efforts towards being a better ad pusher.
So the AI of the social media sites will end trying to get the crap past your local AI filters, in a big AI arms race :)
I would say, bring it on! Nothing will make it past my phi-2 or mistral-7B-v0.1 ^^, at least for now.
I think what this could lead to is homogenization of the content serving layer, since all you'd really need is to get content to the user that can move their filters from one site to the other, the display layer being less relevant (and differentiating). But let's see, exciting times.
That's awesome, I want to do something similar: categorize the content in social media, so I can choose what to see when I want. Sometimes I want to avoid politics, sometimes I'm ok with it, for example. Sometimes I want to see only content about game development.
What's your plan with your project, will you turn it into a product for others, open source it, or neither? I would love it if it was either of the former!
Thank you very much for the supporting words, I've been getting lots of positive feedback from this. The end of year workload has made it such though that I need to be mindful of time. I think one of the two first options will be the way to go.
I'll post an update here as I always do with my small projects.
It has to be the auto-playing Tomb Raider agent, where LLMs were used to give Lara self-awareness. I've never seen anything like it.
It starts off with some classical computer vision shenanigans to understand the character movement, map layout, and to create the 'desire' to explore. Then the LLM is given input of images, sound descriptions and prior thoughts, lettting Lara remark on the situation, which feels very surreal and, at least for me - very unexpdcted. E.g. she hears the wolves howl and wonders how they survived in this environment. Or meta-remarks on game music changes.
Worth pointing out that most of that video is fake[1]. Though it and its debunking video are still a great example how to make entertaining fictional content with a little help of AI. It probably won't be too long before somebody builds something like that for real, similar AI mods[2] for Skyrim are already out.
I'm attempting to create a frequency list of words for language learners. (In Japanese.)
Commonly, these lists are based in just what word appears in the text at "surface" level. However, words commonly have multiple "senses" or nuances of meaning in which they are used. Dictionaries list these senses, but it has been traditionally hard to disambiguate which sense the word is used in, given an usage in text.
LLM's make this feasible, so I'm attempting to create a word sense/usage frequency list.
Consider using fastText's word vectors. They have a bunch of languages that come pre sorted by frequency and are sufficient for basic word sense. Perhaps use a LLm to automate some of the disambiguation.
That’s a great idea. I hope it can be done for other languages, too.
I used to help prepare study materials for Japanese learners of English. The other editors and I would try to adjust the vocabulary to keep it at an appropriate level for the target learners. Word-frequency lists provided some guidance, but they showed only how often words appeared in the surveyed texts, not the meanings in which they were used. The word “medium,” for example, might have a fairly high frequency, but could we expect the learners to know the meanings “a substance through which a force travels” or “someone who claims to have the power to receive messages from dead people”?
A similar problem was with multiword idioms. The verb “make” is one of the most common words in English, but how common are “make it,” “make do,” “make up,” “make away with,” or “make out”? Ten years ago, I was unable to find any reliable answers. We had to rely on our gut feelings.
Good luck with your project. LLMs should be a big help.
Thanks you! Yep, multi-word idioms are tough. How do you quantify whether a phrase is just a "sum" of it's words, or is there some additional meaning, "idiomness" to it. I haven't thought a lot about that yet, but it's a problem that I need to solve for this.
If you’d like to discuss these issues, feel free to get in touch. My website URL is on my profile page. I’m not a programmer or expert on natural language processing, but I have worked on over a dozen Japanese-English and English-Japanese dictionaries and enjoy thinking about such problems.
Basically, I have a big corpus of text (novels, as I'm interested in getting the learners to read) and a dictionary. I annotate the words using the dictionary, and then give the text context, the target word and the possible dictionary definitions as input to LLM, and I let it select or score, which definitions could be considered to "apply" given the context. Finally, I tally the counts.
The disambiguated senses are provided by the dictionary. Does that answer your question?
How about the highest frequency phrases and variations?
As a language learner, I’ve found that high frequency word lists to not be that useful. It’s too atomic of a unit devoid of context. Memorizing word lists don’t lead to speaking a language — but learning phrases often do. Even better is to learn phrases within a context, like a restaurant or a lecture.
LLMs might actually add value. Word frequencies are simply statistical counts, but finding common phrases is a more co more complicated problem — and the LLMs structure (attention) might actually be the solve.
(I actually ask this if ChatGPT 4 today. I ask it to tell me the highest value phrases I should learn if I’m in a restaurant. I also ask it to break down phrases for me, and give me a lesson on conjugations etc.)
Ah, yeah, totally! The whole point of this excercise is to ascend the level of "words" to get to level of "units of meaning". These commonly consist of not single words but phrases.
Also, you are absolutely correct that learning "atomic units" in isolation is not good practice. What I'm thinkin here is to get some tools to collect the data for "what". The "how" of the learning needs to happen in context.
I've recently been experimenting with training LLMs on the personal corpus of a dear family friend who passed recently, with the intent to eventually embed the device in his tombstone up north so that people can come and commune with him.
He was a well-known tarot reader, mystic and Haskeller in the northern Finnish community; without his help it's very likely I would have been deported from the country before I could get my passport sorted out. We came up with this plan together before he passed mostly out of a really weird shared sense of humor.
I was overwhelmed by the pace of AI news and papers coming out, so I built an automated HN news monitoring service that delivers relvant news straight to my inbox or my RSS feed: https://www.kadoa.com/hacksnack
It uses LLMs to extract, summarize, and tag the front page articles and classify the different perspectives in the comments.
Maggie Appleton shared some interesting ideas a few months ago[1]. I especially find the ”Branches” concept interesting, just the idea of exploring multiple paths from a starting point in parallel.
Shit... this is exactly what i've been thinking of for the past few months! Those examples with the UI solve so many problems and I just love the ideas!
I use GPT-4 to generate a poem as a reward for solving the daily puzzle at https://squareword.org The poem is based on words from the puzzle.
It usually manages to create a reasonably coherent and amusing poem from up to 10 completely random words, something would struggle to do myself. People tell me they enjoy them, although some of the poems turn out a bit odd haha.
We’re playing with embodied LLMs that can externalise thoughts in a virtual environment. The idea is to help facilitate knowledge work.
It’s not our main area of interest, but it’s been interesting to experiment with how human/machine and machine/machine interactions work in real-time when you limit how fast agents can move or write. It's much easier to engage in a dialogue with agents that can't create / move tens of sticky notes and graphics faster than you can create one.
Note on that page: This model is uncensored. I have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant to any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
Why can't the bullet points just be used as is? Either they contain enough signal, or they don't and llm won't help anyways.
I fear everything will be expanded by llms soon. "write an email, three paragraphs, about X", instead of just sending X directly. Then the receiver gets a wall of text, and uses an llm to distill it back to X' before reading. Just hope too much didn't get lost in the inverse compression through llm.
I use copilot in emacs, and running "git commit -v" puts the diff in my emacs(client) buffer with copilot on and it's not terrible at describing the changes.
A lot of times it'll even guess the JIRA ticket number from the diff or the branch name.
Currently GitButler only generates commit messages in this way (with some config options for style e.g. semantic commits). With that said, generating PR descriptions is something I was tinkering with this morning.
For me, json and yaml formatting and analysis. ChatGPT is pretty decent at the following real work tasks I used to use less robust tooling for:
- pretty print and indent “json-like” string (ex. Python object str) from a log, or json with typos (extra commas, wrong quotes, imbalanced brackets…) with a summary of errors at the end.
- verbal description (numerically listed) of the changes between two commits of a yaml file, esp when order has changed making git diff hard to read.
Well, it's part chat bot, so I don't know if it meets your criteria. But we're using them for a LOT of things behind the scenes to help kids find content they love that their parents approve of.
[HelloWonder.ai](Hellowonder.ai)
The front end looks like a chat bot, but on the backend we're using LLMs to find, parse, rate, classify, and rephrase content on the fly for individuals.
Me too, initially in a chat with GPT-4 [0] and then in a (private for now) wrapper that sends me a text message when analysis is complete, sums up the day's meals, and compares to my total calories burned per Apple Watch.
Very cool!:) Just went over the article and this is close to how I use it. I implemented it in an Iphone app and added some Rag tricks. Let me know if you want to try it out.
Yes from descriptions. ALso, often I have a rough idea of how many calories something has. So one of the main features is you can say: Protein shake with 140 cals and 35 grams of protein remember as P1, and then whenever I have the same thing again I just type P1.
I have an Iphone app I have been using for half a year (and lost 10kg), if there is interest write me Email in bio I might release it then ;)
Thankfully humans are great at pattern matching and it's trivial for me to "vet this ideation":
LLMs are notorious for getting subtleties wrong, and in legal agreements like terms of employment the subtleties are often of material import. Therefore this is a bad idea.
If you don't want to read/don't understand the terms of your job offer then pay a lawyer. Asking JobOfferGPT is just asking for trouble.
Reading yourself (at whatever level you are capable of and can tolerate) followed by asking an LLM to highlight any areas of the terms that are non-standard, may cause concern, could be restrictive, or might cost you later could help identify subtleties you might have missed.
Certainly it’d seem no worse and possibly better than just reading the terms, especially as a layperson.
Honeycomb is an open telemetry tool that has a complicated search UI. They also have a text box you can use to have it query your data for you, it basically just drives the filtering and group by UI. It's really cool because it just makes the UI simpler to use, worse case it might set the wrong filter.
Essentially, I wrote a small browser extension, that takes the content of LinkedIn, Twitter, YouTube posts/titles, and filters them out based on if they are clickbait, low effort, etc.
It's liberating :D