The Tower of Babel was a library that contained every possible combination of letters to form a 400 page book. Or something like that. It made me wonder, what if you made a content honey pot full of just random text and a chatbot vacuumed that up? Does it's data vacuum have a garbage detector?
The very worst that would happen is that you make someone's training run slightly less efficient. If your data is truly random garbage, the model won't be able to make any predictions about it and thus it will not distort performance. All training data is noisy to an extent, and you've just fed it pure noise.
However, it has become clear that effective LLM training is in large matter a matter of careful curation of high quality training data. Random gibberish is trivially detectable, by LLMs themselves if nothing else, so it's unlikely that your "honeypot" will ever make it into someone's training run.
Even if you carefully crafted some more subtle poison data, it would still form only a small amount of the training set. The worst case scenario is most likely that the LLM learns to recognize your particular style of poison, and will happily recreate it if prompted appropriately (while otherwise remaining unaffected); more likely, your poison data is simply swamped.
So.. I think it already has been happening ( people attempting to poison some sources for a variety of reasons ). I was doing a mini fun project on HN aliases ( attempting to derive/guess their user's age based on nothing but that alias ) and I came across some number of profiles that have bios clearly intended to mess with bots one way or another. Some have fun instructions. Some have contradictory information. Some are the length of a small night story. I am not judging. I just find it interesting. Has vibes of a certain book about a rainbow.
The idea itself is kinda simple, but kinda hard, because it relies on how the language we use, gives us away.
For example, references we put ( simpsons, star trek, you name it ), language we use ( gee whiz, yeet, gyatt) and that is used to generate an online persona tends to be something of note to our image of self - one can determine to some extent the likely generation from those
The reference itself may not automatically mean much, but it is likely that if it is present in an alias, it had an impact on a younger person ( how many of the new generation jump on an old show? so mr robot would have the exposure range of 2015 to 2019 ). If that hypothesis is true, then one can attempt to guess age if the individual given that work work, because 1) we know what year is now 2) we know when it was made, which allows for some minor inference there.
Naturally, some aliases are more elaborate than others. Some are written backwards and/or reference a popular show or popular sci-fi author. Some are anagrams ( and - I discovered today - require additional datasets to tag properly so that is another thing I will need to dig up from somewhere ). And to complicate things further, some aliases use references that are ambiguous and/or belong in more than one category ( Tesla being one of them ).
The original approach was to just throw everything into LLM and see what it comes up with, but the results were somewhat uneven so I decided to start from scratch and do normal analysis ( language, references, how digits are used and so on - it is still amazing how well that one seems to work ).
Sadly, it is still a work in progress ( I was hoping for a quick project, but I am kinda getting into it ) and I probably won't touch until next weekend since the coming week promises to be challenging.
Unfortunately, this means in your particular alias ended up as:
Alias category is_random length is_anagram generic_signal
Loughla Mixed Case 0 7 FALSE FALSE
( remaining fields were empty, basically couldn't put a finger on you:D). If you can provide me with an approximate age, it would help with my testing though:D
edit: This being HN. Vast majority of references are technology related.
I have a separate - not fully implemented - section for more semi-random aliases, but it revolves around our tendency to use default settings and commonly used tools for generating them. Thus far the only thing I was able to show with it is that it is not uncommon, but no clear proxy for age.. so seems like a dead end.
I am so sorry to see this. The absolute last thing we need is yet another fossil guzzling vehicle. Please stop what you are doing immediately and go back to the drawing board and help save our planet by inventing a craft that uses solar power to charge batteries to fly. I am sure I will get down voted to hell for this. But tough shit. The planet is chocking and we simply do not need more people burning fossil fuel for fun and pleasure. Stop. Please. Stop and rethink this.
I am old enough to remember when people said TV was a passing fad.
And the radio.
And the printing press.
And the telegraph.
And the written word. I mean come on you lazy shlubs, memorize Beowulf like we had to back in my day.
OK, I am not actually that old.
My point is, that with every technology that has been invented to improve, or expand the ability of humans to communicate, there have been the detractors and naysayers predicting the inevitable doom of said technology.
I am still waiting for that whole writing things down instead of memorizing them thing to finally go out of style.
I'm not sure if the examples you bring make the point you're trying to make. For most practical intents and purposes, printed press is but a small shadow of its former self. Pretty much all outlets focus on the digital and many have stopped printing altogether. Radio is the same, as a fraction of the population, the numbers are hitting record lows. Most people listen to Spotify, Apple Music, YouTube, podcasts, etc, not radio. I doubt I need to even mention classical TV. Point being, all of these technologies exist and people do consume them, yes, but compared to their former glory they're all practically dead.
And yet they persist.
And continue to evolve.
And TV, now streaming over the internet.
The way humans communicate evolves.
And so will the internet, and social media and all the rest to come.
Think of it like this...
Radio didn't die, it evolved, into streaming music.
TV didn't die, it evolved to streaming TV>
The printed medium did not die, it evolved into HTML and web pages, a fancier form of type setting.
The telegraph didn't die, it evolved into digital communications.
See, it's not that things die and go away, it is a process of improving how humans interact.
Some may find it difficult, or maybe the isolation is a problem, then there is an evolution.
It will not stop, it will evolve to the next step.
These aren't the laments of a dying internet. They're the laments of a person mourning a time and place that will never come back but without the social awareness to realize that. That's the trouble with most of these kinds of laments.
New media exploration is new, fresh, and chaotic. The kids on Discord channels and those watching streamers and VTubers have this same energy. The old guys looking for mailing lists are sneaking a peek in between looking at their kids, doing their household chores, and finishing work. The vibes are off cause of the audience.
And bad for kids. Go back and you can read about how dreadful it is that some people are letting their children read novels. What sort of person would do that?
But the article isn't really about that, to the extent it's really about anything except the author's need to feel very, very smart. It's a vague gesture at how "over" it they are, for any value of "it". Best to pat them on the head condescendingly and then move on.
The person who wrote the article used the internet for all the sub-references. Had it not been for the internet, this person most likely would not have known all the things they mentioned. I don't know if they are listening, but it would be an interesting question.
All these technologies have something in common however. They get coopted for misinformation. I think a lot of fear about "new media" whatever form it might be is simply reactionary from a media literacy perspective. The adult of the radio age might understand that one can have some media literacy with the radio, not believe everything they say, waste their time on it, etc. But their kids who are watching TV all day didn't get that lesson in school, clearly the case from watching TV all day and not playing like a normal kid over the last millenia, and since they are kids they don't understand nuance so its simpler to put the foot down, and say "shut that damn TV off."
I think better lessons in media literacy would help a lot of situations like this, however there is very strong incentive in our world to prevent a high degree of media literacy from taking root, as it would obviate a lot of methods used for controlling subsets of the population.
I have been lucid dreaming for many years and about five years ago, I started keeping my tablet on the nightstand and when I wake up, if I remember it, which I generally do, I use speech to text to record it. If I don't do that I often forget them. I have categories, the movie type are like 3rd person and it plays out like I am watching a movie. A ride along dream is like first person, I feel like I am riding along with someone else, but I know it's not me, but I know their thoughts. The me dream is when I am me in the dream. Some of them repeat multiple times over several nights In all cases I know I am dreaming. I started turning some of them into short stories. No idea where some of them come from. Often it seems like I just tuned into someone's story, but the subconscious is a funny thing.
I have a very similar experience. A worthy distinction is that in my dreams I'm either a) fully aware it's a dream but have no control or b) fully in control of myself without realising it's a dream.
Only once I was both aware and in control, but it didn't last long.
My grandfather wrote something one could only describe as a manifesto while he was in prison in the 1930's for moonshining, which he was doing to save the family farm during the depression. He sighted Spinoza multiple times which is when I first discovered Spinoza. He included it in his memoir and it really influenced me in a lot of ways. So, thanks to the poster for throwing this up.
That may be the fastest performance of The Flight of the Bumblebee on organ, but that's it. It is fast, but what's the fastest piece? In the context of this article, it is about length, so any organ piece under 53 seconds would be faster (and there are plenty). You could also look at notes per second, but I'm not sure this one would win. There are some crazy dense passages in the organ literature.
When I was working on m Masters degree in Electrical Engineering way back in 1978, when dinosaurs roamed the Earth :) I had a prof who was really amazing, he was a Comp Sci prof and he gave me a copy of GEB and it changed my life. OK, I'm probably overstating a bit, but that first edition copy, it's one of my prized possessions. Even to this day, every once in a while I pull it out and re-read a chapter.
A few years back, I had a condo on Lake Washington (east of Seattle) and a buddy of mine is a microwave engineer and we were hanging out one day out on my deck with our laptops and he mentioned how amazing it was that there were so many WiFi signals and he said he had a 4 foot dish from a 2ghz MW link in the junk pile at work that they had replaced with a newer 11Ghz link. He went and got it and we put a USB WiFi dongle on the feed horn on a tripod and it was amazing, we could point it across the lake and get all these WiFi hot spots some were open. It just shows, a big antenna on one end really helps with the link budget, of course, the beam width is super narrow so it's pretty finiky on the positioning. I am not sure the distance exactly there but it had to be a mile or two.
In Australia, people are ripping out those old 11ghz tv dishes (when they remember they have them on thier roof still) now that the DVB-T free to air is somewhat decent in cities, and online streaming has taken off.
It’s not entirely unrealistic to get several and replace the LNB’s with Wifi adaptors, and set up multi diferectional internet spongeing setup.
Meanwhile I just installed three Ubiquiti APs in a ~64sqm (~700 sqft) apartment because fuck if reinforced concrete walls and thick doors are going to stand in the way of us having good WiFi coverage at home.
reply