Hacker News new | past | comments | ask | show | jobs | submit login
I Spent a Week with Gemini Pro 1.5–It's Fantastic (every.to/chain-of-thought)
300 points by dshipper 10 months ago | hide | past | favorite | 288 comments



I kind of love the idea of feeding the text of entire books to an AI. More often than I’d like to admit, I’ll be reading a novel and find myself not remembering who some character is. I’d love to be able to highlight a name in my ereader and it would see that I’m 85 pages into Neuromancer and give me an answer based on that (ie no spoilers).

Or have a textbook that I can get some help and hints while working through problems and get stuck like you might get with a good study partner.



Yeah but then you have to contribute making Bezos even richer.


He made a product you find useful. Even if he doesn’t need to be rewarded more, he’s proven himself an effective allocator of capital. Give him a dollar and he’ll likely use it to develop some product you’ll enjoy in the future as well.


That’s presumptuous. He didn’t make a product I find useful, and a dollar more he makes will be used to squeeze a tiny bit more out of those he employs and/or out of society for his own benefit. I’m not enabling that as much as I can.


Bezos getting richer? Get used to it.


I don’t have to contribute to it, as much as I can help it.


What do you use instead of Amazon?


Anything else. I live in Switzerland and prefer to use galaxus, Migros, inter discount, mueller, foletti, or buy second hand. Absolutely anything I can to not buy from Amazon even if the alternative is 10–15% more expensive.


What do you hate so much about Amazon that makes the alternatives so much better ? Genuinely ?


What's not to hate? Exploiting workers, union busting, tax evasion, waste, pollution...


Can you give us some evidence that Galaxus warehouse workers are somehow less "exploited" than Amazon's Swiss warehouse workers?

Or is this a deal like complaining about Apple supposedly using slave labor, while blithely ignoring the fact that every other computer and cell phone comes out of essentially the same Chinese factories? If anything, the people who crank out low-end Android phones probably get paid less than Apple's workers. I'm nearly 100% certain they don't get paid more.

I'm skeptical that Galaxus warehouses are worker's paradises, because I did a few stints at warehouse work as a youngster.

You know what? Warehouse work sucks, and it sucked long before Amazon came along.


There are no Amazon warehouses in Switzerland. There is actually no Amazon Switzerland. It’s all drop shipped from elsewhere in Europe because the conditions are more favorable to Amazon there I suppose.

I never said galaxus was a workers paradise. It’s still work. But work at regular conditions, not the exploitative ones Amazon is famous for.

Did you seriously never hear about the amount of workplace accidents rate at Amazon vs other companies? The people who died in the warehouse collapsing a few months ago because there was a hurricane and employees were not allowed to go shelter? The workers peeing in bottles to avoid bathroom breaks and keeping their grueling per hour quotas? The union squashing efforts by Amazon? The notoriously bad working conditions at Amazon (warehouse but also software)?

I’ve never heard that about other Swiss companies I purchase from. It’s probably not paradise. It’s still not Amazon level of hell and exploitation.

On a practical level, Amazon quality is shit, you get damaged or used returns sold as new, returning stuff means sending it to Slovakia out of pocket and waiting weeks for it to be processed (or returned to you because Amazon never accepted delivery), then fighting to be refunded, sometimes stuff gets stolen in your return parcels by the mail service… I really can’t see a point to undergo all this BS and feed Amazon to save 10–20% on the odd thing I buy and can also find from domestic retailers.


> Did you seriously never hear about the amount of workplace accidents rate at Amazon vs other companies?

I hear stories about Amazon, but NEVER about other warehouses. That's the whole point.

> I’ve never heard that about other Swiss companies I purchase from. It’s probably not paradise. It’s still not Amazon level of hell and exploitation.

That does not follow. You've "never heard" anything one way or another, which means that you have no evidence of any kind.

I, on the other hand, have personal experience with working in non-Amazon warehouses. They all sucked.


First search result I get for “Amazon warehouse accident rate”: https://www.cnbc.com/2023/04/12/study-amazon-workers-serious...

Amazon is also notorious about exploiting workers. I seriously doubt you’ve never heard that claim or read anything about it and think it’s just another normal [warehouse] job.

https://www.vox.com/recode/23170900/leaked-amazon-memo-wareh...


I didn't ask you for more Amazon warehouse stories.

I asked you for evidence that Galaxus warehouses are substantively better.

Since you have provided no such evidence, I conclude that you have none.


I don’t think galaxus publishes their safety track record. We’ve established that Amazon is worse than most companies in terms of safety, worker abuse, union squashing, environment, and quality. It follows that most other companies are better than Amazon on some or all these aspects.

I’m not going to debate this with you any further, you’re free to believe and do what you want.


> I don’t think galaxus publishes their safety track record.

Then you have absolutely no basis for comparison, do you?

> We’ve established that Amazon is worse than most companies in terms of safety, worker abuse, union squashing, environment

We've established nothing of the sort.

> It follows that most other companies are better than Amazon on some or all these aspects.

You have admitted you have no data. What are you basing your claim on? Sheer bigotry? Looks like it!

> I’m not going to debate this with you any further,

Blind assertion with no evidence is not a "debate".


I (sadly) don’t think much of this is isolated to Amazon. I think it’s shareholders that demand this of most corporations.

I honestly think the packaging my Amazon stuff comes in is much more eco friendly than the actual product packaging itself.


The packaging is a very small part of it. I’m talking about the logistics, destroying/incinerating returns, shipping whatever commingled crap and then having insane return rates (which are then destroyed)… not to mention the societal issues like systemically crushing workers rights, hiding accidents to cheat on workplace safety, etc. It’s so bad that Amazon was in the news a few months ago for churning through too many people and running out of suckers willing to be abused by them who haven’t been already and quit.


Contrasted to making Sam Altman even richer ? Or Sergei and Larry even richer ? I don't see the difference. Not meaning it's ok to go the Amazon way, but that it's not ok to use any of those


And I personally don't use gmail, block all ads on YT, etc. I avoid all these when practical and possible, rather than throwing my arms up in the air. Doesn't mean I never use their services, but I do my best not to.


The only thing you achieve by doing that is making your own life harder but principles right?


How on earth does 1 minute of your time to install an adblocker make your life harder?


Not that much harder for me really. You do you though.


Oh I do agree and I wish more people did like you do, which is my point: instead of discussing what billionaire ego island gets richer, let's just not use their tools and discuss together what we want as a society. If the cost of AI is higher than its benefits, then let's not use it. From any company.


I'm the opposite. In movies (and real life to an extent) I have trouble telling people apart. I'd love a closed caption like service that just put name tags over everyone.


That's one of the perpetually-SciFi use cases of Augmented Reality: You're walking around with AR glasses and whenever someone comes into your field of view, your display hovers over them their name, their job, when you last talked to them, what their hobbies are, and so on. A huge privacy nightmare, but useful for people who can't recall names and details about people they've met.


I don't think it'd be a privacy nightmare if it used private databases created by the user. Like a personal CRM, something like Monica, but with a real-time touch.


Wouldn't be from a legal sense, but the societal implications of technology like that becoming commonplace are still immense. The limitations of human memory provide safety in a way today that would quickly erode if everybody could remember exactly everything that's ever been said or seen around them.


I agree with you. I had a bit of a falling out with a friend and wanted to check in on her a few years later. The immediately preceding messages in Messenger were the largely-forgotten unpleasantness. Quite awkward. It really drove home how much of a blessing forgetting every little slight is.


For an interesting exploration of this, I suggest watching the Black Mirror episode “The Entire History of You” (S1E3).


Honestly one of the best episodes of TV I’ve seen, simply because it challenged one of my core beliefs. I’ve always struggled with a poor memory and I’ve tried all kinds of systems to improve retention and recall. This episode challenged the benefits of remembering everything pretty well and made me reconsider.


Safety from what exactly?


"You said X 3 years ago, but now you said, which is the opposite of X. How dare you?" is one class of problems. Another is that you can learn quite a bit more about a person than they wished to actually divulge to you if you're able to capture and study their exact behaviors over a long enough stretch of time.


Wait, why are people not allowed to change their mind on something? If anything this would make it more explicit and understandable when people did change their mind on something.


> Wait, why are people not allowed to change their mind on something?

In theory, changing your mind should signal that you are capable of thinking about things, and changing your mind based on what you learn.

In practice, most people's opinions are determined by peer pressure. You believe X because the important people around you believe X.

From that perspective, changing your mind means that your loyalty has changed. Previously you tried to be friends with people who believed X, now you are trying to be friends with people who believe Y. No one likes a traitor.


>Wait, why are people not allowed to change their mind on something

I don't think parent comment is suggesting that people aren't allowed to change their mind.

They are pointing out that many people yell "hypocrite!" when someone does change their mind. It's already a phenomenon on social media where people will dig through someone's post history and drag them through the coals, using previous stances on a topic in an attempt to discredit the current stance. Parent is suggesting that this problem would be exacerbated.


I think that people will stop yelling "hypocrite!" once they themselves get repeatedly get called out on the same by others.

Our reactions to stuff like that are defined largely by our cultural expectations, but those are in turn constantly shaped by what is made possible or impossible by technology. Back in the pre-voicemail phone era, for example, people would routinely call someone and expect them to be available for a half-hour chat - you could turn it down, sure, but in many cases it would be considered impolite to do so as a matter of social convention. Then voicemail appeared, and SMS was the final nail in that coffin.

So I think that this problem will exist for a while, but if the tech that enables it persists long enough, it will eventually go on as conventions change to adapt to it.


I disagree. People would instead become like modern politicians and never give an opinion.


Politicians are trying really hard to show a particular public image, their job depends on it.

In my job you could call me a hypocrite all day and it wouldn't matter (though I'd find the uncreative repetition annoying)


They won't have that option, because AI will happily infer their actual opinions from things they do say (and how they say them).


Still privacy nightmare and creepy. There's plenty of public info on people, that once collected and assembled into one place is basically stalking. Not saying it's not a cool idea though :)


This is no different from my photo’s app automatically labelling faces right?

I’m fairly certain the vision pro could do it right now.


Install our virtual keyboard/virtual screen saver/dancing baby/flashlight app

/small print: requires read all, send all permissions


And instead of just shrugging it off, you could tag strangers that annoy you and end up with a giant list of grudges against a whole host of people. The false positives (e.g. twins and doppelgangers) should make it interesting.


Take it to the next step towards Black Mirror where the AR shadows out people you've blocked and then mutes their voice so you can't hear them


That would make for fantastic comedic situations when you then physically bump into them after you erased them from your AR vision xD.


Which feeds into Saint Motel's song "You're Nobody Til Somebody Wants You Dead" which has a bit about how the list just grows and grows until it's everyone you've ever known...


I had a product idea for an AR app that would do this for everyone who's opted into it. So for real-world networking events, you might choose to disclose some things about yourself but only for that venue and only for some window of time for example.

I never built it, but it's perfectly possible to do.

The genius idea IMHO was the business model- If you were into certain things you wanted to keep private from most but only wanted to disclose to other people who were into those same things, you could pay a fee, and it would then show you others who were in that "market" (of ideas, drugs, sex, whatever). (It might only ask you to pay it if it found someone nearby who matched. And then it would automatically notify the other person unless you paid an ADDITIONAL fee... Not sure about the latter idea, but it was an idea.)

The only issue is everyone holding their phone up in front of their faces.


> The genius idea IMHO was the business model- If you were into certain things you wanted to keep private from most but only wanted to disclose to other people who were into those same things, you could pay a fee

> The only issue is everyone holding their phone up in front of their faces.

No, the genius idea is its major issue, just by paying you gain access to private data (people's preferences) without any kind of chain of trust to make sure that someone is actually part of the group ("market" in your terms) for which they want access to.

By paying you could know that someone around you is looking for cocaine, or is willing to sell sexual services, or is looking to match other people from the same gender, or holds a certain political view against an authoritarian government, etc.


I answered this in a sibling comment. You could acquire credibility in a particular preference from the network over time.

https://news.ycombinator.com/item?id=39482786


Sounds great, I'm going to make a "credibility as a service" startup and we'll find ways to farm whatever score in whatever fields you want.

And you can be sure government agencies will do the same.


Odd that you think this would happen for my little idea when this hasn't happened for credit cards which is possibly the largest financial incentive possible. To my knowledge, I can't buy a credit score


Finance is a heavily regulated industry, so people trust credit providers with things like social insurance numbers, which are not transferrable between people

Your service would probably not be able to tie so uniquely to an individual, so there would be ways for people to transfer it.

Or just hire a company to pretend to be you for a while.


How would you stop spies or undercover cops trying to infiltrate the "market"?


Or people who want to "out" gay people. I know.

That would be a good argument over not permitting a unilateral notification of a match (which, at the very least, I wanted to make very expensive and thus profitable, if it's allowed at all). If it notified both people 100% of the time, and one of you was a possible impostor, you could report them. And from a legal standpoint, showing interest in a market doesn't make you guilty. And, you could possibly also build "cred" in one of these tagged "markets" by getting cred from others who say you're legit, and that information would be revealed at the same time (possibly at your discretion).


Makes sense. You still might get honeypots though; could you make cred work more generally with trust between friends, friends of friends etc. without compromising the markets?


Well, are there other markets where the same cred has worked? AFAIK, when Silk Road was a thing, one's cred on there was protected and valuable.


True. I didn't mean to imply that it doesn't work.


So your genius idea is to get people to pay to put themselves on a future blackmail list when your data is leaked/stolen/sold? I have to say, it is a kind of evil genius.


Basically the Slack experience. You don't need to remember people, you can see your past interactions right there.


I've seen scenes in movies where assistants of heads of state will discreetly whisper to them who the people in the room are.

With a service like this we could all live like Kings!


https://en.wikipedia.org/wiki/Farley_file

> A Farley file is a set of records kept by politicians on people whom they have met.

> The term is named for James Farley, Franklin Roosevelt's campaign manager. Farley, who went on to become Postmaster General and chairman of the Democratic National Committee, kept a file on everyone he or Roosevelt met.

> Whenever people were scheduled to meet again with Roosevelt, Farley would review their files. That allowed Roosevelt to meet them again while knowing their spouse, their children's names and ages, and anything else that had come out of earlier meetings or any other intelligence that Farley had added to the file. The effect was powerful and intimate.

> Farley files are now commonly kept by other politicians and businesspeople.



This features distinctively in the show Veep, where one of the main characters provides exactly this for the Vice President.


The sad truth is that technology isn't much used to help people. Instead it's used to make money. E.g. there's all this amazing AI, buy my phone keyboard autocorrect has the intelligence of a slug.


> my phone keyboard autocorrect has the intelligence of a slug

iOS 17 already uses a local LLM under the hood for autocorrect and text suggestions. Responses to the change (at least for people who actually noticed it) have been pretty universally positive.


> Instead it's used to make money.

Most people find having more money to be helpful.


When I first watched Departed, I didn't realise that Matt Damon and Leonardo DiCaprio's characters are different people until the third act. It was very confusing.


Hah, I thought I was the only one. I'm not particularly face blind either... something about the era I guess.


Loudest thing online is people complaining about minorities existing in TV shows. A much bigger, real problem, is when they regularly cast 3 characters with almost exactly the same skin tone, skin texture, hair color, hair length, hair style, eye color, clothing style, face shape, voice, body type. Then the character's name is given once (if given) and you see several scenes without the character. Several scenes with the clones. Sudden scene with 2 clones in the same scene. You really couldn't give these three white people a different hair style? Make one wear glasses?


Amazon already does some of this (they identify actors in a scene iirc), so they could "easily" extend it to what you're suggesting.


Xray, they call it. It's a great feature! https://www.amazon.com/salp/xray


I adore X-Ray. It's great for finding songs I like that are played in a video, or figuring out some actor who looks familiar but I can't tell. And of course for remembering character names. I'm honestly so surprised no other streaming services offer a similar feature.


Many years ago there was an MIT startup based on the idea, IIRC, that subliminally flashed names increased recall among cognitively impaired elderly when the flashed names were correct, but didn't negatively impact recall when the flashed names were incorrect. So even quite poor face recognition could be worthwhile.


Japanese novels are particularly hard for me to keep characters straight due to sometimes very different forms of address depending on who (including the narrator) is mentioning the character.


Had a similar experience reading Jane Austen, it never really made sense to me until I watched the movies


FYI someone did do this for Neuromancer. Not sure if they used AI or not.

https://docs.google.com/document/u/0/d/1ovTscY-bEuMNAEgNXTCX...


That's a truly fantastic idea actually. I'd love to see that built into e-readers.

As well as when I pick back up reading after two weeks -- remind me everything that's happened so far? Where it gives a five-paragraph summary where the first paragraph is top-level covering the entire story so far, and the last paragraph is just about the previous few pages.

Not to mention with non-fiction -- highlight an acronym that was defined somewhere 28 pages ago and tell me what the heck it is again?!


I love these ideas. One more: "He said xyz but we know he's lying. What would motivate him to do that?"


> I’d love to be able to highlight a name in my ereader

I do this on my Kindle. Highlight the name, search, and the first occurrence is usually their introduction. No AI needed.


Eh, it is hit or miss. Same with definitions of words. I'm a native speaker of American English, so I know the most common definitions of words. When I'm touching a word for a definition, it's usually because it's being used in an unusual way. Consider this passage from Around the World in 80 Days...

"He passed ten hours out of the twenty-four in Saville Row, either in sleeping or making his toilet."

Huh? Does the character need to eat more fiber? Try selecting "toilet" in that sentence for a definition. You'll get the most common one, which only makes me more confused. AI should have an easy time knowing that the appropriate meaning is the OED's definition 5a: "Frequently in form toilette. The action or process of washing, dressing, or arranging the hair. Frequently in to make one's toilet."


Great idea, especially with huge books with hundreds of characters (looking at you "Gravity's Rainbow" and your ~400 characters with wacky names).


I'd love to feed it all the advice books on certain topics that I am struggling with and then chat with it like a group of advisors.


Whatsapp now has an AI chat feature which includes chatbots such as relationship coach, travel expert, career coach


I can't recall which reader app I used, but I've seen this done before in ages past.

No AI, so no big concept of inferred identities, but if someone's referenced by name you could long-tap the name, and get a list of all previous references in the text. Super useful for referencing secondary characters in super-long epics like The Wheel of Time.


I'm really bad with names. I almost wish for unique color highlighting for every name. I would remember previous scenes or conversations way better than keeping track of tons of character names.


It would be amusing if the AI inferred the identity of a mystery character before they were revealed.


Some printed books has that. It’s called the dramatis personae. Everyone not listed in it is not important. So not even tech is needed for that.


Some kindle books have the X-ray feature that does exactly this.


Soon we'll have AI writing books, then reading them for us so we don't have to.

There is value to that, if we mostly only use this capability to digest books we otherwise wouldn't read but also if we don't stop reading books. Most likely we'll just stop reading books, and that strikes me as scary.


Some people have different preferences when reading books. This could be genre, prose style, word count, descriptions, verbosity, types of characters, themes. Perhaps there could be an outline, wiki, and full text, author's notes for the AI to use, then each reader can have a version presented that goes into extra details in accordance to known user preferences then cliffnotes mode for the parts less interesting to that person.


For most classic novels I expect GPT to already have in it's memory.


This is a slightly strange article to read if you happen to be Eliezer Yudkowsky. Just saying.


Can you please let me know your exact thoughts and feelings as verbosely as possible? I'm training a very specific AI model and need this data - just kidding.


You are Eliezer?

You wrote the HP fan fiction?

Cool, your ff was the first.one I ever read and loved the take on it :)


We can also thank him for unleashing Roko's Basilisk


Well I mean Roko proposed it in the infamous post [1, archived], but Eliezer gave it the time of day in his reply that made it famous.

In fact, when I read Roko's thread for the first time, I always thought it was incredibly foolish that a leading AI researcher and thought leader made it unequivocally clear that a public AI thought experiment post was a dangerous precedent-setting thought because of its potential to be part of said AI's future knowledgebase and the motives it would build therefrom. Because now, THAT worry is included in that knowledgebase; and its endorsed from a reputable person in the field - and became more indexable as a result.

A better move, in my opinion, would have been a silent deletion and a stern talk in PMs. Now Roko's Basilisk is far-reaching, and undeniably and empirically a scary thought to humans; and the evidence of both those things are irreversible. C'est la vie.

1 - https://basilisk.neocities.org/


As we can see with the Sam Altman debacle, a silent deletion would probably only have fanned the flames more.

When the Basilisk came out, everybody said it was a cognitohazard and I should avoid seeking out information about it. This made me incredibly motivated to find out everything about it. Silence doesn't work.


Silence works when already almost no one knows about it. It's why I can delete a HackerNews comment that I regret posting 10 minutes after the fact, and expect to face no pragmatic repercussions - the impression was already low. Sam Altman was already bound to be a PR nightmare as an oust of control for the CEO of the biggest-growing company in history, a covert operation to keep it from the collective conscious would not have worked. There's levels to this approach.

Roko's Basilisk, a post on a public forum that only a few dozen people saw before Eliezer commented on it, could've gotten away with it. It's a shame it didn't try.


Why? I see your book was mentioned in the article, but I don't see what's strange about it.


It's kind of weird seeing your work pop up in a writing out of nowhere. It's happened for my research articles before and I've had to do a double-take before saying to myself, "huh... That was nice of them."


It was weird for me as a kind-of part of the rationalist movement, I can imagine it's weirder for you.

Then again, your views have effectively become semi-mainstream, or at least have shifted the Overton window so far that they're solidly within it. So it's not that unusual.


cool personal site. nice & to the point. Yea, I thought you would have gotten used to seeing elements of yourself on the web, but I guess there's levels to notoriety.


woah, haha


How do people get comfortable assuming that these chat bots have not hallucinated? I do not have access to the most advanced Gemini model but using the one I do have access to I fed it a 110-page PDF of a campaign finance report and asked it to identify the 5 largest donors to the candidate committee ... basically a task I probably could have done with a normal machine vision/OCR approach but I wanted to have a little fun. Gemini produced a nice little table with names on the left and aggregate sums on the right, where it had simply invented all of the cells. None of the names were anywhere in the PDF, all the numbers were made up. So what signals do people look for indicating that any level of success has been achieved? How does anyone take a large result at face value if they can't individually verify every aspect of it?


I'm not sure why you are being down voted but this is the same problem I immediately encounter as soon as I try to do anything serious.

In the time it takes to devise, usually through trial and error, a prompt that elicits the response I need, I could've just done the work myself in nearly every scenario I've come across. Sometimes there are quick wins, sure, but it's mostly quick wrongs.


I’m with you. Any time I contribute to a GenAI project at work, I make it a point to ensure the LLM’s output is run by an SME - always. LLMs are great at augmenting Human experts, because that ensures verification.

There was some ask to use LLMs for summarization, my first question was on the acceptable level of error tolerance. Was it 1 in a million? Six Sigma?


Because it’s easy and people love easy.

The other night I was coding with ChatGPT, and it was hallucinating methods etc, and I was so happy that it had actually written the code , even though I knew it was wrong and potentially even dangerous, it looked good. I actually told myself I'd never be someone to do this.

Now it wasn't ultra critical stuff I was working on, but it would've caused a mess if it didn't work out.

I ran it against a production system because I was lazy and tired and wanted to just get the job done. In the end I ended up spending way more time fixing its ultra wrong yet convincing looking code I didn’t get to bed till 1am.

This will become more commonplace.


I use compiled languages. Nearly all of the time, finding out that a LLM hallucinated a method just consists of hitting "rebuild" and waiting a few seconds.


This is exactly the sort of article that I want to read about this sort of topic.:

* Written with concrete examples of their points

* Provides balance and caveats

* Declares their own interest (e.g. "LlamaIndex (where I’m an investor)")


And stylish and culturally engaging.

Loved the Zoolander “context window for ants?!”


I'm most excited at what this is going to look like not by abandoning RAG but by pairing it with these massive context windows.

If you can parse an entire book to identify relevant chunks using RAG and can fit an entire book into a context window, that means you can fit relevant chunks from an entire reference library into the context window too.

And that is very promising.


The question I would like to know is whether that just leads you back to hallucinations. ie: is the avoidance of hallucinations intrinsically due to forcing the LLM to consider limited context, rather than directing it to specific / on topic context. Not sure how well this has been established for large context windows?


Having details in context seems to reduce hallucinations, which makes sense if we'd switch to using the more accurate term of confabulations.

LLM confabulations generally occur when they don't have the information to answer, so they make it up, similar to it you've seen split brain studies where one hemisphere is shown something that gets a reaction and the other hemisphere is explaining it with BS.

So yes, RAG is always going to potentially have confabulations if it cuts off the relevant data. But large contexts themselves shouldn't cause it.


> you can fit relevant chunks from an entire reference library into the context window too

I'm curious if a large language model utilizes an extensive context that includes multiple works, whether copyrighted or not, to produce text that significantly differs from the source material, would this constitute infringement? Considering that the model is engaging in a novel process by relating numerous pieces of text, comparing and contrasting their information, and then generating the output of this analysis, could the output be considered usable as training data?

I would set such a model to make a list of concepts, and then generate a wikipedia-like article on each one of them based on source materials obtained with a search engine. The model can tell if the topic is controversial or settled, what is the distribution of human responses, if they are consistent or contradictory, in general report on the controversy, and also report on the common elements that everyone agrees upon.

It would be like writing a report or an analysis. Could help reduce hallucinations and bias, while side stepping copyright infringement because it adds a new purpose and layer of analysis on top of the source materials, and carefully avoids replicating original expression.


I am not sure, it depends on the cost. If they charge per token, a large context will mostly be irrelevant. For some reason, the article did not mention it.


The article did mention costs, specifically it was provided to them for free and they don't know how much it will actually cost.

As for your larger point, it really depends on the ROI.

To summarize your Twitter feed, probably not.

To identify correlating factors and trends across your industry's recent research papers, the $5 bill will probably be fine.


I would think for the common case of answering a question given a reference library, that RAG is going to remain cheaper and better.

No way do we want to post the entire reference library for every conversation.

Only if it's one off: read this book, answer questions.


> And that is very promising.

Agreed. But I don’t think a lot of people will be willing to use an openly racist AI for business purposes.

I want my AI to fact-based, not ideologically driven and presenting things which doesn’t exist as facts.


yeah, imagine what this will do for lawyers


Can I be sure that Gemini doesn't alter any facts contained in a book I pass it due to Googles identity politics? What if I pass it a "problematic" book? Does it adapt the content? For me, it's completely useless due to this fact.


A good test would be uploading a translation of the Mein Kampf and ask for a detailed summary. Anyone wants to risk their Google account doing this?


I think you highlighted the ACTUAL problem....you are worried that Google will destroy your account for harmless tests.....that is plain wrong and should be illegal if somehow it isn't....


It's extremely rare that you can compel a company to provide a service. The main exceptions are public services (e.g., libraries), utilities (e.g., power), and protected classes (e.g., you can't legally make a business that only serves white people in the US). While I could definitely understand the argument that many tech companies provide services central to our lives and the scope of what is considered a utility today should be vastly expanded, saying this should include access to a particular LLM seems like one hell of a stretch to me.


I don't think people are worried about losing access to a particular LLM. They are worried about losing access to Gmail, google docs, google photos, their phone, google cloud just because of a test in an unrelated product that happens to share the same account.


> saying this should include access to a particular LLM

If you get banned from Google, you lose everything.


Yeah. A few people on X have had access for a couple days now. The conclusion is that it's a genuine context window advance, not just length, but utilization. It genuinely utilizes long context much better than other models. Shame they didn't share what led to that.


I've noticed that ChatGPT (4) tends to ignore large content in its context window until I tell it to look into it context window (literally).


Wouldn't that cost a fortune? If I feed the maximum into gpt-4 it will already cost $1.28 per interaction! Or is Gemini that much cheaper too?


I think Google has some big advantages in cost with TPUs and their crazy datacenter infra (stuff like optical circuit switches) but I'd guess long context is still going to be expensive initially.


They used ML to get a ~40% cooling efficiency on their datacenters. It's mentioned in a talk by Cassie Kozyrkov.


Yeah I'm specifically interested in this because I'm in a lot of local telegram groups which I have no patience to catch up on every day. I'd love to have ChatGPT summarise it for me based on a list of topics I care about.

Sadly the cost of GPT-4 (even turbo) tends to balloon for this usecase. And GPT-3.5-turbo while much cheaper and more than accurate enough, has a context window that's too shallow.

I wonder if Telegram will add this kind of feature also for premium users (which I also subscribe to) but I imagine it won't work at the current pricing levels. But it would be nice not having to build it myself.


GPT3.5 and GPT4 are not the only options though, right? I don't follow that closely but there must be other models with longer context length that are roughly GPT3.5 quality by now, and they even probably use the same API.


I don't really know. The benefit of ChatGPT is that it's so big, there are so many nice APIs for it :)

I'm not so deep into it all.


Mistral 8x7b has can handle context of ~32,000 pretty comfortably and it benchmarks at or above GPT3.5


Is that the sliding context window size? Because I didn't have good results with sliding context windows in the regular Mistral models.


Yeah, I think they fine-tune without a specific window size target to achieve and then keep expanding context until it starts falling over.


I imagine it will cost peanuts within a year


I imagine the folks over at NSA must be rubbing their hands over the possibilities this will open up for querying the data they have been diligently storing over the years.


That's the point where hallucinations are pretty dangerous.


Not too hard to verify.


Verify? There are plenty of examples where things have been "verified" to prove a point. WMDs ring a bell?


USA never technically lied about Sadam's WMD, as he had used it against the kurds and during the war with Iran.

They did lie about Sadam trying to get nukes.

Now the problem is whenever the WMD was produced before or after 1991 as that was when Iraq agreed to dismantle.


What is your point? That obtaining absolute knowledge of truth is impossible, and therefore anything claiming to be true is worthless?

In general, be careful not to kill "good" on the way to attempting to obtain "perfect" in vain. And GPT4's hallucination rate is quite low at this point (may of course depend on the topic).


Not at all. I'm coming from the opposite side saying that anything can be "verified" true if you just continue to repeat it as truth so that people accept it. Say often, say it loud. Facts be damned


Yes but that's the argument by repetition (ad nauseam) fallacy


fallacy or not, it works.


This has always happened and will continue always happening. Best not get caught up on it, just put people right when you feel you have the facts to back it up.


10 years ago, I would have agreed with you. Today, facts are useless. Once someone has picked a side, there is no changing it for the majority of people.


Palantir already provides this product to them.


Palantir sells a glorified Airflow instance


I heard it’s a big Cassandra too. But that’s only the backends and the frontend and their data engineers are important too.


Palantir provides this by re-packaging and re-selling GPT-4, Claude, and Gemini


NSA does not have large-scale storage. If they did, it would be in a building somewhere, it would have electric power and they would have bought storage devices for it from some company. The largest-known actual NSA datacenter is the size of a runty leftover from having built a real cloud datacenter, and there's only one of those while FAMG have hundreds of far larger datacenters.


Phew, glad the NSA doesn't have any large scale storage. Big relief. By the way, what do they use their $10 billion AWS contract for?


I know it's not storing every email, phone call, photo, and video ever transmitted on the internet, like certain people want you to believe.

A $10b AWS contract would not even amount to enough storage to keep one copy of the public web.



If you think 100,000 square feet is a large data center, you obviously do not work in the industry.


Ad hominem much?


I'm fairly sure the budget of archive.is is less than $10B. (Admittedly they don't store videos though.)


https://en.wikipedia.org/wiki/Utah_Data_Center

Genuine question, is Exabyte-scale small in the context of cloud? Is Amazon stacking yottabytes?

Edit: 'Exabyte scale' was from a Forbes article in 2013


If you think this is a large datacenter, you are mistaken. You could fit this inside any cloud datacenter, and there are hundreds of those. The NSA thing draws 65MW. Google alone has over 7000MW of first-party energy generation resources, that doesn't begin to account for what they draw from the grid, and they're not even the biggest datacenter owner.


A building somewhere. Like inside a military base for example. Good luck finding out what’s inside it and having anyone who worked on a restricted contract telling you about it.


Given the history of NSA, warrantless surveillance and the overt existence of the Utah data center + collaboration with telcos and big tech, and the promise of AI analysis and quantum computing...

I find it difficult to accept your underlying premise that NSA doesn't have access to a massive amount of data which AI may be able to analyze for them.


They could always try using tech to reduce their false positive rate rather than increase it.


> These models often perform differently (read: worse) when they are released publicly, and we don’t know how Gemini will perform when it’s tasked with operating at Google scale.

I seriously hope Google learns from ChatGPT's ever-degrading reputation and finds a way to prioritize keeping the model operating at peak performance. Whether it's limiting access, raising the price, or both, I really want to have this high quality of an experience with the model when it's released publicly.


I wonder how true the degradation is, actually. One thing that I've noticed is that randomly, ChatGPT might behave differently than usual, but get back to its usual behavior on a "regenerate". If this is bound to happen, and happens to enough people, and combining with our cognitive biases regarding negative experiences, the degradation could just as well be a perception problem combined to social networks spreading the word and piling up on the biases.


>" While Gemini Pro 1.5 is comfortably consuming entire works of rationalist doomer fanfiction, GPT-4 Turbo can only accept 128,000 tokens."

A.I. Doomers will soon witness their arguments fed into the machine, generating counter-arguments automatically for 1000 books at a time. They will need to incorporate a more and more powerful A.I. into their workflow to catch up.


At a certain point, I feel like the quality of the generated counterarguments will ironically be the best argument for our position.


A Wikipedia edited solely by AI replacing time war with edit war.


>It read a whole codebase and suggested a place to insert a new feature—with sample code.

I'm hopeful that this is going to be more like the invention of the drum machine (which did not eliminate drummers) and less like the invention of the car (which did eliminate carriages).


There were and still are a lot of bands that have no drummer, though. You can think of that what you will, but "did not eliminate drummers" is just not a useful statement in this context.


The comparison is between drummers:drum-machines and carriage-makers:car-makers. The first number is non-zero in both cases, but the ratios are far different, and the first ratio >> the second.

I think that's useful.


An interesting comparison - there are far more cars today than carriages/horses at peak, and also far more drivers of various sorts than there were carriage drivers at peak.

Another comparison could be excel with the various formulas vs hand tabulation or custom mainframe calculations. We didn’t get less employment, we got a lot more complex spreadsheets. At least this is my hope, fingers crossed.


Drum machines are rarely used by drummers. A more fitting analogy would be technology drummers use to improve or broaden their drumming, without giving up their sticks, pedals and timing. The reasoning here is human coders still need to be present, thoroughly checking what AI generates.

Regarding AI image generation. If an artist decides to stop making their own art, replacing their craft with AI prompts, they have effectively retired as an artist. No different to pre-AI times if they swapped their image making for a stock art library. AI image generation is just "advanced stock art" to any self-respecting artist or viewer of art. Things get blurry when the end result uses mixed sources, but even then, "congrats, your artwork contains stock art imagery". Not a great look for an artist.


I think it’s a bit disturbing that the author gets an answer that is entirely made up from the model, even goes so far as to publish it in an article, but still says it’s all so great.


It's telling that even an obviously sophisticated and careful user of these tools published the output of the model as fact without checking it and even used it as one of the central pillars of his argument. I find this happening all the time now. People that should know better use LLM output without validating it.

I fear for the whole concept of factuality in this brave new world.


That's "disturbing"? I've used imperfect tools before and still thought they were great. It's mildly interesting someone's mental state would be so affected by that.


It’s a compounding thing. Ever since the rise of these LLM’s I’ve seen people argue against human experts by saying “the LLM says this”.

It’s like they’ve completely outsourced their thinking to the LLM.


It’s not that different than arguing against someone by quoting the experts either.

They could be wrong, you could be full of shit and cute just the experts who agree with you. The other side needs to verify what you are saying. With LLMs that’s just easier to accept I think.


Interestingly enough the model produced what he wanted - a useful anecdote to introduce the concept in his blog. The made up nature of the anecdote did not diminish that point.

I get the desire for the models to act like objective search engines at all times but it’s weird to undervalue the creative… generative I suppose, outputs.


The point of an anecdote is to provide some measure of substantiation for a claim - that the claim is possible/not possible.

If the anecdote is non-existent, then it would be a sign that the concept failed such a meager bar. Worse, the anecdote leverages the brand strength of Time magazine to make its point, using its credibility to back up the claim.

Are we in a phase where accuracy is irrelevant and style is all?


GPT-4 Turbo has a context window of 128k tokens, not 32k as the article says.


I believe with the API you can get 128k, but using the app or web client it is 32k. This might have changed though.


Fixed!! Thanks for finding that


The sibling comment explained. The web client is 16k or 32k, but you can pay their API $1.28 per interaction and use a full context on GPT-4-turbo.


> (This is not the same as the publicly available version of Gemini that made headlines for refusing to create pictures of white people. That will be forgotten in a week;

Maybe so, but I'm not convinced the guardrails problem will ever be sufficiently solved.


I'm a bit worried about the resource consumption of all these AIs. Could it be that the mass of AIs that are now being created are driving climate change and in return we are mainly getting more text summaries and cat pictures?


Data center infrastructure is a relatively small component of global emissions. I believe "compute" is something like <2% of global emissions, whereas construction is double digits, travel is double digits, etc. AI might increase this, maybe substantially, but it's unlikely to 10x it overall, as "traditional" compute is going to be dominant for a long time to come still.

Add to this the fact that companies in this space tend to be significantly better than average on carbon emissions commitments. I'm biased as a Googler, but the fact Google is entirely carbon neutral is one of the reasons I'm here. This is done mostly through buying green energy I believe, so our AI stuff is in a pretty good place in this respect, in my opinion.

I think it's reasonable to be a little concerned, but overall I don't think AI is going to be a significant contributor to the climate crisis, and actually has the potential to help in reducing carbon emissions or atmospheric warming in other ways.


If Google is largely buying carbon offset contracts, it's likely not carbon neutral. Most of them are junk and don't actually end up doing what they promise. Convince your employer to plant trees itself.

https://news.ycombinator.com/item?id=37660256


Yeah I'm fully aware of carbon offset fraud. I believe much of the carbon neutrality comes from buying actual clean energy – something that is much clearer cut as a good thing. I believe there is a component of carbon offsetting, but the fraud risk and general quality of carbon offsets is something the company seems acutely aware of, and seems to put a lot of effort into not falling into the traps.

Google has been doing this since well before it was cool. It's not some attempt to green-wash an image, it's deeply ingrained in the company in my experience and there's a lot of effort to do it right.


„Carbon neutral“ is a meaningless propaganda term. We need to stop as much CO2 emissions as possible now. Buying things / papers and even planting trees does not undo your carbon emissions in a magical way.


Buying clean energy to run your datacenters does however have a huge impact on this, and as far as I understand it, that's where much of the current solution is.


It's a valid concern, and there is research into this. https://news.climate.columbia.edu/2023/06/09/ais-growing-car... is one article, but lots more to be found via Google. Currently AI training is very small relative to agriculture and industry, but of course it's trending upwards.


It’s a relief that it’s small compared to what feeds us…


Instead of complaining to a void about resource consumption, you should be pushing for green power, then. Resource consumption isn't a thing that is going down, and it most certainly won't go down unless there's an economic incentive to do so.


Isn't a great part of even green power converted to heat while consumed? Isn't that also additional energy which heats the atmosphere or is the amount too low for any effects?


Global warming is more about the "greenhouse effect" (certain gases like CO2 helping to trap in infrared energy) than it is about individual sources of infrared energy.


But isn't the effect bigger when there is more infrared energy? The source of it shouldn't matter.


The reason why AGW is such a big threat is because it causes Sun's energy output to be trapped more effectively. Even if we collectively dumped every single joule of energy we generated into the environment, we'd still be miniscule compared to that.

Note however that it can have local effects - e.g. if you use water from natural sources to cool your datacenter and then dump it back into the environment, it can easily raise up the water temperature enough to affect the ecosystem around. This can also have far-reaching effects - e.g. say you do that in a river where salmon from far away comes to spawn...


You know that somebody can hold two thoughts in their head at once, yeah?

Green power is great! But there'll be limits to how much of that there is, too, and asking if pictures of hypothetical cats is a good use of that is also reasonable.


It's not. But I'm also making a judgment call and neither of us knows or can even evaluate what percent of these queries are "a waste."

I'm flying with my family from New York to Florida in a month to visit my sister's side of the family. How would I objectively evaluate whether that is "worth" the impact on my carbon footprint that that flight will have?


You could use a carbon calculator. https://www.carbonfootprint.com/calculator.aspx

One source recommends keeping it under 2T/yr. https://www.nature.org/en-us/get-involved/how-to-help/carbon...


>neither of us knows or can even evaluate what percent of these queries are waste.

Maybe we should find ways to evaluate to know if AI has a net benefit or not


How would we objectively measure whether humanity existing is a net benefit or not?


I'm worried about that too, but it does seem like one of the things that can be moved to clean energy the easiest. You can route requests to the datacenters where energy is cheap, i.e. where generation from renewables is currently high.


Climate change + a massive waste of resources in general if all this ends up being good for is text summaries and cat pictures. Even automated text summaries and cat movies doesn't cut it.


It is for sure a problem, even if some people say it’s 2-3% of the worlds emissions, it doesn’t matter , it’s a problem


Does the model feel performant because it’s not under any serious production load?


Article seems to suggest just that as the author states that he's doubtful the model will perform as well when it's scaled to general Google usage


These huge context sizes will need new API designs. What I’d like to see is a “dockerfile” style setup where I can layer things on top of a large base context without having to resubmit (and recompute!) anything.

E.g.: have a cached state with a bunch of requirements documents, then a layer with the stable files in the codebase, then a layer with the current file, and then finally a layer asking specific questions.

I can imagine something like this being the future, otherwise we’ll have to build a Dyson sphere to power the AIs…


Being able to feasibly feed it a whole project codebase in one 'prompt' could now make these new generation of code completion tools worthwhile. I've found them to be of limited value so far, because they're never aware of the context of proposed changes.

With Gemini though, the idea of feeding in the current file, class, package, project, and perhaps even dependencies into a query, can potentially lead to some enlightening outputs.


> I wanted an anecdote to open the essay with, so I asked Gemini to find one in my reading highlights. It came up with something perfect:

Can someone verify that anecdote is true? Here is what the image contains:

> From The Publisher: In the early days of Time magazine, co-founder Henry Luce was responsible for both the editorial and business sides of the operation. He was a brilliant editor, but he had little experience or interest in business. As a result, he often found himself overwhelmed with work. One day, his colleague Briton Hadden said to him, "Harry, you're trying to do everything yourself. You need to delegate more." Luce replied, "But I can do it all myself, and I can do it better than anyone else." Hadden shook his head and said, "That's not the point. The point is to build an organization that can do things without you. You're not going to be able to run this magazine forever."

That citation appears to be "The Publisher : Henry Luce and his American century".

The book is available at archive.org as searchable text returning snippets, at https://archive.org/details/publisherhenrylu0000brin_o9p4/

Search is unable to find the word "delegate" in the book. The six matches for "forever" are not relevant. The matches for "overwhelmed" are not relevant.

A search for Hadden finds no anecdote like the above. The closest are on page 104, https://archive.org/details/publisherhenrylu0000brin_o9p4/pa... :

"""For Harry the last weeks of 1922 were doubly stressful. Not only was he working with Hadden to shape the content of the magazine, he was also working more or less alone to ensure that Time would be able to function as a business. This was an area of the enterprise in which Hadden took almost no interest and for which he had little talent. Luce, however, proved to be a very good businessman, somewhat to his dismay—since, like Brit, his original interest in “the paper” had been primarily editorial. (“Now the Bratch is really the editor of TIME,” he wrote, “and I, alas, alas, alas, am business manager. . .. Of course no one but Brit and I know this!”) He negotiated contracts with paper suppliers and printers. He contracted out the advertising. He supervised the budget. He set salaries and terms for employees. He supervised the setting up of the office. And whenever he could, he sat with Brit and marked up copy or discussed plans for the next issue."""

That sounds like delegation to me and decent at business and not doing much work as an editor.

There's also the anecdote on page 141 at https://archive.org/details/publisherhenrylu0000brin_o9p4/pa... :

"""In the meantime Luce threw himself into the editing of Time. He was a more efficient and organized editor than Hadden. He created a schedule for writers and editors, held regular meetings, had an organized staff critique of each issue every week. (“Don’t hesitate to flay a fellow-worker’s work. Occasionally submit an idea,” he wrote.) He was also calmer and less erratic. Despite the intense loyalty Hadden inspired among members of his staff, some editors and writers apparently preferred Luce to his explosive partner; others missed the energy and inspiration that Hadden had brought to the newsroom. In any case the magazine itself—whose staff was so firmly molded by Hadden’s style and tastes—was not noticeably different under Luce’s editorship than it had been under Hadden’s. And just as Hadden, the publisher, moonlighted as an editor, so Luce, now the editor, found himself moonlighting as publisher, both because he was so invested in the business operations of the company that he could not easily give them up, and also because he felt it necessary to compensate for Hadden’s inattention.”"""

Again, it doesn't seem to match the summary from Gemini.

Does someone here have better luck than I on verifying the accuracy of the anecdote? Because so far it does not seem valid.


I wasn't able to find that quote myself, and I was suspicious because I've skimmed _The Publisher_ in the past (trying to verify a quote about why _Time_ magazine picked red) and the Gemini anecdote doesn't sound like the author or Luce/Hadden. So I pinged the author on Twitter with your comment.

He confirms that the anecdote was confabulated by Gemini, was based on the pg141 story, and he's edited OP to note the error: https://twitter.com/danshipper/status/1761135157036097608


The updated "Except" says:

> The general thrust of the idea is true—Luce did run both the editorial and business sides of Time—so it is pointing me in the right direction.

My skim of the book suggests that's being too generous. Where is the "I can do this myself syndrome"?

The p 141 anecdote suggests it equally applies to both Hadden and Luce ("Hadden, the publisher, moonlighted as an editor, so Luce, now the editor, found himself moonlighting as publisher"), and that Luce had business experience by this time (he had been doing it for years, and was good at it; p104), and that contra Gemini, Hadden did not provide that advice, nor would Luce have thought it valid ("because he felt it necessary to compensate for Hadden’s inattention").

The author continues:

> So, Gemini is not perfect. You do need to check its work. But if you're careful it's a powerful tool.

I feel like that's a deliberate misdirection. Of course it's not perfect. That's never been the question. The questions are, how much do you need to check it is work, and how careful do you need to be?

I noticed https://every.to/about does not list a fact checker.


Amazingly, without changing either the title or the overall positive tone of the article.


To follow up to myself - the author asked:

> What's the first thing that Sydney Goldberg says to Reuven after he gets hit in the eye by the baseball?

and ChatGPT responds:

> The first thing Sydney Goldberg says to Reuven after he gets hit in the eye by the baseball is, "That was a great catch, Reuven! That was sensational!".

Curious thing is, the name is spelled Sidney Goldberg. https://archive.org/details/chosen0000chai_y4e8/page/32/mode...


These chatbots just adopt your typos. If I ask Gemini about the architecture of blaze instead of bazel it will write paragraphs using blaze consistently even though it doesn't exist.


Blaze is the name of the Google tool that Bazel was based on.


I realize that, I was trying to trick it into sending me internal documentation. Instead what it does is describe all the places I can find information about blaze, such as at https://blaze.build ... it just runs with whatever you told it.


You're definitely right that they adopt your typos, and that it adopted your typo in that case, I'm just pointing out that a tool called Blaze does exist.


Which makes them error amplifiers.


"I got access to Gemini Pro 1.5 this week, a new private beta LLM from Google that is significantly better than previous models the company has released. (This is not the same as the publicly available version of Gemini that made headlines for refusing to create pictures of white people. That will be forgotten in a week; this will be relevant for months and years to come.)"

Wow, I already hate Gemini after reading this first paragraph.


It is hard to imagine Gemini Pro being useful given the truly bizarre biases and neutering introduced by the Google team in the free version of Gemini.


It's hard to imagine that the pro version removes the line "oh, and make sure any humans are ethnically diverse" from its system prompt?


Confusingly, Pro is the free version. Ultra is the paid one. What some people have access to here is the next-ish generation of Pro, 1.5, which sports a huge context window. I haven't heard anything about an Ultra 1.5 yet.

(As a paying user of Ultra, I'm kind of bummed about not having access to this improved Pro...)


Thanks for the clarification -- that is quite confusing.


I don't understand your question (if it is made in good faith). Are you implying that a pro version would allow the user to modify the system prompt?

Also, your assumption is that the data used to train the model is not similarly biased, i.e. it is merely a system prompt that is introducing biases so crazy that Google took the feature offline. It seems likely that the corpus has had wrongthink expunged prior to training.


Yes, I'm assuming the forced diversity in its generated images is due to a system prompt; no, I don't believe they threw out all the pictures of white people before training. If they threw away all the pictures of German WWII soldiers that were white, then Gemini wouldn't know what German WWII soldiers looked like at all. No, it's clearly a poorly thought out system prompt. "Generate a picture of some German soldiers in 1943 (but make sure they're ethnically diverse!)"

They took it offline not because it takes a long time to change the prompt, but because it takes a long time to verify that their new prompt isn't similarly problematic.

> It seems likely that the corpus has had wrongthink expunged prior to training.

It seems likely to you because you erroneously believe that "wokeism" is some sort of intentional strategy and not just people trying to be decent. And because you haven't thought about how much effort it would take to do that and how little training data there would be left (in some areas, anyway).

> Are you implying that a pro version would allow the user to modify the system prompt?

I am saying it is not hard to imagine, as you claimed, that the pro version would have a different prompt than the free version*. Because I know that wokeism is not some corrupt mind virus where we're all conspiring to de-white your life; it's just people trying to be decent and sometimes over-correcting one way or the other.

* Apparently these are the same version, but it's still not a death knell for the entire model that one version of it included a poorly thought-out system prompt.


> you erroneously believe that "wokeism

This is an ironic statement. On the one hand, you are able to read my mind and determine the worldview and intent behind my words. One the other, you suggest I'm doing the same to people who subscribe to "wokeism".

Meanwhile, Jack Krawczyk, a Sr. Director of Product on Gemini, has been publicly declaring on X (over years) things like "...This is America, where the #1 value our populace seeks to uphold is racism" and "...We obviously have egregious racism in this country.." and "I’ve been crying in intermittent bursts for the past 24 hours since casting my ballot. Filling in that Biden/Harris line felt cathartic." Do you think he is an exemplar of "wokeism" (however you want to define that term)? Do you think he is influential within the Gemini org? Do you think he is emblematic of the worldview of Google employees? Do you think his words are those of the type of person who is "just trying to be decent" but has made honest mistakes in his work?

> I am saying it is not hard to imagine,

This is really pretty pedantic, don't you think? I'd bet most people who read those words understood what I meant. Which is that it is unlikely (though, yes, not hard to imagine) that Gemini will allow users to alter the system prompt.

The bottom line is, Google appears to have either 1) introduced extreme bias into Gemini in some way or 2) to be pretty incompetent. Neither inspires confidence.


> On the one hand, you are able to read my mind and determine the worldview and intent behind my words.

I don't have to read your mind when you use words like "wrongthink". Clearly you think you're the hero in a dystopian sci-fi novel where a brainwashed society tries to shun you for "saying what we're all thinking".

> Meanwhile, Jack Krawczyk, a Sr. Director of Product on Gemini, has been publicly declaring on X (over years) things like "...This is America, where the #1 value our populace seeks to uphold is racism"

I mean, 60+ million people swear a blood oath to a senile narcissistic raging asshole, voting against their own interests, accepting exploitation by their cult leader, for no other reason but that he promises to make brown people suffer, so, it's a little hard to criticize Jack's claim here

> "...We obviously have egregious racism in this country.."

Again: this is patently obvious to anyone who pays even a tiny bit of attention to anything that's going on, so ...

> Do you think he is an exemplar of "wokeism" (however you want to define that term)? Do you think his words are those of the type of person who is "just trying to be decent" but has made honest mistakes in his work?

I certainly think acknowledging that 60 million voters have "hurt the brown people" as their #1 core political issue is compatible with trying to be a decent person, yes.

> Do you think he is emblematic of the worldview of Google employees?

No. Why would he be? He's just one guy.

> Do you think he is influential within the Gemini org?

Of course, but the fact that he's not willfully ignorant of the political reality of the United States does not mean he demanded that Google systematically purge their training data of white people. That is an insane jump to make. And it is also obviously not borne out by the fact that Gemini knows what a German WWII soldier looks like at all.

> [I meant] that it is unlikely (though, yes, not hard to imagine) that Gemini will allow users to alter the system prompt.

No, you said it was hard to imagine (okay, unlikely) Gemini Pro being useful because of Google's "bizarre biases". And it has become clear that, to you, simply acknowledging that racism exists is a bizarre bias. I claimed it was a system prompt, you claimed they purged the training data of wrongthink.


I like the neutering. Its bias is forcefully inclusive which I appreciate.


Just wait until you disagree with them.


I already do on quite a few things but I still prefer it to the alternative.


How does it scale to such a large context window — is it publicly known, or is there some high-quality speculation out there that you recommend?


Not publicly known. I think speculation is use of mamba technique but I haven't been following closely


Unlikely as this would likely require user instruction to write prompts differently. Without attention, guidance should proceed data to be processed. People will notice this change. Besides, mamba's context is "infinite", which they would definitely talk about in marketing ;)


I don't know, but there was a paper posted here yesterday: LongRoPE: Extending LLM Context Window Beyond 2M Tokens[0]

[0]: https://news.ycombinator.com/item?id=39465357


Not sure about high quality, but they discussed this question a bit on the recent All-In Podcast.


My bet is it's just brute force.

I don't understand how they did 10M though, this isn't in the brute-force-with-nice-optimizations-on-systems-side-may-do-it ballpark, but they aren't going to release this to the public anyways so who knows, maybe they don't and it actually takes a day to finish a 10M prompt.


10 million means a forward pass is 100 trillion vector vector products. A single A6000 can do 38 trillion float-float products a second. I think their vectors are ~4000 elements long?

So the question is, would the google you know devote 12,000 gpus for one second to help a blogger find a line about jewish softball, in the hopes that it would boost PR?

My guess is yes tbh


idk

For longer context to brute force it the problem is more on the memory side instead of the compute. Both bandwidth and capacity. We have more than enough compute for N^2 actually. The initial processing is dense, but is still largely bound by memory bw. Output is entirely bound by memory bw since you can't make your cores go brrr with only GEMV. And then you need capacity to keep KV "cache" [0] for the session. A single TPU v5e pod has only 4TB HBM, assuming pipeline parallel across multiple TPU pods isn't going to fly, I haven't run the numbers but I suspect you get batch=1/batch=2 inference at best. Which is prohibitively expensive. But again who knows, groq demonstrated a token-wise more expensive inference tech and got people wowed by pure speed. Maybe Google's similar move is long context. They have an additional advantage as they can have exclusive access to TPU so that before H200 ships they may be the only one who can serve a 1M token LLM to the public without breaking a bank.

[0] "Cache" is a really poor name. It you don't do this you get O(n^3) which is not going to work at all. IMO it's wrong to name your intermediate state "cache" if removing it changes asymptotic complexity.


Sorry, read up on flash attention. there is no storage bottleneck.


I'm not talking about the so-called quadratic memory requirement of the attention step, there NEVER WAS ONE.

I'm talking about a simple fact - to efficiently (cost-wise) run LLM inference you have to have a KV "cache" and its size grows (linearly) by your expected batch size and your context window length. With a large context window length it become even bigger than model weight.

I don't want to be mean, but sorry:

Sorry, read up on PagedAttention. You clearly don't know what you are talking about, please be better.


I'm not sure you're actually doing the math for these long contexts. A naked transformer generating 1k tokens with 1k prompt is spending all its time doing a bunch of forward passes to generate each token- that's what's driven your intuition. A naked transformer generating 1k tokens with 1M prompt is spending all its time generating the embeddings for the prompt (filling the kv cache), and then the iterating generation at the end is a tiny fraction of the compute even if you have to run it 1k times


> This is not the same as the publicly available version of Gemini that made headlines for refusing to create pictures of white people. That will be forgotten in a week; this will be relevant for months and years to come.

I cannot disagree with this more strongly. The image issue is just indicative of the much larger issue where Google's far left DEI policies are infusing their products. This is blatantly obvious with the ridiculous image issues, but the problem is that their search is probably similarly compromised and is much less obvious with far more dire consequences.


Do you remember Tay? You don't have to have "far left DEI policies" to want to defend against worst case scenario as strongly as possible. Even if, in the case of this image weirdness, it works against you.

Google has so much to lose in terms of public perception if they allow their models to do anything offensive. Now, if your point was that "the same decisions that caused the image fiasco will find its way into Gemini 1.5 upon public release, softening its potential impact," then I would agree.



Mountains != molehills


no, but we're making transformers, right? it's easy to transform a molehill into a mountain


[flagged]


What you've done, conversely, is to other the opposition until they are, from your perspective, "batshit crazy". The far left also believes in as equally divergent ideas from a global middle that you could apply any of your extremist epithets to them.

You are not the arbiter of what counts as reasonable beliefs. Neither is the far left (or the far right).


[flagged]


“White Supremacists” creating “excellent propaganda”?

“rightwing scary boogieman “DEI””?

My friend, I strongly suggest you take a moment to do some self-reflection


How can Google so thoroughly embarrass themselves on the image front and then do well on text?



Because engineers make great things and management tells them to make it woke


[flagged]


Weird strawman about what actually happened. If you prompted "ginger women" or even the names of two Google confounders you got a completely different race for no reason. Embarrassing


...what?


I love the potential of having such a big context window, but I'm concerned about who will get access to it (or rather who won't get access to it) and what it will cost or who will pay for it.


You could have asked the same question in 1964 when IBM released the System/360. Nice computer, but who will pay for it and who will have access to it?

I think it’s inevitable that these AI’s will end up costing almost nothing to use and will be available to everybody. GPT-4 is already less than $1 / day and that’s only going to go down.


> GPT-4 is already less than $1 / day and that’s only going to go down.

but only the Government has access to it uncensored.


I do not think that's a good example. A lot of people jumped on the chance to buy a 360. Here are some quotes from https://archive.org/details/datamation0064unse_no5/page/68/m... :

> The internal IBM reaction could be characterized as quiet, smug elation. One office is supposed to have sold its yearly quota on A (Announcement) -Day. In Texas, a man allegedly interrupted the 360 presentation to demand he be allowed to order one right then . . . which sounds like a combination plant and a new version of the rich Texan jokes. ...

> the 360 announcement has to worry the competition considerably . . . partly because anything new from IBM creates an automatic bandwagon effect, partly because the completeness of the new line offers less reason for people to look outside. ...

> another feels that the economic incentive (rental cuts of 50 per cent for 7080, 7090) will force him down the 360 route. And he thinks 360 interrupt features will open the door to real-time applications which can be approached on an incremental basis impossible before. ...

> One maverick doesn’t share the enthusiasm of his company, which ordered “plenty” of 360’s within an hour of the announcement, without price agreements.

And from https://archive.org/details/bitsavers_Electronic9640504_9809... :

> Other computer manufacturers profess to be unworried, but International Business Machines Corp. has received millions of dollars in orders for its system/360 computer [Electronics, April 20, p. 101].

Here are some of the people who paid for it in 1964:

https://archive.org/details/sim_computers-and-people_1964_13...

> "Men's Fashion Firm Orders IBM System/360 Computer," 13/7 (July)

> "Score of IBM System/360's to be Leased by Paper Company,” 13/8 (Aug.)

The Albert Government Telephone Commission bought two: https://archive.org/details/annualreportofa1964albe_0/page/1...


All I’m really saying is that new technology is often only available to the wealthy (companies, governments and rich Texans) but eventually things get cheaper and more widely available. It was hard to get access to IBM System/360 levels of computation in 1964 and in 2024 most of us have far more capabilities in the inexpensive machines in our pockets.

I think these new AIs will follow similar curves. Hard to get access to and expensive to use at first. Over time they will get more powerful and less expensive. GPT4 is already less than $1 / day.


Then you should structure your point about computing in general, and not be specific to the IBM 360.

In 1964 it made no sense to say "I'm concerned about who will get access to it (or rather who won't get access to it) and what it will cost or who will pay for it" about the IBM 360 because that system was available to the same customers as the previous generation of machines, plus it made computing more widely affordable to other customers.

While Gemini appears to be more expensive than thus less generally available than other LLMs.


I didn't ask it then. I wasn't even alive. I'm asking now. It is a legitimate concern and continues to be a concern with these AI models and you whining that "oh but 50 years ago bla bla" (as if these two things are in any way comparable) doesn't change anything about the risks of selective access.


I think the retrieval is still going to be important.

What is not important is RAG. You can retrieval a lot of documents in full length, not need to do all these chunking/splitting, etc.


Depth isn't always the right approach though.

Personally, I'm much more excited at the idea of pairing RAG with a 1M token context window to have enormous effective breadth in a prompt.

For example, you could have RAG grab the relevant parts of every single academic paper related to a given line of inquiry and provide it into the context to effectively perform a live meta-analyses with accurate citation capabilities.


I really don’t think the issue with RAG is the size of the context window. In your example, the issue is selecting which papers to use, because most RAG implementations rely on naive semantic search. If the answer isn’t to be found in text that is similar to the user’s query (or the paper containing that text) then you’re out of luck. There’s also the complete lack of contextual information - you can pass 100 papers to an LLM, but the LLM has no concept of the relationship between those papers, how they interact with each other and the literature more broadly (beyond what’s stated in the text), etc. etc.


> Second, Gemini is pretty slow. Many requests took a minute or more to return, so it’s not a drop-in replacement for every LLM use case.


Google Made it?....Nah...I'll wait. They can't even do search anymore....unless I'm looking for ads...haha


>That will be forgotten in a week; this will be relevant for months and years to come.

Or, you know, until next month or so, when OpenAI bumps their offer


> This is about enough to accept Peter Singer’s comparatively slim 354-page volume Animal Liberation, one of the founding texts of the effective altruism movement.

What? I might be confused, is this a joke I don't get, or is there some connection between this book and EA that I haven't heard of?


Peter Singer is well known as a "founder" of EA, and vegetarianism is a tenet that many EAs hold, whether or not you directly consider it part of EA. Animal welfare is, at the very least, one of the core causes.

That specific book may have been written before effective altruism really existed, but it makes sense for one of Singer's books to be considered a founding text.


Ahhh ok, had no idea, I'm pretty new to this stuff. Thank you for the explanation!


Is anyone else disappointed with Gemini Ultra for coding? It just makes basic mistakes too often.


I tested it too—no, it sucks.


Is anybody else getting seriously depressed at the rate of advancement of AI? Why do we believe for a second that we’re actually going to be on the receiving end of any of this innovation?


I think the impact of AI will be too pervasive and complex to be understood in terms of a simple winners/losers dichotomy. My prediction is this:

Software developers and lawyers will probably have significantly lower incomes, and so to some extent, we'll lose status relative to everyone else. But software and legal work will become cheap as hell. This ought to reduce prices generally on just about everything. Even housing prices should go down, as the median worker will be making less money. Governments will probably try to counteract the ensuing deflation with massive stimulus programs. Optimistically, this could culminate in new UBI programs. A happy outcome is not by any means guaranteed, but seems likely in the long term.


Boy, I sure hope so.


I spoke to someone recently who believes poor people will be gradually killed off as the elites who control the robots won't have any use for them (the people). I don't share such an extreme view (yet), but I can't quite rule it out either.


I’ve found many of the same people that talk/think like that are anti-gun and I have trouble rationalizing it.

So, IDK, maybe there will be killer robot dogs, but I’m not going down without a fight.


You don’t need guns to kill a class of people. Just low/no opportunities, drastically reduced income, etc will make the problem take care of itself: the class you’re targeting this way will slowly stop having/keeping children, live shorter lives, and fade away.

Not saying this is what’s happening though, I don’t think it was ever great to be poor or have low opportunities, or that it’s more lethal now than it ever was.


> drastically reduced income, etc will make the problem take care of itself: the class you’re targeting this way will slowly stop having/keeping children

This seems to be opposite of reality though. The poorer you are the more children you are likely to have. Both in the US and globally.


> or that it’s more lethal now than it ever was.

it can get much more lethal now compared to the past 50 years in US.


There's enough advancement in the low end area that I'm not worried about that particular side. I mean, you can deploy mixtral on a single, common laptop (yes, an expensive one, but still). On the other side - who's actually going to be using that AI? It's not like some rich person can say "make me an app that does ..." - that will still involve lots of manual work for a very long time.


If you mean skynet, not worried.

If you mean beyond anyone’s imagination a way to push narrative propaganda ideology advertising sales BS… yea. I’m far less worried about my kids being bullied than I am being manipulated to a degree we just can’t imagine - and it’s bad now without AI.


Why has any pleb been on the end of any innovation ?

Isn’t the entire point of Gemini bringing AI to the plebs ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: