For coding it is still 10x worse than gpt4. I asked it to write a simple databas...

swatcoder · on Nov 21, 2023

Because they're ultimately training data simulators and not actually brilliant aritifical programmers, we can expect Microsoft-affiliated models like ChatGPT4 and beyond to have much stronger value for coding because they have unmediated access to GitHub content.

So it's most useful to look at other capabilities and opportunities when evaluating LLM's with a different heritage.

Not to say we shouldn't evaluate this one for coding or report our evaluations, but we shouldn't be surprised that it's not leading the pack on that particular use case.

YetAnotherNick · on Nov 21, 2023

Github full (public) scrape is available to anyone. GPT-4 was trained before Microsoft deal so I don't think it is because of Github access. And GPT-4 is significantly better in everything compared to second best model for that field, not just coding.

avita1 · on Nov 21, 2023

Is this practically true? Yes, anyone can clone any repo from Github, but surely scraping all of Github would run into rate limits?

The terms and conditions say as much https://docs.github.com/en/site-policy/github-terms/github-t...

vineyardmike · on Nov 22, 2023

Well today you get to learn about the GitHub Archive project, which creates dumps of all GitHub data.

One example is the data hosted in Google Cloud.

https://cloud.google.com/blog/topics/public-datasets/github-...

threeseed · on Nov 21, 2023

And there is no evidence that Github is violating any open source licenses.

So they are going to be training on exactly the same data that is available to all.

whimsicalism · on Nov 21, 2023

idk we're just "have more kids" simulators and we do pretty good at programming as a side-task

swatcoder · on Nov 21, 2023

Sure, and those of us who have more robust preparation and expoure generally do a better job of it.

preommr · on Nov 21, 2023

Someone doesn't get good at programming with low quality learning sources. Also, a poor comparison because models are not people - might as well complain about how NPCs in games behave because they fail at problems real people can solve.

whimsicalism · on Nov 21, 2023

We are both substrate that has been aggressively optimized for a task with a lot of side benefits. "NPC"s are not optimized at all, they are coded using symbolic rules/deterministic behavior.

ironrabbit · on Nov 21, 2023

Zero chance private github repos make it into openai training data, can you imagine the shitshow if GPT-4 started regurgitating your org's internal codebase?

nomel · on Nov 22, 2023

Org specific AI is, almost certainly, the killer app. This will have to be possible at some point, or OpenAI will be left in the dust.

whimsicalism · on Nov 21, 2023

You are downvoted but I agree.

diplodinkus · on Nov 21, 2023

Agreed, but I do find gpt4 has been increasing the amount of pseudo code recently. I think they are a/b testing me. I find myself asking if how much energy it wasted giving me replies that I then have to tell it to fix.. Which is of course a silly thing to do, but maybe someone at oAI is listening?

FrenchDevRemote · on Nov 21, 2023

If you mean through the user friendly chat GPT website, they're probably making it output as few tokens as possible to cut costs

FrustratedMonky · on Nov 21, 2023

That can't be, because I can ask it a simple question that an answer is maybe 1 sentence, and it repeats the question then provides a whole novel. So ton of tokens.

madeofpalk · on Nov 21, 2023

GPT still writes like a highschooler trying to hit a high word count :(

droopyEyelids · on Nov 21, 2023

Like a content mill trying to keep you on the page for as long as possible! Which it was trained on.

gtirloni · on Nov 21, 2023

You can ask it to be very concise.

I added it to my custom instructions and it has helped a lot.

gumballindie · on Nov 21, 2023

Wow, imagine paying so they can experiment on you and limit what you get. I so wish i found such … useful clients for my own projects.

FrenchDevRemote · on Nov 22, 2023

It's not experimentation, it's probably one of the only things that allowed them to make gpt 3.5 turbo 10 TIMES cheaper than the previous model.

wouldbecouldbe · on Nov 21, 2023

Yeah but to be honest been a pain last days to get gpt 4 to write full pieces of code for more the 10-15 lines. Have to re-ask many times and at some point it forgets my initial specifications.

s1gnp0st · on Nov 21, 2023

Earlier in the year I had ChatGPT 4 write a large, complicated C program. It did so remarkably well, and most of the code worked without further tweaking.

Today I have the same experience. The thing fills in placeholder comments to skip over more difficult regions of the code, and routinely forgets what we were doing.

Aside all the recent OpenAI drama, I've been displeased as a paying customer that their products routinely make their debut at a much higher level of performance than when they've been in production for a while.

One would expect the opposite unless they're doing a bad job planning capacity. I'm not diminishing the difficulty of what they're doing; nevertheless, from a product perspective this is being handled poorly.

parkerrex · on Nov 21, 2023

Definitely degraded. I recommend being more specific in your prompting. Also if you have threads with a ton of content, they will get slow as molasses. It sucks but giving them a fresh context each day is helpful. I create text expanders for common prompts / resetting context.

eg: Write clean {your_language} code. Include {whatever_you_use} conventions to make the code readable. Do not reply until you have thought out how to implement all of this from a code-writing perspective. Do not include `/..../` or any filler commentary implying that further functionality needs to be written. Be decisive and create code that can run, instead of writing placeholders. Don't be afraid to write hundreds of lines of code. Include file names. Do not reply unless it's a full-fledged production ready code file.

zarzavat · on Nov 21, 2023

These models are black boxes with unlabeled knobs. A change that makes things better for one user might make things worse for another user. It is not necessarily the case that just because it got worse for you that it got worse on average.

Also, the only way for OpenAI to really know if a model is an improvement or not is to test it out on some human guinea pigs.

eyegor · on Nov 21, 2023

My understanding is they reduced the number of ensembles feeding gpt4 so they could support more customers. I want to say they cut it from 16 to 8. Take that with a grain of salt, that comes through the rumor telephone.

Are you prompting it with instructions about how it should behave at the start of a chat, or just using the defaults? You can get better results by starting a chat with "you are an expert X developer, with experience in xyz and write full and complete programs" and tweak as needed.

s1gnp0st · on Nov 21, 2023

Yep, I'm still able to contort prompts to achieve something usable; however, I didn't have to do that at the beginning, and I'd rather pay $100/mo to not have to do so now.

CSMastermind · on Nov 21, 2023

Agreed OpenAI products have a history of degrading in quality over time.

sp332 · on Nov 21, 2023

OpenAI just had to pause signups after demo day because of capacity issues. They also switched to making users pay in advance for usage instead of billing them after.

refulgentis · on Nov 22, 2023

They aren't switching anything with payments. Bad rumor amplified by social contagion and a 100K:1 ratio of people talking about it to people building with it.

hansvm · on Nov 22, 2023

They told me they were switching and haven't sent anything since to the contrary.

vanviegen · on Nov 21, 2023

Could the (perceived) drop in quality be due to ChatGPT switching from GPT-4 to GPT-4-turbo?

wouldbecouldbe · on Nov 21, 2023

Im not really sure what chatgpt+ is serving me. There was a moment it was suddenly blazing fast, that was around the time turbo came out. Off late, it's been either super slow or super fast randomly.

nomel · on Nov 22, 2023

Try using the playground, with a more code specific system prompt, or even put key points/the whole thing into the system prompt. I see better performance, compared to the web.

nmfisher · on Nov 22, 2023

This was one of the main reasons I cancelled my ChatGPT Pro subscription in favour of Claude…but unfortunately Claude is now doing the same thing too.

nafizh · on Nov 21, 2023

This has exactly been my experience for at least the last 3 months. At this point, I am thinking if paying that 20 bucks is even worth anymore which is a shame because when gpt-4 first came out, it was remembering everything in a long conversation and self-correcting itself based on modifications.

hobo_mark · on Nov 21, 2023

Since I do not use it every day, I only pay for API access directly and it costs me a fraction of that. You can trivially make your own ChatGPT frontend (and from what people write you could make GPT write most of the code, although it's never been my experience).

mercer · on Nov 21, 2023

same. what would you use as an alternative?

ren_engineer · on Nov 21, 2023

definitely noticed it being "lazy" in the sense it will give the outline for code and then literally put in comments telling me to fill out the rest, basically pseudocode. Have to assume they are trying to save on token output to reduce resources used when they can get away with it

squeaky-clean · on Nov 21, 2023

Even when I literally ask it for code it will often not give me code and will give me a high level overview or pseudocode until I ask it again for actual code.

It's pretty funny that my second message is often "that doesn't look like any programming language I recognize. I tried running it in Python and got lots of errors".

"My apologies, that message was an explanation of how to solve your problem, not code. I'll provide a concrete example in Python."

charlesischuck · on Nov 21, 2023

You should read how the infrastructure of gpt works. In peak times you response quality will drop. Microsoft has a few whitepapers on it.

Ideal output is when nobody elese is using the tool.

taf2 · on Nov 21, 2023

noticing the same - what about with gpt-4 via api?

johnisgood · on Nov 21, 2023

I had one chat with ChatGPT 3.5 where it would tell me the correct options (switches) to a command, and then a couple weeks later it is telling me this (in the same chat FWIW):

> As of my last knowledge update in September 2021, the XY framework did not have a --abc or --bca option in its default project generator.

Huh...

inciampati · on Nov 21, 2023

Except: you can feed it an entire programming language manual, all the docs for all the modules you want to use, and _then_ it's stunningly good, whipping chatgpt4 that same 10x.

michaelt · on Nov 21, 2023

I gather the pricing is $8 for a million input tokens [1] so if your language's manual is the size of a typical paperback novel, that'd be about $0.8 per question. And presumably you get to pay that if you ask any follow-up questions too.

Sounds like a kinda expensive way of doing things, to me.

[1] https://www-files.anthropic.com/production/images/model_pric...

infecto · on Nov 21, 2023

From my perspective it sounds pretty cheap if we get to the answers immediately.

esafak · on Nov 21, 2023

Have you tried it? GPT4 fails as often as it succeeds at coding questions I ask so I'm not going to shell out that kind of money to take my chances.

infecto · on Nov 21, 2023

Claude? No, have requested access many times but radio silence.

OpenAI? I use ChatGPT A LOT for coding as some mixture of pair programmer and boilerplate, works generally well for me. On the API side use it heavily for other work and its more directed and have a very high acceptance rate.

cowthulhu · on Nov 21, 2023

If you need a lot of revisions/tweaks, the price could be pretty prohibitive.

FrustratedMonky · on Nov 21, 2023

Can you just tell it to focus on a particular language and have it go find the manuals? If it is so easy to add manuals, maybe they should just make options to do that for you.

chubot · on Nov 21, 2023

How do you do this? Links / more info?

davedx · on Nov 21, 2023

I honestly don’t have time for that level of prompt engineering. So, chatGPT wins (for me)

roflyear · on Nov 21, 2023

Right "may as well do it myself" - I think this is the natural limit these things will reach. Just my opinion.

machiaweliczny · on Nov 21, 2023

Yeah but if their model would be accessible it would already have good vscode extension

p1esk · on Nov 21, 2023

Gpt4 has 128k context length now.

whimsicalism · on Nov 21, 2023

gpt4 turbo

vasili111 · on Nov 21, 2023

Am I only one that thinks that Claude 2 is not bad for programming questions? I do not think it is best one for programming questions but I do not think that it is bad too. I have received multiple times very good response from Claude 2 on Python and SQL.

dinvlad · on Nov 21, 2023

I find all of them, gpt4 or not, just suck, plain and simple. They are only good for only the most trivial stuff, but any time the complexity rises even a little bit they all start hallucinate wildly and it becomes very clear they're nothing more than just word salad generators.

charlesischuck · on Nov 21, 2023

I have built large scale distributed gpu (96gpus per job) dnn systems and worked on very advanced code bases.

GPT4 massively sped up my ability to create this.

It is a tool and it takes a lot of time to master it. Took me around 3-6 months of every day use to actually figure out how. You need to go back and try to learn it properly, it's easily 3-5x my work output.

jpeter · on Nov 21, 2023

Including all of Github in your training dataset seems like a good idea