For coding it is still 10x worse than gpt4. I asked it to write a simple database sync function and it gives me tons of pseudocode like `//sync object with best practices`. When I ask it to give me real code it forgets tons of key aspects.
Because they're ultimately training data simulators and not actually brilliant aritifical programmers, we can expect Microsoft-affiliated models like ChatGPT4 and beyond to have much stronger value for coding because they have unmediated access to GitHub content.
So it's most useful to look at other capabilities and opportunities when evaluating LLM's with a different heritage.
Not to say we shouldn't evaluate this one for coding or report our evaluations, but we shouldn't be surprised that it's not leading the pack on that particular use case.
Github full (public) scrape is available to anyone. GPT-4 was trained before Microsoft deal so I don't think it is because of Github access. And GPT-4 is significantly better in everything compared to second best model for that field, not just coding.
Someone doesn't get good at programming with low quality learning sources. Also, a poor comparison because models are not people - might as well complain about how NPCs in games behave because they fail at problems real people can solve.
We are both substrate that has been aggressively optimized for a task with a lot of side benefits. "NPC"s are not optimized at all, they are coded using symbolic rules/deterministic behavior.
Zero chance private github repos make it into openai training data, can you imagine the shitshow if GPT-4 started regurgitating your org's internal codebase?
Agreed, but I do find gpt4 has been increasing the amount of pseudo code recently. I think they are a/b testing me. I find myself asking if how much energy it wasted giving me replies that I then have to tell it to fix.. Which is of course a silly thing to do, but maybe someone at oAI is listening?
That can't be, because I can ask it a simple question that an answer is maybe 1 sentence, and it repeats the question then provides a whole novel. So ton of tokens.
Yeah but to be honest been a pain last days to get gpt 4 to write full pieces of code for more the 10-15 lines. Have to re-ask many times and at some point it forgets my initial specifications.
Earlier in the year I had ChatGPT 4 write a large, complicated C program. It did so remarkably well, and most of the code worked without further tweaking.
Today I have the same experience. The thing fills in placeholder comments to skip over more difficult regions of the code, and routinely forgets what we were doing.
Aside all the recent OpenAI drama, I've been displeased as a paying customer that their products routinely make their debut at a much higher level of performance than when they've been in production for a while.
One would expect the opposite unless they're doing a bad job planning capacity. I'm not diminishing the difficulty of what they're doing; nevertheless, from a product perspective this is being handled poorly.
Definitely degraded. I recommend being more specific in your prompting. Also if you have threads with a ton of content, they will get slow as molasses. It sucks but giving them a fresh context each day is helpful. I create text expanders for common prompts / resetting context.
eg:
Write clean {your_language} code. Include {whatever_you_use} conventions to make the code readable. Do not reply until you have thought out how to implement all of this from a code-writing perspective. Do not include `/..../` or any filler commentary implying that further functionality needs to be written. Be decisive and create code that can run, instead of writing placeholders. Don't be afraid to write hundreds of lines of code. Include file names. Do not reply unless it's a full-fledged production ready code file.
These models are black boxes with unlabeled knobs. A change that makes things better for one user might make things worse for another user. It is not necessarily the case that just because it got worse for you that it got worse on average.
Also, the only way for OpenAI to really know if a model is an improvement or not is to test it out on some human guinea pigs.
My understanding is they reduced the number of ensembles feeding gpt4 so they could support more customers. I want to say they cut it from 16 to 8. Take that with a grain of salt, that comes through the rumor telephone.
Are you prompting it with instructions about how it should behave at the start of a chat, or just using the defaults? You can get better results by starting a chat with "you are an expert X developer, with experience in xyz and write full and complete programs" and tweak as needed.
Yep, I'm still able to contort prompts to achieve something usable; however, I didn't have to do that at the beginning, and I'd rather pay $100/mo to not have to do so now.
OpenAI just had to pause signups after demo day because of capacity issues. They also switched to making users pay in advance for usage instead of billing them after.
They aren't switching anything with payments. Bad rumor amplified by social contagion and a 100K:1 ratio of people talking about it to people building with it.
Im not really sure what chatgpt+ is serving me. There was a moment it was suddenly blazing fast, that was around the time turbo came out. Off late, it's been either super slow or super fast randomly.
Try using the playground, with a more code specific system prompt, or even put key points/the whole thing into the system prompt. I see better performance, compared to the web.
This has exactly been my experience for at least the last 3 months. At this point, I am thinking if paying that 20 bucks is even worth anymore which is a shame because when gpt-4 first came out, it was remembering everything in a long conversation and self-correcting itself based on modifications.
Since I do not use it every day, I only pay for API access directly and it costs me a fraction of that. You can trivially make your own ChatGPT frontend (and from what people write you could make GPT write most of the code, although it's never been my experience).
definitely noticed it being "lazy" in the sense it will give the outline for code and then literally put in comments telling me to fill out the rest, basically pseudocode. Have to assume they are trying to save on token output to reduce resources used when they can get away with it
Even when I literally ask it for code it will often not give me code and will give me a high level overview or pseudocode until I ask it again for actual code.
It's pretty funny that my second message is often "that doesn't look like any programming language I recognize. I tried running it in Python and got lots of errors".
"My apologies, that message was an explanation of how to solve your problem, not code. I'll provide a concrete example in Python."
I had one chat with ChatGPT 3.5 where it would tell me the correct options (switches) to a command, and then a couple weeks later it is telling me this (in the same chat FWIW):
> As of my last knowledge update in September 2021, the XY framework did not have a --abc or --bca option in its default project generator.
Except: you can feed it an entire programming language manual, all the docs for all the modules you want to use, and _then_ it's stunningly good, whipping chatgpt4 that same 10x.
I gather the pricing is $8 for a million input tokens [1] so if your language's manual is the size of a typical paperback novel, that'd be about $0.8 per question. And presumably you get to pay that if you ask any follow-up questions too.
Sounds like a kinda expensive way of doing things, to me.
Claude? No, have requested access many times but radio silence.
OpenAI? I use ChatGPT A LOT for coding as some mixture of pair programmer and boilerplate, works generally well for me. On the API side use it heavily for other work and its more directed and have a very high acceptance rate.
Can you just tell it to focus on a particular language and have it go find the manuals?
If it is so easy to add manuals, maybe they should just make options to do that for you.
Am I only one that thinks that Claude 2 is not bad for programming questions? I do not think it is best one for programming questions but I do not think that it is bad too. I have received multiple times very good response from Claude 2 on Python and SQL.
I find all of them, gpt4 or not, just suck, plain and simple. They are only good for only the most trivial stuff, but any time the complexity rises even a little bit they all start hallucinate wildly and it becomes very clear they're nothing more than just word salad generators.
I have built large scale distributed gpu (96gpus per job) dnn systems and worked on very advanced code bases.
GPT4 massively sped up my ability to create this.
It is a tool and it takes a lot of time to master it. Took me around 3-6 months of every day use to actually figure out how. You need to go back and try to learn it properly, it's easily 3-5x my work output.