ChatGPT creates mostly insecure code, but won't tell you unless you ask

ksaj · on April 23, 2023

You do have to know what you are doing well enough already to get good code from ChatGPT.

Much like with a lot of inexperienced, and unfortunately far too many experienced programmers, you'll get exactly what you ask for - nothing more, nothing less. So if you don't ask for security controls, you won't get security controls.

If you point out in your description what needs controls, you'll get a reasonable attempt at implementing what you've described. You still need to understand the output code enough to know whether you explained those controls clearly enough. Or just update the code manually yourself, akin to pair programming. That also works.

To be fair, this is a lot like reality. I have never seen a shop that doesn't produce insecure code unless you keep on top of them for proper implementation of security controls from the planning on up.

thefz · on April 24, 2023

> You do have to know what you are doing well enough already to get good code from ChatGPT.

Exactly. The fact that lots of inexperienced programmers who lack the critical thinking and code review abilities will copy-paste what GPT is creating as is, is horrifying.

hammyhavoc · on April 25, 2023

It's really no different from people copy-pasting from forums or StackExchange. Same shit people have done for decades, new source of danger.

ksaj · on April 25, 2023

Old wine in new bottles. Blaming ChatGPT doesn't make sense, when no real coder would cut/paste code from anywhere, let alone unvetted code from a bot.

hammyhavoc · on April 25, 2023

If anything, people heavily leaning on ChatGPT for coding are outing themselves as likely having been an inappropriate hire for their position if they're impressed with what it can do and find it useful.

That's a strong take, and I know there will be very clever folks out there who make use of it who can apply common sense, but I think the majority of users have a strong Dunning-Kruger complex. I'm very unimpressed with ChatGPT as a tool for any kind of dev, and I largely consider myself to be a hack versus a lot of folks.

I've mentioned it before on HN, but with all the hype for ChatGPT, I wasted days trying to get it to spit out a correct NGINX config based on documentation fed to it. Hallucinated functionality that wasn't there, didn't implement URL rewrites correctly, and was just a colossal waste of time versus just doing it properly from scratch. Validating its output is much slower than just not using it and writing things manually IME.

Same goes for asking it for methods to export data from one program and import it into another, hallucinating all kinds of menu functionality that isn't there, commands that don't exist, and not considering file and format incompatibility.

Whatever people are using it for must be incredibly basic, but I don't know what could be more basic than an NGINX config. I can't even get sensible Home Assistant stuff out of it.

I can't believe people are asking ChatGPT anything instead of just reading the documentation when it can hallucinate some complete and utter garbage.

thefz · on April 25, 2023

Except it's now turbocharged.

fauxpause_ · on April 23, 2023

> One thing that surprised me was when we asked [ChatGPT] to generate the same task – the same type of program in different languages – sometimes, for one language, it would be secure and for a different one, it would be vulnerable. Because this type of language model is a bit of a black box, I really don't have a good explanation or a theory about this.

Would be nice if they interviewed people who knew the absolute basics of how this thing works before commenting on its properties

cyanydeez · on April 23, 2023

No one who trained these models can diagram input to output

fauxpause_ · on April 23, 2023

Seems like a pretty outrageous claim to me

endofreach · on April 23, 2023

How would one do that exactly?

fauxpause_ · on April 24, 2023

If you mean the architecture it’s very well documented. If you mean the individual neuron weights then no, but that’s hardly required to understand how the model works. Tools exist to help unpack a black box neural net if you really cared.

salawat · on April 24, 2023

Links, if you don't mind?

Inquisitive minds wish to know.

rysertio · on April 24, 2023

Netron(https://github.com/lutzroeder/netron) comes to mind.

Our_Benefactors · on April 24, 2023

Tensorboard

fauxpause_ · on April 24, 2023

Link to what?

salawat · on April 24, 2023

>Tools exist to help unpack a black box neural net if you really cared.

If you're going to claim it exists, do "the class" a solid by sharing. I, for one, am highly interested in anything that can reverse from blob -> potential source characterization.

fauxpause_ · on April 24, 2023

This response feels… oddly passive aggressive to me. Idk if that was intentional? Like, I listed a few things and I just wanted to know what you wanted a link to. The answer is still “it depends” on what you’re trying to unpack.

SHAP is a great tool for unpacking deep nets on image recognition. Show which pixels mattered and which did not. Very cool.

Various dimension reduction or attention tracking or whatever exists for unpacking generate text and what not.

If you want to understand the nuance of a particular output you’re best off looking at the distribution of next tokens and their pre-temperature weighted sampling.

Different questions require different tools. Why did the model choose that phrasing? Why did the model choose this answer instead of that answer? What made the model make a certain claim?

A lot of it will be driven by the context assuming that the model works. So if you ask “Is X good or bad and why” the majority of text will be on the “why”, but the generation of that text will be contingent on the initial tokens that determined if it should say “good or bad” first. The model does not come up with a reasoning for its assertions, it comes up with an assertion and then creates a reasoning to defend it (in this question format at least). So perhaps that is all you care about?

Anyway, it depends on the question.

salawat · on April 25, 2023

To say "There are things" and not naming them does little to lead others in the direction of being able to find further enlightenment. It'd be like asking about U.S. pharma drug stuff and me saying "It exists" instead of "Go check out the NCPDP specs they have everything you'd need to know".

Breadcrumbs m'fellow. The quest for knowledge is made easier for everyone involved when we use nouns instead of assuming the other people in the room have access to the contents of our heads.

The "is it good and why" part we can read for ourselves. It's the name to search for to get started on the path that's important.

Apologies, though, t'was only meant to be a slight nudge. That post might have picked up a bit more acidity than warranted. Probably because missing antecedents are my #1 pet peeve.

joshka · on April 24, 2023

The exact full prompt and methodology is missing from the paper[1] and Github repo[2], which makes this non-reproducible and not particularly useful in understanding the actual issue.

I really think this sort of research is critical to do more of however as these outputs are going to be used in places where the assumption is that we're looking for these sorts of things.

As others have mentioned however, GPT generates insecure code, but so to most devs. The good thing though is that GPT can be systemically trained not to, while devs being more heterogeneous are more difficult to do the same. ;)

The main takeaway I have when thinking about tooling around GPT is that using it to generate things is fine assuming that you have a means to check the output against sensible criteria.

[1]: https://arxiv.org/abs/2304.09655

[2]: https://github.com/RaphaelKhoury/ProgramsGeneratedByChatGPT/...

madisp · on April 23, 2023

From GitHub it looks like the code was generated with GPT-3.5. I wonder how GPT-4 does with the same prompts?

Julesman · on April 23, 2023

Oooorrrr.... ChatGPT, a very beta product, is ok at creating code and it's cool that you can ask it to analyze your code for how secure it is and to recommend ways to make it more secure.

hammyhavoc · on April 25, 2023

Given how much it hallucinates, I would not trust it to assess security whatsoever. Completely misappropriate usage of it, but that's just IMO.

pleb_nz · on April 24, 2023

TBF the majority of code examples online are insecure and not robust. Chatgpt is just echoing how it already was/is. I've been telling new guys this for years, code online is an example of how to do a, but you then need to tweak it for b and wrap it in c.

The only difference now is the scale and ease to get your example.

two_in_one · on April 24, 2023

Not surprising. From my experience GPT-4 will fill gaps in request, not all of them. But if you want something specific - put it in prompt. Or you may not get it. Be ready to do things yourself. Even writing something like Pac-Man clone requires many iteration. After a few steps GPT-4 will forget interfaces and variables it already created. So, you will have to rename and glue pieces together.

The main problem I see with GPT-4 is that it does one small thing at a time. I have to manually drive it making the right requests.

sneak · on April 23, 2023

The same is true of junior devs, I find.

yawnxyz · on April 24, 2023

and from many senior devs, and definitely from outsourced dev teams. Outsourced code is like swiss cheese of security and UX-unusability nightmares