Prompting only works if the training corpus used for the LLM has critical mass on the concept. I agree that the model should have alternatives (i.e. preface the comment with the tag 'Nit:'), but bear in mind code review nit-picking isn't exactly a well-researched area of study. Also to your point, given this was also likely trained predominantly on code, there's probably a lack of comment sentiment in the corpus.