What is in your experience a better way to subtract the effect of personal feelings from performance reviews? How can you scale, and make it fair? Is it even desirable, as an unpleasant but productive person may have an impact on the morale of a team?
I've never had to manage "at scale", but I've spent the last several years of my career with one foot in management and the other doing actual development. I've been the person reporting to the people who lack introspection.
Ultimately, in my opinion, I think that trying to create the perfect algorithm for evaluating performance is futile. You can't really remove the effect of personal feelings because they play a huge part in group dynamics and in the end you are not trying to optimize for a single person, you want to optimize your team (or organization).
My experience has been that if you just listen to your team and watch them it'll be very obvious who is adding value, who is neutral, and who is a net negative. So long as your team size isn't larger than a dozen or so people, the team will seem to naturally coalesce around people who get stuff done, and they will tell you (sometimes with honest words and a tone that implies sarcasm or joking to remove the feeling of guilt) when someone isn't doing their part. The entire thing can be very opaque- I know of cases where one person seemed to get far less done than everyone else on the team (completed fewer projects, committed less code), and yet was very highly regarded. One might think that she was simply well liked if unproductive, but after she left for a different position the productivity of the team declined significantly. She turned out to have been so well respected because while she didn't create a lot of code she had a knack for spotting problems early on (things that were never seen outside the team because they were fixed early) and coordinating decision making between different people and projects. After she left there was a brief surge in productivity, followed by a disastrous slog as development was mired in bugs and component rework because of poor communication of specifications. In the same team, much later, there was an analogous incident where a developer appeared to be one of the most productive developers on the team. He was well liked as well and often participated in team activities, etc. but there was a constant undertone of joking about how much difficulty and complexity the person introduced into the code base. When I stepped in and did a thorough code review and looked at his parts of the project I realized all the joking had been really telling of a very real problem. The code was nightmarish. After that person was moved away from development and back into devops the productivity of the team had a short slump as everyone worked to refactor or replace his parts of the project, and then eventually ended up being much higher even with one less person.
None of the anecdotes are really new to anyone whose done development before, we know about net-negative producing programmers and people who catalyze work more than actually doing it themselves. I really just bring it up as a reminder of why it's so hard to have any objective measure, but also looking at how the other team members react to a person is very telling.
Well, that works when you manage a small team, but if you were in charge of say, Google Search or Microsoft Excel how would you go about managing hundreds or thousands of developers? Knowing you can't even be around the several hundred team leaders, let alone every developer, so managing by feel becomes impossible. You have thousands of new, and leaving engineers, you need some kind of order to the chaos...
Personally, I only tackle problems where I know I can keep the team small and in constant contact. We work remotely, and prefer Haskell, both of which self-select for people who are capable of managing themselves and working in such a setup. But it wouldn't scale to Google or Facebook scale, I don't think. Maybe Whatsapp.