How to objectively measure developer productivity

One of the big challenges of management is how to ensure that people are adding the correct amount of value for their salary. We can all carry someone a little bit or for a short period if they are struggling or if a particular task doesn't suit their skillset but it is patently not fair if someone is paid significantly more than someone else but yet they are too slow or produce too much low quality output, this affects the overall velocity but it also affects the team morale.

Having a consistent and well-communicated performance mechanism (even if it is low-contact), is important so that people know what is expected of them. As someone once said, how can anybody meet expectations when they don't know what they are?

Unfortunately, measuring performance is much easier in some businesses than in others. In a coffee shop, the performance might simply be that you turn up when on duty, you are polite and you make the drinks as quickly as everyone else. Also, the salaries are low enough that you don't necessarily have to be 100% on it. What about Development? It is like being a Doctor: you are expected to know a certain amount for your grade but you can't necessarily be judged directly on results. It is also an issue in a market that is not flooded with suitable candidates because the experience or training required for entry-level jobs requires dedicated people who might have to spend a number of years committed to a cause, you can't just grab someone off the street and get them coding on day 1 (or can you?)

What is performance?

Performance in a most general sense is a number of easy to describe but hard to measure metrics:

Quality: Does the developer produce code that looks neat, reads well and makes sense with few opportunities for bugs due to the way the code is structured and tested? Does the code perform well by default without entering the esoteric world of premature optimisation?

Speed: Can the developer quickly both understand/debug a problem, produce a design (even if only mentally) and then implement the design? Do they often get bogged down because they don't grasp the problem or they obsess on low-importance factors?

Design: Is the developer able to either produce a suitable architecture/design for the problem they are solving or have the ability to quickly spitball something with colleagues to ensure they don't go down a poorly thought-out route? Do they understand standard patterns and principles so they can smell if something seems incorrect or overly complicated?

General ability: Are they able to comfortably take on the more complex tasks? Are they a go-to person for the most difficult problems? Does their name often get used in a complimentary way "Johnny could nail that in a few minutes"? Are they somebody who seems to add more value to the team than they take away?

Subjective judgment

We will all have subjective judgements about people and although these might be fair, they are easy to game, can be biased by personality and are not accurate enough to make things like salary reviews very easy to justify.

That is not to say they do not have any use, they can be really useful starting points for one-to-ones, if handled carefully and can be a very rough way of working out whether somebody angling for a pay rise ticks the same boxes as someone else who is already on the higher pay amount. You can score these factors, if you spell out what they are, from, say, 1-5 or even just have a tick box for each statement "This person is a go-to person for any problem" and then add them up. In a one-to-one, you could take 1 or 2 of the items that are not ticked and communicate them fairly, especially if also backed up by stats, e.g. "Your comments per code review is quite high, perhaps we can look through some of these comments and see if there are any common issues, we should go through the next month aiming to get zero comments about these specific items".

What we have to do though is to identify where their might be a personality issue and either conciously look past it or work out if there is something about the issue which is related to performance. For example, if someone is really prickly and doesn't take feedback well, this might be "Oh yeah, don't worry about Sami, she is just a little nervous" but it might be important in your definition of performance that someone needs to be able to take feedback well and it needs to turn into a discussion and action.

Objective measurement

You can measure various aspects of a developer's output but we all know that these are highly imperfect. What you should also know, however, is that it doesn't make them valueless. In most cases, if you take enough data points, you should be able to provide a rough comparison of people on the same tier (junior, dev, senior). For example, if all other things being equal, over 6 months, a senior dev does twice as many tasks as another senior dev, the speed difference can be inferred as a performance different.

At the same time, most things also trade-off. Perhaps someone does 100 tasks but has 6 comments per code review whereas someone else does 50 tasks with 0.5 comments per code review.

How do we work out how to measure these thing? Easy, ask yourself how you judge them now and possibly how you could measure them better than you currently do. Engineering is supposed to be self-learning although we do need to balance the quality of the numbers with the amount of effort to get them.

How do we judge quality? Without anything else, measures of quality include comments in code reviews (although not all will be about quality and not all will be problems, maybe just questions). We also measure quality with bugs found during testing and support tickets raised due to bugs in production - although these can't always be traced to a specific developer and might be the gaps between the work that multiple devs do.

So how do we measure these? We need a way to trace problems back to the developer. Comments in PRs are a rough measure but since we are comparing people with each other rather than any arbitrary number, we get a comparative measurement over time and we should be able to see people who perhaps have lots more comments than everyone or even far fewer.

Ideally, we also need each story to be owned by a single dev. Although tasks can be carried out by others, really, the front-end work should own the story so that any testing bugs are atributed to the lack of testing by that dev. Of course, if a bug is noticed e.g. a database column is not wide enough for a field, we could attribute it to whoever decided the column size. We needn't do this for every single thing but, again, over time, we can build up a pattern of people who are often injecting bugs.

Some other metrics can be measured qualitatively. For example, we should be able to see if people always pick the easy tasks and a question like whether somebody is a "go-to person" is probably simply no, sometimes and yes.

What to do with the numbers

We have to accept that we will not get perfect salary parity between developers for the simple reason that it is a judgement call whether a slightly slower person with better quality is the same as a faster person with some quality issues. Also, not all bugs and comments are equal, maybe somebody is great at the main story but gets picked up on small things like code tidiness and quality.

Remember that for the most part, this is an exercise for comparative salaries and not an absolute mechansim to be applied rigidly. People work differently in different teams. Some people go through stuff for a time that makes them perform worse. Some people are super positive and receptive but don't produce great code, although that often feels better than a cranky person who produces better code!

So you can and probably should use the data:

To guide one-to-ones and help people improve areas of weakness
In yearly performance reviews to guide pay rises (this only comes after 11 monthly meetings where you have tried to help them improve!)
To help establish ways you can measure new recruits. If you know how you would judge some code in the office, why not get a candidate to write something much more basic to see how fast they are, how high quality the code is and how well they take feedback?
To help you as a manager to highlight areas where visibility is poor or where processes (ideally automated ones) are missing.

Ultimately, you are hopefully trying to produce a well-oiled machine where designs quickly become deployed in a low-risk way, while also producing a workplace where people known that they make a difference and that they are valued. Any issues around lack of performance harm these goals and that is why we want to carefully consider how we deal with them.