Richard Lilford and David Rosser consider the value and accuracy of measuring individual surgical performance

Surgeons hand reaching for scalpel

Surgeons hand reaching for scalpel

Surgeons hand reaching for scalpel

The government has announced it will publish results for individual surgeons. The measured outcomes of surgery are dependent on two things: the quality of the surgery itself (“the signal”) and all the other factors that affect the outcome of interest (“the noise”).

‘Outcomes should not be used to judge surgeons if those outcomes are not sensitive to surgical practice’

The crucial factor determining the usefulness of any kind of league table is the ratio of the signal to the noise. When the signal is large relative to the noise, league tables can be expected to provide an accurate measure of surgical performance; when the signal is small relative to the noise, the league tables will be less accurate and a point is reached where they contain virtually no information at all.

It would be reasonable to assume that, as an example, damage to an important nerve during surgery to the head and neck is highly preventable; well over 20 per cent of damaged nerves could be prevented. That takes us to the steep part of the graph in the figure. Therefore, league tables have a good chance of identifying poor surgeons who are predisposed to sever nerves.

Other outcomes, however, depend much less on the quality of surgery. For example, the recurrence rate of a bowel tumour is influenced by numerous factors beyond the control of surgeons, who can therefore make only a small contribution to this outcome; bowel cancer deaths are likely to appear on the flat part of the graph.

Medical tray with syringe, pills and water

‘League tables can provide false reassurance and, if overinterpreted, will mask poor performance’

False reassurance

Outcomes should not be used to judge surgeons if those outcomes are not sensitive to surgical practice. When the surgeon has a large influence on outcome, it is plausible to use outcomes as a measure of quality. However there are other factors to consider.

We cannot be sure what proportion of bad outcomes is preventable. While it seems reasonable to surmise that the more technically difficult the operation, the more the outcome will depend on technical skill, it is difficult to be certain this reaches the steep part of the curve in the graph. As such complication or death rates from major surgery − such as heart operations in children or the removal of the pancreas − are more likely to be informative of surgical practice than complications from less demanding procedures, such as hysterectomy or varicose vein repair.

Statistical adjustment for case mix does not remove all sources of bias however. Surgeons attract different caseloads and the best surgeons are often given the trickiest cases. It is a mistake to think case mix adjustment eliminates the influence of all confounding factors, leaving only quality of care as the cause of variation. In fact, case mix adjustment can sometimes even exaggerate the very bias it is designed to counteract.

We must also consider random variation in any measurement. The number of operations of a given type a surgeon carries out can be rather small, yielding wide confidence limits. As a result, league tables will provide false reassurance and, if overinterpreted, will mask poor performance.

Cherry picking

As practice improves, league tables will vitiate their own success and become less useful diagnostically; as surgeons with the worst rates improve or desist, so variance between them decreases and the information content of outcome rates will diminish.

Doctor looking at xray

‘There are lots of real problems with the idea of investigating surgeons according to their outcome rates’

Furthermore, the risk of sanction may discourage surgeons from operating on the patients who are most ill, yet these are often the very cases where surgery can have the biggest impact. In the particular case of heart surgery, league tables have not resulted in progressively lower risk cases being selected over time but, with no counterfactual, we do not know what would have happened had there been no league table. Moreover, there is evidence from a recent systematic review that ”cherry picking” is a risk overall.

Therefore, there are lots of real problems with the idea of investigating surgeons according to their outcome rates, or even with picking your surgeon on this basis. On the other hand, there are occasions when the signal really does stem from the noise and where harm can be prevented by the league table approach. Moreover, concern generated by high profile cases is hard to ignore; indeed it would be wrong to do so.

We offer five suggestions to improve care:

  1. Investigate the investigators by making it the medical director’s job to look at the figures and probe the explanations. The medical director can triangulate the figures with other data. If a surgeon is an outlier, and anaesthetists and theatre sisters corroborate technical incompetence, that is one thing; however, if the outlier turns out to be an acknowledged virtuoso surgeon who attracts the most difficult cases for that reason, that is another thing altogether.
  2. Encourage and sponsor surgeons to create collegiate quality improvement programmes based on prospective clinical data collation, rather than less accurate hospital wide systems using data entered by coding clerks. Such systems are run by surgeons for surgeons and, while the data are publicly reported, the results feed back into the hospitals’ quality improvement programmes in a systematic way. The idea is sustained collaborative action rather than a defensive reaction to an overbearing regulator. A recent systematic review suggested such systems are effective in improving outcomes.
  3. Keep a special look out for extreme variations − surgeons who cross the three standard deviation limit as a sign of “special cause variation” really do warrant a closer look.  
  4. Keep an eye on outcomes by unit, not just by surgeon, to align incentives.
  5. Perhaps above all, do not wait for a surgeon to acquire statistical outlier status to take action if there are other compelling indications of incompetence. Managers who follow the Bayesian paradigm and look for patterns in data of different sorts are much less likely to under or overreact than those who follow a narrow “frequentist tradition”. Numbers are not necessarily more informative than tacit knowledge in this arena.

The government has good intentions but lacks finesse. It should enlist surgeons to the cause and sponsor collaboration, rather than performance managing surgeons from the centre. Evidence suggests surgeons can manage their affairs much better than the government if they are given the resources.

The regulators should not jump on individual surgeons in a draconian fashion but check the medical director is doing his or her investigative job.

Richard Lilford is professor of clinical epidemiology at Birmingham University, David Rosser is medical director of University Hospitals Birmingham Foundation Trust