Given enough training images, computers compete with medics on diagnosis

While there are still many visual tasks where humans perform better than computers, computers are catching up. Part of the reason for computers' progress has been the development of what are called "deep neural networks," which chain together multiple layers of analysis. These have significantly boosted computers' performance in a variety of visual challenges.

The latest example of this progress comes in a rather significant field: medical diagnosis. A group of Stanford researchers has trained one of Google's deep neural networks on a massive database of images that show skin lesions. By the end, the neural network was competitive with dermatologists when it came to diagnosing cancers using images. While the tests done in this paper don't fully represent the challenges a specialist would face, it's still an impressive improvement in computer performance.

Deep neural networks may sound like a jargonish buzzword, but they're inspired in part by how we think the brain works. The brain's visual system uses different clusters of neurons to extract specific features of a visual scene. This information is gradually integrated to create a picture. The neural network used here, GoogleNet Inception v3, has a similar architecture. You can view it as a long assembly line, except any stage of the assembly line may have multiple image classifiers operating in parallel. Sporadically, these parallel tracks are merged and the results separated again. According to a diagram of the system included in the new paper, information can be processed by as many as 70 individual stages before reaching the end of the system (or as few as 33, if it goes down alternate paths).

We don't know the precise features any one stage focuses on, nor do we know the values it assigns to matching features in an image. We simply know that these values are strengthened and weakened based on the successes and failures that occur as the system is trained.

And in the case of Inception v3, it comes pre-trained for image recognition, having been fed a catalog of nearly 1.3 million images before even being asked to do anything medical.

For the medical images, the authors relied on Stanford's extensive records focusing on skin diseases. In all, they arranged more than 2,000 individual disorders into a tree-like structure based on their relatedness. So for example, all inflammatory problems ended up on one branch of the tree, all the cancers on another. These were further subdivided until the branching reached individual diseases. Inception was then given the tree and a set of nearly 130,000 images of these disorders and was trained to properly identify each. That's over 100 times the number of images as were used for training in the largest previous study of this sort.

The authors then tested the basic classification system against two dermatologists, using a new set of images where the diagnosis had been confirmed by biopsy. On the most basic level of classification—benign, malignant, or a type called "non-neoplastic"—the accuracy of the neural network was over 70 percent while the doctors were in the 60s. When asked for a more detailed classification among nine categories, the neural network had an accuracy of about 55 percent, which is similar to the numbers put up by the dermatologists.

For a further test, the team put Inception up against 21 dermatologists, asking them to determine whether an image contained a benign or malignant lesion. Here, the neural network consistently edged out most of the doctors, and it consistently performed a bit better than their average performance.

Before you conclude that doctors are obsolete, however, remember that neither they nor the algorithm did especially well when simply handed an image of any random skin disease and asked to identify it rather than being asked to provide a yes-or-no malignancy diagnosis. In the former case, the doctors have considerable advantages: they can examine the lesion from multiple angles, feel it and its surrounding tissue to get a sense of its texture and density, ask for additional tests, and evaluate their own uncertainty. Unlike Inception, they're not limited to looking at images.

Did it say "eewwwww"? —

Given enough training images, computers compete with medics on diagnosis

Not a fully accurate test, but still an impressive result.

Further Reading

Channel Ars Technica

Further Reading

reader comments

Channel Ars Technica