Semantic Segmentation: Finding stuff in photos

Which pixels belong to which objects?

There are a handful of different ways that computers can look at images, and one of the options is “stuff” versus “things.” Semantic segmentation can be used to find “stuff,” the number of pixels dedicated to people, cars, vegetation, etc in an image.

Use cases

Semantic segmentation is great at answering the question of “how much land is covered by XXX?” when using satellite imagery. In the example below, the model has been trained to search for vegetation.

Try it out

Here’s a great example that identifies vegetation. Scroll down to click the examples.

VQA Models

State of the art

If you poke around paperswithcode, accuracy really depends on the test you’re working from. It’s everything from around 60% on ADE20K val to almost 90% on Cityscapes.

The state of the art models appear to be InternImage-H and BEiT-3.