In this paper, we study the problem of unsupervised object segmentation from single images. We do not introduce a new algorithm, but systematically investigate the effectiveness of existing unsupervised models on challenging real-world images. We firstly introduce four complexity factors to quantitatively measure the distributions of object- and scene-level biases in appearance and geometry for datasets with human annotations. With the aid of these factors, we empirically find that, not surprisingly, existing unsupervised models catastrophically fail to segment generic objects in real-world images, although they can easily achieve excellent performance on numerous simple synthetic datasets, due to the vast gap in objectness biases between synthetic and real images. By conducting extensive experiments on multiple groups of ablated real-world datasets, we ultimately find that the key factors underlying the colossal failure of existing unsupervised models on real-world images is the challenging distributions of object- and scene-level biases in appearance and geometry. Because of this, the inductive biases introduced in existing unsupervised models can hardly capture the diverse object distributions. Our research results suggest that future work should exploit more explicit objectness biases in the network design.
Given an RGB image, we first convert it to grayscale, then calculate its gradient horizontally and vertically. Specifically, to avoid the effect from background, we remove gradient from object boundary. The final score is the averaged inner gradient.
Given a binary mask of an object shape, we first find its smallest convex polygon that surrounds the object. Factor value is computed as 1 - area of object / area of convex mask.
Given an image consisiting of multiple objects, we first calculate the average RGB color of each object. In RGB space, we average Euclidean distance between each pair of objects. Factor value if computed as 1 - normalized averaged distance.
We calculate diagonal length of bounding box for each object. The averaged diagonal variation is normalized to be the final factor value.
Remove color gradient inside each object such that: Object Color Gradient is effectively reduced; Inter-object Color Similarity remains similar.
Make convex the shape of each object such that: Object Shape Concavity is effectively reduced; Inter-object Shape Variation remains similar.
Replaced with distinctive texture for all objects such that: Object Color Gradient remains similar; Inter-object Color Similarity is effectively reduced.
Rescale for all objects such that: Object Shape Concavity remains similar; Inter-object Shape Variation is effectively reduced.
@inproceedings{, title=, author={}, booktitle={NeurIPS}, year={2022}, }
© This page takes inspiration from http://imagine.enpc.fr/~monniert/DTIClustering/.