Facebook announced that it developed an algorithm in collaboration with Inria called DINO that enables the training of transformers, a type of machine learning model, without labeled training data. The company claims it sets a new state-of-the-art among unlabeled data training methods and leads to a model that can discover and segment objects in an image or video without a specific objective.
Segmenting objects is used in tasks ranging from swapping out the background of a video chat to teaching robots that navigate through a factory. But it’s considered among the hardest challenges in computer vision because it requires an AI to understand what’s in an image.
Transformers enable AI models to selectively focus on parts of their input, allowing them to reason more effectively. While initially applied to speech and natural language processing, transformers have been adopted for computer vision problems as well as image classification and detection.
DINO works by matching the output of a model over different views of the same image. In doing this, it can effectively discover object parts and shared characteristics across images. Moreover, DINO can connect categories based on visual properties, for example clearly separating animal species with a structure that resembles the biological taxonomy.
Above: Facebook’s DINO system can segment images in an unsupervised fashion.
Facebook claims that DINO is also among the best at identifying image copies, even though it wasn’t designed for this. That means that in the future, DINO-based models could be used to identify misinformation or copyright infringement.
Facebook also today detailed a new machine learning approach called PAWS that ostensibly achieves better classification accuracy than previous state-of-the-art and semi-supervised approaches. Notably, it also requires an order of magnitude — 4 to 12 times — less training, making PAWS a potential fit for for domains where there aren’t many labeled images, like medicine.
Residing between supervised and unsupervised learning, semi-supervised learning accepts data that’s partially labeled or where the majority of the data lacks labels. The ability to work with limited data is a key benefit of semi-supervised learning because data scientists spend the bulk of their time cleaning and organizing data.
PAWS achieves its results by leveraging a portion of labeled data in conjunction with unlabeled data. Given an unlabeled training image, PAWS generates two or more views of the image using random data augmentations and transformations. It then trains a model to make the representations of these views similar to one another.
“With DINO and PAWS, the AI research community can build new computer vision systems that are far less dependent on labeled data and vast computing resources for training,” the Facebook statement continued. “We hope that our experiments will show the community the potential of self-supervised systems trained on [visual transformers] and encourage further adoption.”
Both DINO and PAWS are available in open source.
© 2021 LeackStat.com
2025 © Leackstat. All rights reserved