Natural language understanding (NLU) and language translation are key to a range of important applications, including identifying and removing harmful content at scale and connecting people across different languages worldwide. Although deep learning–based methods have accelerated progress in language processing in recent years, current systems are still limited when it comes to tasks for which large volumes of labeled training data are not readily available. Recently, Facebook AI has achieved impressive breakthroughs in NLP using semi-supervised and self-supervised learning techniques, which leverage unlabeled data to improve performance beyond purely supervised systems.
We took first place in several languages in the Fourth Conference on Machine Translation (WMT19) competition using a novel kind of semi-supervised training. We’ve also introduced a new self-supervised pretraining approach, RoBERTa, that surpassed all existing NLU systems on several language comprehension tasks. These systems even outperform human baselines in several cases, including English-German translation and five NLU benchmarks.
Across the field, NLU systems have advanced at such a rapid pace that they’ve hit a ceiling on many existing benchmarks. To continue advancing the state of the art, we partnered with New York University (NYU), DeepMind Technologies, and the University of Washington (UW) to develop a brand-new benchmark, leaderboard, and PyTorch toolkit, made up of tasks that we hope will push research further.