The 2020 data and AI landscape

“Some noteworthy developments:The year of NLP

Transformers, which have been around for some time, and pre-trained language models continue to gain popularity. These are the model of choice for NLP as they permit much higher rates of parallelization and thus larger training data sets.

Google rolled out BERT, the NLP system underpinning Google Search, to 70 new languages.

Google also released ELECTRA, which performs similarly on benchmarks to language models such as GPT and masked language models such as BERT, while being much more compute efficient.

We are also seeing adoption of NLP products that make training models more accessible.

And, of course, the GPT-3 release was greeted with much fanfare. This is a 175 billion parameter model out of Open AI, more than two orders of magnitude larger than GPT-2.”

The Next Generation Of Artificial Intelligence

  • “Three emerging areas within AI that are poised to redefine the field—and society—in the years ahead.
  • Unsupervised Learning – is an approach to AI in which algorithms learn from data without human-provided labels or guidance.
  • Federated Learning – Rather than requiring one unified dataset to train a model, federated learning leaves the data where it is, distributed across numerous devices and servers on the edge. Instead, many versions of the model are sent out—one to each device with training data—and trained locally on each subset of data.
  • Transformers – Transformers’ great innovation is to make language processing parallelized: all the tokens in a given body of text are analyzed at the same time rather than in sequence.”