Nick AI Research

Transformers

Pros:

constant path lenght between any tokens
(contrary to RNN, where first token was far from last one)
parallelization

Cons:

self-attention: quadratic in time and space (scaling is an issue)

History

1951 (Shannon) [Statistical] 3-Gram model : a big lookup table P(word | two prev. words) computed on all available texts.

result sample : a bunch of non-sense but the words are from the same "space"

2011 (Sutskever)[Neuron] RNN

result sample: still non-sense, but has a "flow", we can read it as a correct sentence.

2016 (Josefowicz) [Neuron] LSTM

result sample: a bit of sense, some samples even long ones, might be interpreted in some intellifent way.

2018 (Liu, Saleh) Transformers

sentences make sens though might be wrong in logic. non-sence might take place

2019 (Radford) GPT-2

model can make consistent sentences, coherent across many paragraphs making a stories. non-sence might take place

2019 (Brown) GTP-3

fully make sense. is able to flow across many paragraphs. inherit the style of text (poetic aspect)

Page updated

Google Sites

Report abuse