Embeddings

Embedding is a trainable lookup table.

Keras doc: Embedding turns any positive integers into dense vectors of fixed size.

Why Embeddings?

  1. OHE vectors are high-dimensional and sparse : len(obj) = len(vocab)

  2. similar words learn similar vectors (aka queen = king - man + woman)

text = “deep learning is very deep and will be deeper”

tokens = ["deep", "learn", "be", "very", "deep", "and", "be", "be", "deep"]

vocab = { 'deep' : 1, 'learn' : 2, 'be' : 3, ''very' : 4, 'and' : 5 }

sentence = [1, 2, 3, 4, 1, 5, 3, 3, 1]

emb('deep') = [.32, .02, .48]


nn.Embeddings(5,3)

vocab = 5, emb = 3

emb = nn.Embeddings(vocab_size, emb_size).


an input x of size (nx1), subject to len(x.unique()) <= vocab:

emb(x) is of size (n x emb_size).