Language modeling

Predict the next word given the context:

The simple model:

  • next word = word frequency probability

  • Independent assumption

Drawbacks:

❌ unseen words in train will have zero probability to appear.

  • 💡split unseen words to parts

FAQ

Why we model words, not caracters?

Caracters might have problems with grammar, white spaces and decoding.