Language modeling
Predict the next word given the context:
The simple model:
The simple model:
next word = word frequency probability
Independent assumption
Drawbacks:
Drawbacks:
❌ unseen words in train will have zero probability to appear.
💡split unseen words to parts
FAQ
FAQ
Why we model words, not caracters?
Caracters might have problems with grammar, white spaces and decoding.