AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Back to Blog
Entropy machine learning10/5/2023 ![]() The word surprise here is actually a synonym to the information gain. I'd like to share a bit more about my understanding of the surprise mentioned in last paragraph. I didn't mention the cross-entropy here since I think this jargon is too intimidating and not much helpful to reveal the root cause. If the sequence is short, it won't give a convincing result. Only when the sequence is long enough, the average effect will mimic the expectation over the 1/2-1/2 distribution. So, we can see, the 1/3, 2/3 model gives more surprise, which indicates it is worse than the correct model. While the correct 1/2-1/2 model will give: So, in case of a 1000-letter sequence with 500 A and 500 B, the surprise given by the 1/3-2/3 model will be: And we average the total surprise over a long enough sequence. And we use -log(p) to quantify the surprise. For each real word encountered, the language model will give a probability p. To evaluate a language model, we should measure how much surprise it gives us for real sequences in that language. It cannot be used to evaluate the effectiveness of a language model. What I calculated is actually the entropy of the language model distribution. My point is, due to the arbitrariness of the model being tried, we cannot blindly say a smaller entropy indicates a better model.Īfter I re-digested the mentioned NLP book. The second model is obviously better than the first one. And I try to model it through experiment.ĭuring a initial experiment, I see the machine split the following character sequence: And the designer of the machine makes A and B has the equal probability. Suppose we have a machine that spit $2$ characters, A and B, one by one. In other words, we can use entropy as a measure of the quality Structure of a language, then the entropy of the model should be The essential point here is that if a model captures more of the It has the following statement about the relationship between information entropy and language model:
0 Comments
Read More
Leave a Reply. |