An Alternate Formulation of The Mutual Information Statistic Which Yields a More Realistic Measure, Leading to a More Precise and Dependable Model of Natural Language Translation Probabilities Among Parallel Corpora
DOI:
https://doi.org/10.26821/IJSHRE.13.01.2025.130104Keywords:
Mutual Information, Machine Translation, Parallel Corpus, Natural Language, Probabilities, Model ParametersAbstract
The Mutual Information is an important statistic which quantifies the relation between two probability distributions in terms of the mutual information content across the two (parallel) linguistic corpuses under consideration. This is a significant conceptual model which forms the basis of techniques which explore the correspondence of the parallel corpuses for similarity and translation. The traditional formulation of Mutual Information is useful in this regard to model parallel corpus instances and obtain inferences useful for corresponding translations. However, the inherent lacuna in this scheme is the loss of generality due to reliance on specific corpus instances and thereby failing/underperforming with respect to realistic precision and reliability of the inferred model parameters for general use. This work is an attempt to address this lacuna by introducing a novel formulation for (True) Mutual Information which is robust from being affected by the loss of generality (condition) mentioned above. The novel formulation recognizes the fact that the (True) MI statistic can be obtained as measure of deviation of the individual probabilities of the distribution from critical values (as will be elaborated) , which are obtained by computing the Takens embedding of the distribution. The resultant measure is robust in generality as the Takens embedding captures how the distribution tends to move (in time), thus providing a Truer picture which is independent of the specifics of the choice of particular corpus instances..
References
The Mathematics of Statistical Machine Translation: Parameter Estimation Peter E Brown* Vincent J. Della Pietra* Stephen A. Della Pietra* Robert L. Mercer*
Information Retrieval and Information Theory Jaime G. Carbonell ,Daniel Sleator
A language modeling approach to information retrieval • J. Ponte, W. Bruce Croft
Term-weighting approaches in automatic text retrieval Gerard Salton, Christopher Buckley
Optimizing Document Indexing and Search Term Weighting Based on Probabilistic Models N. FuhrC. Buckley