Higher-order derivative test: Difference between revisions
en>Gwen-chan m Reverted edit(s) by 59.177.164.201 identified as test/vandalism using STiki |
en>Addbot m Bot: Migrating 1 interwiki links, now provided by Wikidata on d:q5757974 |
||
Line 1: | Line 1: | ||
'''Statistical machine translation''' ('''SMT''') is a [[machine translation]] [[paradigm]] where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual [[text corpora]]. The statistical approach contrasts with the rule-based approaches to [[machine translation]] as well as with [[example-based machine translation]].{{fact|date=September 2013}} | |||
The first ideas of statistical machine translation were introduced by [[Warren Weaver]] in 1949,<ref>W. Weaver (1955). Translation (1949). In: ''Machine Translation of Languages'', MIT Press, Cambridge, MA.</ref> including the ideas of applying [[Claude Shannon]]'s [[information theory]]. Statistical machine translation was re-introduced in 1993 by researchers at [[IBM]]'s [[Thomas J. Watson Research Center]]<ref>P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer (1993). The mathematics of statistical machine translation: parameter estimation. ''Computational Linguistics'', '''19(2)''', 263-311.</ref> and has contributed to the significant resurgence in interest in machine translation in recent years. Nowadays it is by far the most widely studied machine translation method. | |||
==Basis== | |||
The idea behind statistical machine translation comes from [[information theory]]. A document is translated according to the [[probability distribution]] <math>p(e|f)</math> that a string <math>e</math> in the target language (for example, English) is the translation of a string <math>f</math> in the source language (for example, French). | |||
The problem of modeling the probability distribution <math>p(e|f)</math> has been approached in a number of ways. One intuitive approach is to apply [[Bayes Theorem]], that is <math>p(e|f) \propto p(f|e) p(e)</math>, where the [[translation model]] <math>p(f|e)</math> is the probability that the source string is the translation of the target string, and the [[language model]] <math>p(e)</math> is the probability of seeing that target language string. This decomposition is attractive as it splits the problem into two subproblems. Finding the best translation <math>\tilde{e}</math> is done by picking up the one that gives the highest probability: | |||
:<math> \tilde{e} = arg \max_{e \in e^*} p(e|f) = arg \max_{e\in e^*} p(f|e) p(e) </math>. | |||
For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings <math>e^*</math> in the native language. Performing the search efficiently is the work of a [[machine translation decoder]] that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality. This trade-off between quality and time usage can also be found in [[speech recognition]]. | |||
As the translation systems are not able to store all native strings and their translations, a document is typically translated sentence by sentence, but even this is not enough. Language models are typically approximated by [[smoothed n-gram model|smoothed ''n''-gram model]]s, and similar approaches have been applied to translation models, but there is additional complexity due to different sentence lengths and word orders in the languages. | |||
The statistical translation models were initially [[word]] based (Models 1-5 from [[IBM]] [[Hidden Markov model]] from Stephan Vogel<ref>S. Vogel, H. Ney and C. Tillmann. 1996. HMM-based Word Alignment in StatisticalTranslation. In COLING ’96: The 16th International Conference on Computational Linguistics, pp. 836-841, Copenhagen, Denmark.</ref> and Model 6 from Franz-Joseph Och<ref name="H. Ney. 2003">F. Och and H. Ney. (2003). A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19-51</ref>), but significant advances were made with the introduction of [[phrase]] based models.<ref>P. Koehn, F.J. Och, and D. Marcu (2003). Statistical phrase based translation. In ''Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT/NAACL)''.</ref> Recent work has incorporated [[syntax]] or quasi-syntactic structures.<ref>D. Chiang (2005). A Hierarchical Phrase-Based Model for Statistical Machine Translation. In ''Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05)''.</ref> | |||
==Benefits== | |||
The most frequently cited benefits of statistical machine translation over rule-based approach are: | |||
* '''Better use of resources''' | |||
**There is a great deal of natural language in machine-readable format. | |||
**Generally, SMT systems are not tailored to any specific pair of languages. | |||
**Rule-based translation systems require the manual development of linguistic rules, which can be costly, and which often do not generalize to other languages. | |||
* '''More natural translations''' | |||
**Rule-based translation systems are likely to result in Literal translation. While it appears that SMT should avoid this problem and result in natural translations, this is negated by the fact that using statistical matching to translate rather than a dictionary/grammar rules approach can often result in text that include apparently nonsensical and obvious errors. | |||
==Shortcomings== | |||
* Corpus creation can be costly for users with limited resources. | |||
* The results are unexpected. Superficial fluency can be deceiving. | |||
* Statistical machine translation do not work well between languages that have significantly different word orders (e.g. Japanese and European languages). | |||
* The benefits are overemphasized for European languages. | |||
==Word-based translation== | |||
In word-based translation, the fundamental unit of translation is a word in some natural language. Typically, the number of words in translated sentences are different, because of compound words, morphology and idioms. The ratio of the lengths of sequences of translated words is called fertility, which tells how many foreign words each native word produces. Necessarily it is assumed by information theory that each covers the same concept. In practice this is not really true. For example, the English word ''corner'' can be translated in Spanish by either ''rincón'' or ''esquina'', depending on whether it is to mean its internal or external angle. | |||
Simple word-based translation can't translate between languages with different fertility. Word-based translation systems can relatively simply be made to cope with high fertility, but they could map a single word to multiple words, but not the other way about{{Citation needed|date=March 2009}}. For example, if we were translating from English to French, each word in English could produce any number of French words— sometimes none at all. But there's no way to group two English words producing a single French word. | |||
An example of a word-based translation system is the freely available [[GIZA++]] package ([[GPL]]ed), which includes the training program for [[IBM]] models and HMM model and Model 6.<ref name="H. Ney. 2003"/> | |||
The word-based translation is not widely used today; phrase-based systems are more common. Most phrase-based system are still using GIZA++ to align the corpus{{Citation needed|date=March 2009}}. The alignments are used to extract phrases or deduce syntax rules.<ref>P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, E. Herbst. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. ACL 2007, Demonstration Session, Prague, Czech Republic</ref> And matching words in bi-text is still a problem actively discussed in the community. Because of the predominance of GIZA++, there are now several distributed implementations of it online.<ref>Q. Gao, S. Vogel, "Parallel Implementations of Word Alignment Tool", Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49-57, June, 2008</ref> | |||
==Phrase-based translation== | |||
In phrase-based translation, the aim is to reduce the restrictions of word-based translation by translating whole sequences of words, where the lengths may differ. The sequences of words are called blocks or phrases, but typically are not linguistic [[phrase]]s, but [[phraseme]]s found using statistical methods from corpora. It has been shown that restricting the phrases to linguistic phrases (syntactically motivated groups of words, see [[syntactic categories]]) decreases the quality of translation<ref>Philipp Koehn, Franz Josef Och, Daniel Marcu: Statistical Phrase-Based Translation (2003)</ref> | |||
==Syntax-based translation== | |||
Syntax-based translation is based on the idea of translating [[syntax (linguistics)|syntactic]] units, rather than single words or strings of words (as in phrase-based MT), i.e. (partial) [[parse tree]]s of sentences/utterances. The idea of syntax-based translation is quite old in MT, though its statistical counterpart did not take off until the advent of strong stochastic parsers in the 1990s. Examples of this approach include [[Data-oriented parsing|DOP]]-based MT and, more recently, [[synchronous context-free grammar]]s. | |||
==Hierarchical phrase-based translation== | |||
Hierarchical phrase-based translation combines the strengths of phrase-based and syntax-based translation. It uses phrases (segments or blocks of words) as units for translation and uses [[synchronous context-free grammar]]s as rules(syntax-based translation).<ref>David Chiang, Adam Lopez, Nitin Madnani, Christof Monz, Philip Resnik, Michael Subotin (2005) The Hiero Machine Translation System: Extensions, Evaluation, and Analysis</ref> Chiang et al. (2005) introduces Hiero as an example for this idea. | |||
==Challenges with statistical machine translation== | |||
{{Expand section|date=May 2012}} | |||
Problems that statistical machine translation have to deal with include: | |||
=== Sentence alignment === | |||
In parallel corpora single sentences in one language can be found translated into several sentences in the other and vice versa. Sentence aligning can be performed through the [[Gale-Church alignment algorithm]]. | |||
=== Statistical anomalies === | |||
Real-world training sets may override translations of, say, proper nouns. An example would be that "I took the train to Berlin" gets mis-translated as "I took the train to Paris" due to an abundance of "train to Paris" in the training set. | |||
=== Data dilution === | |||
A common anomaly caused when attempting to construct a new statistical model (engine) to represent a distinct terminology (for a specific corporate brand or domain). Training sets used from alternative sources to the specific brand to compensate for a limited quantity of brand-specific corpora may ‘dilute’ brand terminology, choice of words, text format and style. [http://www.machinetranslation.net/quick-guide-to-machine-translation/statistical-machine-translation-data-dilution-effect Data dilution] is a statistical anomaly unique to a subset of natural language and has shown a negative impact on Machine Translation adoption for commercial use. Various solutions exist that augment statistical MT and optimize translated text to resemble more accurately brand/domain-specific choice of terminology, words and style. | |||
=== Idioms === | |||
Depending on the corpora used, idioms may not translate "idiomatically". For example, using Canadian Hansard as the bilingual corpus, "hear" may almost invariably be translated to "Bravo!" since in Parliament "Hear, Hear!" becomes "Bravo!". | |||
<ref>W. J. Hutchins and H. Somers. (1992). ''An Introduction to Machine Translation'', 18.3:322. ISBN 978-1-2362830-5</ref> | |||
=== Different word orders === | |||
Word order in languages differ. Some classification can be done by naming the typical order of subject (S), verb (V) and object (O) in a sentence and one can talk, for instance, of SVO or VSO languages. There are also additional differences in word orders, for instance, where modifiers for nouns are located, or where the same words are used as a question or a statement. | |||
In [[speech recognition]], the speech signal and the corresponding textual representation can be mapped to each other in blocks in order. This is not always the case with the same text in two languages. For SMT, the machine translator can only manage small sequences of words, and word order has to be thought of by the program designer. Attempts at solutions have included re-ordering models, where a distribution of location changes for each item of translation is guessed from aligned bi-text. Different location changes can be ranked with the help of the language model and the best can be selected. | |||
=== Out of vocabulary (OOV) words === | |||
SMT systems store different word forms as separate symbols without any relation to each other and word forms | |||
or phrases that were not in the training data cannot be translated. This might be because of the lack of training data, changes in the human domain where the system is used, or differences in morphology. | |||
==See also== | |||
*[[Apptek|AppTek]] | |||
*[[Asia Online]] | |||
*[[KantanMT]] | |||
*[[Cache language model]] | |||
*[[Example-based machine translation]] | |||
*[[Google Translate]] | |||
*[[Machine translation]] | |||
*[[Moses (machine translation)]], free software | |||
*[[Language Weaver|SDL Language Weaver]] | |||
*[[Duolingo]] | |||
==References== | |||
{{reflist}} | |||
==External links== | |||
* [http://www.statmt.org/ Statistical Machine Translation] — includes introduction to research, conference, corpus and software listings. | |||
* [http://www.statmt.org/moses/ Moses: a state-of-the-art open source SMT system] | |||
* [http://www.asiaonline.net/ Asia Online Language Studio Platform] | |||
* [http://www.machinetranslation.net/ A Quick Guide to Machine Translation] | |||
* [http://www-nlp.stanford.edu/links/statnlp.html Annotated list of statistical natural language processing resources] — Includes links to freely available statistical machine translation software. | |||
* [http://code.google.com/p/giza-pp/ GIZA++: Word Alignment Tool] | |||
* [http://geek.kyloo.net/software/doku.php/mgiza:overview MGIZA++/PGIZA++ Parallel Implementations of GIZA++] | |||
* [http://www.cunei.org/ Cunei] — an open source platform for data-driven machine translation that combines the approaches of [[SMT]] and [[EBMT]] | |||
* [http://olanto.org/software Olanto] — an open source platform for statistical machine translation | |||
* [https://github.com/daormar/thot Thot] — an open source SMT tool including interactive machine translation and incremental learning | |||
* [http://sishitra.iti.upv.es/ SiShiTra] — A hybrid machine translation engine for Spanish-Catalan translation] | |||
* [http://www.safaba.com/machine-translation/machine-translation-technologies/statistical-machine-translation/ Statistical MT - Overview] | |||
* [http://prhlt.iti.upv.es/content.php?page=software.php GREAT] — Giati and Refx Enhanced via Annotation Techniques] | |||
* [http://www.garuda.dikti.go.id/jurnal/detil/id/0:168667/q/pengarang:Tanuwijaya%20hansel/offset/0/limit/15 Garuda DIKTI] — an open national journal | |||
* [http://jiki.cs.ui.ac.id/index.php/jiki/article/view/122 JIKI NATIONAL] — an open national journal | |||
{{Approaches to machine translation}} | |||
{{DEFAULTSORT:Statistical Machine Translation}} | |||
[[Category:Machine translation]] | |||
[[Category:Statistical natural language processing]] |
Revision as of 11:08, 15 March 2013
Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrasts with the rule-based approaches to machine translation as well as with example-based machine translation.Template:Fact
The first ideas of statistical machine translation were introduced by Warren Weaver in 1949,[1] including the ideas of applying Claude Shannon's information theory. Statistical machine translation was re-introduced in 1993 by researchers at IBM's Thomas J. Watson Research Center[2] and has contributed to the significant resurgence in interest in machine translation in recent years. Nowadays it is by far the most widely studied machine translation method.
Basis
The idea behind statistical machine translation comes from information theory. A document is translated according to the probability distribution that a string in the target language (for example, English) is the translation of a string in the source language (for example, French).
The problem of modeling the probability distribution has been approached in a number of ways. One intuitive approach is to apply Bayes Theorem, that is , where the translation model is the probability that the source string is the translation of the target string, and the language model is the probability of seeing that target language string. This decomposition is attractive as it splits the problem into two subproblems. Finding the best translation is done by picking up the one that gives the highest probability:
For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings in the native language. Performing the search efficiently is the work of a machine translation decoder that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality. This trade-off between quality and time usage can also be found in speech recognition.
As the translation systems are not able to store all native strings and their translations, a document is typically translated sentence by sentence, but even this is not enough. Language models are typically approximated by smoothed n-gram models, and similar approaches have been applied to translation models, but there is additional complexity due to different sentence lengths and word orders in the languages.
The statistical translation models were initially word based (Models 1-5 from IBM Hidden Markov model from Stephan Vogel[3] and Model 6 from Franz-Joseph Och[4]), but significant advances were made with the introduction of phrase based models.[5] Recent work has incorporated syntax or quasi-syntactic structures.[6]
Benefits
The most frequently cited benefits of statistical machine translation over rule-based approach are:
- Better use of resources
- There is a great deal of natural language in machine-readable format.
- Generally, SMT systems are not tailored to any specific pair of languages.
- Rule-based translation systems require the manual development of linguistic rules, which can be costly, and which often do not generalize to other languages.
- More natural translations
- Rule-based translation systems are likely to result in Literal translation. While it appears that SMT should avoid this problem and result in natural translations, this is negated by the fact that using statistical matching to translate rather than a dictionary/grammar rules approach can often result in text that include apparently nonsensical and obvious errors.
Shortcomings
- Corpus creation can be costly for users with limited resources.
- The results are unexpected. Superficial fluency can be deceiving.
- Statistical machine translation do not work well between languages that have significantly different word orders (e.g. Japanese and European languages).
- The benefits are overemphasized for European languages.
Word-based translation
In word-based translation, the fundamental unit of translation is a word in some natural language. Typically, the number of words in translated sentences are different, because of compound words, morphology and idioms. The ratio of the lengths of sequences of translated words is called fertility, which tells how many foreign words each native word produces. Necessarily it is assumed by information theory that each covers the same concept. In practice this is not really true. For example, the English word corner can be translated in Spanish by either rincón or esquina, depending on whether it is to mean its internal or external angle.
Simple word-based translation can't translate between languages with different fertility. Word-based translation systems can relatively simply be made to cope with high fertility, but they could map a single word to multiple words, but not the other way aboutPotter or Ceramic Artist Truman Bedell from Rexton, has interests which include ceramics, best property developers in singapore developers in singapore and scrabble. Was especially enthused after visiting Alejandro de Humboldt National Park.. For example, if we were translating from English to French, each word in English could produce any number of French words— sometimes none at all. But there's no way to group two English words producing a single French word.
An example of a word-based translation system is the freely available GIZA++ package (GPLed), which includes the training program for IBM models and HMM model and Model 6.[4]
The word-based translation is not widely used today; phrase-based systems are more common. Most phrase-based system are still using GIZA++ to align the corpusPotter or Ceramic Artist Truman Bedell from Rexton, has interests which include ceramics, best property developers in singapore developers in singapore and scrabble. Was especially enthused after visiting Alejandro de Humboldt National Park.. The alignments are used to extract phrases or deduce syntax rules.[7] And matching words in bi-text is still a problem actively discussed in the community. Because of the predominance of GIZA++, there are now several distributed implementations of it online.[8]
Phrase-based translation
In phrase-based translation, the aim is to reduce the restrictions of word-based translation by translating whole sequences of words, where the lengths may differ. The sequences of words are called blocks or phrases, but typically are not linguistic phrases, but phrasemes found using statistical methods from corpora. It has been shown that restricting the phrases to linguistic phrases (syntactically motivated groups of words, see syntactic categories) decreases the quality of translation[9]
Syntax-based translation
Syntax-based translation is based on the idea of translating syntactic units, rather than single words or strings of words (as in phrase-based MT), i.e. (partial) parse trees of sentences/utterances. The idea of syntax-based translation is quite old in MT, though its statistical counterpart did not take off until the advent of strong stochastic parsers in the 1990s. Examples of this approach include DOP-based MT and, more recently, synchronous context-free grammars.
Hierarchical phrase-based translation
Hierarchical phrase-based translation combines the strengths of phrase-based and syntax-based translation. It uses phrases (segments or blocks of words) as units for translation and uses synchronous context-free grammars as rules(syntax-based translation).[10] Chiang et al. (2005) introduces Hiero as an example for this idea.
Challenges with statistical machine translation
Problems that statistical machine translation have to deal with include:
Sentence alignment
In parallel corpora single sentences in one language can be found translated into several sentences in the other and vice versa. Sentence aligning can be performed through the Gale-Church alignment algorithm.
Statistical anomalies
Real-world training sets may override translations of, say, proper nouns. An example would be that "I took the train to Berlin" gets mis-translated as "I took the train to Paris" due to an abundance of "train to Paris" in the training set.
Data dilution
A common anomaly caused when attempting to construct a new statistical model (engine) to represent a distinct terminology (for a specific corporate brand or domain). Training sets used from alternative sources to the specific brand to compensate for a limited quantity of brand-specific corpora may ‘dilute’ brand terminology, choice of words, text format and style. Data dilution is a statistical anomaly unique to a subset of natural language and has shown a negative impact on Machine Translation adoption for commercial use. Various solutions exist that augment statistical MT and optimize translated text to resemble more accurately brand/domain-specific choice of terminology, words and style.
Idioms
Depending on the corpora used, idioms may not translate "idiomatically". For example, using Canadian Hansard as the bilingual corpus, "hear" may almost invariably be translated to "Bravo!" since in Parliament "Hear, Hear!" becomes "Bravo!". [11]
Different word orders
Word order in languages differ. Some classification can be done by naming the typical order of subject (S), verb (V) and object (O) in a sentence and one can talk, for instance, of SVO or VSO languages. There are also additional differences in word orders, for instance, where modifiers for nouns are located, or where the same words are used as a question or a statement.
In speech recognition, the speech signal and the corresponding textual representation can be mapped to each other in blocks in order. This is not always the case with the same text in two languages. For SMT, the machine translator can only manage small sequences of words, and word order has to be thought of by the program designer. Attempts at solutions have included re-ordering models, where a distribution of location changes for each item of translation is guessed from aligned bi-text. Different location changes can be ranked with the help of the language model and the best can be selected.
Out of vocabulary (OOV) words
SMT systems store different word forms as separate symbols without any relation to each other and word forms or phrases that were not in the training data cannot be translated. This might be because of the lack of training data, changes in the human domain where the system is used, or differences in morphology.
See also
- AppTek
- Asia Online
- KantanMT
- Cache language model
- Example-based machine translation
- Google Translate
- Machine translation
- Moses (machine translation), free software
- SDL Language Weaver
- Duolingo
References
43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.
External links
- Statistical Machine Translation — includes introduction to research, conference, corpus and software listings.
- Moses: a state-of-the-art open source SMT system
- Asia Online Language Studio Platform
- A Quick Guide to Machine Translation
- Annotated list of statistical natural language processing resources — Includes links to freely available statistical machine translation software.
- GIZA++: Word Alignment Tool
- MGIZA++/PGIZA++ Parallel Implementations of GIZA++
- Cunei — an open source platform for data-driven machine translation that combines the approaches of SMT and EBMT
- Olanto — an open source platform for statistical machine translation
- Thot — an open source SMT tool including interactive machine translation and incremental learning
- SiShiTra — A hybrid machine translation engine for Spanish-Catalan translation]
- Statistical MT - Overview
- GREAT — Giati and Refx Enhanced via Annotation Techniques]
- Garuda DIKTI — an open national journal
- JIKI NATIONAL — an open national journal
Template:Approaches to machine translation
- ↑ W. Weaver (1955). Translation (1949). In: Machine Translation of Languages, MIT Press, Cambridge, MA.
- ↑ P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2), 263-311.
- ↑ S. Vogel, H. Ney and C. Tillmann. 1996. HMM-based Word Alignment in StatisticalTranslation. In COLING ’96: The 16th International Conference on Computational Linguistics, pp. 836-841, Copenhagen, Denmark.
- ↑ 4.0 4.1 F. Och and H. Ney. (2003). A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19-51
- ↑ P. Koehn, F.J. Och, and D. Marcu (2003). Statistical phrase based translation. In Proceedings of the Joint Conference on Human Language Technologies and the Annual Meeting of the North American Chapter of the Association of Computational Linguistics (HLT/NAACL).
- ↑ D. Chiang (2005). A Hierarchical Phrase-Based Model for Statistical Machine Translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05).
- ↑ P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, E. Herbst. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. ACL 2007, Demonstration Session, Prague, Czech Republic
- ↑ Q. Gao, S. Vogel, "Parallel Implementations of Word Alignment Tool", Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49-57, June, 2008
- ↑ Philipp Koehn, Franz Josef Och, Daniel Marcu: Statistical Phrase-Based Translation (2003)
- ↑ David Chiang, Adam Lopez, Nitin Madnani, Christof Monz, Philip Resnik, Michael Subotin (2005) The Hiero Machine Translation System: Extensions, Evaluation, and Analysis
- ↑ W. J. Hutchins and H. Somers. (1992). An Introduction to Machine Translation, 18.3:322. ISBN 978-1-2362830-5