Diffusion map: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>BG19bot
m WP:CHECKWIKI error fix for #61. Punctuation goes before References. Do general fixes if a problem exists. - using AWB (9421)
en>Flying sheep
 
Line 1: Line 1:
In [[natural language processing]], '''semantic compression''' is a process of compacting a lexicon used to build
Land Economist Reister from Holland Landing, usually spends time with hobbies including house brewing, ganhando dinheiro na internet and pc activities. Lately has paid a visit to Archaeological Site of Olympia.<br><br>Feel free to surf to my web site; [http://www.comoganhardinheiro101.com/slide-central/ ganhar dinheiro]
a textual document (or a set of documents) by reducing language heterogeneity, while maintaining text [[semantics]].
As a result, the same ideas can be represented using a smaller set of words.
 
Semantic compression is a [[lossy compression]], that is, some data is being discarded, and an original document
cannot be reconstructed in a reverse process.
 
==Semantic compression by generalization==
Semantic compression is basically achieved in two steps, using [[frequency list|frequency dictionaries]] and [[semantic network]]:
# determining cumulated term frequencies to identify target lexicon,
# replacing less frequent terms with their hypernyms ([[generalization]]) from target lexicon.<ref>[http://dx.doi.org/10.1007/978-3-642-12090-9_10 D. Ceglarek, K. Haniewicz, W. Rutkowski, Semantic Compression for Specialised Information Retrieval Systems], Advances in Intelligent Information and Database Systems, vol. 283, p. 111-121, 2010</ref>
 
Step 1 requires assembling word frequencies and
information on semantic relationships, specifically [[hyponymy]]. Moving upwards in word hierarchy,
a cumulative concept frequency is calculating by adding a sum of hyponyms' frequencies to frequency of their hypernym:
<math>cum f(k_{i}) = f(k_{i}) + \sum_{j} cum f(k_{j})</math> where <math>k_{i}</math> is a hypernym of <math>k_{j}</math>.
Then, a desired number of words with top cumulated frequencies are chosen to build a targed lexicon.
 
In the second step, compression mapping rules are defined for the remaining words, in order to handle every occurrence
of a less frequent hyponym as its hypernym in output text.
 
;Example
 
The below fragment of text has been processed by the semantic compression. Words in bold have been replaced by their hypernyms.
 
<blockquote>They are both '''nest''' building '''social insects''', but '''paper wasps''' and honey '''bees''' '''organize''' their '''colonies'''
in very different '''ways'''. In a new study, researchers report that despite their '''differences''', these insects
'''rely on''' the same network of genes to guide their '''social behavior'''.The study appears in the Proceedings of the
'''Royal Society B''': Biological Sciences. Honey '''bees''' and '''paper wasps''' are separated by more than 100 million years of
'''evolution''', and there are '''striking differences''' in how they divvy up the work of '''maintaining''' a '''colony'''.</blockquote>
 
The procedure outputs the following text:
 
<blockquote>They are both '''facility''' building '''insect''', but '''insect''' and honey '''insects''' '''arrange''' their '''biological groups'''
in very different '''structure'''. In a new study, researchers report that despite their '''difference of opinions''', these insects
'''act''' the same network of genes to '''steer''' their '''party demeanor'''. The study appears in the proceeding of the
'''institution bacteria''' Biological Sciences. Honey '''insects''' and '''insect''' are separated by more than hundred million years of
'''organic process''', and there are '''impinging difference of opinions''' in how they divvy up the work of '''affirming''' a '''biological group'''.</blockquote>
 
==Implicit semantic compression==
A natural tendency to keep natural language expressions concise can be perceived as a form of implicit semantic compression, by omitting unmeaningful words or redundant meaningful words (especially to avoid [[pleonasm]]s)
.<ref>[http://dx.doi.org/10.3115/990100.990155 N. N. Percova, On the types of semantic compression of text],
COLING '82 Proceedings of the 9th Conference on Computational Linguistics, vol. 2, p. 229-231, 1982</ref>
 
==Applications and advantages==
In [[vector space model]], compacting a lexicon lead to a reduction of [[curse of dimensionality|dimensionality]], which results in less
[[computational complexity]] and a positive influence on efficiency.
 
Semantic compression is advantageous in information retrieval tasks, improving their effectiveness (in terms of both precision and recall).<ref>[http://dl.acm.org/citation.cfm?id=1947662.1947683 D. Ceglarek, K. Haniewicz, W. Rutkowski, Quality of semantic compression in classification] Proceedings of the 2nd International Conference on Computational Collective Intelligence: Technologies and Applications, vol. 1, p. 162-171, 2010</ref> This is due to more precise descriptors (reduced effect of language diversity – limited language redundancy, a step towards controlled dictionary)
 
As in the example above, it is possible to display the output as natural text (re-applying inflexion, adding stop words).
 
==See also==
* [[Text simplification]]
* [[Lexical substitution]]
* [[Information theory]]
* [[Quantities of information]]
 
==References==
<references/>
 
==External links==
* [http://semantic.net.pl/semantic_compression.php Semantic compression on Project SENECA (Semantic Networks and Categorization) website]
 
[[Category:Information retrieval]]
[[Category:Natural language processing]]
[[Category:Quantitative linguistics]]
[[Category:Computational linguistics]]

Latest revision as of 10:09, 7 January 2015

Land Economist Reister from Holland Landing, usually spends time with hobbies including house brewing, ganhando dinheiro na internet and pc activities. Lately has paid a visit to Archaeological Site of Olympia.

Feel free to surf to my web site; ganhar dinheiro