Main Page: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
mNo edit summary
No edit summary
Line 1: Line 1:
A '''biaxial nematic''' is a spatially homogeneous [[liquid crystal]] with three distinct optical axes. This is to be contrasted to a simple [[nematic]], which has a single preferred axis, around which the system is rotationally symmetric. The [[symmetry group]] of a biaxial nematic is <math>D_{2h}</math> i.e. that of a rectangular right parallelepiped, having 3 orthogonal <math>C_2</math> axes and three orthogonal mirror planes. In a frame co-aligned with optical axes the second rank [[order parameter]] [[tensor]] of a biaxial nematic has the form
{{machine learning bar}}
:<math>
In [[machine learning]] and [[statistics]], '''feature selection''', also known as '''variable selection''', '''attribute selection''' or '''variable subset selection''', is the process of selecting a subset of relevant features for use in model construction.
Q=
The central assumption when using a feature selection technique is that the data contains many ''redundant'' or ''irrelevant'' features. Redundant features are those which provide no more information than the currently selected features, and irrelevant features provide no useful information in any context.
\begin{pmatrix}
Feature selection techniques are a subset of the more general field of [[feature extraction]]. Feature extraction creates new features from functions of the original features, whereas feature selection returns a subset of the features.
-\frac{1}{2}S+T & 0 &0 \\
Feature selection techniques are often used in domains where there are many features and comparatively few samples (or data points). The archetypal case is the use of feature selection in analysing [[DNA microarray]]s, where there are many thousands of features, and a few tens to hundreds of samples. Feature selection techniques provide three main benefits when constructing predictive models:
0 &-\frac{1}{2}S-T & 0 \\
:* improved model interpretability,
0 & 0& S\\
:* shorter training times,
\end{pmatrix}
:* enhanced generalisation by reducing [[overfitting]].
</math>
Feature selection is also useful as part of the data analysis process, as it shows which features are important for prediction, and how these features are related.


where
==Introduction==
A feature selection algorithm can be seen as the combination of a search technique for proposing new feature subsets, along with an evaluation measure which scores the different feature subsets.
The simplest algorithm is to test each possible subset of features finding the one which minimises the error rate. This is an exhaustive search of the space, and is computationally intractable for all but the smallest of feature sets.
The choice of evaluation metric heavily influences the algorithm, and it is these evaluation metrics which distinguish between the three main categories of feature selection algorithms: wrappers, filters and embedded methods.<ref>http://jmlr.csail.mit.edu/papers/v3/guyon03a.html</ref>


<math>S</math> is the standard nematic scalar order parameter
Wrapper methods use a predictive model to score feature subsets. Each new subset is used to train a model, which is tested on a hold-out set. Counting the number of mistakes made on that hold-out set (the error rate of the model) gives the score for that subset. As wrapper methods train a new model for each subset, they are very computationally intensive, but usually provide the best performing feature set for that particular type of model.


<math>T</math> a measure of the biaxiality.  
Filter methods use a proxy measure instead of the error rate to score a feature subset. This measure is chosen to be fast to compute, whilst still capturing the usefulness of the feature set. Common measures include the [[pointwise mutual information]],<ref name="textcat"/> [[Pearson product-moment correlation coefficient]], inter/intra class distance or the scores of [[Statistical hypothesis testing|significance tests]] for each class/feature combinations.<ref name="textcat">{{cite conference |last1=Yang |first1=Yiming |first2=Jan O. |last2=Pedersen |title=A comparative study on feature selection in text categorization |conference=ICML |year=1997}}</ref><ref>{{cite journal |last1=Forman |first1=George |title=An extensive empirical study of feature selection metrics for text classification |journal=Journal of Machine Learning Research |volume=3 |year=2003 |pages=1289–1305}}</ref>
Filters are usually less computationally intensive than wrappers, but they produce a feature set which is not tuned to a specific type of predictive model. Many filters provide a feature ranking rather than an explicit best feature subset, and the cut off point in the ranking is chosen via [[Cross-validation (statistics)|cross-validation]].


The first report of a biaxial nematic appeared in 2004<ref>
Embedded methods are a catch-all group of techniques which perform feature selection as part of the model construction process. The exemplar of this approach is the [[Least squares#Regularized versions|LASSO]] method for constructing a linear model, which penalises the regression coefficients, shrinking many of them to zero. Any features which have non-zero regression coefficients are 'selected' by the LASSO algorithm. One other popular approach is the Recursive Feature Elimination algorithm, commonly used with [[Support Vector Machines]] to repeatedly construct a model and remove features with low weights. These approaches tend to be between filters and wrappers in terms of computational complexity.
{{cite journal
|last1=Madsen |first1=L. A.
|last2=Dingemans |first2=T. J.
|last3=Nakata |first3=M.
|last4=Samulski |first4=E. T.
|year=2004
|title=Thermotropic Biaxial Nematic Liquid Crystals
|journal=[[Physical Review Letters]]
|volume=92 |pages=145505
|doi=10.1103/PhysRevLett.92.145505 |pmid=15089552 |bibcode=2004PhRvL..92n5505M
|issue=14
}}</ref><ref>
{{cite journal
|last1=Prasad |first1=V.
|last2=Kang |first2=S.-Woong
|last3=Suresh |first3=K. A.
|last4=Joshi |first4=Leela
|last5=Wang |first5=Qingbing
|last6=Kumar |first6=Satyendra
|year=2005
|title=Thermotropic Uniaxial and Biaxial Nematic and Smectic Phases in Bent-Core Mesogens
|journal=[[Journal of the American Chemical Society]]
|volume=127 |pages=17224
|doi=10.1021/ja052769n
|issue=49
}}</ref>  based on a [[boomerang]] shaped [[oxadiazole]] '''bent-core mesogen'''. The biaxial nematic phase for this particular compound only occurs at temperatures around 200 °C and is preceded by as yet unidentified [[smectic]] phases.


[[Image:Biaxialnematic.gif|center|Biaxial nematic boomerang liquid crystal]]
In statistics, the most popular form of feature selection is [[stepwise regression]].  It is a [[greedy algorithm]] that adds the best feature (or deletes the worst feature) at each round. The main control issue is deciding when to stop the algorithm.  In machine learning, this is typically done by [[Cross-validation (statistics)|cross-validation]].  In statistics, some criteria are optimized.  This leads to the inherent problem of nesting. More robust methods have been explored, such as [[branch and bound]] and piecewise linear network.


It is also found that this material can segregate into [[Chirality (chemistry)|chiral]] domains of opposite handedness.<ref>
==Subset selection==
{{cite journal
Subset selection evaluates a subset of features as a group for suitability. Subset selection algorithms can be broken up into Wrappers, Filters and Embedded. Wrappers use a search algorithm to search through the space of possible features and evaluate each subset by running a model on the subset. Wrappers can be computationally expensive and have a risk of over fitting to the model. Filters are similar to Wrappers in the search approach, but instead of evaluating against a model, a simpler filter is evaluated. Embedded techniques are embedded in and specific to a model.
|last1=Görtz |first1=V.
|last2=Goodby |first2=J. W.
|year=2005
|title=Enantioselective segregation in achiral nematic liquid crystals
|journal=[[Chemical Communications]]
|pages=3262
|doi=10.1039/B503846D
|issue=26
}}</ref> For this to happen the boomerang shaped molecules adopt a helical superstructure.


In one azo bent-core mesogen a thermal transition is found from a uniaxial N<sub>u</sub> to a  biaxial nematic N<sub>b</sub> mesophase,<ref>
Many popular search approaches use [[greedy algorithm|greedy]] [[hill climbing]], which iteratively evaluates a candidate subset of features, then modifies the subset and evaluates if the new subset is an improvement over the old. Evaluation of the subsets requires a scoring [[Metric (mathematics)|metric]] that grades a subset of features. Exhaustive search is generally impractical, so at some implementor (or operator) defined stopping point, the subset of features with the highest score discovered up to that point is selected as the satisfactory feature subset. The stopping criterion varies by algorithm; possible criteria include: a subset score exceeds a threshold, a program's maximum allowed run time has been surpassed, etc.
{{cite journal
|last1=Prasad |first1=V.
|last2=Kang |first2=S.-W.
|last3=Suresh |first3=K. A.
|last4=Joshi |first4=L.
|last5=Wang |first5=Q.
|last6=Kumar |first6=S.
|year=2005
|title=Thermotropic Uniaxial and Biaxial Nematic and Smectic Phases in Bent-Core Mesogens
|journal=[[Journal of the American Chemical Society]]
|volume=127 |pages=17224
|doi=10.1021/ja052769n
|issue=49
}}</ref> as predicted by theory and simulation.<ref>
{{cite journal
|last1=Bates |first1=M.
|last2=Luckhurst |first2=G.
|year=2005
|title=Biaxial nematic phases and V-shaped molecules: A Monte Carlo simulation study
|journal=[[Physical Review E]]
|volume=72 |pages=051702
|doi=10.1103/PhysRevE.72.051702
|bibcode = 2005PhRvE..72e1702B
|issue=5 }}</ref> This transition is observed on heating from the N<sub>u</sub> phase with [[Polarizing optical microscopy]] as a change in [[Schlieren texture]] and increased light transmittance and from [[x-ray diffraction]] as the splitting of the nematic reflection. The transition is a [[Phase transition|second order transition]] with low energy content and therefore not observed in [[differential scanning calorimetry]]. The positional order parameter for the uniaxial nematic phase is 0.75 to 1.5 times the mesogen length and for the biaxial nematic phase 2 to 3.3 times the mesogen length.


[[Image:Biaxialnematic2005.png|center|600px|Azo bent-core mesogen thermal transitions in °C: K 82.8 Sy 93.4 Sx 104.3 Sc 118.5 Nb 149 Nu 176.5 I]]
Alternative search-based techniques are based on [[targeted projection pursuit]] which finds low-dimensional projections of the data that score highly: the features that have the largest projections in the lower dimensional space are then selected.


Another strategy towards biaxial nematics is the use of mixtures of classical rodlike mesogens and disklike [[discotic]] mesogens.  The biaxial nematic phase is expected to be located below the minimum in the rod-disk phase diagram. In one study<ref>
Search approaches include:
{{cite journal
 
|last1=Apreutesei |first1=D.
* Exhaustive
|last2=Mehl |first2=G. H.
* Best first
|year=2006
* [[Simulated annealing]]
  |title=Completely miscible disc and rod shaped molecules in the nematic phase
* [[Genetic algorithm]]
|journal=[[Chemical Communications]]
* [[Greedy algorithm|Greedy]] forward selection
|pages=609
* [[Greedy algorithm|Greedy]] backward elimination
|doi=10.1039/b512120e
* [[Targeted projection pursuit]]
|issue=6
* Scatter Search<ref>F.C. Garcia-Lopez, M. Garcia-Torres, B. Melian, J.A. Moreno-Perez, J.M. Moreno-Vega. Solving feature subset selection problem by a Parallel Scatter Search, ''European Journal of Operational Research'', vol. 169, no. 2, pp. 477–489, 2006.
}}</ref> a miscible system of rods and disks is actually found although the biaxial nematic phase remains elusive.
</ref>
* Variable Neighborhood Search<ref>F.C. Garcia-Lopez, M. Garcia-Torres, B. Melian, J.A. Moreno-Perez, J.M. Moreno-Vega. Solving Feature Subset Selection Problem by a Hybrid Metaheuristic. In ''First International Workshop on Hybrid Metaheuristics'', pp. 59–68, 2004.</ref>
 
Two popular filter metrics for classification problems are [[correlation]] and [[mutual information]], although neither are true [[metric (mathematics)|metrics]] or 'distance measures' in the mathematical sense, since they fail to obey the [[triangle inequality]] and thus do not compute any actual 'distance' – they should rather be regarded as 'scores'.  These scores are computed between a candidate feature (or set of features) and the desired output category.  There are, however, true metrics that are a simple function of the mutual information;<ref>Alexander Kraskov, Harald Stögbauer, Ralph G. Andrzejak, and [[Peter Grassberger]], "Hierarchical Clustering Based on Mutual Information", (2003) ''[http://arxiv.org/abs/q-bio/0311039 ArXiv q-bio/0311039]''</ref> see [[mutual information#Metric|here]].
 
Other available filter metrics include:
 
* Class separability
** Error probability
** Inter-class distance
** Probabilistic distance
** [[Entropy (Information theory)|Entropy]]
* Consistency-based feature selection
* Correlation-based feature selection
 
==Optimality criteria==
 
There are a variety of optimality criteria that can be used for controlling feature selection.  The oldest are [[Mallows's Cp|Mallows's ''C<sub>p</sub>'']] statistic and [[Akaike information criterion]] (AIC). These add variables if the [[Student's t-test|''t''-statistic]] is bigger than <math>\sqrt{2}</math>.
 
Other criteria are [[Bayesian information criterion]] (BIC) which uses <math>\sqrt{\log{n}}</math>, [[minimum description length]] (MDL) which asymptotically uses <math>\sqrt{\log{n}}</math>, Bonnferroni / [[Risk Inflation Criterion|RIC]] which use <math>\sqrt{2\log{p}}</math>, maximum dependency feature selection, and a variety of new criteria that are motivated by [[false discovery rate]] (FDR) which use something close to <math>\sqrt{2\log{\frac{p}{q}}}</math>.
 
==Minimum-redundancy-maximum-relevance (mRMR) feature selection==
Peng ''et al.''<ref>{{cite journal |last1=Peng |first1=H. C. |last2=Long |first2=F. |last3=Ding |first3=C. |title=Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy |journal=IEEE Transactions on Pattern Analysis and Machine Intelligence |volume=27 |issue=8 |pages=1226–1238 |year=2005 |doi=10.1109/TPAMI.2005.159 |pmid=16119262}} [http://penglab.janelia.org/proj/mRMR/index.htm Program]</ref> proposed an mRMR feature-selection method that can use either mutual information, correlation, distance/similarity scores to select features. For example, with mutual information, relevant features and redundant features are considered simultaneously. The relevance of a feature set ''S'' for the class ''c'' is defined by the average value of all mutual information values between the individual feature ''f<sub>i</sub>'' and the class ''c'' as follows:
 
<math> D(S,c) = \frac{1}{|S|}\sum_{f_{i}\in S}I(f_{i};c) </math>.
 
The redundancy of all features in the set ''S'' is the average value of all mutual information values between the feature ''f<sub>i</sub>'' and the feature ''f<sub>j</sub>'':
 
<math> R(S) = \frac{1}{|S|^{2}}\sum_{f_{i},f_{j}\in S}I(f_{i};f_{j})</math>
 
The mRMR criterion is a combination of two measures given above and is defined as follows:
 
<math>\mathrm{mRMR}= \max_{S}
\left[\frac{1}{|S|}\sum_{f_{i}\in S}I(f_{i};c) -
\frac{1}{|S|^{2}}\sum_{f_{i},f_{j}\in S}I(f_{i};f_{j})\right].</math>
 
Suppose that there are ''n'' full-set features. Let ''x<sub>i</sub>'' be the set membership [[indicator function]] for feature ''f<sub>i</sub>'', so that ''x<sub>i</sub>''=1 indicates presence and ''x<sub>i</sub>''=0 indicates absence of the feature ''f<sub>i</sub>'' in the globally optimal feature set. Let ''c<sub>i</sub>=I(f<sub>i</sub>;c)'' and ''a<sub>ij</sub>=I(f<sub>i</sub>;f<sub>j</sub>)''. The above may then be written as an optimization problem:
 
<math>\mathrm{mRMR}= \max_{x\in \{0,1\}^{n}}
\left[\frac{\sum^{n}_{i=1}c_{i}x_{i}}{\sum^{n}_{i=1}x_{i}} -
\frac{\sum^{n}_{i,j=1}a_{ij}x_{i}x_{j}}
{(\sum^{n}_{i=1}x_{i})^{2}}\right].</math>
 
It may be shown that mRMR feature selection is an approximation of the theoretically optimal maximum-dependency feature selection that maximizes the mutual information between the joint distribution of the selected features and the classification variable. However, since mRMR turned a combinatorial problem as a series of much smaller scale problems, each of which only involves two variables, the estimation of joint probabilities is much more robust. In certain situations the algorithm may underestimate the usefulness of features as it has no way to measure interactions between features. This can lead to poor performance<ref>Brown, G., Pocock, A., Zhao, M.-J., Lujan, M. (2012). "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection", In the Journal of Machine Learning Research (JMLR). [http://www.jmlr.org/papers/volume13/brown12a/brown12a.pdf]</ref> when the features are individually useless, but are useful when combined (a pathological case is found when the class is a [[parity function]] of the features). Overall the algorithm is more efficient (in terms of the amount of data required) than the theoretically optimal max-dependency selection, yet produces a low redundancy feature set.
 
The mRMR method has also been combined with the wrapper methods, thus a wrapper method can be utilized at a smaller cost. It can be seen that mRMR is also related to the correlation based feature selection below. It may also be seen a special case of some generic feature selectors.<ref name="docs.google">Nguyen, H., Franke, K., Petrovic, S. (2010). "Towards a Generic Feature-Selection Measure for Intrusion Detection", In Proc. International Conference on Pattern Recognition (ICPR), Istanbul, Turkey. [https://www.researchgate.net/publication/220928649_Towards_a_Generic_Feature-Selection_Measure_for_Intrusion_Detection?ev=prf_pub]</ref>
 
==Correlation feature selection==
The Correlation Feature Selection (CFS) measure evaluates subsets of features on the basis of the following hypothesis: "Good feature subsets contain features highly correlated with the classification, yet uncorrelated to each other".<ref>M. Hall 1999, [http://www.cs.waikato.ac.nz/~mhall/thesis.pdf Correlation-based Feature Selection for Machine Learning]</ref> <ref>Senliol, Baris, et al. "Fast Correlation Based Filter (FCBF) with a different search strategy." Computer and Information Sciences, 2008. ISCIS'08. 23rd International Symposium on. IEEE, 2008. [http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=4717949]</ref> The following equation gives the merit of a feature subset ''S'' consisting of ''k'' features:
 
<math> Merit_{S_{k}} = \frac{k\overline{r_{cf}}}{\sqrt{k+k(k-1)\overline{r_{ff}}}}.</math>
                       
Here, <math> \overline{r_{cf}} </math> is the average value of all feature-classification correlations, and <math> \overline{r_{ff}} </math> is the average value of all feature-feature correlations. The CFS criterion is defined as follows:
 
<math>\mathrm{CFS} = \max_{S_k}
\left[\frac{r_{c f_1}+r_{c f_2}+\cdots+r_{c f_k}}
{\sqrt{k+2(r_{f_1 f_2}+\cdots+r_{f_i f_j}+ \cdots
+ r_{f_k f_1 })}}\right].</math>
 
The <math>r_{cf_{i}}</math> and <math>r_{f_{i}f_{j}}</math> variables are referred to as correlations, but are not necessarily [[Pearson product-moment correlation coefficient|Pearson's correlation coefficient]] or [[Spearman's rank correlation coefficient|Spearman's ρ]]. Dr. Mark Hall's dissertation uses neither of these, but uses three different measures of relatedness, [[minimum description length]] (MDL), [[Mutual Information#Normalized variants|symmetrical uncertainty]], and [[Relief (feature selection)|relief]].
 
Let ''x<sub>i</sub>'' be the set membership [[indicator function]] for feature ''f<sub>i</sub>''; then the above can be rewritten as an optimization problem:
 
<math>\mathrm{CFS} = \max_{x\in \{0,1\}^{n}}
\left[\frac{(\sum^{n}_{i=1}a_{i}x_{i})^{2}}
{\sum^{n}_{i=1}x_i + \sum_{i\neq j} 2b_{ij} x_i x_j }\right].</math>
 
The combinatorial problems above are, in fact, mixed 0–1 [[linear programming]] problems that can be solved by using [[branch-and-bound algorithm]]s.<ref>Hai Nguyen, Katrin Franke, and Slobodan Petrovic, Optimizing a class of feature selection measures, Proceedings of the NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra (DISCML), Vancouver, Canada, December 2009. [https://www.researchgate.net/publication/231175763_Optimizing_a_Class_of_Feature_Selection_Measures?ev=prf_pub]</ref>
 
==Regularized trees==
The features from a decision tree or a tree ensemble are shown to be redundant. A recent method called regularized tree<ref>H. Deng, G. Runger, "[https://sites.google.com/site/houtaodeng/publications/FSRegularizedTrees.pdf?attredirects=0 Feature Selection via Regularized Trees]", Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN), IEEE, 2012</ref> can be used for feature subset selection. Regularized trees penalize using a variable similar to the variables selected at previous tree nodes for splitting the current node. Regularized trees only need build one tree model (or one tree ensemble model) and thus are computationally efficient.
 
Regularized trees naturally handle numerical and categorical features, interactions and nonlinearities. They are invariant to attribute scales (units) and insensitive to outliers, and thus, require little data preprocessing such as normalization. Regularized random forest (RRF) ([http://cran.r-project.org/web/packages/RRF/index.html RRF]) is one type of regularized trees. The guided RRF is an enhanced RRF which is guided by the importance scores from an ordinary random forest.
 
==Embedded methods incorporating feature selection==
* [[Random multinomial logit]] (RMNL)
* Sparse regression, LASSO
* [http://arxiv.org/abs/1201.1587 Regularized trees] e.g. regularized random forest implemented in the [http://cran.r-project.org/web/packages/RRF/index.html RRF] package
* [[Decision tree learning|Decision tree]]
* [[Memetic algorithm]]
* Auto-encoding networks with a bottleneck-layer
* Many other [[machine learning]] methods applying a [[Pruning (algorithm)|pruning]] step.
 
==Software for feature selection==
 
Many standard [[:Category:Data analysis software|data analysis software systems]] are often used for feature selection, such as [[SciLab]], [[NumPy]] and [[R (programming language)|the R language]]. Other software systems are tailored specifically to the feature-selection task:
 
* [[Weka (machine learning)|Weka]] – freely available and [[open source|open-source]] software in Java.
* [[Feature Selection Toolbox|Feature Selection Toolbox 3]] – freely available and [[open source|open-source]] software in C++.
* [[RapidMiner]] – freely available and [[open source|open-source]] software.
* [[Orange (software)|Orange]] – freely available and [[open source|open-source]] software (module [http://www.ailab.si/orange/doc/modules/orngFSS.htm orngFSS]).
* [http://sites.google.com/site/tooldiag/ TOOLDIAG Pattern recognition toolbox] – freely available C toolbox.
* [http://penglab.janelia.org/proj/mRMR/ minimum redundancy feature selection tool] – freely available C/Matlab codes for selecting minimum redundant features.
* [http://web.archive.org/web/20110718043215/http://links.cse.msu.edu:8000/members/matt_gerber/index.php/Machine_learning_software A C# Implementation] of greedy forward feature subset selection for various classifiers (e.g., LibLinear, SVM-light).
* [http://www.ipipan.eu/staff/m.draminski/files/dmLab170.zip MCFS-ID] (Monte Carlo Feature Selection and Interdependency Discovery) is a Monte Carlo method-based tool for feature selection. It also allows for the discovery of interdependencies between the relevant features. MCFS-ID is particularly suitable for the analysis of high-dimensional, ill-defined transactional and biological data.
* [http://cran.r-project.org/web/packages/RRF/index.html RRF] is an R package for feature selection and can be installed from R. RRF stands for Regularized Random Forest, which is a type of Regularized Trees. By building a regularized random forest, a compact set of non-redundant features can be selected without loss of predictive information. Regularized trees can capture non-linear interactions between variables, and naturally handle different scales, and numerical and categorical variables.


==See also==
==See also==
* [[Chromonic]]
* [[Cluster analysis]]
* [[Liquid crystal]]
* [[Dimensionality reduction]]
* [[Liquid crystal display]]
* [[Feature extraction]]
* [[Liquid crystal polymer]]
* [[Data mining]]
* [[Lyotropic liquid crystal]]
 
* [[Plastic crystallinity]]
{{More footnotes|date=July 2010}}
* [[Smart glass]]
 
* [[Thermochromics]]
==References==
 
{{Reflist|30em}}
 
==Further reading==
 
{{Refbegin}}
* [http://featureselection.asu.edu/featureselection_techreport.pdf Tutorial Outlining Feature Selection Algorithms, Arizona State University]
* [http://jmlr.csail.mit.edu/papers/special/feature03.html JMLR Special Issue on Variable and Feature Selection]
* [http://www.springer.com/west/home?SGWID=4-102-22-33327495-0&changeHeader=true&referer=www.wkap.nl&SHORTCUT=www.springer.com/prod/b/0-7923-8198-X Feature Selection for Knowledge Discovery and Data Mining] (Book)
* [http://jmlr.csail.mit.edu/papers/volume3/guyon03a/guyon03a.pdf An Introduction to Variable and Feature Selection] (Survey)
* [http://ieeexplore.ieee.org/iel5/69/30435/01401889.pdf Toward integrating feature selection algorithms for classification and clustering] (Survey)
* [http://library.utia.cas.cz/separaty/2010/RO/somol-efficient%20feature%20subset%20selection%20and%20subset%20size%20optimization.pdf Efficient Feature Subset Selection and Subset Size Optimization] (Survey, 2010)
* [http://www.ijcai.org/papers07/Papers/IJCAI07-187.pdf Searching for Interacting Features]
* [http://www.autonlab.org/icml_documents/camera-ready/107_Feature_Subset_Selec.pdf Feature Subset Selection Bias for Classification Learning]
* Y. Sun, S. Todorovic, S. Goodison, [http://plaza.ufl.edu/sunyijun/PAMI2.htm Local Learning Based Feature Selection for High-dimensional Data Analysis], ''IEEE Transactions on Pattern Analysis and Machine Intelligence'', vol. 32, no. 9, pp.&nbsp;1610–1626, 2010.
{{Refend}}


== References ==
==External links==
{{reflist}}
* [http://featureselection.asu.edu/software.php Feature Selection Package, Arizona State University (Matlab Code)]
* [http://www.clopinet.com/isabelle/Projects/NIPS2003/ NIPS challenge 2003] (see also [[NIPS]])
* [http://paul.luminos.nl/documents/show_document.php?d=198 Naive Bayes implementation with feature selection in Visual Basic] (includes executable and source code)
* [http://penglab.janelia.org/proj/mRMR/index.htm Minimum-redundancy-maximum-relevance (mRMR) feature selection program]
* [http://mloss.org/software/view/386/ FEAST] (Open source Feature Selection algorithms in C and MATLAB)


[[Category:Phases of matter]]
[[Category:Model selection]]
[[Category:Crystallography]]
[[Category:Dimension reduction]]
[[Category:Liquid crystals]]

Revision as of 09:24, 13 August 2014

Genital herpes is a kind of sexually transmitted disease that certain becomes through sexual or oral connection with someone else that is afflicted by the viral disorder. Oral herpes requires occasional eruptions of fever blisters" round the mouth Figure 02 Also known as cold sores" or fever blisters," characteristic herpes lesions often appear around the mouth sometimes of illness, after sunlight or wind publicity, during menstruation, or with mental stress.

Though statistical numbers aren't nearly where they should be, increasing numbers of people are arriving at various clinics regarding the herpes symptoms also to have themselves and their companions treated.

Because symptoms may be recognised incorrectly as skin irritation or something else, a partner can't be determined by the partner with herpes to constantly find out when he or she is contagious. Some who contract herpes are symptom-no cost, others have just one breakout, and still others have standard bouts of symptoms.

Similarly, careful hand washing should be practiced to avoid the virus from spreading to other parts of the body, especially the eye and mouth. If you think you have already been exposed or show signs of herpes infection, see your medical provider. Prompt qualified diagnosis may boost your chances of responding to a prescription drugs like acyclovir that decreases the duration and severity of a short bout of symptoms.

HSV type 1 is the herpes virus that is usually responsible for cold sores of the mouth, the so-referred to as " fever blisters." You get HSV-1 by coming into contact with the saliva of an contaminated person.

If you are you looking for more information regarding herpes symptoms oral pictures look into our own web page. In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features for use in model construction. The central assumption when using a feature selection technique is that the data contains many redundant or irrelevant features. Redundant features are those which provide no more information than the currently selected features, and irrelevant features provide no useful information in any context. Feature selection techniques are a subset of the more general field of feature extraction. Feature extraction creates new features from functions of the original features, whereas feature selection returns a subset of the features. Feature selection techniques are often used in domains where there are many features and comparatively few samples (or data points). The archetypal case is the use of feature selection in analysing DNA microarrays, where there are many thousands of features, and a few tens to hundreds of samples. Feature selection techniques provide three main benefits when constructing predictive models:

  • improved model interpretability,
  • shorter training times,
  • enhanced generalisation by reducing overfitting.

Feature selection is also useful as part of the data analysis process, as it shows which features are important for prediction, and how these features are related.

Introduction

A feature selection algorithm can be seen as the combination of a search technique for proposing new feature subsets, along with an evaluation measure which scores the different feature subsets. The simplest algorithm is to test each possible subset of features finding the one which minimises the error rate. This is an exhaustive search of the space, and is computationally intractable for all but the smallest of feature sets. The choice of evaluation metric heavily influences the algorithm, and it is these evaluation metrics which distinguish between the three main categories of feature selection algorithms: wrappers, filters and embedded methods.[1]

Wrapper methods use a predictive model to score feature subsets. Each new subset is used to train a model, which is tested on a hold-out set. Counting the number of mistakes made on that hold-out set (the error rate of the model) gives the score for that subset. As wrapper methods train a new model for each subset, they are very computationally intensive, but usually provide the best performing feature set for that particular type of model.

Filter methods use a proxy measure instead of the error rate to score a feature subset. This measure is chosen to be fast to compute, whilst still capturing the usefulness of the feature set. Common measures include the pointwise mutual information,[2] Pearson product-moment correlation coefficient, inter/intra class distance or the scores of significance tests for each class/feature combinations.[2][3] Filters are usually less computationally intensive than wrappers, but they produce a feature set which is not tuned to a specific type of predictive model. Many filters provide a feature ranking rather than an explicit best feature subset, and the cut off point in the ranking is chosen via cross-validation.

Embedded methods are a catch-all group of techniques which perform feature selection as part of the model construction process. The exemplar of this approach is the LASSO method for constructing a linear model, which penalises the regression coefficients, shrinking many of them to zero. Any features which have non-zero regression coefficients are 'selected' by the LASSO algorithm. One other popular approach is the Recursive Feature Elimination algorithm, commonly used with Support Vector Machines to repeatedly construct a model and remove features with low weights. These approaches tend to be between filters and wrappers in terms of computational complexity.

In statistics, the most popular form of feature selection is stepwise regression. It is a greedy algorithm that adds the best feature (or deletes the worst feature) at each round. The main control issue is deciding when to stop the algorithm. In machine learning, this is typically done by cross-validation. In statistics, some criteria are optimized. This leads to the inherent problem of nesting. More robust methods have been explored, such as branch and bound and piecewise linear network.

Subset selection

Subset selection evaluates a subset of features as a group for suitability. Subset selection algorithms can be broken up into Wrappers, Filters and Embedded. Wrappers use a search algorithm to search through the space of possible features and evaluate each subset by running a model on the subset. Wrappers can be computationally expensive and have a risk of over fitting to the model. Filters are similar to Wrappers in the search approach, but instead of evaluating against a model, a simpler filter is evaluated. Embedded techniques are embedded in and specific to a model.

Many popular search approaches use greedy hill climbing, which iteratively evaluates a candidate subset of features, then modifies the subset and evaluates if the new subset is an improvement over the old. Evaluation of the subsets requires a scoring metric that grades a subset of features. Exhaustive search is generally impractical, so at some implementor (or operator) defined stopping point, the subset of features with the highest score discovered up to that point is selected as the satisfactory feature subset. The stopping criterion varies by algorithm; possible criteria include: a subset score exceeds a threshold, a program's maximum allowed run time has been surpassed, etc.

Alternative search-based techniques are based on targeted projection pursuit which finds low-dimensional projections of the data that score highly: the features that have the largest projections in the lower dimensional space are then selected.

Search approaches include:

Two popular filter metrics for classification problems are correlation and mutual information, although neither are true metrics or 'distance measures' in the mathematical sense, since they fail to obey the triangle inequality and thus do not compute any actual 'distance' – they should rather be regarded as 'scores'. These scores are computed between a candidate feature (or set of features) and the desired output category. There are, however, true metrics that are a simple function of the mutual information;[6] see here.

Other available filter metrics include:

  • Class separability
    • Error probability
    • Inter-class distance
    • Probabilistic distance
    • Entropy
  • Consistency-based feature selection
  • Correlation-based feature selection

Optimality criteria

There are a variety of optimality criteria that can be used for controlling feature selection. The oldest are Mallows's Cp statistic and Akaike information criterion (AIC). These add variables if the t-statistic is bigger than .

Other criteria are Bayesian information criterion (BIC) which uses , minimum description length (MDL) which asymptotically uses , Bonnferroni / RIC which use , maximum dependency feature selection, and a variety of new criteria that are motivated by false discovery rate (FDR) which use something close to .

Minimum-redundancy-maximum-relevance (mRMR) feature selection

Peng et al.[7] proposed an mRMR feature-selection method that can use either mutual information, correlation, distance/similarity scores to select features. For example, with mutual information, relevant features and redundant features are considered simultaneously. The relevance of a feature set S for the class c is defined by the average value of all mutual information values between the individual feature fi and the class c as follows:

.

The redundancy of all features in the set S is the average value of all mutual information values between the feature fi and the feature fj:

The mRMR criterion is a combination of two measures given above and is defined as follows:

Suppose that there are n full-set features. Let xi be the set membership indicator function for feature fi, so that xi=1 indicates presence and xi=0 indicates absence of the feature fi in the globally optimal feature set. Let ci=I(fi;c) and aij=I(fi;fj). The above may then be written as an optimization problem:

It may be shown that mRMR feature selection is an approximation of the theoretically optimal maximum-dependency feature selection that maximizes the mutual information between the joint distribution of the selected features and the classification variable. However, since mRMR turned a combinatorial problem as a series of much smaller scale problems, each of which only involves two variables, the estimation of joint probabilities is much more robust. In certain situations the algorithm may underestimate the usefulness of features as it has no way to measure interactions between features. This can lead to poor performance[8] when the features are individually useless, but are useful when combined (a pathological case is found when the class is a parity function of the features). Overall the algorithm is more efficient (in terms of the amount of data required) than the theoretically optimal max-dependency selection, yet produces a low redundancy feature set.

The mRMR method has also been combined with the wrapper methods, thus a wrapper method can be utilized at a smaller cost. It can be seen that mRMR is also related to the correlation based feature selection below. It may also be seen a special case of some generic feature selectors.[9]

Correlation feature selection

The Correlation Feature Selection (CFS) measure evaluates subsets of features on the basis of the following hypothesis: "Good feature subsets contain features highly correlated with the classification, yet uncorrelated to each other".[10] [11] The following equation gives the merit of a feature subset S consisting of k features:

Here, is the average value of all feature-classification correlations, and is the average value of all feature-feature correlations. The CFS criterion is defined as follows:

The and variables are referred to as correlations, but are not necessarily Pearson's correlation coefficient or Spearman's ρ. Dr. Mark Hall's dissertation uses neither of these, but uses three different measures of relatedness, minimum description length (MDL), symmetrical uncertainty, and relief.

Let xi be the set membership indicator function for feature fi; then the above can be rewritten as an optimization problem:

The combinatorial problems above are, in fact, mixed 0–1 linear programming problems that can be solved by using branch-and-bound algorithms.[12]

Regularized trees

The features from a decision tree or a tree ensemble are shown to be redundant. A recent method called regularized tree[13] can be used for feature subset selection. Regularized trees penalize using a variable similar to the variables selected at previous tree nodes for splitting the current node. Regularized trees only need build one tree model (or one tree ensemble model) and thus are computationally efficient.

Regularized trees naturally handle numerical and categorical features, interactions and nonlinearities. They are invariant to attribute scales (units) and insensitive to outliers, and thus, require little data preprocessing such as normalization. Regularized random forest (RRF) (RRF) is one type of regularized trees. The guided RRF is an enhanced RRF which is guided by the importance scores from an ordinary random forest.

Embedded methods incorporating feature selection

Software for feature selection

Many standard data analysis software systems are often used for feature selection, such as SciLab, NumPy and the R language. Other software systems are tailored specifically to the feature-selection task:

  • Weka – freely available and open-source software in Java.
  • Feature Selection Toolbox 3 – freely available and open-source software in C++.
  • RapidMiner – freely available and open-source software.
  • Orange – freely available and open-source software (module orngFSS).
  • TOOLDIAG Pattern recognition toolbox – freely available C toolbox.
  • minimum redundancy feature selection tool – freely available C/Matlab codes for selecting minimum redundant features.
  • A C# Implementation of greedy forward feature subset selection for various classifiers (e.g., LibLinear, SVM-light).
  • MCFS-ID (Monte Carlo Feature Selection and Interdependency Discovery) is a Monte Carlo method-based tool for feature selection. It also allows for the discovery of interdependencies between the relevant features. MCFS-ID is particularly suitable for the analysis of high-dimensional, ill-defined transactional and biological data.
  • RRF is an R package for feature selection and can be installed from R. RRF stands for Regularized Random Forest, which is a type of Regularized Trees. By building a regularized random forest, a compact set of non-redundant features can be selected without loss of predictive information. Regularized trees can capture non-linear interactions between variables, and naturally handle different scales, and numerical and categorical variables.

See also

Template:More footnotes

References

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

Further reading

Template:Refbegin

Template:Refend

External links

  1. http://jmlr.csail.mit.edu/papers/v3/guyon03a.html
  2. 2.0 2.1 55 years old Systems Administrator Antony from Clarence Creek, really loves learning, PC Software and aerobics. Likes to travel and was inspired after making a journey to Historic Ensemble of the Potala Palace.

    You can view that web-site... ccleaner free download
  3. One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting

    In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang

    Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules

    Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.

    A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running

    The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more

    There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang
  4. F.C. Garcia-Lopez, M. Garcia-Torres, B. Melian, J.A. Moreno-Perez, J.M. Moreno-Vega. Solving feature subset selection problem by a Parallel Scatter Search, European Journal of Operational Research, vol. 169, no. 2, pp. 477–489, 2006.
  5. F.C. Garcia-Lopez, M. Garcia-Torres, B. Melian, J.A. Moreno-Perez, J.M. Moreno-Vega. Solving Feature Subset Selection Problem by a Hybrid Metaheuristic. In First International Workshop on Hybrid Metaheuristics, pp. 59–68, 2004.
  6. Alexander Kraskov, Harald Stögbauer, Ralph G. Andrzejak, and Peter Grassberger, "Hierarchical Clustering Based on Mutual Information", (2003) ArXiv q-bio/0311039
  7. One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting

    In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang

    Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules

    Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.

    A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running

    The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more

    There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang Program
  8. Brown, G., Pocock, A., Zhao, M.-J., Lujan, M. (2012). "Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection", In the Journal of Machine Learning Research (JMLR). [1]
  9. Nguyen, H., Franke, K., Petrovic, S. (2010). "Towards a Generic Feature-Selection Measure for Intrusion Detection", In Proc. International Conference on Pattern Recognition (ICPR), Istanbul, Turkey. [2]
  10. M. Hall 1999, Correlation-based Feature Selection for Machine Learning
  11. Senliol, Baris, et al. "Fast Correlation Based Filter (FCBF) with a different search strategy." Computer and Information Sciences, 2008. ISCIS'08. 23rd International Symposium on. IEEE, 2008. [3]
  12. Hai Nguyen, Katrin Franke, and Slobodan Petrovic, Optimizing a class of feature selection measures, Proceedings of the NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra (DISCML), Vancouver, Canada, December 2009. [4]
  13. H. Deng, G. Runger, "Feature Selection via Regularized Trees", Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN), IEEE, 2012