|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.filters.Filter
weka.filters.SimpleFilter
weka.filters.SimpleBatchFilter
weka.filters.unsupervised.attribute.InterquartileRange
public class InterquartileRange
A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.
Outliers:
Q3 + OF*IQR < x <= Q3 + EVF*IQR
or
Q1 - EVF*IQR <= x < Q1 - OF*IQR
Extreme values:
x > Q3 + EVF*IQR
or
x < Q1 - EVF*IQR
Key:
Q1 = 25% quartile
Q3 = 75% quartile
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor
EVF = Extreme Value Factor
-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)Thanks to Dale for a few brainstorming sessions.
Field Summary | |
---|---|
static int |
NON_NUMERIC
indicator for non-numeric attributes |
Constructor Summary | |
---|---|
InterquartileRange()
|
Method Summary | |
---|---|
java.lang.String |
attributeIndicesTipText()
Returns the tip text for this property |
java.lang.String |
detectionPerAttributeTipText()
Returns the tip text for this property |
java.lang.String |
extremeValuesAsOutliersTipText()
Returns the tip text for this property |
java.lang.String |
extremeValuesFactorTipText()
Returns the tip text for this property |
java.lang.String |
getAttributeIndices()
Gets the current range selection |
Capabilities |
getCapabilities()
Returns the Capabilities of this filter. |
boolean |
getDetectionPerAttribute()
Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false"). |
boolean |
getExtremeValuesAsOutliers()
Get whether extreme values are also tagged as outliers. |
double |
getExtremeValuesFactor()
Gets the factor for determining the thresholds for extreme values. |
java.lang.String[] |
getOptions()
Gets the current settings of the filter. |
double |
getOutlierFactor()
Gets the factor for determining the thresholds for outliers. |
boolean |
getOutputOffsetMultiplier()
Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR. |
java.lang.String |
getRevision()
Returns the revision string. |
java.lang.String |
globalInfo()
Returns a string describing this filter |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
static void |
main(java.lang.String[] args)
Main method for testing this class. |
java.lang.String |
outlierFactorTipText()
Returns the tip text for this property |
java.lang.String |
outputOffsetMultiplierTipText()
Returns the tip text for this property |
void |
setAttributeIndices(java.lang.String value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used). |
void |
setAttributeIndicesArray(int[] value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used). |
void |
setDetectionPerAttribute(boolean value)
Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false"). |
void |
setExtremeValuesAsOutliers(boolean value)
Set whether extreme values are also tagged as outliers. |
void |
setExtremeValuesFactor(double value)
Sets the factor for determining the thresholds for extreme values. |
void |
setOptions(java.lang.String[] options)
Parses a list of options for this object. |
void |
setOutlierFactor(double value)
Sets the factor for determining the thresholds for outliers. |
void |
setOutputOffsetMultiplier(boolean value)
Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR. |
Methods inherited from class weka.filters.SimpleBatchFilter |
---|
batchFinished, input |
Methods inherited from class weka.filters.SimpleFilter |
---|
debugTipText, getDebug, setDebug, setInputFormat |
Methods inherited from class weka.filters.Filter |
---|
batchFilterFile, filterFile, getCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, numPendingOutput, output, outputPeek, toString, useFilter, wekaStaticWrapper |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final int NON_NUMERIC
Constructor Detail |
---|
public InterquartileRange()
Method Detail |
---|
public java.lang.String globalInfo()
globalInfo
in class SimpleFilter
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class SimpleFilter
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)
setOptions
in interface OptionHandler
setOptions
in class SimpleFilter
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedSimpleFilter.reset()
public java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class SimpleFilter
public java.lang.String attributeIndicesTipText()
public java.lang.String getAttributeIndices()
public void setAttributeIndices(java.lang.String value)
value
- a string representing the list of attributes. Since
the string will typically come from a user, attributes
are indexed from 1. java.lang.IllegalArgumentException
- if an invalid range list is suppliedpublic void setAttributeIndicesArray(int[] value)
value
- an array containing indexes of attributes to work on.
Since the array will typically come from a program,
attributes are indexed from 0.
java.lang.IllegalArgumentException
- if an invalid set of ranges is suppliedpublic java.lang.String outlierFactorTipText()
public void setOutlierFactor(double value)
value
- the factor.public double getOutlierFactor()
public java.lang.String extremeValuesFactorTipText()
public void setExtremeValuesFactor(double value)
value
- the factor.public double getExtremeValuesFactor()
public java.lang.String extremeValuesAsOutliersTipText()
public void setExtremeValuesAsOutliers(boolean value)
value
- whether or not to tag extreme values also as outliers.public boolean getExtremeValuesAsOutliers()
public java.lang.String detectionPerAttributeTipText()
public void setDetectionPerAttribute(boolean value)
value
- whether or not to generate indicator attribute pairs
for each numeric attribute.public boolean getDetectionPerAttribute()
public java.lang.String outputOffsetMultiplierTipText()
public void setOutputOffsetMultiplier(boolean value)
value
- whether or not to generate the additional attribute.public boolean getOutputOffsetMultiplier()
public Capabilities getCapabilities()
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class Filter
Capabilities
public java.lang.String getRevision()
public static void main(java.lang.String[] args)
args
- should contain arguments to the filter: use -h for help
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |