- description: sddsoutlier does outlier elimination of rows from SDDS tabular data. An “outlier” is a data point that is statistically unlikely or else invalid.
- example: Eliminate “bad” beam-position-monitor readouts from PAR x BPM data, where a bad
readout is one that is more than three standard deviations from the mean:
sddsoutlier par.bpm par.bpm1 -columns=P?P?x -stDevLimit=3

Fit a line to readout P1P1x vs P1P2x, then eliminate points too far from the line.

sddspfit par.bpm -pipe=out -columns=P1P2x,P1P1x

| sddsoutlier -pipe=in par.2bpms -column=P1P1xResidual -stDevLimit=2Same, but refit and redo outlier elimination based on the improved fit:

sddspfit par.bpm -pipe=out -columns=P1P2x,P1P1x

| sddsoutlier -pipe par.2bpms -column=P1P1xResidual -stDevLimit=2

| sddspfit -pipe -columns=P1P2x,P1P1x

| sddsoutlier -pipe=in par.2bpms -column=P1P1xResidual -stDevLimit=2 - synopsis:
sddsoutlier [-pipe=[input][,output]] [inputFile] [outputFile] [-columns=listOfNames] [-excludeColumns=listOfNames] [-stDevLimit=value] [-absLimit=value] [-absDeviationLimit=value] [-minimumLimit=value] [-maximumLImit=value] [-chanceLimit=value] [-invert] [-verbose] [-noWarnings] [{-markOnly | -replaceOnly={lastValue | nextValue | interpolatedValue | value=number}}]

- files: inputFile contains column data that is to be winnowed using outlier elimination. If inputFile contains multiple pages, the are treated separately. outputFile contains all of the array and parameter data, but only those rows of the tabular data that pass the outlier elimination. Warning: if outputFile is not given and -pipe=output is not specified, then inputFile will be overwritten.
- switches:
- -pipe[=input][,output] — The standard SDDS Toolkit pipe option.
- -columns=listOfNames — Specifies a comma-separated list of optionally wildcard containing column names. Outlier analysis and elimination will be applied to the data in each of the specified columns independently. No row that is eliminated by outlier analysis of any of these columns will appear in the output. If this option is not given, all columns are included in the analysis.
- -excludeColumns=listOfNames — Specifies a comma-separated list of optionally wildcard containing column names that are to be excluded from outlier analysis.
- -stDevLimit=value — Specifies the number of standard deviations by which a data point from a column may deviate from the average for the column before being considered an outlier.
- -absLimit=value — Specifies the maximum absolute value that a data point from a column may have before being considered an outlier.
- -absDeviationLimit=value — Specifies the maximum absolute value by which a data point from a column may deviate from the average for the column before being considered an outlier.
- -minimumLimit=value, -minimumLimit=value — Specify minimum or maximum values that data points may have without being considered outliers.
- -chanceLimit=value — Specifies placing a lower limit on the probability of seeing a data point as a means of removing outliers. Gaussian statistics are used to determine the probability that each point would be seen in sampling a gaussian distribution a given number of times (equal to the number of points in each page). If this probability is less than value, then the point is considered an outlier. Using a larger value results in elimination of more points.
- -invert — Specifies that only outlier points should be kept.
- -markOnly — Specifies that instead of deleting outlier points, they should be only marked as outliers. This is done by creating a new column (IsOutlier) in the output file that contains a 1 (0) if the row has (no) outliers. If IsOutlier is in the input file, rows with a value of 1 are treated as outliers and essentially ignored in processing. Hence, successive invocations of sddsoutlier in a data-processing pipeline make use of results from previous invocations even if -markOnly is given. Note: if -markOnly is not given, then the presence of IsOutlier in the input file has no effect.
- -tt -replaceOnly={lastValue — nextValue — interpolatedValue — value=number} — Specifies replacing outliers rather than removing them. lastValue (nextValue) specifies replacing with the previous (next) value in the column. interpolatedValue specifies interpolating a new value from the last and next value (with row number as the independent quantity). value=number specifies replacing outliers with number.
- -verbose — Specifies that informational printouts should be provided.
- -noWarnings — Specifies that warnings should be suppressed.

- see also:
- author: M. Borland, ANL/APS.