- description: sddscorrelate computes correlation coefficients and correlation significance
between column data. The correlation coefficient between columns i and j is defined as
If Cij = 1, then the variables are perfectly correlated, whereas if Cij = -1, they are perfectly
anticorrelated. The correlation significance is the probability that the observed correlation
coefficient could happen by chance if the variables were in fact uncorrelated. Hence, a very
small correlation significance means that the variables are probably correlated.
- examples:
sddscorrelate par.bpm par.cor -column=’*x’
sddscorrelate par.bpm par.cor -column=’*x’ -withOnly=P1P1x
- synopsis:
sddscorrelate [-pipe=[input][,output]] [inputFile] [outputFile]
[-columns=columnNames] [-excludeColumns=columnNames]
[-withOnly=columnName] [-rankOrder]
[-stDevOutlier[=limit=factor][,passes=integer]]
- files: inputFile is an SDDS file containing two or more columns of data. For each page of the file,
outputFile contains the correlation coefficients and significance for every possible pairing of
variables requested. outputFile also contains three string columns: Correlate1Name,
Correlate2Name, and CorrelatePair. These are respectively the name first column in
the analysis, the name of the second column in the analysis, and a string of the form
Name1.Name2.
- switches:
- -pipe=[input][,output] — The standard SDDS Toolkit pipe option.
- -columns=columnNames — Specifies the names of columns to be included in the analysis.
A comma-separated list of optionally wildcard-containing names may be given.
- -excludeColumns=columnNames — Specifies the names of columns to be excluded from
the analysis. A comma-separated list of optionally wildcard-containing names may be
given.
- -withOnly=columnName — Specifies that one of the variables for each correlation will
be the named column.
- -rankOrder — Specifies computing rank-order correlations rather than standard
correlations. This is considered more robust than standard correlations.
- -stDevOutlier[=limit=factor][,passes=integer] —
Specifies standard-deviation-based outlier elimination on each pair of columns prior to
computation of the correlation coefficient. Any pair of values is ignored if one or both
values are outliers relative to the column from which they come. The limit qualifier
specifies the allowed deviation from the mean in standard deviations; the default is
1. The passes qualifier specifies how many times the outlier elimination (including
recomputation of the mean and standard deviation) is performed; the default is 1.
- see also:
- author: M. Borland, ANL/APS.