description:sddscorrelate computes correlation coefficients and correlation
significance between column data. The correlation coefficient between
columns i and j is defined as
If
, then the variables are perfectly correlated, whereas if
, they
are perfectly anticorrelated.
The correlation significance is the probability that the observed correlation coefficient could happen
by chance if the variables were in fact uncorrelated. Hence, a very small correlation significance
means that the variables are probably correlated.
examples:
Find the correlations among beam-position-monitor x values in par.bpm:
sddscorrelate par.bpm par.cor -column='*x'
Find the correlations of these readouts with one specific readout only:
files:inputFile is an SDDS file containing two or more columns of data. For each page of
the file, outputFile contains the correlation coefficients and significance for
every possible pairing of variables requested. outputFile also contains three string
columns: Correlate1Name, Correlate2Name, and CorrelatePair. These are
respectively the name first column in the analysis, the name of the second column in
the analysis, and a string of the form Name1.Name2.
switches:
-pipe=[input][,output] -- The standard SDDS Toolkit pipe option.
-columns=columnNames -- Specifies the names of columns to be included in the analysis.
A comma-separated list of optionally wildcard-containing names may be given.
-excludeColumns=columnNames -- Specifies the names of columns to be excluded from the
analysis. A comma-separated list of optionally wildcard-containing names may be given.
-withOnly=columnName -- Specifies that one of the variables for each correlation will be
the named column.
-rankOrder -- Specifies computing rank-order correlations rather than standard correlations.
This is considered more robust that standard correlations.
-stDevOutlier[=limit=factor][,passes=integer] -- Specifies standard-deviation-based
outlier elimination on each pair of columns prior to computation of the correlation coefficient.
Any pair of values is ignored if one or both values are outliers relative to the column from which they come.
The limit qualifier specifies the allowed deviation from the mean in standard deviations; the
default is 1. The passes qualifier specifies how many times the outlier elimination (including
recomputation of the mean and standard deviation) is performed; the default is 1.