- description:

sddsprocess operates on the data columns and parameters of an existing SDDS data set and creates a new data set. The program supports filtering and matching operations on both tabular data and parameter data, definition of new parameters and columns in terms of existing ones, units conversions, scanning of string data to produce numeric data, composition of string data from other data types, statistical and waveform analyses, and other operations. - examples: Compute the square-roots of the beta-functions, which are the beam-size
envelopes:
sddsprocess APS.twi -define=column,sqrtBetax,"betax sqrt" -define=column,sqrtBetay,"betay sqrt"

Compute the horizontal beam-size, given by the equation

sddsprocess APS.twi -define=parameter,epsx,8.2e-9,units=nm -define=parameter,sigmaDelta,1e-3 -define=column,sigmax,"epsx betax * sigmaDelta etax * sqr + sqrt",units=m

- synopsis:
sddsprocess [-pipe[=input][,output]] [inputFile] [outputFile] options

- files: inputFile is an SDDS file containing data to be processed. If no options are given, it is copied to outputFile without change. Warning: if no output filename is given, and if an output pipe is not selected, then the input file will be replaced.
- switches:
- Data winnowing: Any number of the following may be used. They are applied in the
order given. Note that -match and -test are the most time intensive; thus, if several
types of winnowing are to be applied, these should be used last if possible.
- -filter={column | parameter},rangeSpec[,rangeSpec[,logicOp...]] —
Specifies winnowing inputFile based on numerical data in parameters or columns. A
range-spec is of the form name,lower-value,upper-value[,!] , where ! signifies
logical negation. A page passes a given filter by having the named parameter
inside (or outside, if negation is given) the specified range, where the endpoints are
considered inside. A tabular data row passes a given filter in the analogous fashion,
except that the value from the named column is used.
One or more range specifications may be combined to give a accept/reject status by employing the logic-operations, & (logical and) and | (logical or). For example, to select rows for which A is on [0, 1] and B is on [10, 20], one would use -filter=column,A,0,1,B,10,20,&.

- -timeFilter={column | parameter},[before=YYYY/MM/DD@HH:MM:SS] [,after=YYYY/MM/DD@HH:MM:SS][,invert] — Specifies date range in YYY/MM/DD@HH:MM:SS format in time parameters or columns. The invert option cause the filter to be inverted, so that the data that would otherwise be kept is removed and vice-versa. For example, if one want to keep data between 8:30AM on Januaray 2, 2003 and 9:20PM on February 6,2003, the option woould be -timeFilter=column,Time,before=2003/2/6@21:20,after=2003/1/2@8:30 assume that the time data is in the column Time.
- -match={column | parameter},matchTest[,matchTest,logicOp] — Specifies
winnowing inputFile based on data in string parameters or columns. A match-test is
of the form name=matchingString[,!], where the matching string may include the
wildcards * (matches zero of more characters) and ? (matches any one character).
If the first character of matchingString is ’@’, then the remainder of the string is taken to be the name of a parameter or column. In this case, the match is performed to the data in the named entity. For column-based matching, this is done row-by-row. For parameter-based matching, it is done page-by-page.

In addition, if instead of = one uses =+, then matching is case-insensitive. The plus sign is intended to be mnemonic, as the case-insensitive matching results in additional matches.

The use of several match tests and logic is done just as for -filter. For example, to match all the rows for which the column Name starts with ’A’ or ’B’, one could use -match=column,Name=A*,Name=B*,|. (This could also be done with -match=column,Name=[AB]*.)

- -numberTest={column | parameter},name[,invert] — Specifies testing the values of in a string column (parameter) to see if they can be (or cannot be, if invert is given) converted to numbers. If not, the corresponding row (page) is deleted.
- -test={column | parameter},test[,autostop][,algebraic] — Specifies winnowing of inputFile based on a test embodied in an rpn expression. The expression, test, may use the names of any parameters or columns. If autostop is specified, the processing of the data set (or data page) terminates when the parameter-based (or column-based) expression is false.
- -clip=head,tail[,invert] — Specifies the number of data points to clip from the head and tail of each page. If invert is given, the clipping retains rather than deletes the indicated points.
- -fclip=head,tail[,invert] — Specifies the fraction of data points to clip from the head and tail of each page. If invert is given, the clipping retains rather than deletes the indicated points.
- -sparse=interval[,offset] — Specifies sparsing of each page with the indicated interval. That is, only every intervalth row starting with row offset is copied to the output. The default value of offset is 0.
- -sample=fraction — Specifies random sampling of rows such that approximately the indicated fraction is kept. Since a random number generator is used that is seeded with the system clock, this will usually never be the same twice.

- -filter={column | parameter},rangeSpec[,rangeSpec[,logicOp...]] —
Specifies winnowing inputFile based on numerical data in parameters or columns. A
range-spec is of the form name,lower-value,upper-value[,!] , where ! signifies
logical negation. A page passes a given filter by having the named parameter
inside (or outside, if negation is given) the specified range, where the endpoints are
considered inside. A tabular data row passes a given filter in the analogous fashion,
except that the value from the named column is used.
- rpn calculator initialization:

- -rpnDefinitionsFiles=filename... — Specifies a list of comma-separated filenames to be read in as rpn definitions files. By default, the file named in the RPN_DEFNS environment variable is read.
- -rpnExpression=expression[,repeat][,algebraic] — Specifies an rpn expression to be executed. If repeat is not specified, then the expression is executed before processing begins. If repeat is specified, the expression is executed just after each page is read; it may use values of any of the numerical parameters for that page. This option may be given any number of times.

- Scanning from, editing, printing to, and executing string columns and parameters:

- -scan={column | parameter},newName,sourceName,sscanfString

[,definitionEntries] — Specifies creation of a new numeric column (parameter) by scanning an existing string column (parameter) using a sscanf format string. The default type of the new data is double; this may be changed by including a definitionEntry of the form type=typeName. With the exception of the name field, any valid namelist command field and value may be given as part of the definitionEntries.If sourceName contains wildcards, then newName must contain at least one occurrence of the string “%s”. In this case, for each name that matches sourceName, an additional element is created, with a name created by substituting the name for “%s” in newName.

- -edit={column | parameter},newName,sourceName,edit-command — Specifies
creation of a new string column (parameter) called newName by editing an existing
string column (parameter) sourceName using an emacs-like editing string. For details
on editing commands, see SDDS editing.
If sourceName contains wildcards, then newName must contain at least one occurrence of the string “%s”. In this case, for each name that matches sourceName, an additional element is created, with a name created by substituting the name for “%s” in newName.

- -reedit={column | parameter},name,edit-command — Like -edit, except that the element name must already exist. Each value is replaced by the value obtained from applying edit-command.
- -print={column | parameter},newName,sprintfString,sourceName

[,sourceName...][,definitionEntries] — Specifies creation of a new string column (parameter) by formatted printing of one or more elements from other columns (parameters). The sprintfString is a C-style format string such as might be given to the routine sprintf. With the exception of the name field, any valid namelist command field and value may be given as part of the definitionEntries. - -reprint — Identical in syntax and function to -print, except that if newName already exists, it is overwritten. No error or warning is issued.
- -format={column | parameter},newName,sourceName

[,stringFormat=sprintfString][,doubleFormat=sprintfString]

[,longFormat=sprintfString] — Reformats string data in different ways depending on the type of data the string contains. Each string is separated into tokens at space boundaries. Each token is separately formatted, either as a long integer, a double-precision floating point number, or a string, depending on what the token appears to be. The formatting is done using the specified format strings; the default format strings are %ld for longs, %21.15e for doubles, and %s for strings. - -system={column | parameter},newName,commandName,

[definitionEntries] — Specifies creation of a new string column (parameter) by executing an existing string column (parameter) using a subprocess. The first line of output from the subprocess is acquired and placed in the new column (parameter).If commandName contains wildcards, then newName must contain at least one occurrence of “%s”. In this case, for each name that matches commandName, an additional element is created, with a name created by substituting the name for “%s” in newName.

- -scan={column | parameter},newName,sourceName,sscanfString
- Creation and modification of numeric columns and parameters:

- -convertUnits={column | parameter},name,newUnits,oldUnits,factor

— Specifies units conversion for the column or parameter name (which may contain wildcards). The factor entry the factor by which the values must be multiplied to convert them to the desired units. It is an error if oldUnits does not match the original units of the column or parameter. Eventually, the factor entry will be made optional by inclusion of conversion information in the program. This option may be given any number of times. - -define={column |
parameter},name,equation[,select=matchString][,exclude=matchString]
[,editSelection=editCommand][,definitionEntries][,algebraic]

— Specifies creation of a new column or parameter using an rpn expression to obtain the values. For parameters, any parameter value may be obtained by giving the parameter name in the expression. For columns, one may additionally get the value of any column by giving its name in the expression; the expression given for -define=column is essentially specifying a vector operation on columns with parameters as scalars. By default, the type of the new data is double. This and other properties of the new column or parameter may be altered by giving definitionEntries, which have the form fieldName=value; fieldName is the name of any namelist command field (except the name field) for a column or parameter, as appropriate. This option may be given any number of times.Using the select qualifier, it is possible to use a single -define option to specify many instances of new column definitions. If select is given, the input is searched for all the column names matching matchString. These are then optionally editted using the editCommand specified with editSelection. The resulting strings are then substituted one at a time into name and equation, replacing all occurances of “%s”. For example, suppose a file contained a number of column-pairs of the form PrefixV1 and PrefixV2; to take the difference of each pair, one could use

-define=column,%sDiff,%sV1 %sV2 -,select=*V1,edit=%/V1//

sddsprocess permits read access to individual elements of a column of data using the rpn array feature. For each column, an array of name &ColumnName is created; the ampersand is to remind the user that the variable &ColumnName is the address of the start of the array. To get the first element of a column named Data, one would use 0 &Data [. This will function only within or following a -define=column or -redefine=column operation. It is an error to attempt to access data beyond the bounds of an array.The number of columns, and the current page and row number are pre-loaded into the rpn calculator memory according to the following table.

Quantity rpn memory Page number i_page Page number table_number Row number i_row Number of rows n_rows For example, to generate a column of index number to a file, add the option -define=col,Index,i_row,type=long.

- -redefine — This option is identical to -define except that the column or parameter already exists in the input. The equation may use the previous values of the entity being redefined by including the column name in the expression.
- -evaluate={column | parameter},name,source[,definitionEntries]

— Specifies creation of a new column or parameter name containing values from evaluation of the equation stored in a string column or parameter source. The source string is an rpn expression in terms of the other column and parameter values. - -cast={column | parameter},newName,oldName,newType — This option allows casting data from one numerical data type to another. It is much faster than trying to do the same operation using -define. The string newType may be any of double, float, long, short, or character.
- -process=mainColumnName,analysisName,resultName[,default=value]

[,description=string][,symbol=string][,weightBy=columnName]

[,functionOf=columnName[,lowerLimit=value][,upperLimit=value]]

[,head=number][,tail=number][,fhead=fraction][,ftail=fraction]

[,topLimit=value][,bottomLimit=value]

[,position][,offset=value][,factor=value]

[,match=columnName,value=match-value] — This option may be given any number of times. It specifies creation of a new parameter resultName by processing column mainColumnName using analysis mode analysisName. The column must contain numeric data, in general, except for a few analysis modes that take any type of data (see below). mainColumnName may contain wildcards, in which case the processing is applied to all matching columns containing numeric data. resultName may have a single occurence of the string “%s” embedded in it; if so, mainColumnName is substituted. If wildcards are given in mainColumnName, then “%s” must appear in resultName; in this case, the name of each selected column is substituted. Similarly, if the description field is supplied, it may contain an embedded “%s” for which the column name will be substituted. If the processing fails for any reason, the value given by the default parameter is subsituted; if no value is specified, the value is equal to the maximum double-precision value on the system.Recognized values for analysisName are:

- average, rms, sum, standardDeviation, mad — The arithmetic average, the rms average, the arithmetic sum, the standard deviation, and the mean absolute deviation. All may be possibly weighted.
- median, drange, qrange — The median value, i.e., the value which is both above and below 50% of the data points; the decile-range, which is the range excluding the smallest and largest 10% of the values; the quartile-range, which is the range excluding the smallest and largest 25% of the values.
- percentile, prange — These compute percentiles and percentile ranges, as defined by the percentlevel qualifier. For percentile, the value returned is the value of the column corresponding to the given percentlevel. For prange, the value return is the span of the values in the column encompassing the given central percentage of the data; for example percentile=50 would give the quartile range.
- minimum, maximum, spread, smallest, largest — The minimum value, maximum value, spread in values, smallest value (minimum absolute value), and largest value (maximum absolute value). For all except spread, the position and functionOf qualifiers may be given to obtain the value in another column when mainColumnName has the extremal value; the functionOf qualifer may name a string column.
- first, last — The values in the first and last rows of the page. Will accept non-numeric data.
- pick — The first value within the filter. Will accept non-numeric data.
- count — The number of values in the page.
- baselevel, toplevel, amplitude — Waveform analysis parameters from histogramming the signal amplitude. baselevel is the baseline, toplevel is the height, and amplitude is height above baseline.
- risetime, falltime, center — The rise and fall times from the 10%-90% and 90%-10% transitions. center is the midpoint between the first 50% rising edge and the first following 50% falling edge after rising above 90% amplitude. Requires specifying a independent variable column with functionOf.
- fwhm, fwtm, fwha, fwta — Full-widths of the named column as a function of the independent variable column specified with functionOf. The letters ’h’ and ’t’ specify Half and Tenth amplitude widths, while ’m’ and ’a’ specify Maximum value or Amplitude over baseline.
- zerocrossing — Zero-crossing point of the column named with functionOf of the column mainColumnName.
- sigma — The standard deviation over the square-root of the number of points. This is an estimate of the uncertainty in the mean value.
- slope, intercept, lfsd — The slope and intercept of a linear fit. The functionOf qualifier must be given to specify the quantity to fit against. lfsd is the Linear-Fit-Standard-Deviation, which is the standard deviation of the fit residuals.
- gmintegral — The integral of the quantity with respect to the quantity named with the functionOf qualifier. The integral is performed using the Gill-Miller method, which works well for non-equispaced values of the independent variable.
- correlation — The Pearson’s correlation coefficient of the quantity and the column of which it is (nominally) a function (as declared with the functionOf qualifier).

Qualifiers for this switch are:

- description=string, symbol=string — Specify the description and symbol fields for the new column.
- weightBy=columnName — Specifies the name of a column to weight values from column mainColumnName by before computing statistics.
- functionOf=columnName — Specifies the name of a column that mainColumnName is to be considered a function of for computing widths, zero-crossings, etc.
- topLimit=value, bottomLimit=value — Specifies winnowing of rows so that only those with mainColumn values above the topLimit or below the bottomLimit are included in the computations.
- lowerLimit=value, upperLimit=value — If functionOf is given, specifies winnowing of rows so that only rows for which the independent column data is above the lowerLimit and/or below the upperLimit are included in computations.
- head=number, fhead=fraction — Specifies taking the head of the data prior to processing. head gives the number of points keep, while fhead gives the fraction of the points to keep. If number or fraction is less than 0, thenthe head points are deleted and the other points are kepts. If head and tail are both used, head is performed first. gives the fraction of the points to clip.
- tail=number, ftail=fraction — Specifies taking the tail of the data prior to processing. tail gives the number of points keep, while ftail gives the fraction of the points to keep. If number or fraction is less than 0, thenthe tail points are deleted and the other points are kepts. If head and tail are both used, head is performed first.
- position — For minimum, maximum, smallest, and largest analysis modes, specifies that the results should be the position at which the indicated value occurs. This position is the corresponding value of in column named with functionOf.
- offset=value, factor=value — Specify an offset and factor for modifying data prior to processing. By default, the offset is zero and the factor is 1. The equation is x → f * (x + o).
- match=controlName, value=match-value — Specify the match column and the match value (may contain wildcard).

- -convertUnits={column | parameter},name,newUnits,oldUnits,factor
- Miscellaneous:
- -ifis={column | parameter | array},name[,name...]

-ifnot={column | parameter | array},name[,name...]

— These options allow conditional execution. If any column that is named under a ifis option is not present, execution aborts. If any column that is named under a ifnot option is present, execution aborts. - -description=[text=string][,contents=string] — Specifies the description fields for the SDDS dataset. Use of this feature is disparaged as these fields are not manipulated by any tools. Use of string parameters is suggested.
- -summarize — Specifies that a summary of the processing be printed to the screen.
- -verbose — Specifies that informational printouts be provided during processing.
- -noWarnings — Specifies suppression of warning messages.
- -delete={columns | parameters | arrays},matchingString[,...],

-retain={columns | parameters | arrays},matchingString[,...] — These options specify wildcard strings to be used to select entities (i.e., columns, parameters, or arrays) that will respectively be deleted or retained (i.e., that will not or will appear in the output). The selection is performed by determining which input entities have names matching any of the strings. If retain is given but delete is not, only those entities matching one of the strings given with retain are retained. If both delete and retain are given, then all entities are retained except those that match a delete string without matching any of the retain strings.

- -ifis={column | parameter | array},name[,name...]

- Data winnowing: Any number of the following may be used. They are applied in the
order given. Note that -match and -test are the most time intensive; thus, if several
types of winnowing are to be applied, these should be used last if possible.
- author: M. Borland, H. Shang, R. Soliday ANL/APS.