Input File Formats

Sample Tracker accepts input from laboratory result files in various formats, some of which are based on the standard interchange format (SIF) concept. You may implement variations on the SIF layout, provided that the basic structure is adhered to. Currently, these implementations include fixed format, Comma Separated Values (CSV) and Extensible Markup Language (XML) files.

File Structure

A typical lab results file consists of two sections: a header section and a data section.

Header Section

The header section consists of a number of lines (records), containing general information relating to the sample receipt (such as the despatch identifier, lab job number, date and comments) as well as details of the analyses performed (i.e. analytes, method codes, units of measure and detection limits).

The requirements for this section are as follows:

Despatch number, lab job number, receipt date, comments, etc. (if present) must appear in the header section of the file
Element codes must appear in consecutive fields on a single record
Lab method codes (one for each element) must appear in consecutive fields on a single record
Units of measure must appear in consecutive fields on a single record
Lower detection limits must appear in consecutive fields on a single record
Upper detection limits must appear in consecutive fields on a single record

The rows containing element, method, units and lower detection limit are compulsory.

In Sample Tracker, analytical details are grouped together in collections known as "combos" (element/method combinations). Each combo consists of an element code, a lab method code, a units of measure code, a lower detection value and (optionally) an upper detection value. Two combos are considered to be identical if the element, lab method, units and lower detection limit are the same.

Data Section

This part of the file comprises a number of records containing the actual analytical results. Each row must contain at least the sample identifier, followed by the results (in the same order as the corresponding combos defined in the element header row). The data section can also contain additional fields, such as sample tag qualifiers.

Format Layouts

No further configuration is required if you are planning to use standard, fixed format SIF files to import analytical results. However, if necessary you may implement variations on the standard SIF layout by defining a new format.

The idea behind a format definition is to tell Sample Tracker where to locate important data in the input file. In other words, a format definition provides various points of reference to be used by Sample Tracker when it decodes the input records.

Customised format definitions are stored in the Sample Tracker database. Two database tables are used for this purpose:

ST_XS_FORMAT

This table contains a list of all your customised format definitions.

Column	Description
FORMAT_ID	An abbreviation used for this format.
FORMAT_SEQ	Display order (e.g. in format lists).
FORMAT_DESCRIPTION	A short description of the format.
FORMAT_TYPE	Either SIF, CSV or XML. The CSV option is used for files that contain the same basic data as a SIF file, but with the field values separated by commas (i.e. not in a fixed format). The XML option is used for XML files. 1
FILE_MASK	A filter used in file dialog boxes (e.g. *.SIF for SIF files).

Although it is possible to define XML content in other ways, the XML data definition is included in the Sample Tracker format tables for the sake of consistency. The XML implementation is described in more detail later. Note that the XML option is not available in version 7.1.

ST_XS_FORMAT_LAYOUT

This table stores column definitions for the fields that could be present in a customised SIF file format.

Column	Description
FORMAT_ID	Format abbreviation, as stored in ST_XS_FORMAT
FIELD_ID	The system name by which this field is known to Sample Tracker.
DESCR	Description of the field contents (used for display purposes in the receipt and receipt report).
SHEET_ID	In Version 7 and upwards, this field is used to store a default value for fields that do not appear in the input file. (See discussion regarding tag qualifiers).
FIELD_SEQ	Display order (e.g. in field lists).
FIELD_ROW	The row number, identifying the data record on which this field appears (where 1 is the first record in the file). A value of zero indicates that the field is not present in the input file. (See note under FIELD_COL.)
FIELD_COL	In a fixed format SIF, this value refers to the actual starting position for the field on the input record (with 1 indicating the first character). In a CSV format, this value refers to the field index in the record, where 1 is the first field, 2 is the second field, etc. (separated by commas). A value of zero indicates that the field is not present in the input file. However, if the value is zero and the value of FIELD_ROW is greater than zero, then a default value will be used for this field (as stored in SHEET_ID).
FIELD_LEN	In a fixed format SIF, this refers to the actual width of this field in the input record. (This value is not used for CSV files.)

The value in the FIELD_ID column of the format layout refers to the system name of a field. This is a predefined name by which the field is known to Sample Tracker. For example, Sample Tracker looks for the field "DESPATCH" when it needs to retrieve the value of the despatch identifier. Similarly, System fields usually have a specific meaning or function, therefore it is important that they be correctly identified.

The value of FIELD_ROW has particular importance in the case of the SAMPLEID field. This row is assumed to be the start of the data section (in other words, the first record in the file containing analytical results.) Consequently, the row values for the SAMPLEID and RESULTV fields should be the same. All rows preceding the data row are assumed to fall in the header section.

The following system field names are recognised by Sample Tracker:

Name	Description	Section
DESPATCH	Despatch identifier	Header
LABJOBNO	Lab job number	Header
DATERECV	Date of report (string in the format ddmmyy)	Header
PERSON	Person processing the receipt*	Header
DELIVERY	Delivery method*	Header
INVOICE	Invoice number*	Header
AMOUNT	Invoice amount*	Header
CURRENCY	Currency of Invoice	Header
COSTCODE	Cost code*	Header
ELEMENT	First element code	Header
UNITS	First units code	Header
DETECT	First lower detection value	Header
UDETECT	First upper detection value*	Header
METHOD	First method code	Header
COMMENTS	First comment	Header
SAMPLEID	First sample tag	Data
RESULTV	First result value	Data

These fields do not appear in a standard SIF file, but can be configured for customised SIF formats. It is possible to define addition fields to cater for special cases, such as sample tag qualifiers.

You do not need to define a format layout if you intend to use only standard SIF files. For Sample Tracker purposes, a Standard SIF File is one for which the format layout is as follows:

System Name	FIELD_ROW	FIELD_COLUMN	FIELD_WIDTH
DESPATCH	2	1	6
LABJOBNO	1	1	4
DATERECV	2	21	6
ELEMENT	2	27	8
UNITS	3	27	8
DETECT	4	27	8
METHOD	5	27	8
COMMENTS	6	3	80
SAMPLEID	8	1	16
RESULTV	8	27	8

Here is an example showing the top section of a standard SIF file, as well as the corresponding layout as it would appear in the ST_XS_FORMAT_LAYOUTtable:

Here is an example of a CSV implementation of the standard SIF format:

Extended format definitions

The above examples implement only the standard system fields. It is possible to extend the format layout for the header section to include some of the other fields recognised by Sample Tracker, e.g. upper detection limits.

It is also possible to extend the field set for the data section. However, the important difference here is that any fields defined in the data section are assumed to be related to the analytical results and will therefore be stored in the results table in the Sample Tracker database.

The data section is assumed to start at the row on which the first sample tag appears, therefore any field that has the same FIELD_ROW value as the SAMPLE_ID field will be treated as data.

One of the most popular extensions in the data section is the use of Sample Tag Qualifiers.