Skip to contents

Read a clinical-dataset specification into a validated artoo_spec, dispatching on the file extension: artoo's native JSON (the inverse of write_spec()), a Pinnacle 21 (P21) Excel workbook, or a native Define-XML 2.0/2.1 document. The returned spec is the lingua franca the rest of artoo applies and serialises.

Usage

read_spec(path, datasets = NULL, on_duplicate = c("error", "first", "warn"))

Arguments

path

The specification file to read. <character(1)>: required. A .json (native) or .xlsx / .xls (P21) file.

Requirement: reading a P21 workbook needs the readxl package.

datasets

Read only these datasets. <character> | NULL. NULL (default) reads the whole spec. Otherwise the spec is scoped to the named datasets before validation, so one broken sheet elsewhere in a workbook cannot block the dataset you are working on. An unknown name aborts listing what the file defines.

on_duplicate

Policy for a variable defined more than once. <character(1)>. A workbook row duplicated within one dataset makes the spec ambiguous; the finding is reported with its source location (sheet and row numbers for Excel). One of:

  • "error" (default) abort, naming each duplicate's rows.

  • "first" keep the first definition of each, dropping the rest with a message.

  • "warn" keep the first definition and warn (artoo_warning_spec).

Value

A validated artoo_spec. Inspect it with spec_datasets() / spec_variables(), check it with validate_spec(), or persist it with write_spec().

Details

Three formats, one validator. A .json file is read as artoo native JSON; a .xlsx / .xls file is read as a P21 workbook; a .xml file is read as Define-XML 2.x. Either way the result is built through artoo_spec(), so type canonicalisation and cross-slot integrity checks are identical regardless of source.

Define-XML ingestion (needs the xml2 package). ItemGroupDefs become datasets (keys derived from the ItemRef KeySequence), ItemRef + ItemDef pairs become variables, CodeLists become codelists (def:ExtendedValue = "Yes" marks an extended term), MethodDefs / CommentDefs / leaves become the supporting slots, and ValueListDefs land in the value-level slot with their where-clauses rendered as readable text.

Note: an ExternalCodeList (MedDRA, ISO-3166) names a dictionary, not an enumerable membership list; it is dropped, and variables that referenced it carry no codelist. Define-XML v1.0 (the 2005 model) is refused with guidance.

P21 ingestion. Sheets are located by a tolerant alias match (case-, space-, and spelling-variant insensitive). Datasets and Variables are required; Codelists and ValueLevel are optional (the latter becomes the spec's value-level slot). Every cell is read as text, then the dataset and codelist foreign keys are forward-filled to recover merged cells (which the Excel reader returns as NA on continuation rows). A key that cannot be resolved aborts with artoo_error_spec rather than being silently dropped.

See also

Inverse: write_spec() serialises a spec to native JSON.

Build / inspect: artoo_spec(), spec_datasets(), spec_variables(), validate_spec().

Examples

# ---- Example 1: round-trip a spec through native JSON ----
#
# write_spec() and read_spec() are inverses on the JSON path: the spec
# that comes back is identical to the one written.
spec <- artoo_spec(cdisc_sdtm_datasets, cdisc_sdtm_variables, codelists = cdisc_codelists)
path <- tempfile(fileext = ".json")
write_spec(spec, path)
back <- read_spec(path)
identical(back, spec)
#> [1] TRUE

# ---- Example 2: scope the read to one dataset ----
#
# `datasets =` reads just the domain you are working on — validation is
# scoped with it, so a problem elsewhere in the workbook cannot block
# this dataset.
dm_spec <- read_spec(path, datasets = "DM")
spec_datasets(dm_spec)
#> [1] "DM"
head(spec_variables(dm_spec, "DM")[, c("variable", "label", "data_type")])
#>   variable                             label data_type
#> 1  STUDYID                  Study Identifier    string
#> 2   DOMAIN               Domain Abbreviation    string
#> 3  USUBJID         Unique Subject Identifier    string
#> 4   SUBJID  Subject Identifier for the Study    string
#> 5  RFSTDTC Subject Reference Start Date/Time    string
#> 6  RFENDTC   Subject Reference End Date/Time    string