Build and validate a artoo_spec from dataset, variable, and codelist
tables. Each table is coerced to a plain data frame, missing optional
columns are filled with typed NAs, every variable type is canonicalised
to the CDISC dataType vocabulary, and cross-slot integrity (dataset and
codelist references) is checked before the object is returned. The spec
is the lingua franca the rest of artoo reads, applies, and serialises.
Usage
artoo_spec(
datasets = NULL,
variables = NULL,
codelists = NULL,
study = NULL,
values = NULL,
methods = NULL,
comments = NULL,
documents = NULL,
standard = NULL
)Arguments
- datasets
Dataset-level metadata table.
<data.frame>: required. One row per dataset; must carry adatasetcolumn. Optional columnslabel,class,structure,keysare filled withNAwhen absent.- variables
Variable-level metadata table.
<data.frame>: required. One row per variable; must carrydataset,variable, anddata_type. Thedata_typecolumn is canonicalised to a CDISCdataType(e.g."text"becomes"string").Requirement: every
datasetvalue must appear indatasets.- codelists
Controlled-terminology terms.
<data.frame> | NULL. Must carrycodelist_idandtermwhen supplied.Interaction: every
codelist_idreferenced byvariablesmust resolve here.- study
Study-level metadata.
<data.frame> | NULL. A single row of named study fields. Well-known fields are canonicalised tostudy_name,study_description, andprotocol_name(aliases such asStudyNameorstudyidresolve automatically); other fields pass through verbatim. Astandardfield, when present, is consumed into@standard.- values
Value-level (VLM) metadata.
<data.frame> | NULL.- methods
Derivation methods.
<data.frame> | NULL. The Define-XML method definitions variables reference bymethod_id; must carrymethod_idwhen supplied. Completeness (e.g. a referenced method has a description) is checked byvalidate_spec(), not here.- comments
Comment definitions.
<data.frame> | NULL. Referenced bycomment_id; must carrycomment_idwhen supplied.- documents
Document references.
<data.frame> | NULL. Referenced bydocument_id; must carrydocument_idwhen supplied.- standard
The CDISC standard the spec implements.
<character(1)> | NULL. E.g."ADaMIG 1.1"or"SDTMIG 3.2". WhenNULL(default) it is resolved fromdatasets$standardorstudy$standard; absent everywhere,@standardisNA.Restriction: all sources must agree on one value; conflicting standards abort with
artoo_error_spec.
Value
A validated artoo_spec object. Inspect it with
spec_datasets() / spec_variables(), or check it with
validate_spec().
Details
Coerce, then validate. Each table is first coerced to a plain data
frame (a tibble is accepted and demoted); known columns are cast to
their storage mode and absent optional columns are added as typed NA,
so every downstream reader can trust the schema. Validation runs only
after coercion, on the completed slots.
Type canonicalisation. variables$data_type is mapped through the
closed CDISC dataType vocabulary (string, integer, decimal,
float, double, boolean, date, datetime, time, URI). Common
SAS / P21 spellings resolve automatically ("text", "Char",
"integer (8)", ...); an unrecognised token aborts with
artoo_error_type.
Cross-slot integrity. Construction fails (artoo_error_spec) if a
variable names a dataset absent from datasets, or references a
codelist_id absent from codelists.
One spec, one standard. A artoo_spec carries exactly one CDISC
standard, stored as the scalar @standard property. The constructor
resolves it from the standard argument, a standard column in
datasets (the P21 workbook shape), and a standard field in study
(the Define-XML shape) — those columns are consumed, so @standard is
the single home. More than one distinct value aborts with
artoo_error_spec; scope the source to one standard (e.g.
read_spec(path, datasets = ...)) instead of mixing.
One study vocabulary. Well-known study fields are canonicalised to
the CDISC ODM GlobalVariables names, snake_cased: study_name,
study_description, protocol_name. Source spellings resolve
automatically (StudyName, studyid, ...); fields the vocabulary does
not know pass through verbatim. Aliases that disagree on a value abort
with artoo_error_spec.
See also
Inspect: spec_datasets(), spec_variables(), spec_codelists(),
spec_keys(), spec_study().
Check: validate_spec(). Predicate: is_artoo_spec().
Examples
# ---- Example 1: build a spec from the bundled CDISC-pilot tables ----
#
# `cdisc_sdtm_datasets` and `cdisc_sdtm_variables` hold the CDISC pilot SDTM
# metadata in the shape artoo_spec() expects; the constructor
# canonicalises every type and checks cross-slot integrity.
spec <- artoo_spec(cdisc_sdtm_datasets, cdisc_sdtm_variables, codelists = cdisc_codelists)
spec_datasets(spec)
#> [1] "DM"
# ---- Example 2: a focused spec for a single dataset ----
#
# Slice the bundled tables to one dataset (DM) to build a smaller spec.
dm_ds <- cdisc_sdtm_datasets[cdisc_sdtm_datasets$dataset == "DM", ]
dm_var <- cdisc_sdtm_variables[cdisc_sdtm_variables$dataset == "DM", ]
dm_spec <- artoo_spec(dm_ds, dm_var, codelists = cdisc_codelists)
head(spec_variables(dm_spec, "DM")[, c("variable", "label", "data_type")])
#> variable label data_type
#> 1 STUDYID Study Identifier string
#> 2 DOMAIN Domain Abbreviation string
#> 3 USUBJID Unique Subject Identifier string
#> 4 SUBJID Subject Identifier for the Study string
#> 5 RFSTDTC Subject Reference Start Date/Time string
#> 6 RFENDTC Subject Reference End Date/Time string