Compare a data frame to one dataset's specification and report where they
diverge. This is the data-conformance check at the end of the artoo
workflow (spec -> apply_spec -> check_spec): it reuses the metadata the
spec already carries (variables, types, lengths, codelists, keys). It is
distinct from validate_spec(), which checks the spec's own internal
integrity rather than the data. Both report findings keyed to the same open
rule catalog.
Usage
check_spec(
x,
spec,
dataset,
decode = c("none", "to_decode", "to_code"),
checks = NULL
)Arguments
- x
The data frame to check.
<data.frame>: required. Typically the output ofapply_spec(), but any frame works.- spec
The specification to check against.
<artoo_spec>: required.- dataset
The dataset whose rules apply.
<character(1)>: required.Restriction: must name a dataset in
spec(seespec_datasets()).- decode
Which codelist column membership is checked against.
<character(1)>. One of"none"(default),"to_decode", or"to_code".- checks
Which conformance dimensions to evaluate.
<artoo_checks> | NULL. WhenNULL(default) every dimension runs; build a control withartoo_checks()to disable some.
Value
A findings data frame with columns check, dimension,
severity ("error", "warning", or "note"), dataset, variable,
and message, one row per divergence. Zero rows means the data conforms.
Details
Findings, not enforcement. check_spec() never modifies data; it
returns every divergence it finds. apply_spec() runs it and decides what
to do via its conformance argument (warn, abort, off). The dimensions
checked are: missing variables (split into mandatory, an error, and
permissible, a warning), extra variables (data column the spec does not
declare), type mismatch, ISO 8601 validity of character date/datetime/time
values (CDISC partials pass; "12NOV2019" does not), fractional values
and 32-bit overflow under an integer dataType (both would corrupt data
at coercion), character length overflow, the hard 200-byte XPORT v5 / FDA
character limit, codelist membership, label drift against the spec, key
uniqueness, and displayFormat validity.
Decode-aware membership. decode selects which codelist column the
data is checked against, matching apply_spec()'s decode step:
"none"/"to_code" check against the codelist terms, "to_decode"
against the decodes. apply_spec() threads its own decode through, so
a decoded column is not wrongly flagged.
Fatal vs informational coercion checks. Only integer_fraction and
integer_overflow carry error severity: they mark data an integer
dataType cannot hold without loss, which apply_spec() refuses to coerce
(its on_coercion_loss governs that gate). type_mismatch is a note: a
column stored more widely than the spec declares (an integer-valued
double, for instance) coerces cleanly, so it is informational, not a
blocker.
See also
apply_spec() which runs this; check_study() for the same
check across a whole study; artoo_checks() to select dimensions;
validate_spec() for spec integrity.
Examples
# ---- Example 1: a conformed frame surfaces only the genuine gaps ----
#
# apply_spec() coerces and orders to spec but never fabricates a variable
# the data lacks; checking the result reports the permissible variables
# this extract never derived (here, six) instead of hiding them as empty
# columns.
adsl <- apply_spec(cdisc_adsl, adam_spec, "ADSL", conformance = "off")
#> 6 variables the spec declares are absent from the data (not added):
#> `TRTDURD`, `DISONDT`, `EOSSTT`, `DCSREAS`, `EOSDISP`, and `MMS1TSBL`.
nrow(check_spec(adsl, adam_spec, "ADSL"))
#> [1] 12
# ---- Example 2: raw data surfaces divergences ----
#
# Checking a raw frame with an undeclared column flags the extras.
raw <- cdisc_adsl
raw$NOTASPEC <- 1
head(check_spec(raw, adam_spec, "ADSL")[, c("check", "variable", "severity")])
#> check variable severity
#> 1 missing_permissible TRTDURD warning
#> 2 missing_permissible DISONDT warning
#> 3 missing_permissible EOSSTT warning
#> 4 missing_permissible DCSREAS warning
#> 5 missing_permissible EOSDISP warning
#> 6 missing_permissible MMS1TSBL warning