Run the ordered, transactional artoo pipeline that turns a raw analysis
data frame into one conformed to its specification and carrying
artoo_meta. This is the middle of the workflow (spec -> apply_spec ->
read_/write_): the conformed frame is ready for any write_*() codec, and
the metadata it now carries makes that write lossless. The input is never
mutated; if any step aborts, the call leaves your data untouched.
Arguments
- x
The raw data frame to conform.
<data.frame>: required.- spec
The specification to conform to.
<artoo_spec>: required.- dataset
The dataset whose rules apply.
<character(1)>: required. Must name a dataset inspec.- conformance
What to do with conformance findings.
<character(1)>. One of:"warn"(default) runcheck_spec(), attach the findings (read them withconformance()), warn on any error-severity finding."abort"abort withartoo_error_conformanceon any error-severity finding."off"skip the check entirely.
Note: this governs only the findings disposition — what is reported. Pipeline errors are a different category and abort under every setting, including
"off": an unknown dataset, and lossy coercion (artoo_error_type). Lossy coercion has its own governed gate,on_coercion_loss, not this argument; when the spec is the problem, retype it withset_type().- na_position
Where missing key values sort.
<character(1)>. One of"first"(default) or"last"."first"matches SASPROC SORT(and the FDA submission convention) by ordering missings before present values;"last"matches R'sorder()and the pandas/Polars default. Both are lossless; pick the one your comparison target uses.- extra
What happens to undeclared columns.
<character(1)>. An "extra" is a column ofxthe spec does not declare — typically a derivation temporary. One of:"keep"(default) extras ride along after the declared columns, reported by theextra_variablefinding."drop"the returned frame carries exactly the spec's columns; the drop is announced (artoo_message_apply), and that message is the audit trail of what was removed.
Interaction: the drop runs before the check, so
conformance()reports only the columns the returned frame keeps. Underconformance = "abort"an error-severity finding still aborts (those findings arise only on spec-declared columns, which the drop never touches) and the input is never mutated, so the trim cannot mask a failure.Note:
"keep"is the default deliberately. artoo is a lossless carrier, so the metadata step never silently discards a column; extras are surfaced every run (theextra_variablefinding, and a warning underconformance = "warn"), making"drop"a conscious opt-in.- on_coercion_loss
What to do when coercion would lose data.
<character(1)>. The governed gate for anintegerdataType whose data truncates (fractions) or overflows (R's 32-bit range). One of:"error"(default) abort withartoo_error_typebefore any value is touched, refusing to damage the data."keep"skip coercion for the offending column, leaving it at its wider source type. The values are preserved and the mismatch is reported as aninteger_fraction/integer_overflowfinding (read withconformance()), never silently truncated.
Interaction: independent of
conformance."error"aborts even underconformance = "off"; under"keep"the finding it leaves is surfaced byconformance = "warn"(the default) and suppressed by"off".Tip:
"keep"is the iterate stance (preserve the data, flag the spec);"error"is the submission stance. To fix the spec itself, seeset_type()andrepair_spec().
Value
A conformed <data.frame> carrying artoo_meta (read it with
get_meta()) and, unless conformance = "off", the findings frame
conformance() reads back. Hand it to any write_*() codec.
Details
Ordered pipeline. Four fixed steps run in order: coerce each column
to its CDISC dataType, reorder columns to the spec, sort rows by the
dataset keys, then stamp the metadata. A spec variable the data lacks is
never fabricated as an empty column: artoo is a lossless carrier, not a
deriver. It is reported instead, an informational heads-up at apply time
plus a missing_variable finding (when mandatory) or missing_permissible
(when not), and left absent, so the conformed frame carries only the
columns the data actually had.
Extras are kept by default. A column the spec does not declare
survives the pipeline (ordered after the declared ones), is reported
by the extra_variable conformance finding, and round-trips through
every write_*() codec with metadata inferred from its R class —
membership reported, never enforced by silent destruction. Keeping is
the default because artoo is lossless by construction: a
metadata-application step that silently discarded columns would break
that contract, so trimming data is always an explicit, announced choice
rather than a default side effect.
extra = "drop" opts in to trim-to-spec (the returned frame carries
exactly the spec's columns): the undeclared columns are removed before
the check, so the findings describe exactly the returned frame (a dropped
column is never reported as extra_variable), and the drop itself is
always announced (artoo_message_apply) as the audit trail of what was
removed — even under conformance = "off".
Lossless or abort, your call. A coercion that would damage values —
an integer dataType truncating fractions or overflowing R's 32-bit
range — aborts with artoo_error_type before any value is touched, under
the default on_coercion_loss = "error". This gate is independent of
conformance: conformance = "off" does not bypass it. When the data
(not the spec) is right, set on_coercion_loss = "keep": the column
keeps its wider source type and the divergence is reported as an
integer_fraction / integer_overflow finding, never silently
truncated. When the spec is wrong, retype it with set_type() (or
repair_spec() from the findings). The error abort carries the offending
rows as data: cnd$variables is a data frame with columns
variable, data_type, n, and reason ("truncated" /
"overflowed"), so a pipeline can collect every mismatch in one
tryCatch(..., artoo_error_type = function(cnd) cnd$variables) pass.
The NA-introduction warning (artoo_warning_coercion) carries the same
frame with reason = "na_introduced", and a conformance = "abort"
failure carries the complete findings frame as cnd$findings.
Values are never translated. Coded variables keep their submission
values (SEX stays "M"); codelist translation is its own verb,
decode_column().
See also
Check: check_spec() for the findings; conformance() to read them
back.
Fix the spec: set_type() to retype a variable the data disagrees
with, repair_spec() to apply every integer fix from a findings frame.
Translate: decode_column() for codelist value mapping.
Metadata: get_meta() / set_meta() for what the stamp attaches.
Examples
# ---- Example 1: conform ADSL, then read its metadata ----
#
# The bundled adam_spec describes ADSL; the raw frame is coerced,
# ordered, sorted, and stamped with the CDISC metadata get_meta() reads
# back. Variables the spec declares but this extract never derived are
# reported (not added), readable via conformance().
adsl <- apply_spec(cdisc_adsl, adam_spec, "ADSL")
#> 6 variables the spec declares are absent from the data (not added):
#> `TRTDURD`, `DISONDT`, `EOSSTT`, `DCSREAS`, `EOSDISP`, and `MMS1TSBL`.
#> ℹ See `conformance(x)` for the findings.
get_meta(adsl)@dataset$records
#> [1] 60
# ---- Example 2: extras are kept and reported, or dropped on request ----
#
# By default a column outside the spec rides along (reported by the
# extra_variable finding) and still writes losslessly; extra = "drop"
# trims to the spec, announced and still reported. DM is SDTM, so it
# conforms against the bundled sdtm_spec.
raw <- cdisc_dm
raw$DERIVED <- seq_len(nrow(raw))
dm <- apply_spec(raw, sdtm_spec, "DM")
#> 1 variable the spec declares is absent from the data (not added):
#> `BRTHDTC`.
#> ℹ See `conformance(x)` for the findings.
findings <- conformance(dm)
findings[findings$check == "extra_variable", c("variable", "message")]
#> variable message
#> 2 RFXSTDTC Column 'RFXSTDTC' is not declared in the spec.
#> 3 RFXENDTC Column 'RFXENDTC' is not declared in the spec.
#> 4 RFICDTC Column 'RFICDTC' is not declared in the spec.
#> 5 RFPENDTC Column 'RFPENDTC' is not declared in the spec.
#> 6 DTHDTC Column 'DTHDTC' is not declared in the spec.
#> 7 DTHFL Column 'DTHFL' is not declared in the spec.
#> 8 ACTARMCD Column 'ACTARMCD' is not declared in the spec.
#> 9 ACTARM Column 'ACTARM' is not declared in the spec.
#> 10 DMDTC Column 'DMDTC' is not declared in the spec.
#> 11 DMDY Column 'DMDY' is not declared in the spec.
#> 12 DERIVED Column 'DERIVED' is not declared in the spec.
trimmed <- apply_spec(raw, sdtm_spec, "DM", extra = "drop")
#> 1 variable the spec declares is absent from the data (not added):
#> `BRTHDTC`.
#> ℹ See `conformance(x)` for the findings.
#> Dropped 11 undeclared variables: `RFXSTDTC`, `RFXENDTC`, `RFICDTC`,
#> `RFPENDTC`, `DTHDTC`, `DTHFL`, `ACTARMCD`, `ACTARM`, `DMDTC`, `DMDY`,
#> and `DERIVED`
"DERIVED" %in% names(trimmed)
#> [1] FALSE