Skip to contents

A artoo_spec is artoo’s single source of truth: the variables, CDISC data types, lengths, labels, controlled-terminology codelists, and sort keys for exactly one CDISC standard. Read one from the metadata you already have, inspect it as plain data frames, fix it in R when the data disagrees, and write it back — the spec is the contract every later step honors.

1. Read a spec

read_spec() ingests a specification from Define-XML 2.x, a Pinnacle 21 workbook, or artoo’s own native JSON, and returns a artoo_spec. The bundled ADaM spec also ships as a P21 workbook, so this runs as-is:

p21 <- system.file("extdata", "adam-spec.xlsx", package = "artoo")
spec <- read_spec(p21)
spec
<artoo_spec>
Study: CDISC-Sample
Standard: ADaMIG 1.1
Datasets:  2
Variables: 104
Codelists: 30
Methods: 54
Comments: 22
Documents: 9
Spec for: ADSL, ADAE

A workbook can carry several standards or duplicate roles; scope the read when you need just one:

read_spec("define.xml", datasets = "ADSL", on_duplicate = "first")

2. Inspect with the spec_* accessors

Each accessor returns a plain data frame (or character vector), so the spec slots straight into ordinary base R work — filter, join, summarise. The datasets a spec covers:

[1] "ADSL" "ADAE"

The variable table is the one you reach for most; here, four columns of it:

spec_variables(spec, "ADSL")[, c("variable", "label", "data_type", "length")] |>
  head()
  variable                            label data_type length
1  STUDYID                 Study Identifier    string     12
2  USUBJID        Unique Subject Identifier    string     11
3   SUBJID Subject Identifier for the Study    string      4
4   SITEID            Study Site Identifier    string      3
5  SITEGR1              Pooled Site Group 1    string      3
6      ARM       Description of Planned Arm    string     20

The sort keys that apply_spec() will order by, and the controlled terminology a coded variable is bound to:

spec_keys(spec, "ADSL")
[1] "STUDYID" "USUBJID"
  codelist_id order  term decode extended comment_id
1   CL.AGEGR1    NA   <65   <NA>       NA       <NA>
2   CL.AGEGR1    NA 65-80   <NA>       NA       <NA>
3   CL.AGEGR1    NA   >80   <NA>       NA       <NA>
4  CL.AGEGR1N     1     1    <65       NA       <NA>
5  CL.AGEGR1N     2     2  65-80       NA       <NA>
6  CL.AGEGR1N     3     3    >80       NA       <NA>

spec_standard(), spec_study(), spec_methods(), spec_comments(), and spec_documents() expose the remaining slots the same way.

3. Fix it in place

When the data disagrees with the spec, fix the spec in one line — never reach into internals. set_type() retypes a variable; the spec is immutable, so it returns an updated copy:

spec <- set_type(spec, "ADSL", AGE = "float")
v <- spec_variables(spec, "ADSL")
v$data_type[v$variable == "AGE"]
[1] "float"

When a check has already found integer-vs-fraction mismatches, repair_spec() applies the fix for every one of them at once, from the findings frame:

findings <- check_spec(cdisc_adsl, spec, "ADSL")
spec <- repair_spec(spec, findings)
No "integer_fraction" or "integer_overflow" findings to repair.
ℹ The spec is returned unchanged.

4. Write it back

write_spec() is the inverse of read_spec() on each format: native JSON is fully lossless, and the P21 workbook is the interchange form. Round-trip a corrected spec through JSON:

out <- tempfile(fileext = ".json")
write_spec(spec, out)
identical(spec_standard(read_spec(out)), spec_standard(spec))
[1] TRUE

Because the two verbs are inverses, format conversion is one composition:

read_spec("define.xml") |> write_spec("spec.xlsx")

Where to next