
Specifications: The Single Source of Truth
Source:vignettes/spec-management.Rmd
spec-management.RmdA herald_spec is the single source of truth for a
clinical submission. It drives every downstream operation: metadata
decoration, Define-XML generation, conformance validation, and
submission packaging. Build it once; use it everywhere.
The spec slots
A herald_spec holds up to eleven data frames, each
corresponding to a tab in a Pinnacle 21 specification workbook:
| Slot | What it holds | Required? |
|---|---|---|
ds_spec |
Dataset-level info: label, class, structure, keys | Yes |
var_spec |
Variable-level info: label, type, length, format, order | Yes |
value_spec |
Value-level metadata (Where/Comment fields) | No |
codelist |
Controlled terminology: code/decode pairs | No |
study |
Study-level metadata: protocol, sponsor | No |
dictionaries |
Medical dictionaries (MedDRA, WHODrug) | No |
methods |
Derivation methods for ADaM | No |
comments |
Reviewer guide comments | No |
documents |
Supplemental documents | No |
arm_displays |
ADaM Results Metadata display definitions | No |
arm_results |
ADaM Results Metadata analysis results | No |
Building a spec programmatically
For small studies or tests, build the spec inline. This is the most self-contained approach and works without any external files.
spec <- herald_spec(
ds_spec = data.frame(
dataset = c("DM", "AE"),
label = c("Demographics", "Adverse Events"),
keys = c("STUDYID, USUBJID", "STUDYID, USUBJID, AESEQ"),
structure = c("One record per subject", "One record per subject per AE"),
stringsAsFactors = FALSE
),
var_spec = data.frame(
dataset = c("DM","DM","DM","DM", "AE","AE","AE","AE","AE"),
variable = c("STUDYID","USUBJID","AGE","SEX",
"STUDYID","USUBJID","AESEQ","AETERM","AESTDTC"),
label = c("Study Identifier","Unique Subject Identifier","Age","Sex",
"Study Identifier","Unique Subject Identifier",
"Sequence Number of AE","Reported Term for the Adverse Event",
"Start Date/Time of AE"),
data_type = c("text","text","integer","text",
"text","text","integer","text","text"),
length = c(12L,11L,8L,1L, 12L,11L,8L,200L,19L),
order = c(1L,2L,3L,4L, 1L,2L,3L,4L,5L),
stringsAsFactors = FALSE
),
codelist = data.frame(
codelist_id = c("SEX","SEX","RACE","RACE","RACE"),
term = c("M","F","WHITE","BLACK","ASIAN"),
decoded_value = c("Male","Female","White","Black or African American","Asian"),
stringsAsFactors = FALSE
)
)
spec
#>
#> ── herald_spec ──
#>
#> • Datasets: 2
#> • Variables: 9
#> • Codelists: 2
#> Datasets: "DM" and "AE"The print() method gives a quick summary. Use
summary() for slot-level detail:
summary(spec)
#>
#> ── herald_spec summary ──
#>
#> study: 0 rows x 0 cols
#> ds_spec: 2 rows x 4 cols
#> var_spec: 9 rows x 6 cols
#> value_spec: NULL
#> codelist: 5 rows x 3 cols
#> dictionaries: NULL
#> methods: NULL
#> comments: NULL
#> documents: NULL
#> arm_displays: NULL
#> arm_results: NULLInspecting a spec
List datasets
spec_datasets(spec)
#> [1] "DM" "AE"Variable metadata for one dataset
spec_vars(spec, "AE")
#> dataset variable label data_type length order
#> 1 AE STUDYID Study Identifier text 12 1
#> 2 AE USUBJID Unique Subject Identifier text 11 2
#> 3 AE AESEQ Sequence Number of AE integer 8 3
#> 4 AE AETERM Reported Term for the Adverse Event text 200 4
#> 5 AE AESTDTC Start Date/Time of AE text 19 5Codelist entries
spec_codelist(spec, "SEX")
#> codelist_id term decoded_value
#> 1 SEX M Male
#> 2 SEX F FemaleStudy slot
# (no study slot in this example — returns NULL)
spec_study(spec, "protocol")
#> NULLSlot access: @ vs $
herald_spec is an S7 object. Both @ and
$ access slots, but they behave differently in the IDE:
spec$ds_spec # works, but no IDE autocomplete
#> dataset label keys structure
#> 1 DM Demographics STUDYID, USUBJID One record per subject
#> 2 AE Adverse Events STUDYID, USUBJID, AESEQ One record per subject per AE
spec@ds_spec # works AND triggers autocomplete — use this
#> dataset label keys structure
#> 1 DM Demographics STUDYID, USUBJID One record per subject
#> 2 AE Adverse Events STUDYID, USUBJID, AESEQ One record per subject per AE
spec$codelist
#> codelist_id term decoded_value
#> 1 SEX M Male
#> 2 SEX F Female
#> 3 RACE WHITE White
#> 4 RACE BLACK Black or African American
#> 5 RACE ASIAN Asian
spec@codelist
#> codelist_id term decoded_value
#> 1 SEX M Male
#> 2 SEX F Female
#> 3 RACE WHITE White
#> 4 RACE BLACK Black or African American
#> 5 RACE ASIAN AsianTip: Use
@for slot access in scripts and the console. IDEs (RStudio, Positron) autocomplete@slot names;$bypasses the autocomplete mechanism for S7 objects.
Reading specs from files
Pinnacle 21 Excel (real-world workflow)
# Reads all tabs automatically — ds_spec, var_spec, value_spec, codelist, etc.
spec <- read_spec("path/to/specification.xlsx")read_spec() detects the file type from the extension:
.xlsx triggers the P21 Excel parser, .xml
triggers the Define-XML parser, .json triggers the herald
JSON parser.
Define-XML round-trip
Generate Define-XML from a spec, then read it back:
if (requireNamespace("xml2", quietly = TRUE)) {
xml_path <- tempfile(fileext = ".xml")
write_define_xml(spec, xml_path, validate = FALSE)
spec2 <- read_spec_define(xml_path)
# Variable metadata is preserved
nrow(spec2$var_spec)
spec2$ds_spec$label
}
#> [1] "Demographics" "Adverse Events"Herald JSON round-trip
JSON is ideal for version control — store your spec alongside your code.
if (requireNamespace("jsonlite", quietly = TRUE)) {
json_path <- tempfile(fileext = ".json")
write_spec(spec, json_path)
spec3 <- read_spec(json_path)
identical(spec3$var_spec$variable, spec$var_spec$variable)
identical(spec3$codelist$term, spec$codelist$term)
}
#> [1] TRUEValidating the spec itself
validate_spec() checks the spec structure before you
touch any data. It runs the DD-prefix rules (Define-XML conformance
rules) against the spec.
# Introduce a deliberate error: variable with no label
bad_spec <- herald_spec(
ds_spec = data.frame(dataset = "DM", label = "Demographics",
stringsAsFactors = FALSE),
var_spec = data.frame(
dataset = "DM", variable = "AGE", label = NA_character_,
data_type = "integer", length = 8L,
stringsAsFactors = FALSE
)
)
result <- validate_spec(bad_spec)
result
#>
#> ── herald validation ──
#>
#> Datasets checked: 3
#> ℹ Spec checks only -- no conformance rules evaluated
#> Findings: 0 reject, 6 high, 0 medium, 0 low
#>
#> ── Reject / High Impact
#> ✖ [DD0006] [High] datasets.dataset: Dataset name is missing. Each row in the Datasets sheet must have a dataset name.
#> ✖ [DD0006] [High] variables.dataset: Dataset name is missing. Each row in the Datasets sheet must have a dataset name.
#> ✖ [DD0007] [High] datasets.label: Dataset label is missing. Description is required for all ItemGroupDef in regulatory submissions.
#> ✖ [DD0021] [High] variables.variable: Variable name is missing. Each row in the Variables sheet must have a variable name.
#> ✖ [DD0022] [High] datasets.label: Variable label is missing. Description is required for all ItemDef corresponding to Variable definitions in regulatory submissions.
#> ✖ [DD0028] [High] variables.data_type: Text variable length exceeds 200 characters. SAS v5 Transport files restrict variable lengths to 200 characters.
result$findings
#> rule_id impact dataset variable row value expected
#> 1 DD0006 High datasets dataset 1 DM <NA>
#> 2 DD0006 High variables dataset 1 DM <NA>
#> 3 DD0007 High datasets label 1 Demographics <NA>
#> 4 DD0021 High variables variable 1 AGE <NA>
#> 5 DD0022 High datasets label 1 Demographics <NA>
#> 6 DD0028 High variables data_type 1 integer <NA>
#> message
#> 1 Dataset name is missing. Each row in the Datasets sheet must have a dataset name.
#> 2 Dataset name is missing. Each row in the Datasets sheet must have a dataset name.
#> 3 Dataset label is missing. Description is required for all ItemGroupDef in regulatory submissions.
#> 4 Variable name is missing. Each row in the Variables sheet must have a variable name.
#> 5 Variable label is missing. Description is required for all ItemDef corresponding to Variable definitions in regulatory submissions.
#> 6 Text variable length exceeds 200 characters. SAS v5 Transport files restrict variable lengths to 200 characters.Before vs After
| Task | metacore | herald |
|---|---|---|
| Create spec object |
metacore::metacore(ds_spec, var_spec, value_spec, ...)
— 6+ separate data frames with strict S4 class requirements |
herald_spec(ds_spec, var_spec, ...) — plain data
frames, no S4 ceremony |
| Read P21 Excel | metacore::spec_to_metacore("spec.xlsx") |
read_spec("spec.xlsx") |
| Access variable labels | metacore$var_spec %>% filter(dataset == "DM") %>% pull(label) |
spec_vars(spec, "DM")$label |
| Check codelist | metacore$codelist %>% filter(code_id == "SEX") |
spec_codelist(spec, "SEX") |
| Write to JSON | (not available) | write_spec(spec, "spec.json") |
| Validate spec | (not available) | validate_spec(spec) |
What to read next
-
vignette("herald")— 5-minute end-to-end workflow -
vignette("metadata-helpers")— apply_spec() and individual operations -
vignette("define-xml")— Define-XML 2.1 generation in depth -
vignette("validation")— dataset conformance checking