Getting Started with herald

herald is a single R package for FDA, PMDA, and EMA regulatory submissions. It replaces metacore + xportr + Pinnacle 21 with one coherent tool: read spec → apply metadata → write XPT or Dataset-JSON → generate Define-XML → validate → package. Pure R, no SAS, no Java, no compiled code.

What herald replaces

Task	Old way	herald
Spec object	`metacore::metacore()`	`herald_spec()`
Set variable labels	`xportr::xportr_label()`	`apply_spec()` or `set_label()`
Coerce types to spec	`xportr::xportr_type()`	`apply_spec()` or `coerce_types()`
Write XPT	`xportr::xportr_write()`	`write_xpt()`
Generate Define-XML	Pinnacle 21 Enterprise (GUI)	`write_define_xml()`
Validate against FDA rules	Pinnacle 21 Community (Java)	`validate()`
Package submission	Manual SOP, 5+ packages	`submit()`

Installation

# From GitHub (pre-CRAN)
pak::pak("vthanik/herald")

The 5-minute workflow

Build a small SDTM DM dataset and take it all the way from raw data to a validated, FDA-compliant XPT in five steps.

Step 1 — Define your spec

A herald_spec is the single source of truth for your submission. It holds dataset definitions, variable metadata, codelists, and study information.

spec <- herald_spec(
  ds_spec = data.frame(
    dataset = "DM",
    label   = "Demographics",
    keys    = "STUDYID, USUBJID",
    stringsAsFactors = FALSE
  ),
  var_spec = data.frame(
    dataset   = c("DM", "DM", "DM", "DM", "DM"),
    variable  = c("STUDYID", "USUBJID", "AGE", "SEX", "RACE"),
    label     = c("Study Identifier", "Unique Subject Identifier",
                  "Age", "Sex", "Race"),
    data_type = c("text", "text", "integer", "text", "text"),
    length    = c(12L, 11L, 8L, 1L, 200L),
    order     = c(1L, 2L, 3L, 4L, 5L),
    stringsAsFactors = FALSE
  )
)

spec
#> 
#> ── herald_spec ──
#> 
#> • Dataset: 1
#> • Variables: 5
#> Datasets: "DM"

Step 2 — Build your data

dm <- data.frame(
  STUDYID = rep("CDISCPILOT01", 5L),
  USUBJID = c("01-701-1015", "01-701-1023", "01-701-1028",
              "01-701-1033", "01-701-1034"),
  AGE     = c(63L, 64L, 71L, 74L, 77L),
  SEX     = c("F", "M", "M", "F", "F"),
  RACE    = rep("WHITE", 5L),
  stringsAsFactors = FALSE
)

Step 3 — Apply the spec

apply_spec() does six things in one call: scaffold missing variables, drop unspecified columns, coerce types, set all labels and formats, order columns per spec position, and sort rows by key variables.

dm <- suppressMessages(apply_spec(dm, spec, "DM"))

# Labels are now set on every column
attr(dm$AGE, "label")
#> [1] "Age"
attr(dm,     "label")       # dataset label
#> [1] "Demographics"
attr(dm,     "herald.sort_keys")
#> [1] "STUDYID" "USUBJID"

Step 4 — Write XPT

write_xpt() reads the attributes set by apply_spec() and writes a valid SAS V5 transport file. It returns the data frame invisibly.

tmp_dir  <- file.path(tempdir(), "herald_vignette_gs")
dir.create(tmp_dir, showWarnings = FALSE)
xpt_path <- file.path(tmp_dir, "dm.xpt")

write_xpt(dm, xpt_path)

# Verify round-trip
dm2 <- read_xpt(xpt_path)
attr(dm2$AGE,  "label")    # "Age"
#> NULL
attr(dm2,      "label")    # "Demographics"
#> [1] "Demographics"
nrow(dm2)
#> [1] 5

Step 5 — Validate

validate() runs two passes: built-in spec conformance checks, then optional CDISC CORE / FDA / PMDA rule sets.

result <- validate(tmp_dir, spec = spec, rules = NULL)
result
#> 
#> ── herald validation ──
#> 
#> Datasets checked: 1
#> ℹ Spec checks only -- no conformance rules evaluated
#> Findings: 0 reject, 0 high, 5 medium, 0 low
result$summary
#> $reject
#> [1] 0
#> 
#> $high
#> [1] 0
#> 
#> $medium
#> [1] 5
#> 
#> $low
#> [1] 0
#> 
#> $total
#> [1] 5

No errors, no findings — the dataset conforms to the spec.

The herald layer cake

herald is organized into six layers, each building on the one below:

Layer 5  submit(path)         — packaging: manifest, reports, define
Layer 4  validate(path)       — conformance: spec + FDA/PMDA/CDISC rules
Layer 3  write_define_xml()   — Define-XML 2.1 generation
Layer 2  apply_spec()         — metadata: labels, types, order, sort
Layer 1  herald_spec()        — spec: datasets, variables, codelists
Layer 0  read_xpt / write_xpt — XPT and Dataset-JSON I/O

You can enter at any layer. If you have a spec, apply_spec() handles all of Layer 2. If you just need XPT I/O, use Layer 0 directly.