Skip to contents

Define-XML 2.1 is required for every FDA and PMDA submission that includes clinical datasets. It describes dataset structure, variable metadata, codelists, derivation methods, and analysis results in a machine-readable ODM 1.3 XML document. herald generates valid Define-XML 2.1 from a herald_spec with a single function call.

Building a multi-dataset spec

skip_xml <- !requireNamespace("xml2", quietly = TRUE)

spec <- herald_spec(
  ds_spec = data.frame(
    dataset   = c("DM", "AE"),
    label     = c("Demographics", "Adverse Events"),
    class     = c("SPECIAL PURPOSE", "EVENTS"),
    structure  = c("One record per subject",
                  "One record per subject per adverse event"),
    keys      = c("STUDYID, USUBJID", "STUDYID, USUBJID, AESEQ"),
    stringsAsFactors = FALSE
  ),
  var_spec = data.frame(
    dataset   = c("DM","DM","DM","DM", "AE","AE","AE","AE"),
    variable  = c("STUDYID","USUBJID","AGE","SEX",
                  "STUDYID","USUBJID","AESEQ","AETERM"),
    label     = c("Study Identifier","Unique Subject Identifier","Age","Sex",
                  "Study Identifier","Unique Subject Identifier",
                  "Sequence Number of AE","Reported Term for the Adverse Event"),
    data_type = c("text","text","integer","text",
                  "text","text","integer","text"),
    length    = c(12L,11L,8L,1L, 12L,11L,8L,200L),
    order     = c(1L,2L,3L,4L, 1L,2L,3L,4L),
    mandatory = c("Yes","Yes","No","No", "Yes","Yes","Yes","Yes"),
    origin    = c("Assigned","Assigned","CRF","CRF",
                  "Assigned","Assigned","Derived","CRF"),
    stringsAsFactors = FALSE
  ),
  codelist = data.frame(
    codelist_id   = c("SEX","SEX"),
    term          = c("M","F"),
    decoded_value = c("Male","Female"),
    stringsAsFactors = FALSE
  )
)

Generating Define-XML

xml_path <- tempfile(fileext = ".xml")

write_define_xml(spec, xml_path, validate = FALSE)

# Confirm the file was written
file.exists(xml_path)
#> [1] TRUE
file.info(xml_path)$size
#> [1] 4642

The generated file is valid ODM 1.3 XML with Define-XML 2.1 namespace extensions. It includes:

  • <Study><MetaDataVersion> root structure
  • <ItemGroupDef> for each dataset (with SASDatasetName, Label, Keys)
  • <ItemDef> for each variable (with Name, Label, DataType, Length, Origin)
  • <CodeList> for each controlled terminology codelist
  • <leaf> hrefs pointing to the data files

Peek at the XML

doc  <- xml2::read_xml(xml_path)
# Dataset names from ItemGroupDef elements
ns   <- c(d = "http://www.cdisc.org/ns/def/v2.1",
          o = "http://www.cdisc.org/ns/odm/v1.3")
igds <- xml2::xml_find_all(doc, ".//o:ItemGroupDef", ns = ns)
xml2::xml_attr(igds, "Name")
#> [1] "DM" "AE"

Rendering to HTML

write_define_html() produces a self-contained HTML document in the same format reviewers see in the CDISC Define Viewer — no external stylesheet or dependencies required.

html_path <- tempfile(fileext = ".html")

write_define_html(spec, html_path)
file.exists(html_path)
#> [1] TRUE

Reading Define-XML back

read_spec_define() parses an existing Define-XML 2.1 file back into a herald_spec. Use it for migration workflows (existing submissions) or to verify that write_define_xml() produced what you expect.

spec2 <- read_spec_define(xml_path)

spec2
#> 
#> ── herald_spec ──
#> 
#> Study: "UNKNOWN"
#> • Datasets: 2
#> • Variables: 8
#> • Codelist: 1
#> Datasets: "DM" and "AE"
spec2$ds_spec[, c("dataset", "label")]
#>   dataset          label
#> 1      DM   Demographics
#> 2      AE Adverse Events
spec2$var_spec[spec2$var_spec$dataset == "AE",
               c("variable", "label", "data_type")]
#>               variable                               label data_type
#> IT.AE.STUDYID  STUDYID                    Study Identifier      text
#> IT.AE.USUBJID  USUBJID           Unique Subject Identifier      text
#> IT.AE.AESEQ      AESEQ               Sequence Number of AE   integer
#> IT.AE.AETERM    AETERM Reported Term for the Adverse Event      text

What survives the round-trip

Field Preserved?
Dataset labels Yes
Variable labels Yes
Data types Yes
Lengths Yes
Origin Yes
Mandatory Yes
Key variables Yes
Codelists Yes
Methods (derivations) Yes, if written
ARM displays/results Yes, if written

Validating the spec against Define-XML rules

validate_spec_define() checks the spec object against Define-XML conformance rules (DD-prefix). This catches issues like missing required metadata before you submit.

result <- validate_spec_define(spec)
result$summary
#> $reject
#> [1] 0
#> 
#> $high
#> [1] 61
#> 
#> $medium
#> [1] 0
#> 
#> $low
#> [1] 0
#> 
#> $total
#> [1] 61

ADaM Analysis Results Metadata (ARM)

For ADaM submissions, herald supports the ARM 1.0 extension. Add arm_displays and arm_results slots to your spec:

adsl_spec <- herald_spec(
  ds_spec = data.frame(
    dataset = "ADSL", label = "Subject-Level Analysis Dataset",
    stringsAsFactors = FALSE
  ),
  var_spec = data.frame(
    dataset   = c("ADSL","ADSL","ADSL"),
    variable  = c("STUDYID","USUBJID","AGE"),
    label     = c("Study Identifier","Unique Subject Identifier","Age"),
    data_type = c("text","text","integer"),
    length    = c(12L,11L,8L),
    stringsAsFactors = FALSE
  ),
  arm_displays = data.frame(
    display_name        = "Table 14.1.1",
    display_description = "Summary of Demographics",
    display_title       = "Table 14.1.1 Summary of Demographic and Baseline Characteristics",
    stringsAsFactors    = FALSE
  ),
  arm_results = data.frame(
    display_name  = "Table 14.1.1",
    result_key    = "R.AGE.MEAN",
    parameter_oid = "ADSL.AGE",
    analysis_reason    = "PRIMARY OUTCOME MEASURE",
    analysis_purpose   = "Analysis",
    stringsAsFactors   = FALSE
  )
)

adsl_spec
#> 
#> ── herald_spec ──
#> 
#> • Dataset: 1
#> • Variables: 3
#> • ARM: 1 display, 1 result
#> Datasets: "ADSL"
arm_xml <- tempfile(fileext = ".xml")

write_define_xml(adsl_spec, arm_xml, validate = FALSE)
file.exists(arm_xml)
#> [1] TRUE

P21 Excel → Define-XML workflow

The typical production workflow reads a Pinnacle 21 Excel spec and generates Define-XML in one pipeline:

spec <- read_spec("path/to/study_spec.xlsx")
write_define_xml(spec, "sdtm/define.xml")
write_define_html(spec, "sdtm/define.html", define_xml = "sdtm/define.xml")

No GUI, no Java, no license — just one herald_spec object flowing through to a standards-compliant submission deliverable.

Before vs After

Task Old way herald
Generate Define-XML Pinnacle 21 Enterprise (GUI, license, Java) write_define_xml(spec, "define.xml")
Render to HTML P21 Enterprise or separate XSLT tool write_define_html(spec, "define.html")
Parse existing Define-XML Manual XML parsing or P21 read_spec_define("define.xml")
Validate Define-XML P21 Validator (Java) validate_spec_define("define.xml")
ARM 1.0 support P21 Enterprise only arm_displays + arm_results slots