Skip to contents

Reads dataset files from a path and checks them against a specification and/or conformance rules. Returns a herald_validation object with structured findings.

Usage

validate(
  path = NULL,
  datasets = NULL,
  format = "xpt",
  spec = NULL,
  config = NULL,
  rules = NULL,
  standard = NULL,
  version = NULL,
  define_xml = NULL,
  ignore_spec_checks = NULL,
  files = NULL,
  ct_path = NULL
)

Arguments

path

Character. Path to a directory containing dataset files, or to a single .xpt or .json file. Ignored when files is provided.

datasets

Character vector of dataset names to validate, e.g. c("DM", "AE"). Case-insensitive. NULL (default) validates all files matching format in path. Ignored when path is a single file.

format

"xpt" (default) or "json". File type to read from path when it is a directory.

spec

A herald_spec object, a path to a spec file (.xlsx, .xml, .json), or NULL.

config

Optional. A herald-rules submission config identifier string (e.g., "fda-sdtm-ig-3.3", "pmda-adam-ig-1.1"). When provided, loads the pre-built rule profile from the bundled herald-rules rules. Takes precedence over rules. When NULL, auto-selected from standard + version (defaults to FDA authority) if a matching bundled config exists.

rules

Optional. A character shortcut ("fda", "pmda", "core", "all"), or a list of herald_rule objects. Used when config is not provided and auto-selection finds no match.

standard

Character. CDISC standard: "sdtmig", "adamig", or "sendig". When spec is provided, read from the standard column of the dataset sheet. When no spec is given this parameter is required for anchor auto-detection. When both are absent, anchor detection is skipped.

version

Character. Standard version, e.g. "3.4" for SDTMIG 3.4 or "1.1" for ADaMIG 1.1. When spec is provided and contains a version in the standard column (e.g. "SDTMIG 3.3"), extracted automatically.

define_xml

Character. Path to a Define-XML 2.1 file. Stored in the output context; used for cross-checks in future releases.

ignore_spec_checks

Character vector of spec checks to skip. Default NULL runs all spec checks: "presence", "labels", "types", "lengths", "dataset_label", "codelist", "common". Example: ignore_spec_checks = c("lengths", "codelist") skips those two. Spec checks are silently skipped when spec = NULL.

files

Optional. Named list or character vector of explicit file paths to load. Allows selecting specific datasets from different directories.

  • Named list: list(ADAE = "/path/adae.xpt", ADSL = "/shared/adsl.xpt") — list names become dataset names.

  • Unnamed character vector: c("/path/adae.xpt", "/shared/adsl.xpt") — dataset names inferred from file basenames (uppercased, extension stripped).

When files is provided, path and datasets are ignored. Cross-dataset rules (anchor detection) fire when two or more files are loaded.

ct_path

Optional character. Path to a custom controlled terminology file (.xlsx or .csv, NCI EVS column layout). When provided, the custom CT is merged on top of the bundled CDISC CT for this call only. To register CT for the entire session use register_ct().

Value

A herald_validation object with:

findings

Data frame of issues.

summary

List with counts by impact level (reject, high, medium, low, total).

datasets_checked

Character vector of dataset names validated.

Controlled Terminology

Herald uses the CT package bundled with the installed version of the package (inst/rules/ct/). To update to a newer CT release, call fetch_ct() which downloads to the user cache. The validation report always shows which CT version and source (bundled / fetched) was used.

Rule IDs

HRL-VAR-001

Variable in spec but missing from data.

HRL-VAR-002

Variable in data but not in spec.

HRL-LBL-001

Variable label mismatch.

HRL-TYP-001

Variable type mismatch.

HRL-LEN-001

Character variable exceeds spec length.

HRL-DS-001

Dataset label mismatch.

HRL-CL-001

Variable value not found in spec codelist.

Examples

# Validate a minimal XPT written to a temp directory
tmp_dir <- tempdir()
xpt_path <- file.path(tmp_dir, "dm.xpt")
on.exit(unlink(xpt_path), add = TRUE)

dm <- data.frame(
  STUDYID = "STUDY001",
  USUBJID = "001-001",
  stringsAsFactors = FALSE
)
write_xpt(dm, xpt_path, dataset = "DM")
result <- validate(xpt_path)
result
#> 
#> ── herald validation ──
#> 
#> Datasets checked: 1
#>  Spec checks only -- no conformance rules evaluated
#> Findings: 0 reject, 0 high, 0 medium, 0 low