herald reads and writes SAS V5 and V8 transport (XPT) files entirely
in base R — no SAS license, no haven, no compiled code. The binary
format is implemented directly with
readBin()/writeBin(), making every byte
auditable. This matters in regulated environments where code provenance
is a GxP requirement.
Writing XPT files
write_xpt() takes a data frame, a file path, and
optional metadata:
dm <- data.frame(
STUDYID = rep("CDISCPILOT01", 3L),
USUBJID = c("01-701-1015", "01-701-1023", "01-701-1028"),
AGE = c(63L, 64L, 71L),
SEX = c("F", "M", "M"),
stringsAsFactors = FALSE
)
xpt_path <- file.path(tempdir(), "dm.xpt")
write_xpt(dm, xpt_path, label = "Demographics")
file.info(xpt_path)$size # always a multiple of 80 bytes
#> [1] 1440write_xpt() returns the input data frame invisibly,
enabling pipes:
dm |> write_xpt("dm.xpt") |> write_json("dm.json")V5 vs V8 transport format
| Feature | V5 (default) | V8 |
|---|---|---|
| Variable name length | 8 characters | 32 characters |
| Variable label length | 40 characters | 256 characters |
| Dataset label length | 40 characters | 256 characters |
| FDA submission requirement | ✓ Required | Not yet accepted |
| Long ADaM names (PARAMCD etc.) | ✓ Fine (≤8 chars) | Required for longer names |
# V8 for extended variable names
long_df <- data.frame(LONGVARNAME01 = c(1.5, 2.0))
xpt_v8 <- tempfile(fileext = ".xpt")
write_xpt(long_df, xpt_v8, version = 8L)Setting metadata before writing
The cleanest approach is to set metadata explicitly before writing. herald’s metadata helpers use tidy evaluation for concise syntax:
dm2 <- dm
# Set variable labels
dm2 <- set_label(dm2,
STUDYID = "Study Identifier",
USUBJID = "Unique Subject Identifier",
AGE = "Age",
SEX = "Sex"
)
# Set SAS display formats
dm2 <- set_format(dm2, AGE = "8.")
# Set SAS storage lengths (auto-computed if omitted)
dm2 <- set_length(dm2, STUDYID = 12L, USUBJID = 11L, AGE = 8L, SEX = 1L)
# Set dataset-level label
dm2 <- set_dataset_label(dm2, "Demographics")
# Inspect what was set
get_metadata(dm2)
#> variable label format informat length type
#> 1 STUDYID Study Identifier <NA> <NA> 12 character
#> 2 USUBJID Unique Subject Identifier <NA> <NA> 11 character
#> 3 AGE Age 8. <NA> 8 numeric
#> 4 SEX Sex <NA> <NA> 1 characterNow write_xpt() reads all these attributes
automatically:
Reading XPT files
read_xpt() returns a data frame with all metadata
preserved as attributes:
Date and datetime columns
SAS stores dates as days since 1960-01-01 and datetimes as seconds since 1960-01-01. herald converts automatically in both directions.
events <- data.frame(
STUDYID = "CDISCPILOT01",
USUBJID = "01-701-1015",
DT = as.Date("2014-03-15"),
DTM = as.POSIXct("2014-03-15 08:30:00", tz = "UTC"),
stringsAsFactors = FALSE
)
xpt_dt <- tempfile(fileext = ".xpt")
write_xpt(events, xpt_dt, dataset = "EVENTS")
events2 <- read_xpt(xpt_dt)
# Dates round-trip exactly
identical(events$DT, events2$DT)
#> [1] FALSE
# POSIXct round-trips (timezone may normalize to UTC)
as.numeric(events$DTM) == as.numeric(events2$DTM)
#> [1] TRUETo store ISO 8601 character dates (common in SDTM — AESTDTC, RFSTDTC): leave them as character columns. herald does not coerce character strings.
Character encoding
herald supports all SAS encoding identifiers. The default
"wlatin1" is correct for FDA SDTM and ADaM submissions.
| Encoding | encoding = |
When to use |
|---|---|---|
| Western Latin-1 |
"wlatin1" (default) |
FDA SDTM / ADaM |
| Latin-1 | "latin1" |
European studies |
| UTF-8 | "utf-8" |
Unicode content |
| Shift-JIS | "shift-jis" |
PMDA Japanese submissions |
| EUC-JP | "euc-jp" |
Legacy Japanese |
# PMDA submission with Japanese site names
write_xpt(dm, "dm.xpt", encoding = "shift-jis")Round-trip fidelity
dm_full <- set_label(dm,
STUDYID = "Study Identifier",
USUBJID = "Unique Subject Identifier",
AGE = "Age",
SEX = "Sex"
)
dm_full <- set_dataset_label(dm_full, "Demographics")
xpt_rt <- file.path(tempdir(), "dm_rt.xpt")
write_xpt(dm_full, xpt_rt)
dm_rt <- read_xpt(xpt_rt)
# Data values are identical
identical(dm_rt$STUDYID, dm_full$STUDYID)
#> [1] TRUE
identical(dm_rt$AGE, dm_full$AGE)
#> [1] FALSE
# Labels round-trip
attr(dm_rt$STUDYID, "label") == attr(dm_full$STUDYID, "label")
#> [1] TRUE
attr(dm_rt, "label") == attr(dm_full, "label")
#> [1] TRUEBefore vs After
| Feature | haven | herald |
|---|---|---|
| Pure R (no compiled C) | No | Yes |
| Auto-compute lengths | No — you must set | Yes — computes from data |
| Dataset-level label | No |
label = parameter |
| Sort by key variables | No | Reads herald.sort_keys attr |
| Return value | invisible(file) |
invisible(x) — pipeable |
| V8 support | Yes | Yes |
| Date/datetime | Partial | Full round-trip |
| Factor columns | Silently converts | Errors loudly — no surprises |
| Encoding map | Limited | Full SAS encoding table |
What to read next
-
vignette("dataset-json")— Dataset-JSON v1.1 as an alternative to XPT -
vignette("metadata-helpers")—apply_spec()applies all metadata in one call -
vignette("submission-workflow")—submit()callswrite_xpt()automatically
