Serialize a data frame to an Apache Parquet (.parquet) file, storing the
data natively while preserving the full artoo_meta as a CDISC-shaped
sidecar in the file's key-value metadata. The emit end of the artoo
workflow (spec -> apply_spec -> write_parquet); a thin wrapper over
write_dataset() with format = "parquet". Requires the lightweight
nanoparquet package.
Usage
write_parquet(
x,
path,
encoding = NULL,
on_invalid = c("error", "replace", "ignore"),
compression = "snappy"
)Arguments
- x
The dataset to write.
<data.frame>: required. Typically the output ofapply_spec(), carryingartoo_meta.- path
Destination
.parquetpath.<character(1)>: required.- encoding
Source charset to record.
<character(1)> | NULL. The parquet bytes are always written as UTF-8 (the format's STRING type is UTF-8 by spec);encodingonly records the data's original charset in theartoo_meta, so a laterwrite_xpt()can reproduce the source bytes.NULL(default) leaves the recorded encoding untouched.Tip: any SAS or IANA spelling listed by
artoo_encodings()is accepted.- on_invalid
Policy for values that are not valid UTF-8.
<character(1)>: default "error". One of"error"(abort withartoo_error_codec),"replace"(substitute?and warn withartoo_warning_encoding), or"ignore"(drop the invalid bytes). Seewrite_json()for when this fires; parquet STRING bytes are UTF-8 by spec, exactly like Dataset-JSON.- compression
Column compression codec.
<character(1)>: default "snappy". One of:"snappy"(default) — fast, the parquet ecosystem default."gzip"— smaller files, slower."zstd"— the best size/speed trade-off where supported."uncompressed"— raw pages.
Details
Metadata where plain Parquet has none. A bare nanoparquet/arrow file
drops labels, formats, and codelists; write_parquet() embeds the complete
artoo_meta as a single Dataset-JSON-shaped string under the
metadata_json key, so read_parquet() restores every CDISC attribute.
The same string is what a .json file or an rds carries, so conversion
between any two formats stays lossless. A reader without artoo still opens
the data and can see the metadata_json block.
See also
read_parquet() for the inverse; write_dataset() for the
generic dispatcher.
Examples
spec <- artoo_spec(cdisc_adam_datasets, cdisc_adam_variables, codelists = cdisc_codelists)
# ---- Example 1: write a conformed dataset to Parquet ----
#
# apply_spec() attaches the metadata; write_parquet() stores the data
# natively and the metadata as a CDISC-shaped sidecar.
adsl <- apply_spec(cdisc_adsl, spec, "ADSL", conformance = "off")
path <- tempfile(fileext = ".parquet")
write_parquet(adsl, path)
# ---- Example 2: round-trip and confirm the metadata survived ----
#
# Reading it back yields an identical artoo_meta.
back <- read_parquet(path)
identical(get_meta(back)@columns, get_meta(adsl)@columns)
#> [1] TRUE