Serialize a data frame to the newline-delimited variant of CDISC
Dataset-JSON v1.1 (.ndjson): line 1 carries the complete metadata block,
every following line one row array. The streaming end of the artoo
workflow (spec -> apply_spec -> write_ndjson) for datasets too large for
the array-form .json file; a thin wrapper over write_dataset() with
format = "ndjson".
Usage
write_ndjson(
x,
path,
on_invalid = c("error", "replace", "ignore"),
created = NULL,
strict = FALSE
)Arguments
- x
The dataset to write.
<data.frame>: required. Typically the output ofapply_spec(), carryingartoo_meta.- path
Destination
.ndjsonpath.<character(1)>: required. A.ndjson.gzpath writes gzip-compressed bytes.- on_invalid
Policy for values that are not valid UTF-8.
<character(1)>: default "error". One of"error"(abort withartoo_error_codec),"replace"(substitute?and warn withartoo_warning_encoding), or"ignore"(drop the invalid bytes). Seewrite_json()for when this fires.- created
Creation timestamp.
<POSIXct(1)> | NULL.NULL(default) stamps the current time intodatasetJSONCreationDateTime; freeze it for byte-stable output.- strict
Suppress the
_artooextension block.<logical(1)>: default FALSE. Seewrite_json(): the same extension semantics apply to the metadata line.
Details
Bounded memory, both directions. The writer streams slabs of
per-column JSON literals and read_ndjson() parses slab-sized line
batches, so a multi-million-row dataset never materializes a whole rows
array the way the .json codec must. A .ndjson.gz path gzips the stream
transparently.
See also
read_ndjson() for the inverse; write_json() for the
array-form file; write_dataset() for the generic dispatcher.
Examples
spec <- artoo_spec(cdisc_adam_datasets, cdisc_adam_variables, codelists = cdisc_codelists)
# ---- Example 1: write a conformed dataset as NDJSON ----
#
# apply_spec() attaches the metadata; write_ndjson() streams the metadata
# line and one row per line.
adsl <- apply_spec(cdisc_adsl, spec, "ADSL", conformance = "off")
path <- tempfile(fileext = ".ndjson")
write_ndjson(adsl, path)
readLines(path, n = 2)[2]
#> [1] "[\"CDISCPILOT01\",\"01-701-1015\",\"1015\",\"701\",\"701\",\"Placebo\",\"Placebo\",0,\"Placebo\",0,19725,19906,182,0,0,63,\"<65\",1,\"YEARS\",\"WHITE\",1,\"F\",\"HISPANIC OR LATINO\",\"Y\",\"Y\",\"Y\",\"Y\",\"Y\",\"Y\",null,null,null,25.100000000000001,\"25-<30\",147.30000000000001,54.399999999999999,16,18382,43.899999999999999,\">=12\",19718,\"2014-01-02\",\"2014-07-02\",12,19906,\"COMPLETED\",\"Completed\",23]"
# ---- Example 2: gzip the stream via the file extension ----
#
# A .ndjson.gz path compresses transparently; read_ndjson() inflates it.
gz <- tempfile(fileext = ".ndjson.gz")
write_ndjson(adsl, gz)
nrow(read_ndjson(gz))
#> [1] 60