Read a newline-delimited CDISC Dataset-JSON v1.1 (.ndjson) file back to
a data frame, restoring the full artoo_meta from its metadata line and
realizing SAS date/datetime/time variables to R Date / POSIXct /
hms::hms. Rows are parsed in bounded slabs, and n_max stops the
line loop early, so a partial read of a huge file never parses the tail.
A thin wrapper over read_dataset() with format = "ndjson".
Arguments
- path
Source
.ndjsonpath.<character(1)>: required. A gzip stream (.ndjson.gz) is inflated transparently. A file whose first line is not the Dataset-JSON metadata object aborts withartoo_error_codec.- col_select
Variables to read.
<character> | NULL.NULL(default) reads every column; otherwise a vector of variable names. Columns return in file order (not the requested order) and theartoo_metais filtered to match. Works on every format: parquet narrows columns natively, the rest filter after decode.Note: an unknown name is a
artoo_error_input, never a silent drop.- n_max
Maximum records to read.
<numeric(1)>: default Inf. Caps the row count; the returnedartoo_metareports the rows actually read. xpt v8 bounds the disk read; the other formats cap after decode.- encoding
Source charset of the file bytes.
<character(1)> | NULL.NULL(default) reads UTF-8, as Dataset-JSON requires. Pass an IANA or SAS charset name (e.g."windows-1252") only to read a non-conformant file a producer wrote in that charset; each line is transcoded to UTF-8 on read, preserving the boundedn_maxstreaming.
Value
A <data.frame> carrying artoo_meta (read it with
get_meta()).
See also
write_ndjson() for the inverse; read_json() for the
array-form file; read_dataset() for the generic dispatcher.
Examples
spec <- artoo_spec(cdisc_adam_datasets, cdisc_adam_variables, codelists = cdisc_codelists)
# ---- Example 1: round-trip a conformed dataset through NDJSON ----
#
# The variable labels, types, and keys survive the round-trip.
adsl <- apply_spec(cdisc_adsl, spec, "ADSL", conformance = "off")
path <- tempfile(fileext = ".ndjson")
write_ndjson(adsl, path)
back <- read_ndjson(path)
identical(get_meta(back)@columns, get_meta(adsl)@columns)
#> [1] TRUE
# ---- Example 2: a bounded partial read of the first rows ----
#
# n_max stops the line loop as soon as enough rows are in.
head_rows <- read_ndjson(path, n_max = 5)
get_meta(head_rows)@dataset$records
#> [1] 5