Read an Apache Parquet (.parquet) file back to a data frame, restoring the
artoo_meta from its metadata_json sidecar and realizing SAS
date/datetime/time variables to R Date / POSIXct / hms::hms. A
parquet written by another tool (with no artoo sidecar) reads back as a
bare frame. A thin wrapper over read_dataset() with format = "parquet".
Requires the lightweight nanoparquet package.
Arguments
- path
Source
.parquetpath.<character(1)>: required.- col_select
Variables to read.
<character> | NULL.NULL(default) reads every column; otherwise a vector of variable names. Columns return in file order (not the requested order) and theartoo_metais filtered to match. Works on every format: parquet narrows columns natively, the rest filter after decode.Note: an unknown name is a
artoo_error_input, never a silent drop.- n_max
Maximum records to read.
<numeric(1)>: default Inf. Caps the row count; the returnedartoo_metareports the rows actually read. xpt v8 bounds the disk read; the other formats cap after decode.- encoding
Source charset of the string columns.
<character(1)> | NULL.NULL(default) reads the UTF-8 bytes parquet stores. Pass a charset name only to read a foreign file whose string columns hold that charset's bytes; they are transcoded to UTF-8 on read.Tip: any SAS or IANA spelling listed by
artoo_encodings()is accepted.
Value
A <data.frame> carrying artoo_meta when the file recorded it
(read it with get_meta()); otherwise a plain data frame.
See also
write_parquet() for the inverse; read_dataset() for the
generic dispatcher.
Examples
spec <- artoo_spec(cdisc_adam_datasets, cdisc_adam_variables, codelists = cdisc_codelists)
# ---- Example 1: round-trip a conformed dataset through Parquet ----
#
# The variable labels, types, and keys survive the round-trip.
adsl <- apply_spec(cdisc_adsl, spec, "ADSL", conformance = "off")
path <- tempfile(fileext = ".parquet")
write_parquet(adsl, path)
back <- read_parquet(path)
get_meta(back)@columns$STUDYID$label
#> [1] "Study Identifier"
# ---- Example 2: the metadata names the dataset and row count ----
#
# The restored artoo_meta exposes the dataset-level attributes.
get_meta(back)@dataset$records
#> [1] 60