artoo 0.1.1
CRAN release: 2026-06-24
Initial CRAN release. artoo is a lightweight, lossless, CDISC-native reader and writer for clinical-trial datasets, built around one canonical metadata model (artoo_meta) so that conversion between any two supported formats is lossless by construction. Pure R and lightweight, with no external SAS or Java runtime.
Formats
Reads and writes SAS XPORT (v5 and v8), CDISC Dataset-JSON v1.1, NDJSON, Apache Parquet, and RDS. Every codec carries the full
artoo_meta— labels, CDISC data types, lengths, SAS display formats, controlled-terminology references, and sort keys — so any-to-any conversion preserves the complete metadata. For Parquet the metadata rides as ametadata_jsonsidecar; a file written by another tool with no sidecar degrades gracefully to a bare frame rather than an error.Generic
read_dataset()/write_dataset()dispatch on the file extension, withread_xpt()/write_xpt()and the matchingread_json(),read_ndjson(),read_parquet(), andread_rds()pairs as direct entry points. Cross-cuttingencoding,checks, andcreatedarguments flow through....Partial reads (
col_select,n_max) on every reader; gzip-transparent JSON and NDJSON; multi-member SAS XPORT libraries viaxpt_members()plusread_xpt(member = ).Numeric fidelity is exact end to end: a
decimalvalue is exchanged as a string at IEEE round-trip precision,integervalues beyond R’s 32-bit range stay numeric rather than overflowing, andNaN/ infinite values are rejected as invalid CDISC numerics. Rows are sorted in C-locale (byte) order, so a written file is deterministic across locales and matches SASPROC SORTfor ASCII keys.Encodings follow the IANA and SAS standards: the readers and writers accept a charset name in either the SAS or R spelling (see
artoo_encodings()), character columns are transcoded to UTF-8 and NFC-normalized on read, and theon_invalid = c("error", "replace", "ignore")policy governs invalid bytes uniformly across every writer.
Specifications
artoo_spec()builds the canonical metadata model from a Pinnacle 21 Excel workbook, a Define-XML 2.0 / 2.1 file, or a native artoo JSON spec.read_spec()/write_spec()dispatch on the file extension:.xlsxwrites a Pinnacle 21 workbook (Define-XML to P21 is one composition), and the native JSON form is the lossless interchange that round-trips a spec identically.The spec is single-standard by construction:
@standardis resolved once from the explicit argument or the source, and study-level fields are canonicalized to the CDISC ODM vocabulary. Accessors includespec_standard(),spec_variables(),spec_codelists(),spec_methods(), andspec_comments().set_type()returns a spec with one or more variables retyped through the CDISC vocabulary;repair_spec()retypes every variable acheck_spec()run flags as fractional or out-of-range under anintegerdata type, so a frame the original spec would refuse coerces after one call.
Conform and check
apply_spec(x, spec, dataset, conformance = , na_position = )coerces each column to its CDISC data type, orders the columns and sorts the rows by the spec’s keys, and stamps theartoo_meta.extra = c("keep", "drop")controls whether undeclared columns survive;on_coercion_loss = c("error", "keep")governs a coercion that would lose data. The pipeline never silently fabricates or drops a column: an undeclared column is reported and kept, a declared-but-absent column is reported and left absent.check_spec()validates a data frame against its spec across conformance dimensions toggled byartoo_checks();check_study()runs it over a whole study and returns one stacked findings frame;conformance()reads the findings back off a stamped frame.validate_spec()checks a spec for internal consistency against a bundled rule catalog, with no external dependency.decode_column()translates coded values to or from their codelist decodes;sync_meta()reconciles a stamped frame’s metadata after manual edits.
Inspect
-
members()is the format-neutral inventory of the dataset(s) a path holds, one row per dataset, dispatched through the codec registry.columns()is the SAS PROC CONTENTS / Universal Viewer variable pane over a stamped frame, a plain data frame, or a file path.get_meta()/set_meta()read and attach theartoo_meta.
Errors
- Every condition artoo raises carries a three-level class chain —
artoo_<severity>_<kind>,artoo_<severity>,artoo_condition— so a handler can catch a specific kind, a whole severity, or every artoo condition. The data-protection conditions attach their evidence as data (cnd$variables,cnd$findings) for programmatic inspection.
Data
- Bundled demo specs
adam_spec(ADaMIG 1.1) andsdtm_spec(SDTMIG 3.1.2), built reproducibly from the official CDISC Define-XML 2.1 release examples and shipped also as Pinnacle 21 workbooks underinst/extdata/. Demo datasets come from the PHUSE Test Data Factory; the constructor tablescdisc_adam_datasets/cdisc_adam_variables,cdisc_sdtm_datasets/cdisc_sdtm_variables, and the sharedcdisc_codelistsbuild a spec by hand. Every bundled dataset conforms to its bundled spec, gated at build and test time.
Documentation
- An introductory
vignette("artoo")plus task-oriented web articles (specifications; conform and validate; formats and lossless conversion; recipes), and a pkgdown reference site.