public class Parquet
extends java.lang.Object
Read/write parquet files. Uses the standard hadoop parquet library. One aspect that may be confusing is that when writing files the parquet system decides when to end the record batch so a single dataset may end up as a single parquet file with many record batches.
Note that in the requiring dependencies I remove slf4j. tmd comes with logback-classic by default which is less featureful but far less of a security disaster than slf4j. If you have a setup that already uses slf4j then you should exclude logback-classic from tmd’s dependencies.
You must disable debug logging else the parquet system is unreasonably slow. See logging section of parquet namespace.
Required dependencies:
org.apache.parquet/parquet-hadoop {:mvn/version "1.12.0"
:exclusions [org.slf4j/slf4j-log4j12]}
org.apache.hadoop/hadoop-common {:mvn/version "3.3.0"
:exclusions [org.slf4j/slf4j-log4j12]}
;; We literally need this for 1 POJO formatting object.
org.apache.hadoop/hadoop-mapreduce-client-core {:mvn/version "3.3.0"
:exclusions [org.slf4j/slf4j-log4j12]}
Modifier and Type | Method and Description |
---|---|
static void |
datasetSeqToParquet(java.lang.Iterable dsSeq,
java.lang.String path,
java.lang.Object options) |
static void |
datasetToParquet(java.lang.Object ds,
java.lang.String path,
java.lang.Object options) |
static java.lang.Iterable |
parquetMetadata(java.lang.String path) |
static java.util.Map |
parquetToDataset(java.lang.String path,
java.lang.Object options) |
static java.lang.Iterable |
parquetToDatasetSeq(java.lang.String path,
java.lang.Object options) |
public static java.lang.Iterable parquetMetadata(java.lang.String path)
public static java.util.Map parquetToDataset(java.lang.String path, java.lang.Object options)
public static java.lang.Iterable parquetToDatasetSeq(java.lang.String path, java.lang.Object options)
public static void datasetToParquet(java.lang.Object ds, java.lang.String path, java.lang.Object options)
public static void datasetSeqToParquet(java.lang.Iterable dsSeq, java.lang.String path, java.lang.Object options)