libcudf  24.02.00
Public Member Functions | Public Attributes | List of all members
cudf::io::parquet_chunked_writer Class Reference

chunked parquet writer class to handle options and write tables in chunks. More...

#include <parquet.hpp>

Public Member Functions

 parquet_chunked_writer ()=default
 Default constructor, this should never be used. This is added just to satisfy cython.
 
 parquet_chunked_writer (chunked_parquet_writer_options const &options, rmm::cuda_stream_view stream=cudf::get_default_stream())
 Constructor with chunked writer options. More...
 
parquet_chunked_writerwrite (table_view const &table, std::vector< partition_info > const &partitions={})
 Writes table to output. More...
 
std::unique_ptr< std::vector< uint8_t > > close (std::vector< std::string > const &column_chunks_file_paths={})
 Finishes the chunked/streamed write process. More...
 

Public Attributes

std::unique_ptr< parquet::detail::writer > writer
 Unique pointer to impl writer class.
 

Detailed Description

chunked parquet writer class to handle options and write tables in chunks.

The intent of the parquet_chunked_writer is to allow writing of an arbitrarily large / arbitrary number of rows to a parquet file in multiple passes.

The following code snippet demonstrates how to write a single parquet file containing one logical table by writing a series of individual cudf::tables.

auto destination = cudf::io::sink_info("dataset.parquet");
auto options = cudf::io::chunked_parquet_writer_options::builder(destination, table->view());
writer.write(table0)
writer.write(table1)
writer.close()
static chunked_parquet_writer_options_builder builder(sink_info const &sink)
creates builder to build chunked_parquet_writer_options.
chunked parquet writer class to handle options and write tables in chunks.
Definition: parquet.hpp:1777
std::unique_ptr< parquet::detail::writer > writer
Unique pointer to impl writer class.
Definition: parquet.hpp:1820
Destination information for write interfaces.
Definition: io/types.hpp:469

Definition at line 1777 of file parquet.hpp.

Constructor & Destructor Documentation

◆ parquet_chunked_writer()

cudf::io::parquet_chunked_writer::parquet_chunked_writer ( chunked_parquet_writer_options const &  options,
rmm::cuda_stream_view  stream = cudf::get_default_stream() 
)

Constructor with chunked writer options.

Parameters
[in]optionsoptions used to write table
[in]streamCUDA stream used for device memory operations and kernel launches

Member Function Documentation

◆ close()

std::unique_ptr<std::vector<uint8_t> > cudf::io::parquet_chunked_writer::close ( std::vector< std::string > const &  column_chunks_file_paths = {})

Finishes the chunked/streamed write process.

Parameters
[in]column_chunks_file_pathsColumn chunks file path to be set in the raw output metadata
Returns
A parquet-compatible blob that contains the data for all rowgroups in the list only if column_chunks_file_paths is provided, else null.

◆ write()

parquet_chunked_writer& cudf::io::parquet_chunked_writer::write ( table_view const &  table,
std::vector< partition_info > const &  partitions = {} 
)

Writes table to output.

Parameters
[in]tableTable that needs to be written
[in]partitionsOptional partitions to divide the table into. If specified, must be same size as number of sinks.
Exceptions
cudf::logic_errorIf the number of partitions is not the same as number of sinks
rmm::bad_allocif there is insufficient space for temporary buffers
Returns
returns reference of the class object

The documentation for this class was generated from the following file: