libcuspatial
24.02.00
|
This document serves as a guide for contributors to libcuspatial C++ code. Developers should also refer to these additional files for further documentation of libcuspatial best practices.
libcuspatial is a C++ library that provides GPU-accelerated data-parallel algorithms for processing geospatial and spatiotemporal data. libcuspatial provides various spatial relationship algorithms including distance computation, containment (e.g. point-in-polygon testing), bounding box computations, and spatial indexing.
libcuspatial currently has two interfaces. The first is a C++ API based on data types from libcudf, (the CUDA Dataframe library C++ API). In this document we refer to it as the "column-based API". The column-based API represents spatial data as tables of type-erased columns.
The second API is the cuSpatial header-only C++ API, which is independent of libcudf and represents data as arrays of structures (e.g. 2D points). The header-only API uses iterators for input and output, and is similar in style to the C++ Standard Template Library (STL) and Thrust.
This section defines terminology used within libcuspatial. For terms specific to libcudf, such as Column, Table, etc., see the libcudf developer guide.
TODO: add terms
External/public libcuspatial APIs are grouped based on functionality into an appropriately titled header file in cuspatial/cpp/include/cuspatial/
. For example, cuspatial/cpp/include/cuspatial/distance.hpp
contains the declarations of public API functions related to distance computations. Note the .hpp
file extension used to indicate a C++ header file that can be included from a .cpp
source file.
Header files should use the #pragma once
include guard.
The folder that contains the source files that implement an API should be named consistently with the name of the of the header for the API. For example, the implementation of the APIs found in cuspatial/cpp/include/cuspatial/trajectory.hpp
are located in cuspatial/cpp/src/trajectory
. This rule obviously does not apply to the header-only API, since the headers are the source files.
Likewise, unit tests and benchmarks reside in folders corresponding to the names of the API headers, e.g. distance.hpp tests are in cuspatial/cpp/tests/distance/
and benchmarks are in cuspatial/cpp/benchmarks/distance/
.
Internal API headers containing detail
namespace definitions that are used across translation units inside libcuspatial should be placed in include/cuspatial/detail
.
Header-only API files and column-based API headers are stored together in include/cuspatial
. The former use the .cuh
extension because they almost universally require CUDA compilation. The latter use the .hpp
extension because they can be compiled with a standard C++ compiler.
.hpp
: C++ header files.cpp
: C++ source files.cu
: CUDA C++ source files.cuh
: Headers containing CUDA device codeOnly use .cu
and .cuh
if necessary. A good indicator is the inclusion of __device__
and other symbols that are only recognized by nvcc
. Another indicator is Thrust algorithm APIs with a device execution policy (always rmm::exec_policy
in libcuspatial).
libcuspatial code uses snake_case for all names except in a few cases: template parameters, unit tests and test case names may use Pascal case, aka UpperCamelCase. We do not use Hungarian notation, except sometimes when naming device data variables and their corresponding host copies (e.g. d_data
and h_data
). Private member variables are typically prefixed with an underscore.
Examples:
C++ formatting is enforced using clang-format
. You should configure clang-format
on your machine to use the cuspatial/cpp/.clang-format
configuration file, and run clang-format
on all changed code before committing it. The easiest way to do this is to configure your editor to "format on save", or to use pre-commit
.
Aspects of code style not discussed in this document and not automatically enforceable are typically caught during code review, or not enforced.
In general, we recommend following C++ Core Guidelines. We also recommend watching Sean Parent's C++ Seasoning talk, and we try to follow his rules: "No raw loops. No raw pointers. No raw synchronization primitives." We also wherever possible add a fourth rule: "No raw kernels".
std::shared_ptr
and std::unique_ptr
) to raw pointers.Documentation is discussed in the Documentation Guide.
Prefer algorithms over raw loops wherever possible, as mentioned above. However, avoiding raw loops is not always possible. C++ range-based for loops can make raw loops much clearer, and cuSpatial uses Ranger for this purpose. Ranger provides range helpers with iterators that can be passed to range-based for loops. Of special importance is ranger::grid_stride_range()
, which can be used to iterate over a range in parallel using all threads of a CUDA grid.
When writing custom kernels, grid stride ranges help ensure kernels are adaptable to a variety of grid shapes, most notably when there are fewer total threads than there are data items. Instead of:
A grid-stride loop ensures all of data is processed even if there are fewer than n threads:
With ranger, the code is even clearer and less error prone:
The following guidelines apply to organizing #include
lines.
clang-format
will respect the groupings and sort the individual includes within a group lexicographically.<thrust/...>
, then includes from dependencies installed with cuSpatial, and then standard library headers (for example <string>
, <iostream>
).<>
instead of ""
unless the header is in the same directory as the source file.clangd
often auto-insert includes when they can, but they usually get the grouping and brackets wrong."
to include local headers from the same relative source directory. This should only occur in source files and non-public header files. Otherwise use angle brackets <>
around included header filenames...
when possible. Paths with ..
are necessary when including (internal) headers from source paths not in the same directory as the including file, because source paths are not passed with -I
.src
directories in tests or in libcuspatial public headers. If you find yourself doing this, start a discussion about moving (parts of) the included internal header to a public header.The header-only libcuspatial API is agnostic to the type of containers used by the application to hold its data, because the header-only API is based on iterators (see Iterator Requirements). The cuDF-based cuSpatial API, on the other hand, uses cuDF Columns and Tables to store and access application data.
See the libcudf Developer guide for more information on cuDF data structures, including views.
Resource ownership is an essential concept in libcudf, and therefore in the cuDF-based libcuspatial API. In short, an "owning" object owns a resource (such as device memory). It acquires that resource during construction and releases the resource in destruction (RAII). A "non-owning" object does not own resources. Any class in libcudf with the *_view
suffix is non-owning. For more detail see the libcudf++
presentation.
cuDF-based libcuspatial functions typically take views as input (column_view
or table_view
) and produce unique_ptr
s to owning objects as output. For example,
rmm::device_memory_resource
)libcuspatial allocates all device memory via RMM memory resources (MR). See the RMM documentation for details.
RMM provides a "default" memory resource for each device that can be accessed and updated via the rmm::mr::get_current_device_resource()
and rmm::mr::set_current_device_resource(...)
functions, respectively. All memory resource parameters should be defaulted to use the return value of rmm::mr::get_current_device_resource()
.
This section provides specifics about the structure and implementation of cuSpatial API functions.
libcuspatial's column-based API is designed to integrate seamlessly with other RAPIDS libraries, notably cuDF. To that end, this API uses cudf::column
and cudf::table
data structures as input and output. This enables cuSpatial to provide Python and other language APIs (e.g. Java) that integrate seamlessly with the APIs of other RAPIDS libraries like cuDF and cuML. This allows users to integrate spatial data queries and transformations into end-to-end GPU-accelerated data analytics and machine learning workflows.
The preferred style for passing input to and returning output from column-based API functions is the following:
column_view const&
table_view const&
scalar const&
const&
mutable_column_view&
mutable_table_view&
std::unique_ptr<column>
std::unique_ptr<table>
std::unique_ptr<scalar>
Here is an example column-based API function.
key points:
cudf::column_view
. This is a type-erased container so determining the type of data must be done at run time.unique_ptr<cudf::column>
.detail
version of the API that takes a stream. This follows libcudf, and may change in the future.Sometimes it is necessary for functions to have multiple outputs. There are a few ways this can be done in C++ (including creating a struct
for the output). One convenient way to do this is using std::tie
and std::pair
. Note that objects passed to std::pair
will invoke either the copy constructor or the move constructor of the object, and it may be preferable to move non-trivially copyable objects (and required for types with deleted copy constructors, like std::unique_ptr
).
Multiple column outputs that are functionally related (e.g. x- and y-coordinates), should be combined into a table
.
Note: std::tuple
could be used if not for the fact that Cython does not support std::tuple
Therefore, libcuspatial public column-based APIs must use std::pair
, and are therefore limited to return only two objects of different types. Multiple objects of the same type may be returned via a std::vector<T>
.
Alternatively, C++17 structured binding may be used to disaggregate multiple return values:
Note that the compiler might not support capturing aliases defined in a structured binding in a lambda. One may work around this by using a capture with an initializer instead:
For C++ users and developers who do not also use libcudf or other RAPIDS APIS, depending on libcudf could be a barrier to adoption of libcuspatial. libcudf is a very large library and building it takes a lot of time.
Therefore, libcuspatial provides a header-only C++ API that does not depend on libcudf. The header-only API is an iterator-based interface. This has a number of advantages.
The main disadvantages of this type of API are
The column-based C++ API is a simple layer above the header-only API. This approach protects column-based API users from the disadvantages while maintaining the advantages for users of the header-only API.
All array inputs and outputs are iterator type templates to enable generic application of the APIs. An example function is helpful.
There are a few key points to notice.
std::transform
.cuspatial::vec_2d<T>
type (include/cuspatial/vec_2d.hpp). This is enforced using a static_assert
in the function body.T
) that is by default equal to the base value_type
of the type iterated over by LonLatItA
. libcuspatial provides the iterator_vec_base_type
trait helper for this.a_lonlat_first
and a_lonlat_last
). This mirrors STL APIs.std::transform
, even though as with transform
, many uses of cuSpatial APIs will not need to use this returned iterator.rmm::device_memory_resource
to use for output memory allocation.Whenever possible in the header-only API, output data should be written to output iterators that reference data allocated by the caller of the API. In this case, multiple "return values" are simply written to multiple output iterators. Typically such APIs return an iterator one past the end of the primary output iterator (in the style of std::transform()
.
In functions where the output size is data dependent, the API may allocate the output data and return it as a rmm::device_uvector
or other data structure containing device_uvector
s.
All input and output iterators must be device-accessible with random access. They must satisfy the requirements of C++ LegacyRandomAccessIterator. Output iterators must be mutable.
CUDA streams are not yet exposed in public column-based libcuspatial APIs. header-only libcuspatial APIs that execute GPU work or allocate GPU memory should take a stream parameter.
In order to ease the transition to future use of streams in the public column-based API, all libcuspatial APIs that allocate device memory or execute GPU work (including kernels, Thrust algorithms, or anything that can take a stream) should be implemented using asynchronous APIs on the default stream (e.g., stream 0).
The recommended pattern for doing this is to make the definition of the external API invoke an internal API in the detail
namespace. The internal detail
API has the same parameters as the public API, plus a rmm::cuda_stream_view
parameter at the end with no default value. If the detail API also accepts a memory resource parameter, the stream parameter should be ideally placed just before the memory resource. The public API will call the detail API and provide rmm::cuda_stream_default
. The implementation should be wholly contained in the detail
API definition and use only asynchronous versions of CUDA APIs with the stream parameter.
In order to make the detail
API callable from other libcuspatial functions, it may be exposed in a header placed in the cuspatial/cpp/include/detail/
directory.
For example:
Note: It is important to synchronize the stream if and only if it is necessary. For example, when a non-pointer value is returned from the API that is the result of an asynchronous device-to-host copy, the stream used for the copy should be synchronized before returning. However, when a column is returned, the stream should not be synchronized because doing so will break asynchrony if and when we add an asynchronous API to libcuspatial.
Note: cudaDeviceSynchronize()
should never be used. This limits the ability to do any multi-stream/multi-threaded work with libcuspatial APIs.
In order to aid in performance optimization and debugging, all compute intensive libcuspatial functions should have a corresponding NVTX range. In libcuspatial, we have a convenience macro CUSPATIAL_FUNC_RANGE()
that automatically annotates the lifetime of the enclosing function and uses the function's name as the name of the NVTX range. For more information about NVTX, see here.
(Note: cuSpatial has not yet had the need for internal stream creation.) The following guidance is copied from libcudf's documentation. There may be times in implementing libcuspatial features where it would be advantageous to use streams internally, i.e., to accomplish overlap in implementing an algorithm. However, dynamically creating a stream can be expensive. RMM has a stream pool class to help avoid dynamic stream creation. However, this is not yet exposed in libcuspatial, so for the time being, libcuspatial features should avoid creating streams (even if it is slightly less efficient). It is a good idea to leave a // TODO:
note indicating where using a stream would be beneficial.
Device memory resources are used in libcuspatial to abstract and control how device memory is allocated.
Any libcuspatial API that allocates memory that is returned to a user must accept a pointer to a device_memory_resource
as the last parameter. Inside the API, this memory resource must be used to allocate any memory for returned objects. It should therefore be passed into functions whose outputs will be returned. Example:
Not all memory allocated within a libcuspatial API is returned to the caller. Often algorithms must allocate temporary, scratch memory for intermediate results. Always use the default resource obtained from rmm::mr::get_current_device_resource()
for temporary memory allocations. Example:
libcuspatial code eschews raw pointers and direct memory allocation. Use RMM classes built to use device_memory_resource
for device memory allocation with automated lifetime management.
Allocates a specified number of bytes of untyped, uninitialized device memory using a device_memory_resource
. If no resource is explicitly provided, uses rmm::mr::get_current_device_resource()
.
rmm::device_buffer
is movable and copyable on a stream. A copy performs a deep copy of the device_buffer
's device memory on the specified stream, whereas a move moves ownership of the device memory from one device_buffer
to another.
Allocates a single element of the specified type initialized to the specified value. Use this for scalar input/outputs into device kernels, e.g., reduction results, null count, etc. This is effectively a convenience wrapper around a rmm::device_vector<T>
of length 1.
Allocates a specified number of elements of the specified type. If no initialization value is provided, all elements are default initialized (this incurs a kernel launch).
Note: (TODO: this not true yet in libcuspatial but we should strive for it. The following is copied from libcudf's developer guide.) We have removed all usage of rmm::device_vector
and thrust::device_vector
from libcuspatial, and you should not use it in new code in libcuspatial without careful consideration. Instead, use rmm::device_uvector
along with the utility factories in device_factories.hpp
. These utilities enable creation of uvector
s from host-side vectors, or creating zero-initialized uvector
s, so that they are as convenient to use as device_vector
. Avoiding device_vector
has a number of benefits, as described in the following section on rmm::device_uvector
.
Similar to a device_vector
, allocates a contiguous set of elements in device memory but with key differences:
T
to trivially copyable types.cuda_stream_view
specifying the stream on which the operation is performed). This improves safety when using non-default streams.device_uvector.hpp
does not include any __device__
code, unlike thrust/device_vector.hpp
, which means device_uvector
s can be used in .cpp
files, rather than just in .cu
files.All public libcuspatial APIs should be placed in the cuspatial
namespace. Example:
The top-level cuspatial
namespace is sufficient for most of the public API. However, to logically group a broad set of functions, further namespaces may be used.
Many functions are not meant for public use, so place them in either the detail
or an anonymous namespace, depending on the situation.
Functions or objects that will be used across multiple translation units (i.e., source files), should be exposed in an internal header file and placed in the detail
namespace. Example:
Functions or objects that will only be used in a single translation unit should be defined in an anonymous namespace in the source file where it is used. Example:
Anonymous namespaces should never be used in a header file.
libcuspatial is constantly evolving to improve performance and better meet our users' needs. As a result, we occasionally need to break or entirely remove APIs to respond to new and improved understanding of the functionality we provide. Remaining free to do this is essential to making libcuspatial an agile library that can rapidly accommodate our users needs. As a result, we do not always provide a warning or any lead time prior to releasing breaking changes. On a best effort basis, the libcuspatial team will notify users of changes that we expect to have significant or widespread effects.
Where possible, indicate pending API removals using the deprecated attribute and document them using Doxygen's deprecated command prior to removal. When a replacement API is available for a deprecated API, mention the replacement in both the deprecation message and the deprecation documentation. Pull requests that introduce deprecations should be labeled "deprecation" to facilitate discovery and removal in the subsequent release.
Advertise breaking changes by labeling any pull request that breaks or removes an existing API with the "breaking" tag. This ensures that the "Breaking" section of the release notes includes a description of what has broken from the past release. Label pull requests that contain deprecations with the "non-breaking" tag.
libcuspatial follows conventions (and provides utilities) enforcing compile-time and run-time conditions and detecting and handling CUDA errors. Communication of errors is always via C++ exceptions.
Use the CUSPATIAL_EXPECTS
macro to enforce runtime conditions necessary for correct execution.
Example usage:
The first argument is the conditional expression expected to resolve to true
under normal conditions. If the conditional evaluates to false
, then an error has occurred and an instance of cuspatial::logic_error
is thrown. The second argument to CUSPATIAL_EXPECTS
is a short description of the error that has occurred and is used for the exception's what()
message.
There are times where a particular code path, if reached, should indicate an error no matter what. For example, often the default
case of a switch
statement represents an invalid alternative. Use the CUSPATIAL_FAIL
macro for such errors. This is effectively the same as calling CUSPATIAL_EXPECTS(false, reason)
.
Example:
Use the CUSPATIAL_CUDA_TRY
macro to check for the successful completion of CUDA runtime API functions. This macro throws a cuspatial::cuda_error
exception if the CUDA API return value is not cudaSuccess
. The thrown exception includes a description of the CUDA error code in its what()
message.
Example:
Use static_assert
to enforce compile-time conditions. For example,
Columns may contain data of a number of types. cuDF supports a variety of types that are not used in cuSpatial. cuSpatial functions mostly operate on numeric and timestamp data. For more information on libcudf data types see the libcudf developer guide.
cudf::column
stores data (for columns and scalars) "type erased" in void*
device memory. This type-erasure enables interoperability with other languages and type systems, such as Python and Java. In order to determine the type, functions must use the run-time information stored in the column type()
to reconstruct the data type T
by casting the void*
to the appropriate T*
.
This so-called type dispatch is pervasive throughout libcudf and the column-based libcuspatial API. The cudf::type_dispatcher
is a central utility that automates the process of mapping the runtime type information in data_type
to a concrete C++ type. See the libcudf developer guide for more information.