[−][src]Module rustacuda::memory
Access to CUDA's memory allocation and transfer functions.
The memory module provides a safe wrapper around CUDA's memory allocation and transfer functions. This includes access to device memory, unified memory, and page-locked host memory.
Device Memory
Device memory is just what it sounds like - memory allocated on the device. Device memory
cannot be accessed from the host directly, but data can be copied to and from the device.
RustaCUDA exposes device memory through the DeviceBox
and
DeviceBuffer
structures. Pointers to device memory are
represented by DevicePointer
, while slices in device memory are
represented by DeviceSlice
.
Unified Memory
Unified memory is a memory allocation which can be read from and written to by both the host
and the device. When the host (or device) attempts to access a page of unified memory, it is
seamlessly transferred from host RAM to device RAM or vice versa. The programmer may also
choose to explicitly prefetch data to one side or another (though this is not currently exposed
through RustaCUDA). RustaCUDA exposes unified memory through the
UnifiedBox
and UnifiedBuffer
structures, and pointers to unified memory are represented by
UnifiedPointer
. Since unified memory is accessible to the host,
slices in unified memory are represented by normal Rust slices.
Unified memory is generally easier to use than device memory, but there are drawbacks. It is possible to allocate more memory than is available on the card, and this can result in very slow paging behavior. Additionally, it can require careful use of prefetching to achieve optimum performance. Finally, unified memory is not supported on some older systems.
Page-locked Host Memory
Page-locked memory is memory that the operating system has locked into physical RAM, and will
not page out to disk. When copying data from the process' memory space to the device, the CUDA
driver needs to first copy the data to a page-locked region of host memory, then initiate a DMA
transfer to copy the data to the device itself. Likewise, when transferring from device to host,
the driver copies the data into page-locked host memory then into the normal memory space. This
extra copy can be eliminated if the data is loaded or generated directly into page-locked
memory. RustaCUDA exposes page-locked memory through the
LockedBuffer
struct.
For example, if the programmer needs to read an array of bytes from disk and transfer it to the
device, it would be best to create a LockedBuffer
, load the bytes directly into the
LockedBuffer
, and then copy them to a DeviceBuffer
. If the bytes are in a Vec<u8>
, there
would be no advantage to using a LockedBuffer
.
However, since the OS cannot page out page-locked memory, excessive use can slow down the entire system (including other processes) as physical RAM is tied up. Therefore, page-locked memory should be used sparingly.
FFI Information
The internal representations of DevicePointer<T>
and UnifiedPointer<T>
are guaranteed to be
the same as *mut T
and they can be safely passed through an FFI boundary to code expecting
raw pointers (though keep in mind that device-only pointers cannot be dereferenced on the CPU).
This is important when launching kernels written in C.
As with regular Rust, all other types (eg. DeviceBuffer
or UnifiedBox
) are not FFI-safe.
Their internal representations are not guaranteed to be anything in particular, and are not
guaranteed to be the same in different versions of RustaCUDA. If you need to pass them through
an FFI boundary, you must convert them to FFI-safe primitives yourself. For example, with
UnifiedBuffer
, use the as_unified_ptr()
and len()
functions to get the primitives, and
mem::forget()
the Buffer so that it isn't dropped. Again, as with regular Rust, the caller is
responsible for reconstructing the UnifiedBuffer
using from_raw_parts()
and dropping it to
ensure that the memory allocation is safely cleaned up.
Modules
array | Routines for allocating and using CUDA Array Objects. |
Structs
DeviceBox | A pointer type for heap-allocation in CUDA device memory. |
DeviceBuffer | Fixed-size device-side buffer. Provides basic access to device memory. |
DeviceChunks | An iterator over a |
DeviceChunksMut | An iterator over a |
DevicePointer | A pointer to device memory. |
DeviceSlice | Fixed-size device-side slice. |
LockedBuffer | Fixed-size host-side buffer in page-locked memory. |
UnifiedBox | A pointer type for heap-allocation in CUDA unified memory. |
UnifiedBuffer | Fixed-size buffer in unified memory. |
UnifiedPointer | A pointer to unified memory. |
Traits
AsyncCopyDestination | Sealed trait implemented by types which can be the source or destination when copying data asynchronously to/from the device or from one device allocation to another. |
CopyDestination | Sealed trait implemented by types which can be the source or destination when copying data to/from the device or from one device allocation to another. |
DeviceCopy | Marker trait for types which can safely be copied to or from a CUDA device. |
Functions
cuda_free⚠ | Free memory allocated with |
cuda_free_locked⚠ | Free page-locked memory allocated with |
cuda_free_unified⚠ | Free memory allocated with |
cuda_malloc⚠ | Unsafe wrapper around the |
cuda_malloc_locked⚠ | Unsafe wrapper around the |
cuda_malloc_unified⚠ | Unsafe wrapper around the |