Image & Memory
PyImageCUDA provides two image types and flexible memory management for GPU buffers.
Image Types
Image - Float32 Precision
Primary image type for all operations. Stores RGBA data in 32-bit floating point (0.0 to 1.0 range).
Uninitialized Memory
Newly created images contain uninitialized GPU memory with random data. Always initialize before use with Fill.color() or load data explicitly.
img = Image(1920, 1080)
Fill.color(img, (0, 0, 0, 0)) # Clear to transparent black, or use load() or other Fill functions
When to use: All composition, effects, and color operations. This is your default choice.
ImageU8 - 8-bit Precision
Storage-optimized type for loading/saving. Stores RGBA data as unsigned 8-bit integers (0 to 255 range).
from pyimagecuda import ImageU8
# Typically used internally by load/save
u8_img = ImageU8(width=1920, height=1080)
When to use: Rarely needed directly. load() and save() handle conversions automatically.
Memory Management
PyImageCUDA offers three memory management approaches:
1. Automatic (Garbage Collection)
Simplest approach. Python's GC cleans up when images go out of scope.
from pyimagecuda import Image, Fill
img = Image(1920, 1080)
Fill.color(img, (1, 0, 0, 1))
# img will be freed automatically when no longer referenced
Use when: Writing simple scripts or prototypes.
2. Explicit with Context Managers
Immediate cleanup using with statements. Example - Batch Processing:
from pyimagecuda import load, save, Filter
for i in range(1000):
with load(f"input_{i}.jpg") as img:
Filter.gaussian_blur(img, radius=10)
save(img, f"output_{i}.jpg")
# Each image is freed before loading the next
3. Manual Control
Explicit free() calls for maximum control.
from pyimagecuda import Image, Fill
img = Image(1920, 1080)
Fill.color(img, (1, 0, 0, 1))
img.free() # Free immediately
Use when: You need precise control over when memory is released.
4. Handling Out of Memory
When GPU memory is exhausted, you'll see:
This is a clear signal that your GPU has run out of VRAM. Check your memory usage and consider freeing unused buffers or reducing workload size.
Buffer Reuse
All operations that create temporary buffers accept optional buffer parameters for zero-allocation workflows.
Buffers can be larger than necessary but not smaller. If they are larger, their logical dimensions will be adapted within the function without any performance cost, but the original maximum size with which it was created will be maintained.
Example: Gaussian Blur
from pyimagecuda import Image, Filter
src = Image(1920, 1080)
dst = Image(1920, 1080)
temp = Image(1920, 1080)
# Process 100 images reusing the same buffers
for i in range(100):
load(f"input_{i}.jpg", f32_buffer=src)
Filter.gaussian_blur(src, dst_buffer=dst, temp_buffer=temp)
save(dst, f"output_{i}.jpg")
# Clean up once
src.free()
dst.free()
temp.free()
Benefits:
- No repeated allocations
- Consistent VRAM usage
- Critical for video processing
Dynamic Buffer Sizing
Image buffers have a fixed capacity but adjustable logical dimensions.
from pyimagecuda import Image
# Create buffer with capacity for 1920×1080
img = Image(1920, 1080)
# Can logically resize within capacity
img.resize(1280, 720)
# Check capacity
max_pixels = img.get_max_capacity()
print(f"Capacity: {max_pixels:,} pixels") # 2,073,600 pixels (1920×1080)
print(f"Current: {img.width}×{img.height}") # 1280×720
CUDA Interop (Zero-Copy)
PyImageCUDA images implement the __cuda_array_interface__ v3 protocol. This allows any library in the CUDA Python ecosystem (CuPy, PyTorch, Numba, RAPIDS, JAX with the CUDA backend, etc.) to read and write the image's GPU memory without any copies.
The memory layout exposed is (height, width, 4) interleaved RGBA, row-major contiguous:
Image→float32(typestr<f4)ImageU8→uint8(typestr|u1)
Memory ownership
The CUDA buffer is owned by the Image / ImageU8 instance. The image must stay alive while any external array views it, otherwise the view becomes a dangling pointer. Do not call image.free() while a CuPy / Torch tensor still references it.
Raw device pointer
Use the cuda_ptr property to retrieve the raw CUDA device pointer as a Python int. Useful for custom kernels or low-level interop.
from pyimagecuda import Image
img = Image(1920, 1080)
print(hex(img.cuda_ptr)) # e.g. 0x7f1234500000
CuPy
The to_cupy() helper wraps a pyimagecuda image as a zero-copy cupy.ndarray. CuPy is not a dependency of pyimagecuda; install it separately:
from pyimagecuda import Image, Fill, to_cupy, download
import cupy as cp
img = Image(512, 512)
Fill.color(img, (1, 0, 0, 1))
# Zero-copy view — shares the same GPU memory
arr = to_cupy(img)
assert arr.data.ptr == img.cuda_ptr
assert arr.shape == (512, 512, 4)
assert arr.dtype == cp.float32
# Modify via CuPy → changes are visible in the pyimagecuda image
arr[:, :, 1] = 1.0 # add green channel
# Download to verify
pixels = download(img)
Because __cuda_array_interface__ is honored, cp.asarray(img) works too:
Custom CUDA kernels (CuPy RawKernel)
You can run your own CUDA kernels directly on pyimagecuda buffers:
from pyimagecuda import Image, Fill
import cupy as cp
img = Image(512, 512)
Fill.color(img, (0.2, 0.4, 0.8, 1.0))
invert_rgb = cp.RawKernel(r'''
extern "C" __global__
void invert_rgb(float4* data, int n) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i >= n) return;
float4 p = data[i];
data[i] = make_float4(1.0f - p.x, 1.0f - p.y, 1.0f - p.z, p.w);
}
''', 'invert_rgb')
arr = cp.asarray(img)
n = img.width * img.height
invert_rgb((n // 256 + 1,), (256,), (arr, n))
PyTorch
PyTorch consumes __cuda_array_interface__ via torch.as_tensor:
import torch
from pyimagecuda import Image, Fill
img = Image(512, 512)
Fill.color(img, (1, 0, 0, 1))
tensor = torch.as_tensor(img, device='cuda') # zero-copy
# tensor.shape == torch.Size([512, 512, 4])
# tensor.dtype == torch.float32
Numba CUDA
from numba import cuda
from pyimagecuda import Image
img = Image(512, 512)
device_array = cuda.as_cuda_array(img) # zero-copy
Best Practices
For Simple Scripts
For Batch Processing
For Video/Real-time
# Reuse buffers explicitly
frame = Image(1920, 1080)
temp = Image(1920, 1080)
while video.has_frames():
video.read_into(frame)
process(frame, temp_buffer=temp)
video.write(frame)
frame.free()
temp.free()
Memory Considerations
VRAM vs RAM:
Image(1920, 1080)uses ~32MB of VRAM- Python object itself uses <100 bytes of RAM
- GC triggers on RAM pressure, not VRAM pressure, use explicit management for large workloads