OpenGL Integration
PyImageCUDA provides GPU-to-GPU transfer with OpenGL through CUDA-OpenGL interop, enabling efficient real-time preview pipelines.
Overview
Traditional approach:
With OpenGL interop:
Performance:
- Zero CPU involvement (all transfers GPU-side)
- ~10x faster than CPU download/upload
- Sub-millisecond updates for 2K images (~1ms measured)
- Single GPU-GPU DMA copy (PBO→Texture, asynchronous)
How It Works
The pipeline has 5 stages, all happening on the GPU:
- Generate frame in a PyImageCUDA
Image(float32 RGBA). - Convert to uint8 in an
ImageU8(the format OpenGL expects). - Copy to PBO via
GLResource.copy_from()(GPU→GPU, no CPU involved). - Upload PBO to texture with
glTexSubImage2D(asynchronous on the GPU). - Draw the texture on a fullscreen quad.
The PBO (Pixel Buffer Object) is the bridge: it's an OpenGL buffer that CUDA can write to directly. PyImageCUDA registers it once via GLResource(pbo_id) and then writes to it every frame with copy_from().
Minimal Example
A complete, runnable script showing the full GPU→display pipeline. This is all you need to display PyImageCUDA output in a real-time window:
import glfw
from OpenGL.GL import *
from pyimagecuda import Image, ImageU8, Fill, convert_float_to_u8, GLResource
W, H = 1280, 720
# 1. Create OpenGL window
glfw.init()
window = glfw.create_window(W, H, "PyImageCUDA Preview", None, None)
glfw.make_context_current(window)
glfw.swap_interval(1) # vsync
# 2. Create PBO (Pixel Buffer Object). Size = W * H * 4 (RGBA uint8)
pbo = int(glGenBuffers(1))
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo)
glBufferData(GL_PIXEL_UNPACK_BUFFER, W * H * 4, None, GL_STREAM_DRAW)
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0)
# 3. Create OpenGL texture for display
tex = int(glGenTextures(1))
glBindTexture(GL_TEXTURE_2D, tex)
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, W, H, 0,
GL_RGBA, GL_UNSIGNED_BYTE, None)
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR)
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR)
# 4. Register PBO with CUDA (one-time setup)
gl_res = GLResource(pbo)
# 5. PyImageCUDA buffers (reused every frame)
canvas = Image(W, H)
canvas_u8 = ImageU8(W, H)
# Render loop
while not glfw.window_should_close(window):
glfw.poll_events()
# --- Generate the frame on GPU ---
Fill.gradient(canvas, (1, 0, 0, 1), (0, 0, 1, 1), 'radial')
# --- Convert float32 → uint8 (still on GPU) ---
convert_float_to_u8(canvas_u8, canvas)
# --- Copy GPU buffer to PBO (zero-copy, GPU→GPU) ---
gl_res.copy_from(canvas_u8)
# --- Upload PBO to texture ---
glBindTexture(GL_TEXTURE_2D, tex)
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo)
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, W, H,
GL_RGBA, GL_UNSIGNED_BYTE, None)
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0)
# --- Draw the texture as a fullscreen quad ---
fb_w, fb_h = glfw.get_framebuffer_size(window)
glViewport(0, 0, fb_w, fb_h)
glClear(GL_COLOR_BUFFER_BIT)
glEnable(GL_TEXTURE_2D)
glBindTexture(GL_TEXTURE_2D, tex)
glBegin(GL_TRIANGLE_STRIP)
# Y inverted to match image orientation
glTexCoord2f(0, 1); glVertex2f(-1, -1)
glTexCoord2f(1, 1); glVertex2f( 1, -1)
glTexCoord2f(0, 0); glVertex2f(-1, 1)
glTexCoord2f(1, 0); glVertex2f( 1, 1)
glEnd()
glDisable(GL_TEXTURE_2D)
glfw.swap_buffers(window)
# Cleanup
gl_res.free()
canvas.free()
canvas_u8.free()
glDeleteTextures([tex])
glDeleteBuffers(1, [pbo])
glfw.terminate()
Dependencies:
Note: glGenBuffers(1) and glGenTextures(1) return numpy types from PyOpenGL. Cast them with int() before passing to GLResource, otherwise you'll get a ValueError.
GLResource API
Constructor
Registers an OpenGL PBO with CUDA for interop.Parameters: - pbo_id: OpenGL buffer ID from glGenBuffers(). Must be a Python int (cast with int() if it comes from PyOpenGL).
Raises: - ValueError: If pbo_id is invalid - RuntimeError: If CUDA registration fails
copy_from()
CopiesImageU8 data directly to the PBO (GPU→GPU, zero CPU overhead). Parameters: - image: Source ImageU8 buffer. Must match PBO size. - sync: If True (default), blocks until copy completes. If False, returns immediately (advanced usage).
Raises: - TypeError: If image is not ImageU8 - RuntimeError: If resource has been freed
free()
Unregisters the resource. Must be called before deleting the PBO.Context Manager:
Buffer Sizing
When creating the PBO, its size must match what you'll write to it:
- For
ImageU8(RGBA uint8):width * height * 4bytes - The OpenGL texture must use
GL_RGBA8internal format andGL_RGBA, GL_UNSIGNED_BYTEforglTexImage2D/glTexSubImage2D.
GLResource only works with ImageU8, not Image (float32). If your pipeline runs in float32, use convert_float_to_u8() before calling copy_from().
Full Application Example
For a complete production-quality implementation with PySide6 (Qt widget integration, mouse/keyboard handling, multiple panels):
https://github.com/offerrall/pyimagecuda-studio/tree/main/pyimagecuda_studio/gui/preview
Best Practices
Display Integration
- Invert texture Y-coordinates if the image appears upside-down (
glTexCoord2f(0, 1)at bottom-left, not(0, 0)). - Enable
GL_BLENDfor alpha channel support if compositing transparent layers.
Resource Management
- Create PBO with
GL_STREAM_DRAWfor best performance (signals frequent updates). - Pre-allocate buffers (
Image,ImageU8) outside the render loop and reuse every frame. - Call
gl_res.free()beforeglDeleteBuffers([pbo]).
Performance
- Reuse
ImageandImageU8buffers across frames; never allocate inside the loop. - Match texture size to PBO size exactly to avoid resampling.
glTexSubImage2Dwith PBO is asynchronous: the GPU schedules the upload without stalling the CPU.- Use
glfw.swap_interval(0)to disable vsync and measure raw pipeline throughput.
Common Pitfalls
ValueError: Invalid PBO ID: PyOpenGL returns numpy types. Cast withint(glGenBuffers(1)).- Black or garbled texture: PBO size doesn't match
width * height * 4, or you're usingImage(float32) instead ofImageU8. - Image upside-down: invert Y in texture coordinates as shown in the minimal example.
copy_fromfails after a while: you freed theGLResourceor the PBO before the last frame. Free in the right order:gl_res.free()first, then OpenGL objects.