Skip to content

PhotoFF Performance Benchmarks

Overview

PhotoFF is built to fully exploit the parallelism of modern NVIDIA GPUs through CUDA. This page presents an apples‑to‑apples comparison between PhotoFF and the ubiquitous Pillow CPU implementation for common image‑processing tasks. Each benchmark shows the number of frames processed per second (FPS) and the resulting speed‑up factor obtained with PhotoFF.

Test Environment

Component Specification
CPU AMD Ryzen™ 3 3700X
GPU NVIDIA® GeForce RTX™ 3070 (8 GB)
OS Windows 10 Pro 22H2
Python 3.13
CUDA 12.6
Driver 560.94

Methodology

Each test runs the corresponding script inside `` and executes 100 consecutive iterations of the target operation. For PhotoFF we measure:

  • GPU (no cache) – the naive call that internally allocates a new destination buffer every iteration.
  • GPU (cache) – the same call but re‑using a pre‑allocated destination buffer to avoid costly cudaMalloc operations (when the API supports it).

Pillow is executed on the host CPU using the nearest equivalent function.

The final metric is calculated as:

FPS = 100 iterations / total time (seconds)

where the total time includes all Python overhead, memory transfers and device synchronisations.

Results

Blending (blend)

Operation PhotoFF GPU FPS Pillow CPU FPS Speed‑up ×
blend 39 144.23 464.04 84.36×

Cropping (crop_margins)

Resolution: 1920 × 1080 → 1720 × 980

Method GPU (no cache) FPS GPU (cache) FPS Pillow FPS Speed‑up no‑cache × Speed‑up cache ×
crop_margins 1 690.29 12 359.09 626.57 2.70× 19.73×

Filling

Operation GPU FPS Pillow FPS Speed‑up ×
fill_color 11 960.15 6 477.89 1.85×
fill_gradient 13 553.62 79.47 170.54×

Resizing (resize → 1920 × 1080 → 1280 × 720)

Method GPU (no cache) FPS GPU (cache) FPS Pillow FPS Speed‑up no‑cache × Speed‑up cache ×
NEAREST 2 456.03 25 562.55 830.38 2.96× 30.78×
BILINEAR 2 327.16 21 242.36 42.45 54.82× 500.36×
BICUBIC 2 162.25 8 842.59 30.55 70.77× 289.42×

Reproducing the Benchmarks

cd photoff/tests
python blend_speed.py
python crop_speed.py
python fill_speed.py
python resize_speed.py

All scripts print a detailed comparison table to stdout. For consistent results close other GPU‑intensive applications and ensure the GPU is running at its maximum performance profile.

Conclusion

PhotoFF’s CUDA backend delivers order‑of‑magnitude performance gains over traditional CPU‑based imaging libraries, making real‑time or batch‑processing workloads practical even at high resolutions. Further gains are possible by batching multiple operations together and minimising host‑device transfers, as explained in the Advanced Topics guide.