文档介绍:GPU-based Image Processing and
Computer Vision
GIPSA GPU Summer School
June 29 – July 2, 2010, Grenoble, France
James Fung
NVIDIA
Talk Outline
History/Overview of GPU hardware
3 CUDA Generations from G80, Tesla, and introducing
Fermi
Fully Resident GPU pipelines: Panorama Stitiching
What’s new at NVIDIA in puting
Fermi Architecture
CUDA Features/PTX
Shader Based Vision (~2004)
Graphics API/Shaders
Inefficiencies
CPU interaction Req’d
Tethered
Today: Fully GPU resident
Evolution of puting (Vision)
Texture Framebuffer
Combiners Objects,
GPU Programmable Render to CUDA Architecture
Fixed
Technology Shaders Texture C- for CUDA
Function Single Precision Floating
Pipeline Point Double Precision
Image Derivatives, simple Imaging Pipelines
edge detection Fast “Gather” and
(Global) Reduction
On-GPU Math
Operations
Vision Feature Detection
Mappings Image Filtering,
Bayer Demosaicing
Image Projections
General Numerical
Computing
Full “On-GPU” algorithms
G80 Device
Processors puting threads
Thread Execution Manager issues threads
128 Thread Processors grouped into 16 multiprocessors (SMs)
Parallel Data Cache enables thread cooperation
Host
Input Assembler
Thread Execution Manager
TPC TPC TPC TPC TPC TPC TPC TPC
Parallel Parallel Parallel Parallel Parallel Parallel Parallel Parallel Parallel Parallel Parallel Parallel Parallel Parallel Parallel Parallel
Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data
Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache
Load/store
Global Memory
10-Series Architecture
240 thread processors grouped into 30 streaming multiprocessors
(SMs) with GB of RAM
1 TFLOPS single precision (theoretical)
87 GFLOPS double precision (IEEE 754 floating point) Thread
Processors
Each SM:
8 Thread Processors
Double
1 Double Precision Unit Multiprocessor