1 / 32
文档名称:

High Performance Computing On Gpus Using Nvidia Cuda.pdf

格式:pdf   页数:32
下载后只包含 1 个 PDF 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

High Performance Computing On Gpus Using Nvidia Cuda.pdf

上传人:kuo08091 2014/4/1 文件大小:0 KB

下载得到文件列表

High Performance Computing On Gpus Using Nvidia Cuda.pdf

文档介绍

文档介绍:High puting
on GPUs
using
NVIDIA CUDA
Slides include some material from GPGPU tutorial at SIGGRAPH2007:
07
Mark Silberstein, Technion 1
Outline
● Motivation
● Stream programming
– Simplified HW and SW model
– Simple GPU programming example
● Increasing stream granularity
– Using shared memory
– Matrix multiplication
● Improving performance
● Some real life example
Mark Silberstein, Technion 2
Disclaimer
This lecture will discuss GPUs from the
puting perspective
since I am NOT an expert in graphics hardware
Mark Silberstein, Technion 3
Mark Silberstein, Technion 4
Why GPUs-II
Mark Silberstein, Technion 5
Is it a miracle? NO!
● Architectural solution prefers parallelism over
single thread performance!
● Example problem – I have 100 apples to eat
1)“high puting” objective: optimize
the time of eating one apple
2) “high puting” objective: optimize
the time of eating all apples
● The 1st option has been exhausted!!!
● Performance = parallel hardware + scalable
parallel program!
Mark Silberstein, Technion 6
Why not in CPUs?
● Not applicable to general puting
● Complex programming model
● Still immature
– Platform is a moving target
● Vendor-dependent architectures
● patible architectural changes from generation to
generation
– Programming model is vendor dependent
● NVIDIA – CUDA
● AMD(ATI) – Close To Metal (CTM)
● INTEL ( LARRABEE) – nobody knows
Mark Silberstein, Technion 7
Simple stream programming model
Mark Silberstein, Technion 8
Generic GPU
hardware/software model
● Massively parallel processor: many concurrently running
threads (thousands)
● Threads access global GPU memory
● Each thread has limited number of private registers
● Caching: two options
– Not cached (latency hidden through time-slicing)
– Cached with unknown anization, but optimized
for 2D spatial locality
● Single Program Multiple Data (SPMD) model
– The same program, called kernel, is