Microsoft Particle Based Simulations On Multi GPU Systems_add_image_by_kwtechx [읽기 전용] En Sight 2010 Metariver

2013-11-20

: Ensight Ensight 2010 Metariver EnSight_2010_Metariver cases China

Open the PDF directly: View PDF PDF.
Page Count: 36

Multi-GPU 시스템을 이용한
입자계 기반 해석사례
Particle-based Simulations
on multi-GPU Systems
2010-10
coahn@metariver.kr
www.metariver.kr
GPU &CUDA Technology
What is GPU?
3
Entertainment High Performance Computing
GPU
Games
Movie
CG
VR
:
Engineering
Medicine
Science
Finance
Biology
:
GPU & CUDA Technology
4
GPU(Graphics Processing Unit)는 매우 빠른 시간에 복잡한 3차원 이미지를 화면에 출력하
기 위해 사용되는 그래픽카드 전용 Processing Unit
CUDA(Compute Unified Device Architecture)는 GPU를 이용하여 고속 연산이 가능하도록
하는 기술로서, 그래픽 작업을 처리하도록 작성된 고속 shader언어를 과학/공학 계산에 활용
가능한 형태로 사용할 수 있도록 작성된 형태
CUDA는 GPU가 가지는 ~백개의core를동시활
함으로써 수많은 thread를 고속으
로 처리할 수 있도록 하며, 이들 core들은 공통의 GPU 메모리를 사용가능
연산전용의 단일 GPU는 다수의 CPU를 고속의 network로 연결하여 대규모 계산을 공동 처
리하도록 하는 기존의 supercomputing(병렬처리, Parallel Processing) 방법에 비해 보다 낮
은 비용으로, 월등한 고속 연산 가능
NVIDIA에서 shader언어를 다목적으로 사용할 수 있도록 한 GPGPU(General Purpose GPU)
기술을 더욱 확장한 것으로, 2008년 이후 미국 및 유럽 등 선진 연구자들에 의해 지속적으로
성공적인 활용사례가 보고되고 있음
Parallel Computing (HPC)
Multiple Instruction Multiple Data
(conventional parallel processing)
node0
proc.0
node1
proc.1
node2
proc.2
node3
proc.3
node4
proc.4
node5
proc.5
node6
proc.6
node7
proc.7
Domain Decomposition
Solution
Data
(Problem)
5
GPU Computing (CUDA)
Single Instruction Multiple Data
(CUDA technology)
• GPU is a data-parallel processor
• Thousands of parallel threads /
Thousands of data elements to process
• GeForce 8800 has 128 streaming
processor cores and 512MB RAM
• Tesla C1060 has 240 streaming processor
cores and 4GB RAM
Thread(0,0)
Thread(1,0)
Thread(2,0)
Thread(3,0)
Thread(0,1)
core
core
core
Thread(5,5)
Data
(Problem)
GPU
6
GPU Architecture
7
©NVIDIA Corporation
Inside CUDA kernel
8
©NVIDIA Corporation
GPU vs. CPU
Performance [kEA/s]
(x0.6)
(x1.0)
(x2.1)
(x2.5)
(x11.8)
(x33.6)
(x58.7)
- W,D : 50mm
-H : 100mm
- Dpore : 1.5mm
- N : 83,300 EA
Wetting Simulation using SPH (Smoothed Particle Hydrodynamics)
GPU vs. HPC
10
GPU (C870) 1EA
HPC (Opteron252)
Speed-up
Mesh size
128x128 256x256 512x512 1024x1024
Lid-driven cavity flow
Lattice Boltzmann Method (CFD)
16 16 32 128
Speed-up :
s(n) = Ts/ Tp
Ts: WCT of serial code [sec.]
Tp: WCT of parallel code [sec.]
n: the number of processors
(n)
11
MPI (Message Passing Interface)
MPI (Message Passing Interface)
Fast
Algorithm
+
CUDA
Fast
Algorithm
+
CUDA
Fast
Algorithm
+
CUDA
Fast
Algorithm
+
CUDA
HPC
(Parallel Processing)
multi-GPU
HPC vs. multi-GPU
12
sub domain sub domain sub domain
GPU GPU GPU
GPU GPU GPU
Domain
Decomposition
Exchange 1
Exchange 2
Fast Algorithm
Interaction
Inside multi-GPU system
SAMADII
: Particle-based multi-physics solver
14
SAMADII
Particle-based multi-physics solver
H/W acceleration using multi-GPU (GPU cluster)
S/W acceleration (Fast algorithm)
Discrete Element Method (DEM)
Magnetic particle, charged particle simulation
Wetting simulation
Smoothed-Particle Hydrodynamics (SPH) *
Fluid-Solid Interaction (FSI) *
Deformable body simulation *
* under development
15
3D CAD
Assembly
Body1
Body2
Surface
Mesh
Inlet
Outlet
Surface
Selected
Surface
Selected
Motion
Definition
Particle
Creation
/ Filtering
Motion
Definition
Boundary
Condition
Boundary
Condition
Rotation
Rotation
CAD DATA MESH PRE SOLVER
:: Particle
Data
Surface
Mesh
SAMADII
16
Mesh Converter
GPU Configuration
Material Property Setup
Particle Creator
Particle Filter
Mesh Assembler
SAMADII - GUI
-Graphics Engine : HyperCube4 (based on OpenGL & .NET, self-developed)
- Specially designed for high-speed rendering
Example-1 : Excavator
Np: 80,000 EA
Ne: 5,653 EA
500,000 steps
Multi-CPU (HPC 16 core)
2010-02
Example-1 : Excavator
Example-2 : Agitator
Np: 30,000 EA
Ne: 48,602 EA
120,000 steps
Multi-GPU (Tesla C1060 x2)
2010-07
Example-2 : Agitator
Example-3 : Blast Furnace
Np: 640,000 EA
Ne: 64,314 EA
250,000 steps
Multi-GPU (Tesla C1060 x2)
2010-07
Example-3 : Blast Furnace
Visualization
: HyperCube4 &EnSight
Sloshing
Np: 248,845 EA
Ne: 12,580 EA
1,000,000 steps
Tesla C2050
Sloshing
Np: 248,845 EA
Ne: 12,580 EA
1,000,000 steps
Tesla C2050
Sloshing
Sloshing
Np: 248,845 EA
Ne: 12,580 EA
1,000,000 steps
Tesla C2050
< Only Vector >
< Vector & Transparent Particle>
Sloshing
2-R Auger
Np: 219,391 EA
Ne: 45,282 EA
700,000 steps
Multi-GPU (Tesla C1060 x2)
D=0.001
2-R Auger
Np: 219,391 EA
Ne: 45,282 EA
700,000 steps
Multi-GPU (Tesla C1060 x2)
2-R Auger
Np: 219,391 EA
Ne: 45,282 EA
700,000 steps
Multi-GPU (Tesla C1060 x2)
2-R Auger
Rotating Drum
Np: 124,266 EA
Ne: 6,048 EA
700,000 steps
Tesla C2050
D=0.01
Rotating Drum
Rotating Drum
Np: 124,266 EA
Ne: 6,048 EA
700,000 steps
Tesla C2050
< Only Vector >
< Vector & Transparent Particle>
Rotating Drum

Navigation menu