Table of Contents
High Performance Computing and Best Practices
Background
HPC Vs. Single Processors
Serial Computing - Instructions run one at a time:
Parallel Computing - Problem is split into multiple problems, each problem can be run concurrently (at the same time) on multiple processors
Need for HPC
Time - Parallel computing can solve problems faster
Cost - Parallel computing can be accomplished with cheap hardware and the time savings can lead to cost savings
Larger Problems - Complex problems like climate change, traffic, website transactions, and plasma physics can be infeasible with serial computing and are better suited to parallel computing
Concepts and Terminology
HPC - High Performance Computing, solving big problems with big computing power
Cluster - A group of computers working together
Node - A single computer, many nodes join together to form a cluster
Core/CPU - A processing unit
Job - A problem or program to be run
Parallel Overhead - Extra time needed to setup and coordinate a parallel job: synchronizing, data exchange, start-up/termination, etc
Scalability - Ability of a system to handle more work or its ability to be enlarged to accommodate more work
Limits and Cost
$Speedup = \frac{1}{(P/N)+S}$
P = parallel fraction, N = number of processors, S = serial fraction
Development cost of parallel programming is higher.
Memory Architectures
Batch Processing
Batch Processing and Shared Resources
The need for shared resources
Our Cluster
- 16 cores on two nodes
NERSC
- National Energy Research Scientific Computing Center
- Primary scientific computing facility for the Office of Science in the U.S. Department of Energy
- Multiple machines with over 100,000 cores
Sunway TaihuLight
- in China, currently the top HPC center with 10,649,600 cores
Check www.top500.org for an updated list of the top500 supercomputers
Batch Processing Management
With multiple users wishing to access computing resources, we need a way to manage who gets to use what resources and when they can do so.
Some Resource Managers:
Using HPC Resources at WSI
Accessing Resources via SSH
In the office:
$ ssh user@control
Launching and Managing Your Jobs
Version Control
Often multiple programmers will be working on a codebase simultaneously. Alice may want to work on the code but Bob does too. If they both make changes to a file without telling each other, changes might get lost. Using a version control system ensures that Alice and Bob can work on the code together without undoing each-others work or breaking the code.
Commonly used Version Control Systems
Github SVN
HPC Best Practices
- Redundant Data
- Seperate Storage from Compute Nodes
- Central Source for User Documentation