User Tools

Site Tools


high_performance_computing_and_best_practices

High Performance Computing and Best Practices

Background

HPC Vs. Single Processors

Serial Computing - Instructions run one at a time:

Parallel Computing - Problem is split into multiple problems, each problem can be run concurrently (at the same time) on multiple processors

Need for HPC

Time - Parallel computing can solve problems faster

Cost - Parallel computing can be accomplished with cheap hardware and the time savings can lead to cost savings

Larger Problems - Complex problems like climate change, traffic, website transactions, and plasma physics can be infeasible with serial computing and are better suited to parallel computing

Concepts and Terminology

HPC - High Performance Computing, solving big problems with big computing power

Cluster - A group of computers working together

Node - A single computer, many nodes join together to form a cluster

Core/CPU - A processing unit

Job - A problem or program to be run

Parallel Overhead - Extra time needed to setup and coordinate a parallel job: synchronizing, data exchange, start-up/termination, etc

Scalability - Ability of a system to handle more work or its ability to be enlarged to accommodate more work

Limits and Cost

$Speedup = \frac{1}{(P/N)+S}$

P = parallel fraction, N = number of processors, S = serial fraction

Development cost of parallel programming is higher.

Memory Architectures

Batch Processing

Batch Processing and Shared Resources

The need for shared resources

Our Cluster

  • 16 cores on two nodes

NERSC

  • National Energy Research Scientific Computing Center
  • Primary scientific computing facility for the Office of Science in the U.S. Department of Energy
  • Multiple machines with over 100,000 cores

Sunway TaihuLight

  • in China, currently the top HPC center with 10,649,600 cores

Check www.top500.org for an updated list of the top500 supercomputers

Batch Processing Management

With multiple users wishing to access computing resources, we need a way to manage who gets to use what resources and when they can do so.

Some Resource Managers:

Using HPC Resources at WSI

Accessing Resources via SSH

In the office:

$ ssh user@control

Launching and Managing Your Jobs

Version Control

Often multiple programmers will be working on a codebase simultaneously. Alice may want to work on the code but Bob does too. If they both make changes to a file without telling each other, changes might get lost. Using a version control system ensures that Alice and Bob can work on the code together without undoing each-others work or breaking the code.

Commonly used Version Control Systems

Github SVN

HPC Best Practices

  • Redundant Data
  • Seperate Storage from Compute Nodes
  • Central Source for User Documentation
high_performance_computing_and_best_practices.txt · Last modified: 2022/07/21 06:59 by 127.0.0.1