====== High Performance Computing and Best Practices ======
===== Background =====
==== HPC Vs. Single Processors ====
Serial Computing - Instructions run one at a time:
{{:images:serialproblem.gif?400|}}

Parallel Computing - Problem is split into multiple problems, each problem can be run concurrently (at the same time) on multiple processors
{{:images:parallelproblem.gif?500|}}


==== Need for HPC ====
//Time// - Parallel computing can solve problems faster

//Cost// - Parallel computing can be accomplished with cheap hardware and the time savings can lead to cost savings

//Larger Problems// - Complex problems like climate change, traffic, website transactions, and plasma physics can be infeasible with serial computing and are better suited to parallel computing

==== Concepts and Terminology ====

//HPC// - High Performance Computing, solving big problems with big computing power

//Cluster// - A group of computers working together

//Node// - A single computer, many nodes join together to form a cluster

//Core/CPU// - A processing unit

//Job// - A problem or program to be run

//Parallel Overhead// - Extra time needed to setup and coordinate a parallel job: synchronizing, data exchange, start-up/termination, etc

//Scalability// - Ability of a system to handle more work or its ability to be enlarged to accommodate more work


==== Limits and Cost ====

$Speedup = \frac{1}{(P/N)+S}$

P = parallel fraction, N = number of processors, S = serial fraction

{{:images:amdahl2.gif|}}


Development cost of parallel programming is higher.


==== Memory Architectures ====


===== Batch Processing =====
==== Batch Processing and Shared Resources ====
The need for shared resources

//Our Cluster//
  * 16 cores on two nodes

//NERSC//
  * National Energy Research Scientific Computing Center
  * Primary scientific computing facility for the Office of Science in the U.S. Department of Energy
  * Multiple machines with over 100,000 cores
{{:images:nersc.jpg?300|}}

//Sunway TaihuLight//
  * in China, currently the top HPC center with 10,649,600 cores
{{::sunway-supercomputer-6.jpg?200|}}

Check [[https://www.top500.org/|www.top500.org]] for an updated list of the top500 supercomputers
==== Batch Processing Management ====
With multiple users wishing to access computing resources, we need a way to manage who gets to use what resources and when they can do so.

Some Resource Managers:
  * [[http://www.adaptivecomputing.com/products/open-source/torque/|Torque]]
  * [[https://computing.llnl.gov/linux/slurm/slurm.html|Slurm]]
  * [[http://www.adaptivecomputing.com/products/hpc-products/moab-hpc-suite-enterprise-edition/|Moab]]
===== Using HPC Resources at WSI =====
==== Accessing Resources via SSH ====

In the office:
  $ ssh user@control
  
==== Launching and Managing Your Jobs ====
[[torque_tutorial|Using Torque (NERSC)]]

[[slurm_tutorial|Using Slurm]]
==== Version Control ====
Often multiple programmers will be working on a codebase simultaneously. Alice may want to work on the code but Bob does too. If they both make changes to a file without telling each other, changes might get lost. Using a version control system ensures that Alice and Bob can work on the code together without undoing each-others work or breaking the code. 

=== Commonly used Version Control Systems ===
Github
SVN
===== HPC Best Practices =====

  * Redundant Data
  * Seperate Storage from Compute Nodes
  * Central Source for User Documentation