Skip to content

Commit 9208719

Browse files
author
Trevor Keller
committed
Merge branch 'snippet-library' into nersc
2 parents ac9b0bc + 34b46ad commit 9208719

File tree

7 files changed

+122
-60
lines changed

7 files changed

+122
-60
lines changed

config.yaml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ contact: '[email protected]'
6565

6666
# Order of episodes in your lesson
6767
episodes:
68-
- 10-hpc-intro.md
68+
- 10-hpc-intro.Rmd
6969
- 11-connecting.Rmd
7070
- 12-cluster.Rmd
7171
- 13-scheduler.Rmd
@@ -78,12 +78,15 @@ episodes:
7878

7979
# Information for Learners
8080
learners:
81+
- setup.md
8182

8283
# Information for Instructors
8384
instructors:
85+
- instructor-notes.Rmd
8486

8587
# Learner Profiles
8688
profiles:
89+
- learner-profiles.md
8790

8891
# Customisation ---------------------------------------------
8992
#
Lines changed: 25 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,15 @@ Frequently, research problems that use computing can outgrow the capabilities
2222
of the desktop or laptop computer where they started:
2323

2424
- A statistics student wants to cross-validate a model. This involves running
25-
the model 1000 times -- but each run takes an hour. Running the model on
25+
the model 1000 times but each run takes an hour. Running the model on
2626
a laptop will take over a month! In this research problem, final results are
2727
calculated after all 1000 models have run, but typically only one model is
2828
run at a time (in **serial**) on the laptop. Since each of the 1000 runs is
2929
independent of all others, and given enough computers, it's theoretically
3030
possible to run them all at once (in **parallel**).
3131
- A genomics researcher has been using small datasets of sequence data, but
3232
soon will be receiving a new type of sequencing data that is 10 times as
33-
large. It's already challenging to open the datasets on a computer --
33+
large. It's already challenging to open the datasets on a computer
3434
analyzing these larger datasets will probably crash it. In this research
3535
problem, the calculations required might be impossible to parallelize, but a
3636
computer with **more memory** would be required to analyze the much larger
@@ -54,7 +54,7 @@ problems in parallel**.
5454

5555
## Jargon Busting Presentation
5656

57-
Open the [HPC Jargon Buster](../files/jargon#p1)
57+
Open the [HPC Jargon Buster](files/jargon.html#p1)
5858
in a new tab. To present the content, press `C` to open a **c**lone in a
5959
separate window, then press `P` to toggle **p**resentation mode.
6060

@@ -71,48 +71,44 @@ results.
7171
## Some Ideas
7272

7373
- Checking email: your computer (possibly in your pocket) contacts a remote
74-
machine, authenticates, and downloads a list of new messages; it also
75-
uploads changes to message status, such as whether you read, marked as
76-
junk, or deleted the message. Since yours is not the only account, the
77-
mail server is probably one of many in a data center.
78-
- Searching for a phrase online involves comparing your search term against
79-
a massive database of all known sites, looking for matches. This "query"
74+
machine, authenticates, and downloads a list of new messages; it also uploads
75+
changes to message status, such as whether you read, marked as junk, or
76+
deleted the message. Since yours is not the only account, the mail server is
77+
probably one of many in a data center.
78+
- Searching for a phrase online involves comparing your search term against a
79+
massive database of all known sites, looking for matches. This "query"
8080
operation can be straightforward, but building that database is a
8181
[monumental task][mapreduce]! Servers are involved at every step.
82-
- Searching for directions on a mapping website involves connecting your
83-
(A) starting and (B) end points by [traversing a graph][dijkstra] in
84-
search of the "shortest" path by distance, time, expense, or another
85-
metric. Converting a map into the right form is relatively simple, but
86-
calculating all the possible routes between A and B is expensive.
82+
- Searching for directions on a mapping website involves connecting your (A)
83+
starting and (B) end points by [traversing a graph][dijkstra] in search of
84+
the "shortest" path by distance, time, expense, or another metric. Converting
85+
a map into the right form is relatively simple, but calculating all the
86+
possible routes between A and B is expensive.
8787

8888
Checking email could be serial: your machine connects to one server and
8989
exchanges data. Searching by querying the database for your search term (or
90-
endpoints) could also be serial, in that one machine receives your query
91-
and returns the result. However, assembling and storing the full database
92-
is far beyond the capability of any one machine. Therefore, these functions
93-
are served in parallel by a large, ["hyperscale"][hyperscale] collection of
94-
servers working together.
95-
96-
90+
endpoints) could also be serial, in that one machine receives your query and
91+
returns the result. However, assembling and storing the full database is far
92+
beyond the capability of any one machine. Therefore, these functions are served
93+
in parallel by a large, ["hyperscale"][hyperscale] collection of servers
94+
working together.
9795

9896
:::::::::::::::::::::::::
9997

10098
::::::::::::::::::::::::::::::::::::::::::::::::::
10199

102-
103-
104100
[mapreduce]: https://en.wikipedia.org/wiki/MapReduce
105101
[dijkstra]: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
106102
[hyperscale]: https://en.wikipedia.org/wiki/Hyperscale_computing
107103

108-
109104
:::::::::::::::::::::::::::::::::::::::: keypoints
110105

111-
- High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world.
112-
- These other systems can be used to do work that would either be impossible or much slower on smaller systems.
106+
- High Performance Computing (HPC) typically involves connecting to very large
107+
computing systems elsewhere in the world.
108+
- These other systems can be used to do work that would either be impossible or
109+
much slower on smaller systems.
113110
- HPC resources are shared by multiple users.
114-
- The standard method of interacting with such systems is via a command line interface.
111+
- The standard method of interacting with such systems is via a command line
112+
interface.
115113

116114
::::::::::::::::::::::::::::::::::::::::::::::::::
117-
118-

episodes/11-connecting.Rmd

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -38,15 +38,17 @@ results.
3838
If you have ever opened the Windows Command Prompt or macOS Terminal, you have
3939
seen a CLI. If you have already taken The Carpentries' courses on the UNIX
4040
Shell or Version Control, you have used the CLI on your *local machine*
41-
extensively. The only leap to be made here is to open a CLI on a *remote machine*,
42-
while taking some precautions so that other folks on the network can't see (or
43-
change) the commands you're running or the results the remote machine sends
44-
back. We will use the Secure SHell protocol (or SSH) to open an encrypted
45-
network connection between two machines, allowing you to send \& receive text
46-
and data without having to worry about prying eyes.
47-
48-
![](/fig/connect-to-remote.svg){max-width="50%" alt="Connect to cluster"}
49-
41+
extensively. The only leap to be made here is to open a CLI on a *remote
42+
machine*, while taking some precautions so that other folks on the network
43+
can't see (or change) the commands you're running or the results the remote
44+
machine sends back. We will use the Secure SHell protocol (or SSH) to open an
45+
encrypted network connection between two machines, allowing you to send \&
46+
receive text and data without having to worry about prying eyes.
47+
48+
![connect-to-remote.svg](/fig/connect-to-remote.svg){
49+
max-width="50%"
50+
alt="Connect to cluster. "
51+
}
5052

5153
SSH clients are usually command-line tools, where you provide the remote
5254
machine address as the only required argument. If your username on the remote

episodes/13-hpcc-scheduler/hpcc/section2.rmd

Lines changed: 0 additions & 8 deletions
This file was deleted.

episodes/load_config.R

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,36 @@
1+
## R script to chain-load lesson configuration YAML files.
2+
## Top-level configuration is `/episodes/lesson_config.yml`
13

2-
# Function to merge two lists (with overrides)
3-
merge_lists <- function(base, override) {
4-
modifyList(base, override)
5-
}
6-
7-
# Load required package
84
library(yaml)
9-
# Load primary configuration
5+
6+
## Load primary configuration
107
config <- yaml.load_file("lesson_config.yaml")
11-
# If 'config' key exists, load the second configuration and merge
8+
9+
## If "config" key exists, load the second configuration and merge
1210
if (!is.null(config$main_config) && file.exists(config$main_config)) {
1311
override_config <- yaml.load_file(config$main_config)
14-
config <- merge_lists(config, override_config)
12+
config <- modifyList(config, override_config)
1513
}
1614

17-
snippets <- paste('files/snippets/', config$snippets, sep='')
15+
snippets <- paste("files/snippets/", config$snippets, sep="")
1816

1917
# Extract main and fallback paths from config
20-
main_snippets <- config$main_snippets
18+
main_snippets <- config$main_snippets
2119
fallback_snippets <- config$fallback_snippets
2220

2321
# Function to choose the correct document path (or return NULL if neither exists)
2422
choose_doc <- function(child_file) {
2523
# Get the current document name (without extension)
2624
current_doc <- tools::file_path_sans_ext(knitr::current_input(dir = TRUE))
27-
25+
2826
# Build paths for the child document inside subdirectories
2927
doc_paths <- list(
3028
main = file.path(current_doc, main_snippets, child_file),
3129
fallback = file.path(current_doc, fallback_snippets, child_file)
3230
)
3331
print(doc_paths)
3432
print(getwd())
33+
3534
# Return the valid path, or NULL if neither exists
3635
if (file.exists(doc_paths$main)) {
3736
print("Returning")
@@ -45,4 +44,4 @@ choose_doc <- function(child_file) {
4544
print("Returning NULL")
4645
return(NULL) # Return NULL if neither path exists
4746
}
48-
}
47+
}

episodes/slurm_defaults.yaml

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Fail-safe defaults and implicit schema for lesson configuration files
2+
---
3+
snippets: slurm
4+
baseurl: "https://ocaisa.github.io/probable-pancake/"
5+
# main_config: "lesson_config.yaml"
6+
7+
# about the Learner's laptop
8+
local:
9+
prompt: "[you@laptop:~]$" # command-line prompt
10+
shebang: "#!/bin/bash" # first line of every shell script
11+
12+
# about the remote/cluster environment
13+
remote:
14+
name: "Example Cluster" # Name of the cluster (proper noun)
15+
login: "cluster.example.com" # domain name of the login node
16+
host: "head" # hostname of the login node
17+
node: "node" # hostname of a compute node
18+
location: "SchedMD" # institutional host of the cluster
19+
homedir: "/home" # parent of home directories
20+
user: "userid" # stand-in for the username
21+
prompt: "[userid@head:~]" # command-line prompt
22+
prompt_work: "[userid@head:/work/userid]" # prompt under /work
23+
module_python3: "Python" # name of the module providing py3
24+
shebang: "#!/bin/bash" # first line of every shell script
25+
26+
# Commands & flags for the scheduler environment
27+
sched:
28+
name: "Slurm" # proper name of the scheduler
29+
command:
30+
batch: "sbatch" # run later
31+
interactive: "srun" # run now
32+
cancel: "scancel" # don't run
33+
queue:
34+
test: "debug"
35+
prod: "batch"
36+
status: "squeue"
37+
flag:
38+
user: "-u userid"
39+
interactive: "--pty bash"
40+
histdetail: "-l -j"
41+
name: "-J"
42+
time: "-t"
43+
queue: "-p"
44+
nodes: "-N"
45+
tasks: "-n"
46+
del: "scancel"
47+
interactive:
48+
command: "srun"
49+
info:
50+
command: "sinfo"
51+
comment: "#SBATCH"
52+
hist: "sacct -u userid"
53+
hist_filter: ""
54+
reservation: ""
55+
qos: ""
56+
budget: ""
57+
project: ""
58+
59+
# submit:
60+
# salloc: obtain a job allocation
61+
# sbatch: submit a batch script for later execution
62+
# srun: obtain an allocation and execute an application
63+
# account:
64+
# sacct: display accounting data
65+
# manage:
66+
# sbcast: transfer a file to a job's compute nodes
67+
# scancel: signal jobs/steps
68+
# squeue: view information about jobs
69+
# sinfo: view information about nodes & partitions
70+
# scontrol: view & modify state

0 commit comments

Comments
 (0)