Merge branch 'snippet-library' into nersc

Trevor Keller · Trevor Keller · commit 92087194bf61 · 2025-05-29T15:28:56.000-04:00
diff --git a/config.yaml b/config.yaml
@@ -65,7 +65,7 @@ contact: 'team@carpentries.org'
 
 # Order of episodes in your lesson
 episodes:
-  - 10-hpc-intro.md
+  - 10-hpc-intro.Rmd
   - 11-connecting.Rmd
   - 12-cluster.Rmd
   - 13-scheduler.Rmd
@@ -78,12 +78,15 @@ episodes:
 
 # Information for Learners
 learners:
+  - setup.md
 
 # Information for Instructors
 instructors:
+  - instructor-notes.Rmd
 
 # Learner Profiles
 profiles:
+  - learner-profiles.md
 
 # Customisation ---------------------------------------------
 #
diff --git a/episodes/10-hpc-intro.Rmd b/episodes/10-hpc-intro.Rmd
@@ -22,15 +22,15 @@ Frequently, research problems that use computing can outgrow the capabilities
 of the desktop or laptop computer where they started:
 
 - A statistics student wants to cross-validate a model. This involves running
-  the model 1000 times -- but each run takes an hour. Running the model on
+  the model 1000 times — but each run takes an hour. Running the model on
   a laptop will take over a month! In this research problem, final results are
   calculated after all 1000 models have run, but typically only one model is
   run at a time (in **serial**) on the laptop. Since each of the 1000 runs is
   independent of all others, and given enough computers, it's theoretically
   possible to run them all at once (in **parallel**).
 - A genomics researcher has been using small datasets of sequence data, but
   soon will be receiving a new type of sequencing data that is 10 times as
-  large. It's already challenging to open the datasets on a computer --
+  large. It's already challenging to open the datasets on a computer —
   analyzing these larger datasets will probably crash it. In this research
   problem, the calculations required might be impossible to parallelize, but a
   computer with **more memory** would be required to analyze the much larger
@@ -54,7 +54,7 @@ problems in parallel**.
 
 ## Jargon Busting Presentation
 
-Open the [HPC Jargon Buster](../files/jargon#p1)
+Open the [HPC Jargon Buster](files/jargon.html#p1)
 in a new tab. To present the content, press `C` to open a **c**lone in a
 separate window, then press `P` to toggle **p**resentation mode.
 
@@ -71,48 +71,44 @@ results.
 ## Some Ideas
 
 - Checking email: your computer (possibly in your pocket) contacts a remote
-  machine, authenticates, and downloads a list of new messages; it also
-  uploads changes to message status, such as whether you read, marked as
-  junk, or deleted the message. Since yours is not the only account, the
-  mail server is probably one of many in a data center.
-- Searching for a phrase online involves comparing your search term against
-  a massive database of all known sites, looking for matches. This "query"
+  machine, authenticates, and downloads a list of new messages; it also uploads
+  changes to message status, such as whether you read, marked as junk, or
+  deleted the message. Since yours is not the only account, the mail server is
+  probably one of many in a data center.
+- Searching for a phrase online involves comparing your search term against a
+  massive database of all known sites, looking for matches. This "query"
   operation can be straightforward, but building that database is a
   [monumental task][mapreduce]! Servers are involved at every step.
-- Searching for directions on a mapping website involves connecting your
-  (A) starting and (B) end points by [traversing a graph][dijkstra] in
-  search of the "shortest" path by distance, time, expense, or another
-  metric. Converting a map into the right form is relatively simple, but
-  calculating all the possible routes between A and B is expensive.
+- Searching for directions on a mapping website involves connecting your (A)
+  starting and (B) end points by [traversing a graph][dijkstra] in search of
+  the "shortest" path by distance, time, expense, or another metric. Converting
+  a map into the right form is relatively simple, but calculating all the
+  possible routes between A and B is expensive.
 
 Checking email could be serial: your machine connects to one server and
 exchanges data. Searching by querying the database for your search term (or
-endpoints) could also be serial, in that one machine receives your query
-and returns the result. However, assembling and storing the full database
-is far beyond the capability of any one machine. Therefore, these functions
-are served in parallel by a large, ["hyperscale"][hyperscale] collection of
-servers working together.
-
-
+endpoints) could also be serial, in that one machine receives your query and
+returns the result. However, assembling and storing the full database is far
+beyond the capability of any one machine. Therefore, these functions are served
+in parallel by a large, ["hyperscale"][hyperscale] collection of servers
+working together.
 
 :::::::::::::::::::::::::
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 
-
-
 [mapreduce]: https://en.wikipedia.org/wiki/MapReduce
 [dijkstra]: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
 [hyperscale]: https://en.wikipedia.org/wiki/Hyperscale_computing
 
-
 :::::::::::::::::::::::::::::::::::::::: keypoints
 
-- High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world.
-- These other systems can be used to do work that would either be impossible or much slower on smaller systems.
+- High Performance Computing (HPC) typically involves connecting to very large
+  computing systems elsewhere in the world.
+- These other systems can be used to do work that would either be impossible or
+  much slower on smaller systems.
 - HPC resources are shared by multiple users.
-- The standard method of interacting with such systems is via a command line interface.
+- The standard method of interacting with such systems is via a command line
+  interface.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
-
-
diff --git a/episodes/11-connecting.Rmd b/episodes/11-connecting.Rmd
@@ -38,15 +38,17 @@ results.
 If you have ever opened the Windows Command Prompt or macOS Terminal, you have
 seen a CLI. If you have already taken The Carpentries' courses on the UNIX
 Shell or Version Control, you have used the CLI on your *local machine*
-extensively. The only leap to be made here is to open a CLI on a *remote machine*,
-while taking some precautions so that other folks on the network can't see (or
-change) the commands you're running or the results the remote machine sends
-back. We will use the Secure SHell protocol (or SSH) to open an encrypted
-network connection between two machines, allowing you to send \& receive text
-and data without having to worry about prying eyes.
-
-![](/fig/connect-to-remote.svg){max-width="50%" alt="Connect to cluster"}
-
+extensively. The only leap to be made here is to open a CLI on a *remote
+machine*, while taking some precautions so that other folks on the network
+can't see (or change) the commands you're running or the results the remote
+machine sends back. We will use the Secure SHell protocol (or SSH) to open an
+encrypted network connection between two machines, allowing you to send \&
+receive text and data without having to worry about prying eyes.
+
+![connect-to-remote.svg](/fig/connect-to-remote.svg){
+  max-width="50%"
+  alt="Connect to cluster. "
+}
 
 SSH clients are usually command-line tools, where you provide the remote
 machine address as the only required argument. If your username on the remote
diff --git a/episodes/13-hpcc-scheduler/hpcc/section2.rmd b/episodes/13-hpcc-scheduler/hpcc/section2.rmd
diff --git a/episodes/load_config.R b/episodes/load_config.R
@@ -1,37 +1,36 @@
+## R script to chain-load lesson configuration YAML files.
+## Top-level configuration is `/episodes/lesson_config.yml`
 
-# Function to merge two lists (with overrides)
-merge_lists <- function(base, override) {
-  modifyList(base, override)
-}
-
-# Load required package
 library(yaml)
-# Load primary configuration
+
+## Load primary configuration
 config <- yaml.load_file("lesson_config.yaml")
-# If 'config' key exists, load the second configuration and merge
+
+## If "config" key exists, load the second configuration and merge
 if (!is.null(config$main_config) && file.exists(config$main_config)) {
   override_config <- yaml.load_file(config$main_config)
-  config <- merge_lists(config, override_config)
+  config <- modifyList(config, override_config)
 }
 
-snippets <- paste('files/snippets/', config$snippets, sep='')
+snippets <- paste("files/snippets/", config$snippets, sep="")
 
 # Extract main and fallback paths from config
-main_snippets <- config$main_snippets
+main_snippets     <- config$main_snippets
 fallback_snippets <- config$fallback_snippets
 
 # Function to choose the correct document path (or return NULL if neither exists)
 choose_doc <- function(child_file) {
   # Get the current document name (without extension)
   current_doc <- tools::file_path_sans_ext(knitr::current_input(dir = TRUE))
-  
+
   # Build paths for the child document inside subdirectories
   doc_paths <- list(
     main = file.path(current_doc, main_snippets, child_file),
     fallback = file.path(current_doc, fallback_snippets, child_file)
   )
   print(doc_paths)
   print(getwd())
+
   # Return the valid path, or NULL if neither exists
   if (file.exists(doc_paths$main)) {
     print("Returning")
@@ -45,4 +44,4 @@ choose_doc <- function(child_file) {
     print("Returning NULL")
     return(NULL)  # Return NULL if neither path exists
   }
-}
+}
diff --git a/episodes/slurm_defaults.yaml b/episodes/slurm_defaults.yaml
@@ -0,0 +1,70 @@
+# Fail-safe defaults and implicit schema for lesson configuration files
+---
+snippets: slurm
+baseurl: "https://ocaisa.github.io/probable-pancake/"
+# main_config: "lesson_config.yaml"
+
+# about the Learner's laptop
+local:
+  prompt: "[you@laptop:~]$"                  # command-line prompt
+  shebang: "#!/bin/bash"                     # first line of every shell script
+
+# about the remote/cluster environment
+remote:
+  name: "Example Cluster"                    # Name of the cluster (proper noun)
+  login: "cluster.example.com"               # domain name of the login node
+  host: "head"                               # hostname of the login node
+  node: "node"                               # hostname of a compute node
+  location: "SchedMD"                        # institutional host of the cluster
+  homedir: "/home"                           # parent of home directories
+  user: "userid"                             # stand-in for the username
+  prompt: "[userid@head:~]"                  # command-line prompt
+  prompt_work: "[userid@head:/work/userid]"  # prompt under /work
+  module_python3: "Python"                   # name of the module providing py3
+  shebang: "#!/bin/bash"                     # first line of every shell script
+
+# Commands & flags for the scheduler environment
+sched:
+  name: "Slurm"                              # proper name of the scheduler
+  command:
+    batch:       "sbatch"                    # run later
+    interactive: "srun"                      # run now
+    cancel:      "scancel"                   # don't run
+  queue:
+    test: "debug"
+    prod: "batch"
+  status: "squeue"
+  flag:
+    user: "-u userid"
+    interactive: "--pty bash"
+    histdetail: "-l -j"
+    name: "-J"
+    time: "-t"
+    queue: "-p"
+    nodes: "-N"
+    tasks: "-n"
+  del: "scancel"
+  interactive:
+    command: "srun"
+  info:
+    command: "sinfo"
+  comment: "#SBATCH"
+  hist: "sacct -u userid"
+  hist_filter: ""
+  reservation: ""
+  qos: ""
+  budget: ""
+  project: ""
+
+# submit:
+#   salloc:   obtain a job allocation
+#   sbatch:   submit a batch script for later execution
+#   srun:     obtain an allocation and execute an application
+# account:
+#   sacct:    display accounting data
+# manage:
+#   sbcast:   transfer a file to a job's compute nodes
+#   scancel:  signal jobs/steps
+#   squeue:   view information about jobs
+#   sinfo:    view information about nodes & partitions
+#   scontrol: view & modify state
diff --git a/instructors/instructor-notes.Rmd b/instructors/instructor-notes.Rmd