SCR
v3.1.0
  • Quick Start
    • Building SCR
      • CMake
      • Spack
    • Building the SCR test_api Example
    • Running the SCR test_api Example
    • Adding SCR to Your Application
    • Final Thoughts
  • Assumptions
  • Concepts
    • Jobs, allocations, and runs
    • Group, store, and redundancy descriptors
    • Control, cache, and prefix directories
    • Example of SCR files and directories
    • Redundancy schemes
    • Scalable restart
    • Catastrophic failures
    • Fetch, flush, and scavenge
  • Build SCR
    • Dependencies
    • CMake
    • Spack
  • SCR API
    • Startup and Shutdown API
      • SCR_Init
      • SCR_Finalize
      • SCR_Config
    • File Routing API
      • SCR_Route_file
    • Checkpoint/Output API
      • SCR_Need_checkpoint
      • SCR_Start_output
      • SCR_Route_file
      • SCR_Complete_output
    • Restart API
      • SCR_Have_restart
      • SCR_Start_restart
      • SCR_Route_file
      • SCR_Complete_restart
    • General API
      • SCR_Get_version
      • SCR_Should_exit
    • Dataset Management API
      • SCR_Current
      • SCR_Delete
      • SCR_Drop
    • Space/time semantics
    • SCR API state transitions
  • SCR Python API
    • SCR Python module
      • Implementation of the SCR Python module
      • Installing the SCR Python module
      • Using the SCR Python module
  • Integrate SCR
    • Using the SCR API
      • Init/Finalize
      • Checkpoint
      • Restart with SCR
      • Restart without SCR
      • Configure SCR for application settings
    • Building with the SCR library
  • Configure a job
    • Setting parameters
    • Common configurations
      • Enable debug messages
      • Specify the job output directory
      • Specify which checkpoint to load
      • File-per-process vs shared access
      • Write file-per-process, read file-per-process
      • Write file-per-process, read with shared access
      • Write with shared access
      • Change checkpoint flush frequency
      • Change cache location
      • Change control and cache location
      • Increase cache size
      • Change redundancy schemes
      • Enable asynchronous flush
      • Restart with a different number of processes
    • Group, store, and checkpoint descriptors
      • Example using SINGLE and XOR
    • SCR parameters
  • Run a job
    • Supported platforms
    • Jobs and job steps
    • Tolerating node failures
    • The SCR wrapper scripts
    • Using the SCR wrapper scripts
    • Example batch script for using scavenge, but no restart
    • Example batch script for using scavenge and restart
    • Example SLURM batch script with scr_run using scavenge and restart
    • Managing SCR jobs with Python scripts
      • ClusterShell (optional)
      • SCR User Commands
  • Halt a job
    • scr_halt and the halt file
    • Halt after next checkpoint
    • Halt before or after a specified time
    • Halt immediately
    • Catch a hanging job
    • Combine, list, change, and unset halt conditions
    • Remove the halt file
  • Manage datasets
SCR
  • Search


© Copyright 2017, SCR. Revision c96e29ab.

Built with Sphinx using a theme provided by Read the Docs.