bash - How to handle SLURM job dependencies when intermediate jobs are dynamically generated? - Stack Overflow

admin2025-04-16  2

I'm working with SLURM and need to create and submit a pipeline where jobs run sequentially: job_1 → job_2 → job_3. However, job_2.sh is unique because it's generated dynamically by job_1.sh using a Python script.

Since job_2.sh doesn't exist when the pipeline starts, I can't directly create dependencies between the jobs. To work around this, I created an intermediate job job_1_5.sh that submits job_2.sh after job_1.sh has generated it. However, now when I want to submit job_3, I only have the id for job_1_5.

Here's my current pipeline structure:

  1. Submit job_1.sh which generates job_2.sh via Python
  2. Submit job_1_5.sh (depends on job_1.sh) which submits job_2.sh
  3. Submit job_3.sh (depends on job_1_5.sh and job_2.sh)

Here's my implementation:

#pipeline.sh

#!/bin/bash

# Submit job_1 which will generate and submit job_2
job1_id=$(sbatch --parsable job_1.sh)

job1_5_id=$(sbatch --parsable --dependency=afterok:$job1_id job_1_5.sh)

job3_id=$(sbatch --parsable --dependency=afterok:$job1_5_id job_3.sh) #I also want a dependency on `job_2.sh`

#job_1.sh

#!/bin/bash
#SBATCH --job-name=job1
#SBATCH --output=job1_%j.out
#SBATCH --time=00:05:00

echo "This is job 1"
python generate_job2.py

#generate_job2.py

#!/usr/bin/env python3

def generate_job2():
    with open('job_2.sh', 'w') as f:
        f.write('''#!/bin/bash
#SBATCH --job-name=job2
#SBATCH --output=job2_%j.out
#SBATCH --time=00:05:00

echo "This is job 2"
sleep 2
''')

if __name__ == "__main__":
    generate_job2()

#job_1_5.sh

#!/bin/bash
#SBATCH --job-name=job1_5
#SBATCH --output=job1_5_%j.out
#SBATCH --time=00:05:00
#SBATCH --dependency=afterok:$1  # Depends on job_1


# Submit job_2
job_2_id=$(sbatch --parsable job_2.sh)
echo $job_2_id > job2_id.txt

#job_2.sh

#!/bin/bash
#SBATCH --job-name=job2
#SBATCH --output=job2_%j.out
#SBATCH --time=00:05:00

echo "This is job 2"
sleep 2

#job_3.sh

#!/bin/bash
#SBATCH --job-name=job3
#SBATCH --output=job3_%j.out
#SBATCH --time=00:05:00

echo "This is job 3"

How can I properly ensure that job_3.sh only runs after the dynamically generated job_2.sh has completed?

My attempts at fixing it -

I tried writing the job_id for job_2 into a file. However, that won't allow me to utilize that dependency id, as the id won't be generated until job_2 runs.

Another potential solution would be somehow placing an id for job_2, rather than expecting slurm to generate it. That way, we can easily place a constraint on job_3 to ensure job_2 goes first.

I'm working with SLURM and need to create and submit a pipeline where jobs run sequentially: job_1 → job_2 → job_3. However, job_2.sh is unique because it's generated dynamically by job_1.sh using a Python script.

Since job_2.sh doesn't exist when the pipeline starts, I can't directly create dependencies between the jobs. To work around this, I created an intermediate job job_1_5.sh that submits job_2.sh after job_1.sh has generated it. However, now when I want to submit job_3, I only have the id for job_1_5.

Here's my current pipeline structure:

  1. Submit job_1.sh which generates job_2.sh via Python
  2. Submit job_1_5.sh (depends on job_1.sh) which submits job_2.sh
  3. Submit job_3.sh (depends on job_1_5.sh and job_2.sh)

Here's my implementation:

#pipeline.sh

#!/bin/bash

# Submit job_1 which will generate and submit job_2
job1_id=$(sbatch --parsable job_1.sh)

job1_5_id=$(sbatch --parsable --dependency=afterok:$job1_id job_1_5.sh)

job3_id=$(sbatch --parsable --dependency=afterok:$job1_5_id job_3.sh) #I also want a dependency on `job_2.sh`

#job_1.sh

#!/bin/bash
#SBATCH --job-name=job1
#SBATCH --output=job1_%j.out
#SBATCH --time=00:05:00

echo "This is job 1"
python generate_job2.py

#generate_job2.py

#!/usr/bin/env python3

def generate_job2():
    with open('job_2.sh', 'w') as f:
        f.write('''#!/bin/bash
#SBATCH --job-name=job2
#SBATCH --output=job2_%j.out
#SBATCH --time=00:05:00

echo "This is job 2"
sleep 2
''')

if __name__ == "__main__":
    generate_job2()

#job_1_5.sh

#!/bin/bash
#SBATCH --job-name=job1_5
#SBATCH --output=job1_5_%j.out
#SBATCH --time=00:05:00
#SBATCH --dependency=afterok:$1  # Depends on job_1


# Submit job_2
job_2_id=$(sbatch --parsable job_2.sh)
echo $job_2_id > job2_id.txt

#job_2.sh

#!/bin/bash
#SBATCH --job-name=job2
#SBATCH --output=job2_%j.out
#SBATCH --time=00:05:00

echo "This is job 2"
sleep 2

#job_3.sh

#!/bin/bash
#SBATCH --job-name=job3
#SBATCH --output=job3_%j.out
#SBATCH --time=00:05:00

echo "This is job 3"

How can I properly ensure that job_3.sh only runs after the dynamically generated job_2.sh has completed?

My attempts at fixing it -

I tried writing the job_id for job_2 into a file. However, that won't allow me to utilize that dependency id, as the id won't be generated until job_2 runs.

Another potential solution would be somehow placing an id for job_2, rather than expecting slurm to generate it. That way, we can easily place a constraint on job_3 to ensure job_2 goes first.

Share Improve this question edited Feb 3 at 16:05 desert_ranger asked Feb 3 at 15:51 desert_rangerdesert_ranger 1,7733 gold badges20 silver badges43 bronze badges 9
  • so wait for job2? Or write your own pipeline. There's also sbatch --wait – KamilCuk Commented Feb 3 at 15:53
  • Ah, the problem with sbatch --wait is that the job submission script will have to wait until the current job executes. My objective was to submit all jobs in one go. – desert_ranger Commented Feb 3 at 15:58
  • @KamilCuk I don't understand this phrase - write your own pipeline. – desert_ranger Commented Feb 3 at 16:06
  • Hi, yes, the "write your own pipeline" means using sbatch --wait. to submit all jobs in one go You can submit the script that uses sbatch --wait . – KamilCuk Commented Feb 3 at 16:09
  • Do your jobs have different ressource allocations? There's no much point in submitting 4 different jobs that will run sequentially if it isn't the case. – Fravadona Commented Feb 3 at 16:58
 |  Show 4 more comments

1 Answer 1

Reset to default 1

The easiest way to achieve that is to simply submit the N+1th job from the Nth job:

job_1.sh:

#!/bin/bash
#SBATCH --job-name=job1
#SBATCH --output=job1_%j.out
#SBATCH --time=00:05:00

echo "This is job 1"
python generate_job2.py
sbatch generate_job2.py

and so on, adapting generate_job2.py so that is also submits Job3.

Two caveats:

  • some clusters forbid/prevent users from submitting new jobs from compute nodes (but the default Slurm configuration will allow it)
  • if the priority increases with the ineligible waiting time in the queue (which is not the default), then you do not gain that priority increase with this strategy compared with submitting all jobs from the beginning.

Another strategy, if the generate_job2.py script does not influence the resource request, and only generates the commands to run in the submission script, is to prepare for each dependent jobs a submission script that only runs an external Bash script yet to be written by the previous job.

Something like

#!/bin/bash
#SBATCH --job-name=job2
#SBATCH --output=job2_%j.out
#SBATCH --time=00:05:00

./script_to_be_written_by_job1

Slurm will not check whether or not script_to_be_written_by_job1 exists at submission time. So if you setup the dependencies properly, by the time Slurm tries to run the script, it will exist.

转载请注明原文地址:http://www.anycun.com/QandA/1744763446a87276.html