Submit batch jobs with kbatch
Introductionโ
The ability to run a notebook or a script from within the Nebari terminal is now possible with the addition of kbatch
. kbatch
is a small project that enables the user submit jobs or cronjobs to the Kubernetes API. Or in other words, this CLI tool allows a user to submit their notebook or script to run in a "headless" manner.
The idea of batch jobs is useful in situations where you need no human interaction, besides submitting it as a job, and the results can be efficiently saved to the cloud or other similar storage locations. Batch jobs can also be submitted to run on a schedule, these are known as cronjobs, more on kbatch cronjob
below.
There are a few known limitations at the moment, these include:
- No integration with the local Nebari file system, besides the notebook or script itself
- The need to specify an image which contains all your required packages and libraries
conda-store
built images are perfectly suited to solve this issue
- No artifact management
- If you need to save the output, make sure to save it to cloud storage, a hosted git repos, etc.
Initial configurationโ
Your Nebari platform comes with kbatch
, and all the necessary back-end components, pre-enabled. Consult your platform administrator or your nebari-config.yaml
if you are unsure. Or you can create another conda
env using conda-store
and add kbatch
to it.
note
kbatch
is available on Nebari version0.4.3
and greaterkbatch
is currently only available onpip
, notconda
kbatch configure
โ
From the terminal, activate the conda
environment which has kbatch
installed. By default, kbatch
is installed in the dask
conda
environment for those looking to get started quickly.
conda activate <conda-env>
note
kbatch
requires Python 3.9 or greater to work.
For JupyterHub to authenticate your future kbatch
requests, you will need to perform a one time configuration setup. For more details, visit the kbatch
documentation.
This one-time setup command is listed below, and requires two arguments:
--token
: generate aJUPYTERHUB_API_TOKEN
from yourhttps://<nebari_domain>/hub/token
--kbatch-url
: copy this exact URL (specific to Nebari deployments)http://kbatch-kbatch-proxy.dev.svc.cluster.local
Once completed, you should see a confirmation message that shows where this config file was saved:
$ kbatch configure --token <JUPYTERHUB_API_TOKEN> \
--kbatch-url http://kbatch-kbatch-proxy.dev.svc.cluster.local
Wrote config to /home/<username>/.config/kbatch/config.json
Submit a jobโ
When submitting a job to kbatch
, there are a handful of required arguments you will need to provide.
These arguments can be passed directly from the command-line, or you can create a configuration file and pass that to kbatch
.
For more details on how to use kbatch
please refer to the kbatch
documentation page.
To submit a job, you will need to specify:
name
- give your job a name.image
- specify the container image used to run your job.- this should include all the packages and libraries needed to run your job.
command
- the command that will start the job.code
- (optional) the path to the file or notebook used by the job.
note
If trying to submit a notebook as a job, the command
would be something like --command="['papermill', 'my-nb.ipynb']"
, where papermill
is a tool for parameterizing and executing Jupyter notebooks and should be included in your image
.
kbatch job submit \
--name="my-job" \
--image="ghcr.io/username/my-image:latest" \
--command="['papermill', 'my-nb.ipynb']" \
--code="./my-job.ipynb"
This should output the full Job specification that was submitted to Kubernetes.
Alternatively, all of these command line arguments can be consolidated into one configuration file.
name:" my-job"
image: "ghcr.io/username/my-image:latest"
command:
- "papermill"
- "my-nb.ipynb"
code: "my-nb.ipynb"
If the above YAML configuration file is saved as my-job.yaml
, then it can be submitted with the following command to produce the same effect as above.
kbatch job submit -f my-job.yaml
This job will now run without any feedback to the user. However, if you're interested in checking the status of your job, you can list the recently submitted jobs as follows.
$ kbatch job list --output table
Jobs
โโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโ
โ name โ submitted โ status โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ my-job-jfprp โ 2022-07-01T15:07:49+00:00 โ done โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโ
Submitting a cronjobโ
What if you wanted to submit a job to run a schedule, for example, to run once a week? This is where kbatch cronjob
comes in handy and luckily the interface is almost exactly the same. The only difference between job
and cronjob
is that the latter requires you to specify a schedule
.
name: "my-cronjob"
image: "ghcr.io/username/my-image:latest"
command:
- "papermill"
- "my-nb.ipynb"
code: "my-nb.ipynb"
schedule: "0 2 * * 7*"
The same job that you submitted above now can be submitted to run on a schedule. A cron schedule of 0 2 * * 7
means the job will run once every Sunday at 2:00AM
.
tip
You can check crontab.guru which is a nifty tool that tries to translate the schedule syntax into plain English.
kbatch cronjob submit -f my-cronjob.yaml
As with jobs, cronjobs can be listed.
$ kbatch cronjob list --output table
CronJobs
โโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโ
โ cronjob name โ started โ schedule โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ my-cronjob-cron-kb7d6 โ 2022-05-31T05:27:25+00:00 โ 0 2 * * 7 โ
โโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
This job will now run on the schedule specified and will continue to run indefinitely. The only way to stop this cronjob is to delete it.
kbatch cronjob delete my-cronjob-cron-kb7d6
note
It's important to remember that you are responsible for deleting cronjobs. If left unchecked, they will continue to run indefinitely.