Submit batch jobs with kbatch
Introductionโ
The ability to run a notebook or a script from within the Nebari terminal is now possible with the addition of kbatch. kbatch is a small project that enables the user submit jobs or cronjobs to the Kubernetes API. Or in other words, this CLI tool allows a user to submit their notebook or script to run in a "headless" manner.
The idea of batch jobs is useful in situations where you need no human interaction, besides submitting it as a job, and the results can be efficiently saved to the cloud or other similar storage locations. Batch jobs can also be submitted to run on a schedule, these are known as cronjobs, more on kbatch cronjob below.
There are a few known limitations at the moment, these include:
- No integration with the local Nebari file system, besides the notebook or script itself
- The need to specify an image which contains all your required packages and libraries
conda-storebuilt images are perfectly suited to solve this issue
- No artifact management
- If you need to save the output, make sure to save it to cloud storage, a hosted git repos, etc.
Initial configurationโ
Your Nebari platform comes with kbatch, and all the necessary back-end components, pre-enabled. Consult your platform administrator or your nebari-config.yaml if you are unsure. Or you can create another conda env using conda-store and add kbatch to it.
note
kbatchis available on Nebari version0.4.3and greaterkbatchis currently only available onpip, notconda
kbatch configureโ
From the terminal, activate the conda environment which has kbatch installed. By default, kbatch is installed in the dask conda environment for those looking to get started quickly.
conda activate <conda-env>
note
kbatch requires Python 3.9 or greater to work.
For JupyterHub to authenticate your future kbatch requests, you will need to perform a one time configuration setup. For more details, visit the kbatch documentation.
This one-time setup command is listed below, and requires two arguments:
--token: generate aJUPYTERHUB_API_TOKENfrom yourhttps://<nebari_domain>/hub/token--kbatch-url: copy this exact URL (specific to Nebari deployments)http://kbatch-kbatch-proxy.dev.svc.cluster.local
Once completed, you should see a confirmation message that shows where this config file was saved:
$ kbatch configure --token <JUPYTERHUB_API_TOKEN> \
--kbatch-url http://kbatch-kbatch-proxy.dev.svc.cluster.local
Wrote config to /home/<username>/.config/kbatch/config.json
Submit a jobโ
When submitting a job to kbatch, there are a handful of required arguments you will need to provide.
These arguments can be passed directly from the command-line, or you can create a configuration file and pass that to kbatch.
For more details on how to use kbatch please refer to the kbatch documentation page.
To submit a job, you will need to specify:
name- give your job a name.image- specify the container image used to run your job.- this should include all the packages and libraries needed to run your job.
command- the command that will start the job.code- (optional) the path to the file or notebook used by the job.
note
If trying to submit a notebook as a job, the command would be something like --command="['papermill', 'my-nb.ipynb']", where papermill is a tool for parameterizing and executing Jupyter notebooks and should be included in your image.
kbatch job submit \
--name="my-job" \
--image="ghcr.io/username/my-image:latest" \
--command="['papermill', 'my-nb.ipynb']" \
--code="./my-job.ipynb"
This should output the full Job specification that was submitted to Kubernetes.
Alternatively, all of these command line arguments can be consolidated into one configuration file.
name:" my-job"
image: "ghcr.io/username/my-image:latest"
command:
- "papermill"
- "my-nb.ipynb"
code: "my-nb.ipynb"
If the above YAML configuration file is saved as my-job.yaml, then it can be submitted with the following command to produce the same effect as above.
kbatch job submit -f my-job.yaml
This job will now run without any feedback to the user. However, if you're interested in checking the status of your job, you can list the recently submitted jobs as follows.
$ kbatch job list --output table
Jobs
โโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโ
โ name โ submitted โ status โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ my-job-jfprp โ 2022-07-01T15:07:49+00:00 โ done โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโ
Submitting a cronjobโ
What if you wanted to submit a job to run a schedule, for example, to run once a week? This is where kbatch cronjob comes in handy and luckily the interface is almost exactly the same. The only difference between job and cronjob is that the latter requires you to specify a schedule.
name: "my-cronjob"
image: "ghcr.io/username/my-image:latest"
command:
- "papermill"
- "my-nb.ipynb"
code: "my-nb.ipynb"
schedule: "0 2 * * 7*"
The same job that you submitted above now can be submitted to run on a schedule. A cron schedule of 0 2 * * 7 means the job will run once every Sunday at 2:00AM.
tip
You can check crontab.guru which is a nifty tool that tries to translate the schedule syntax into plain English.
kbatch cronjob submit -f my-cronjob.yaml
As with jobs, cronjobs can be listed.
$ kbatch cronjob list --output table
CronJobs
โโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโ
โ cronjob name โ started โ schedule โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ my-cronjob-cron-kb7d6 โ 2022-05-31T05:27:25+00:00 โ 0 2 * * 7 โ
โโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
This job will now run on the schedule specified and will continue to run indefinitely. The only way to stop this cronjob is to delete it.
kbatch cronjob delete my-cronjob-cron-kb7d6
note
It's important to remember that you are responsible for deleting cronjobs. If left unchecked, they will continue to run indefinitely.