Now Reading
GitHub Actions as a time-sharing supercomputer

GitHub Actions as a time-sharing supercomputer

2024-01-10 04:11:01

The time-sharing computer systems of the Seventies meant operators may submit a job and get the outcomes in some unspecified time in the future sooner or later. Below the guise of “serverless”, all the pieces previous is new once more.

AWS Lambda reinvented the concept of submitting work to a supercomputer solely to obtain the outcomes in a while, asynchronously. I preferred that strategy a lot that in 2016 I wrote a prototype to unlock the concept of capabilities however in your personal infrastructure. It is now generally known as OpenFaaS and has over 30k GitHub stars, over 380 contributors and its group have given tons of of weblog posts and convention talks.

There’s one thing persuasive about operating jobs and I do not assume it is as a result of builders “do not need to preserve infrastructure”.

I know this, it's a UNIX system

“I do know this, it is a UNIX system”

See my Twitter thread as I built the actions-batch tool.

Prior work

I discussed OpenFaaS and to some extent, it does for Kubernetes what time-sharing did for mainframes within the early 60s and 70s.

You may write capabilities in utility code or bash and wrap them in containers, then have them autoscale, scale to zero, with built-in monitoring an a REST API for automation.

For a few examples of bash see my openfaas-streaming-templates or the samples written by a Netflix engineer for image and video manipulation.

With OpenFaaS you write code as soon as after which that acts as a blueprint, it may be scaled, triggered by cron, Kafka and databases, run synchronously or asynchronously with retries and callbacks built-in to obtain the outcomes.

However typically all you need is a one-shot job.

Within the Kubernetes APIs, we’ve a “Job” that may be scheduled. So my preliminary experiments concerned writing a wrapper for that, which we use for buyer help at OpenFaaS.

Fixing the UX for one-time tasks on Kubernetes

I would additionally had a go at something similar for Docker Swarm which corporations have been utilizing for cleansing up database indexes and operating nightly cron jobs.

actions-batch

actions-batch is an open-source CLI available on GitHub

asciicast

An ASCII solid of constructing a Linux Kernel, and having the binary introduced again to your individual pc to make use of.

So with the comparability to OpenFaaS out of the way in which, and a few prior work, let’s take a look at how actions-batch works.

  1. A brand new GitHub repository is created
  2. A workflow is written which runs “job.sh” upon commits
  3. When a neighborhood bash file is written to the repo as “job.sh”, the job triggers

That is the magic of it. We have created an “unofficial” API which turns GitHub Actions right into a time-sharing supercomputer.

The nice bits:

  • You may embody secrets and techniques
  • You may fetch the outputs of the builds
  • You should use self-hosted runners or hosted runners
  • Non-public and public repos are supported

Construct a Linux Kernel and produce it again to your machine

For instance you are operating an Apple MacBook, and must construct a Linux Kernel? It’s possible you’ll not have Docker put in, or need to fiddle with all that complexity.

mkdir kernels
actions-batch 
    --owner alexellis 
    --org=false 
    --token-file ~/batch 
    --file ./examples/linux-kernel.sh 
    --out ./kernels

Then:

┏━┓┏━╸╺┳╸╻┏━┓┏┓╻┏━┓   ┏┓ ┏━┓╺┳╸┏━╸╻ ╻
┣━┫┃   ┃ ┃┃ ┃┃┗┫┗━┓╺━╸┣┻┓┣━┫ ┃ ┃  ┣━┫
╹ ╹┗━╸ ╹ ╹┗━┛╹ ╹┗━┛   ┗━┛╹ ╹ ╹ ┗━╸╹ ╹
By Alex Ellis 2023 -  (232d61a253f0805b85d60fecf87f5badbb53047b)

Job file: linux-kernel.sh
Repo: https://github.com/alexellis/hopeful_goldwasser3
----------------------------------------
View job at: 
https://github.com/alexellis/hopeful_goldwasser3/actions
----------------------------------------
Itemizing workflow runs for: alexellis/hopeful_goldwasser3 max makes an attempt: 360 (interval: 1s)

With out putting in something in your pc, in a minute or two, you will get a vmlinux that is prepared to make use of.

Contents of: ./kernels

FILE    SIZE
vmlinux 22.71MB

QUEUED DURATION TOTAL
3s     2m51s    2m57s

In fact, hosted runners are identified for being nice worth, however notably sluggish. So we will run the identical factor on our personal, extra highly effective infrastructure:

./bin/actions-batch 
  --owner actuated-samples 
  --token-file ~/batch 
  --file ./examples/linux-kernel.sh 
  --out ./kernels 
  --runs-on actuated-24cpu-96gb

On this instance, a 24vCPU microVM was used with 96GB of RAM allotted. In fact, you by no means want this a lot RAM to construct a Kernel, nevertheless it reveals what’s doable.

If you wish to understand how a lot disk, RAM or vCPU you want for a GitHub Motion, you should use the actuated telemetry action.

As soon as full, the repository is deleted for you.

temporary-repo

The repository is a part of the “batch job” specification

Run some ML/AI utilizing Llama

You may run inference utilizing a machine studying mannequin from Hugging Face.

This is how one can get a Llama2 mannequin to reply a bunch of questions that you just present with 150 tokens getting used.

examples/llama.sh

Example of running inference against a pre-trained model

Instance of operating inference towards a pre-trained mannequin

Obtain a video from YouTube

./actions-batch 
  --owner alexellis 
  --org=false 
  --token-file ~/batch 
  --file ./examples/youtubedl.sh 
  --out ~/movies/

This may create a file named ~/movies/video.mp4 with the UNIX documentary by Bell Labs.

See a screenshot of the results

OIDC tokens

You should use GitHub’s built-in OIDC tokens for those who want them to federate to AWS or one other system.

#!/bin/bash

# Warning: it is suggest to solely run this with the --private (repo) flag

env

OIDC_TOKEN=$(curl -sLS "${ACTIONS_ID_TOKEN_REQUEST_URL}&viewers=https://fed-gw.exit.o6s.io" -H "Consumer-Agent: actions/oidc-client" -H "Authorization: Bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN")
JWT=$(echo $OIDC_TOKEN | jq -j '.worth')

jq -R 'cut up(".") | .[1] | @base64d | fromjson' <<< "$JWT"

# Publish the JWT to the printer perform to visualise it within the logs
# curl -sLSi ${OPENFAAS_URL}/perform/printer -H "Authorization: Bearer $JWT"

Deploy a perform to OpenFaaS utilizing secrets and techniques

We have seen how one can obtain artifacts from a construct, however what if our job wants a secret?

First, create a folder referred to as .secrets and techniques.

Then add a file referred to as .secrets and techniques/openfaas-gateway-password together with your admin person after which create one other file referred to as .secrets and techniques/openfaas-url with the URL of your OpenFaaS gateway.

Two repo-level secrets and techniques can be created named: OPENFAAS_GATEWAY_PASSWORD and OPENFAAS_URL. They will then be consumed as follows:

curl -sLS https://get.arkade.dev | sudo sh

arkade get faas-cli --quiet
sudo mv $HOME/.arkade/bin/faas-cli /usr/native/bin/
sudo chmod +x /usr/native/bin/faas-cli 

echo "${OPENFAAS_GATEWAY_PASSWORD}" | faas-cli login -g "${OPENFAAS_URL}" -u admin --password-stdin

# Record some capabilities
faas-cli listing

# Deploy a perform to point out this labored and replace the "com.github.sha" annotation
faas-cli retailer deploy env --name env-actions-batch --annotation com.github.sha=${GITHUB_SHA}

sleep 2

# Invoke the perform
faas-cli invoke env-actions-batch <<< ""

Run curl remotely, if you wish to verify if it is your community

Generally, you surprise if it is your community that is the difficulty. So that you DM somebody on Slack: “Are you able to entry XYZ?”

Let the tremendous pc do it as a substitute:

#!/bin/bash

set -e -x -o pipefail

# Instance by Alex Ellis

curl -s https://checkip.amazonaws.com > ip.txt

mkdir -p uploads
cp ip.txt ./uploads/

Outcomes:

Discovered file: 6_Complete job.txt
---------------------------------
2023-12-22T11:59:23.6683796Z Cleansing up orphan processes

Contents of: /tmp/artifacts-2603933045

FILE   SIZE
ip.txt 15B

QUEUED DURATION TOTAL
3s     13s      19s

Deleting repo: actuated-samples/vigorous_ishizaka8

cat /tmp/artifacts-2603933045/ip.txt 
172.183.51.127

Nicely 172.183.51.127 is certainly not my IP. It labored.

Construct a container picture remotely, then import it

Generally I construct ML and AI containers on Equinix Metal as a result of they’ve a 10Gbps pipe, and I could be on vacation or in a restaurant with 1Mbps obtainable.

Let’s submit that batch job!

#!/bin/bash

set -e -x -o pipefail

# Instance by Alex Ellis

# Construct after which export a Docker picture to a tar file
# The exported file can then be imported into your native library through:

# docker load -i curl.tar

mkdir -p uploads

cat > Dockerfile <<EOF
FROM alpine:newest

RUN apk --no-cache add curl

ENTRYPOINT ["curl"]
EOF

docker construct -t curl:newest .

Lastly:

./actions-batch 
  --org=false 
  --owner alexellis 
  --token-file ~/batch 
  --file ./examples/export-docker-image.sh 
  --out ./photographs/
  
....
Contents of: ./photographs/

FILE     SIZE
curl.tar 12.37MB

QUEUED DURATION TOTAL
5s     22s      29s

Then let’s import that curl picture:

docker rmi -f curl
docker photographs |grep curl

docker load -i ./photographs/curl.tar
38d2771a5c36: Loading layer [==================================================>]  4.687MB/4.687MB
Loaded picture: curl:newest

docker run -ti curl:newest
curl: attempt 'curl --help' or 'curl --manual' for extra data

It labored simply as anticipated.

Let’s have a race?

Right here, I’ve submitted the identical job each to an x86_64 server and an arm64 server each alone infrastructure. They’re going to construct a Linux Kernel utilizing the v6.0 department.

Off to the binary races – what’s faster? vmlinux or Picture?

That is additionally a helpful method of evaluating GitHub’s hosted runners with your individual self-hosted infrastructure – simply change the “–runs-on” flag.

The youtubedl.sh instance is multi-arch conscious, and makes use of a bash if assertion to obtain the right model of youtubedl for the system. Identical factor with the Linux Kernel instance you will discover within the repo.

Wrapping up

I hope this concept captures the creativeness ultimately. Be at liberty to check out the examples and let me understand how it may be improved, and whether or not that is one thing you can use.

Q&A:

The place are the examples?

I’ve added a baker’s dozen of examples, however would welcome many extra. Simply ship a PR and present how you’ve got run the instrument and what output it created.

https://github.com/alexellis/actions-batch/tree/master/examples

Will GitHub be “indignant”?

We regularly discuss manufacturers and firms as in the event that they have been a single individual or thoughts. GitHub shouldn’t be one individual, however the GitHub staff have a tendency to like and encourage innovation and have constructed APIs so as to have the ability to make use of GitHub Actions in this type of method.

Essentially the most related clauses are: C. Acceptable Use and H. API Terms.

Train frequent sense.

Ought to I really feel dangerous about utilizing free runners for batch jobs?

Use your individual discretion right here. When you assume what you are doing would not align with the phrases of service, use a non-public repo, and pay for the minutes.

Or use your individual self-hosted runners with an answer like actuated

Might I run this in manufacturing?

The query actually ought to be: is GitHub Actions manufacturing prepared? The reply is sure, so by proxy, you can run this instrument in manufacturing.

What is the longest job I can run?

The restrict for hosted and self-hosted runners is 6 hours. If that is not sufficient, think about how you can break up the job into smaller items, or maybe have a look at run-job or OpenFaaS.

Why not use Kubernetes Jobs as a substitute?

Humorous you requested. Within the introduction I discussed my instrument alexellis/run-job which does precisely that.

How is that this totally different from OpenFaaS?

Workloads for OpenFaaS must be constructed right into a container picture and are run in a closely restricted surroundings. Features are perfect for many calls, with totally different inputs.

actions-batch solely accepts a bash script, and is designed to run in a full VM, operating administrative duties and instruments like Docker. It is designed to solely run periodic, one-shot jobs or duties.

Should not you be performing some actual work?

Lots of the issues I’ve began as experiments or prototypes have given me helpful suggestions. OpenFaaS was by no means meant to be a factor, neither was inlets or actuated and other people informed me to not construct all of them.

You may additionally like



Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top