Now Reading
Rethinking Serverless with FLAME · The Fly Weblog

Rethinking Serverless with FLAME · The Fly Weblog

2023-12-06 06:03:39

FLAME logo

Think about when you may auto scale just by wrapping any present app code in a operate and have that block of code run in a short lived copy of your app.

The pursuit of elastic, auto-scaling purposes has taken us to crazy locations.

Serverless/FaaS had a pair issues going for it. Elastic Scale™ is tough. It’s even tougher when you must handle these pesky servers. It additionally promised pay-what-you-use prices to keep away from idle utilization. Great things, proper?

Effectively the charade is over. You offload scaling issues and the complexities of scaling, simply to finish up needing extra complexity. Extra queues, storage, and glue code to speak again to our app is simply the place to begin. Dev, check, and CI complexity balloons as quick as your prices. Oh, and also you usually need to rewrite your app in proprietary JavaScript – even when it’s already written in JavaScript!

On the identical time, the remainder of us have elastically scaled by beginning extra webservers. Or we’ve dumped on complexity with microservices. This doesn’t make sense. Piling on extra webservers to transcode extra movies or serve up extra ML duties isn’t what we would like. And granular scale shouldn’t require slicing our apps into bespoke operational models with their very own APIs and deployments to handle.

Sufficient is sufficient. There’s a greater technique to elastically scale purposes.

The FLAME pattern

Here’s what we really want:

  • We don’t want to manage those pesky servers. We already have this for our app deployments via fly deploy, git push heroku, kubectl, etc
  • We want on-demand, granular elastic scale of specific parts of our app code
  • We don’t want to rewrite our application or write parts of it in proprietary runtimes

Imagine if we could auto scale simply by wrapping any existing app code in a function and have that block of code run in a temporary copy of the app.

Enter the FLAME pattern.

FLAME – Fleeting Lambda Application for Modular Execution

With FLAME, you treat your entire application as a lambda, where modular parts can be executed on short-lived infrastructure.

No rewrites. No bespoke runtimes. No outrageous layers of complexity. Need to insert the results of an expensive operation to the database? PubSub broadcast the result of some expensive work? No problem! It’s your whole app so of course you can do it.

The Elixir flame library implements the FLAME sample. It has a backend adapter for Fly.io, however you should use it on any cloud that offers you an API to spin up an occasion along with your app code working on it. We’ll speak extra about backends in a bit, in addition to implementing FLAME in different languages.

First, lets watch a realtime thumbnail technology instance to see FLAME + Elixir in motion:

Now let’s stroll via one thing a bit extra fundamental. Think about now we have a operate to transcode video to thumbnails in our Elixir utility after they’re uploaded:

def generate_thumbnails(%Video{} = vid, interval) do
  tmp = Path.be part of(System.tmp_dir!(), Ecto.UUID.generate())
  File.mkdir!(tmp)
  args = ~w(-i #{vid.url} -vf fps=1/#{interval} #{tmp}/%02d.png))
  System.cmd("ffmpeg", args)
  urls = VidStore.put_thumbnails(vid, Path.wildcard(tmp <> "/*.png"))
  Repo.insert_all(Thumb, Enum.map(urls, &%{vid_id: vid.id, url: &1}))
finish

Our generate_thumbnails operate accepts a video struct. We shell out to ffmpeg to take the video URL and generate thumbnails at a given interval. We then write the short-term thumbnail paths to sturdy storage. Lastly, we insert the generated thumbnail URLs into the database.

This works nice domestically, however CPU certain work like video transcoding can rapidly deliver our whole service to a halt in manufacturing. As a substitute of rewriting giant swaths of our app to maneuver this into microservices or some FaaS, we will merely wrap it in a FLAME name:

def generate_thumbnails(%Video{} = vid, interval) do
  FLAME.name(MyApp.FFMpegRunner, fn ->
    tmp = Path.be part of(System.tmp_dir!(), Ecto.UUID.generate())
    File.mkdir!(tmp)
    args = ~w(-i #{vid.url} -vf fps=1/#{interval} #{tmp}/%02d.png))
    System.cmd("ffmpeg", args)
    urls = VidStore.put_thumbnails(vid, Path.wildcard(tmp <> "/*.png"))
    Repo.insert_all(Thumb, Enum.map(urls, &%{vid_id: vid.id, url: &1}))
  finish)
finish

That’s it! FLAME.name accepts the title of a runner pool, and a operate. It then finds or boots a brand new copy of our whole utility and runs the operate there. Any variables the operate closes over (like our %Video{} struct and interval) are handed alongside routinely.

When the FLAME runner boots up, it connects again to the guardian node, receives the operate to run, executes it, and returns the end result to the caller. Primarily based on configuration, the booted runner both waits fortunately for extra work earlier than idling down, or extinguishes itself instantly.

Let’s visualize the circulate:

visualizing the flow

We modified no different code and issued our DB write with Repo.insert_all identical to earlier than, as a result of we’re working our whole utility. Database connection(s) and all. Besides this fleeting utility solely runs that little operate after startup and nothing else.

In observe, a FLAME implementation will assist a pool of runners for decent startup, scale-to-zero, and elastic development. Extra on that later.

Solving a problem vs removing the problem

FaaS solutions help you solve a problem. FLAME removes the problem.

The FaaS labyrinth of complexity defies reason. And it’s unavoidable. Let’s walkthrough the thumbnail use-case to see how.

We try to start with the simplest building block like request/response AWS Lambda Function URL’s.

The complexity hits immediately.

We start writing custom encoders/decoders on both sides to handle streaming the thumbnails back to the app over HTTP. Phew that’s done. Wait, is our video transcoding or user uploads going to take longer than 15 minutes? Sorry, hard timeout limit – time to split our videos into chunks to stay within the timeout, which means more lambdas to do that. Now we’re orchestrating lambda workflows and relying on additional services, such as SQS and S3, to enable this.

All the FaaS is doing is adding layers of communication between your code and the parts you want to run elastically. Each layer has its own glue integration price to pay.

Ultimately handling this kind of use-case looks something like this:

  • Trigger the lambda via HTTP endpoint, S3, or API gateway ($)
  • Write the bespoke lambda to transcode the video ($)
  • Place the thumbnail results into SQS ($)
  • Write the SQS consumer in our app (dev $)
  • Persist to DB and figure out how to get events back to active subscribers that may well be connected to other instances than the SQS consumer (dev $)

This is nuts. We pay the FaaS toll at every step. We shouldn’t have to do any of this!

FaaS provides a bunch of offerings to build a solution on top of. FLAME removes the problem entirely.

FLAME Backends

On Fly.io infrastructure the FLAME.FlyBackend can boot a copy of your application on a new Machine and have it join again to the guardian for work inside ~3s.

By default, FLAME ships with a LocalBackend and FlyBackend, however any host that gives an API to provision a server and run your app code can work as a FLAME backend. Erlang and Elixir primitives are doing all of the heavy lifting right here. The whole FLAME.FlyBackend is < 200 LOC with docs. The library has a single dependency, req, which is an HTTP consumer.

As a result of Fly.io runs our purposes as a packaged up docker picture, we merely ask the Fly API in addition a brand new Machine for us with the identical picture that our app is presently working. Additionally due to Fly infrastructure, we will assure the FLAME runners are began in the identical area because the guardian. This optimizes latency and allows you to ship no matter information backwards and forwards between guardian and runner with out having to consider it.

Look at everything we’re not doing

With FaaS, just imagine how quickly the dev and testing story becomes a fate worse than death.

To run the app locally, we either need to add some huge dev dependencies to simulate the entire FaaS pipeline, or worse, connect up our dev and test environments directly to the FaaS provider.

With FLAME, your dev and test runners simply run on the local backend.

Remember, this is your app. FLAME just controls where modular parts of it run. In dev or test, those parts simply run on the existing runtime on your laptop or CI server.

Using Elixir, we can even send a file across to the remote FLAME application thanks to the distributed features of the Erlang VM:

def generate_thumbnails(%Video{} = vid, interval) do
  parent_stream = File.stream!(vid.filepath, [], 2048)
  FLAME.call(MyApp.FFMpegRunner, fn ->
    tmp_file = Path.join(System.tmp_dir!(), Ecto.UUID.generate())
    flame_stream = File.stream!(tmp_file)
    Enum.into(parent_stream, flame_stream)

    tmp = Path.join(System.tmp_dir!(), Ecto.UUID.generate())
    File.mkdir!(tmp)
    args = ~w(-i #{tmp_file} -vf fps=1/#{interval} #{tmp}/%02d.png)
    System.cmd("ffmpeg", args)
    urls = VidStore.put_thumbnails(vid, Path.wildcard(tmp <> "/*.png"))
    Repo.insert_all(Thumb, Enum.map(urls, &%{vid_id: vid.id, url: &1}))
  end)
end

On line 2 we open a file on the parent node to the video path. Then in the FLAME child, we stream the file from the parent node to the FLAME server in only a couple lines of code. That’s it! No setup of S3 or HTTP interfaces required.

With FLAME it’s easy to miss everything we’re not doing:

  • We don’t need to write code outside of our application. We can reuse business logic, database setup, PubSub, and all the features of our respective platforms
  • We don’t need to manage deploys of separate services or endpoints
  • We don’t need to write results to S3 or SQS just to pick up values back in our app
  • We skip the dev, test, and CI dependency dance

FLAME outside Elixir

Elixir is fantastically well suited for the FLAME model because we get so much for free like course of supervision and distributed messaging. That stated, any language with affordable concurrency primitives can make the most of this sample. For instance, my teammate, Lubien, created a proof of idea instance for breaking out capabilities in your JavaScript utility and working them inside a brand new Fly Machine: https://github.com/lubien/fly-run-this-function-on-another-machine

So the overall circulate for a JavaScript-based FLAME name could be to maneuver the modular executions to a brand new file, which is executed on a runner pool. Offered the arguments are JSON serializable, the overall FLAME circulate is just like what we’ve outlined right here. Your utility, your code, working on fleeting situations.

An entire FLAME library might want to deal with the next issues:

  • Elastic pool scale-up and scale-down logic
  • Scorching vs chilly startup with swimming pools
  • Distant runner monitoring to keep away from orphaned assets
  • How you can monitor and hold deployments contemporary

For the remainder of this submit we’ll see how the Elixir FLAME library handles these issues in addition to options uniquely suited to Elixir purposes. However first, you may be questioning about your background job queues.

What about my background job processor?

FLAME works great inside your background job processor, but you may have noticed some overlap. If your job library handles scaling the worker pool, what is FLAME doing for you? There’s a couple important distinctions here.

First, we reach for these queues when we need durability guarantees. We often can turn knobs to have the queues scale to handle more jobs as load changes. But durable operations are separate from elastic execution. Conflating these concerns can send you down a similar path to lambda complexity. Leaning on your worker queue purely for offloaded execution means writing all the glue code to get the data into and out of the job, and back to the caller or end-user’s device somehow.

For example, if we want to guarantee we successfully generated thumbnails for a video after the user upload, then a job queue makes sense as the dispatch, commit, and retry mechanism for this operation. The actual transcoding could be a FLAME call inside the job itself, so we decouple the ideas of durability and scaled execution.

On the other side, we have operations we don’t need durability for. Take the screencast above where the user hasn’t yet saved their video. Or an ML model execution where there’s no need to waste resources churning a prompt if the user has already left the app. In those cases, it doesn’t make sense to write to a durable store to pick up a job for work that will go right into the ether.

Pooling for Elastic Scale

With the Elixir implementation of FLAME, you define elastic pools of runners. This allows scale-to-zero behavior while also elastically scaling up FLAME servers with max concurrency limits.

For example, lets take a look at the start/2 callback, which is the entry point of all Elixir applications. We can drop in a FLAME.Pool for video transcriptions and say we want it to scale to zero, boot a max of 10, and support 5 concurrent ffmpeg operations per runner:

def start(_type, _args) do
  flame_parent = FLAME.Parent.get()

  children = [
    ...,
    MyApp.Repo,
    {FLAME.Pool,
      name: Thumbs.FFMpegRunner,
      min: 0,
      max: 10,
      max_concurrency: 5,
      idle_shutdown_after: 30_000},
    !flame_parent && MyAppWeb.Endpoint
  ]
  |> Enum.filter(& &1)

  opts = [strategy: :one_for_one, name: MyApp.Supervisor]
  Supervisor.start_link(children, opts)
end

We use the presence of a FLAME parent to conditionally start our Phoenix webserver when booting the app. There’s no reason to start a webserver if we aren’t serving web traffic. Note we leave other services like the database MyApp.Repo alone because we want to make use of those services inside FLAME runners.

Elixir’s supervised process approach to applications is uniquely great for turning these kinds of knobs.

We also set our pool to idle down after 30 seconds of no caller operations. This keeps our runners hot for a short while before discarding them. We could also pass a min: 1 to always ensure at least one ffmpeg runner is hot and ready for work by the time our application is started.

Process Placement

In Elixir, stateful bits of our applications are built around the process primitive – lightweight greenthreads with message mailboxes. Wrapping our otherwise stateless app code in a synchronous FLAME.call‘s or async FLAME.cast’s works great, but what about the stateful parts of our app?

FLAME.place_child exists to take an existing process specification in your Elixir app and start it on a FLAME runner instead of locally. You can use it anywhere you’d use Task.Supervisor.start_child , DynamicSupervisor.start_child, or similar interfaces. Just like FLAME.call, the process is run on an elastic pool and runners handle idle down when the process completes its work.

And like FLAME.call, it lets us take existing app code, change a single LOC, and continue shipping features.

See Also

Let’s walk thru the example from the screencast above. Imagine we want to generate video thumbnails for a video as it is being uploaded. Elixir and LiveView make this easy. We won’t cover all the code here, but you can view the full app implementation.

Our first cross could be to jot down a LiveView add author that calls right into a ThumbnailGenerator:

defmodule ThumbsWeb.ThumbnailUploadWriter do
  @behaviour Phoenix.LiveView.UploadWriter

  alias Thumbs.ThumbnailGenerator

  def init(opts) do
    generator = ThumbnailGenerator.open(opts)
    {:okay, %{gen: generator}}
  finish

  def write_chunk(information, state) do
    ThumbnailGenerator.stream_chunk!(state.gen, information)
    {:okay, state}
  finish

  def meta(state), do: %{gen: state.gen}

  def shut(state, _reason) do
    ThumbnailGenerator.shut(state.gen)
    {:okay, state}
  finish
finish

An add author is a habits that merely ferries the uploaded chunks from the consumer into no matter we’d love to do with them. Right here now we have a ThumbnailGenerator.open/1 which begins a course of that communicates with an ffmpeg shell. Inside ThumbnailGenerator.open/1, we use common elixir course of primitives:

  # thumbnail_generator.ex
  def open(opts  []) do
    Key phrase.validate!(opts, [:timeout, :caller, :fps])
    timeout = Key phrase.get(opts, :timeout, 5_000)
    caller = Key phrase.get(opts, :caller, self())
    ref = make_ref()
    guardian = self()

    spec = {__MODULE__, {caller, ref, guardian, opts}}
    {:okay, pid} = DynamicSupervisor.start_child(@sup, spec)

    obtain do
      {^ref, %ThumbnailGenerator{} = gen} ->
        %ThumbnailGenerator pid: pid
    after
      timeout -> exit(:timeout)
    finish
  finish

The main points aren’t tremendous necessary right here, besides line 10 the place we name {:okay, pid} = DynamicSupervisor.start_child(@sup, spec), which begins a supervisedThumbnailGenerator course of. The remainder of the implementation merely ferries chunks as stdin into ffmpeg and parses png’s from stdout. As soon as a PNG delimiter is present in stdout, we ship the caller course of (our LiveView course of) a message saying “hey, right here’s a picture”:

# thumbnail_generator.ex
@png_begin <<137, 80, 78, 71, 13, 10, 26, 10>>
defp handle_stdout(state, ref, bin) do
  %ThumbnailGenerator{ref: ^ref, caller: caller} = state.gen

  case bin do
    <<@png_begin, _rest::binary>> ->
      if state.present do
        ship(caller, {ref, :picture, state.rely, encode(state)})
      finish

      % rely: state.rely + 1, present: [bin]

    _ ->
      % state.current]
  finish
finish

The caller LiveView course of then picks up the message in a handle_info callback and updates the UI:

# thumb_live.ex
def handle_info({_ref, :picture, _count, encoded}, socket) do
  %{rely: rely} = socket.assigns

  {:noreply,
   socket
   |> assign(rely: rely + 1, message: "Producing (#{rely + 1})")
   |> stream_insert(:thumbs, %{id: rely, encoded: encoded})}
finish

The ship(caller, {ref, :picture, state.rely, encode(state)} is one magic half about Elixir. Every thing is a course of, and we will message these processes, no matter their location within the cluster.

It’s like if each instantiation of an object in your favourite OO lang included a cluster-global distinctive identifier to work with strategies on that object. The LiveView (a course of) merely receives the picture message and updates the UI with new pictures.

Now let’s head again over to our ThumbnailGenerator.open/1 operate and make this elastically scalable.

-    {:okay, pid} = DynamicSupervisor.start_child(@sup, spec)
+    {:okay, pid} = FLAME.place_child(Thumbs.FFMpegRunner, spec)

That’s it! As a result of all the things is a course of and processes can dwell wherever, it doesn’t matter what server our ThumbnailGenerator course of lives on. It merely messages the caller with ship(caller, …) and the messages are despatched throughout the cluster if wanted.

As soon as the method exits, both from an specific shut, after the add is finished, or from the end-user closing their browser tab, the FLAME server will word the exit and idle down if no different work is being executed.

Try the full implementation when you’re .

Remote Monitoring

All this transient infrastructure needs failsafe mechanisms to avoid orphaning resources. If a parent spins up a runner, that runner must take care of idling itself down when no work is present and handle failsafe shutdowns if it can no longer contact the parent node.

Likewise, we need to shutdown runners when parents are rolled for new deploys as we must guarantee we’re running the same code across the cluster.

We also have active callers in many cases that are awaiting the result of work on runners that could go down for any reason.

There’s a lot to monitor here.

There’s also a number of failure modes that make this sound like a harrowing experience to implement. Fortunately Elixir has all the primitives to make this an easy task thanks to the Erlang VM. Namely, we get the following for free:

  • Process monitoring and supervision – we know when things go bad. Whether on a node-local process, or one across the cluster
  • Node monitoring – we know when nodes come up, and when nodes go away
  • Declarative and controlled app startup and shutdown – we carefully control the startup and shutdown sequence of applications as a matter of course. This allows us to gracefully shutdown active runners when a fresh deploy is triggered, while giving them time to finish their work

We’ll cover the internal implementation details in a future deep-dive post. For now, feel free to poke around the flame source.

What’s Next

We’re just getting started with the Elixir FLAME library, but it’s ready to try out now. In the future look for more advance pool growth techniques, and deep dives into how the Elixir implementation works. You can also find me @chris_mccord to talk about implementing the FLAME sample in your language of selection.

Glad coding!

–Chris



Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top