Keras: Deep Studying for people
After 5 months of intensive public beta testing,
we’re excited to announce the official launch of Keras 3.0.
Keras 3 is a full rewrite of Keras that lets you
run your Keras workflows on prime of both JAX, TensorFlow, or PyTorch, and that
unlocks model new large-scale mannequin coaching and deployment capabilities.
You’ll be able to choose the framework that fits you finest,
and change from one to a different primarily based in your present targets.
You may as well use Keras as a low-level cross-framework language
to develop customized elements similar to layers, fashions, or metrics
that can be utilized in native workflows in JAX, TensorFlow, or PyTorch — with one codebase.
Welcome to multi-framework machine studying.
You are already acquainted with the advantages of utilizing Keras — it allows
high-velocity growth by way of an obsessive concentrate on nice UX, API design,
and debuggability. It is also a battle-tested framework that has been chosen
by over 2.5M builders and that powers among the most refined,
largest-scale ML techniques on the planet,
such because the Waymo self-driving fleet and the YouTube suggestion engine.
However what are the extra advantages of utilizing the brand new multi-backend Keras 3?
- All the time get the perfect efficiency in your fashions. In our benchmarks,
we discovered that JAX sometimes delivers the perfect coaching and inference efficiency
on GPU, TPU, and CPU — however outcomes range from mannequin to mannequin, as non-XLA
TensorFlow is sometimes quicker on GPU. The power to dynamically choose
the backend that can ship the perfect efficiency in your mannequin
with out having to alter something to your code means you are assured
to coach and serve with the best achievable effectivity. - Unlock ecosystem optionality in your fashions. Any Keras 3
mannequin may be instantiated as a PyTorchModule
, may be exported as a TensorFlow
SavedModel
, or may be instantiated as a stateless JAX perform. Which means
that you should use your Keras 3 fashions with PyTorch ecosystem packages,
with the complete vary of TensorFlow deployment & manufacturing instruments
(like TF-Serving, TF.js and TFLite), and with JAX large-scale
TPU coaching infrastructure. Write onemannequin.py
utilizing
Keras 3 APIs, and get entry to the whole lot the ML world has to supply. - Leverage large-scale mannequin parallelism & information parallelism with JAX. Keras 3 contains
a model new distribution API, thekeras.distribution
namespace,
at the moment carried out for the JAX backend (coming quickly to the TensorFlow and PyTorch backends).
It makes it straightforward to do mannequin parallelism, information parallelism, and combos of each —
at arbitrary mannequin scales and cluster scales.
As a result of it retains the mannequin definition, coaching logic,
and sharding configuration all separate from one another,
it makes your distribution workflow straightforward to develop and straightforward to take care of.
See our starter guide. - Maximize attain in your open-source mannequin releases. Wish to
launch a pretrained mannequin? Need as many individuals as attainable
to have the ability to use it? In the event you implement it in pure TensorFlow or PyTorch,
it is going to be usable by roughly half of the neighborhood.
In the event you implement it in Keras 3, it’s immediately usable by anybody regardless
of their framework of selection (even when they don’t seem to be Keras customers themselves).
Twice the impression at no added growth value. - Use information pipelines from any supply. The Keras 3
match()
/consider()
/predict()
routines are suitable withtf.information.Dataset
objects,
with PyTorchDataLoader
objects, with NumPy arrays, Pandas dataframes —
whatever the backend you are utilizing. You’ll be able to practice a Keras 3 + TensorFlow
mannequin on a PyTorchDataLoader
or practice a Keras 3 + PyTorch mannequin on a
tf.information.Dataset
.
The complete Keras API, obtainable for JAX, TensorFlow, and PyTorch.
Keras 3 implements the complete Keras API and makes it obtainable
with TensorFlow, JAX, and PyTorch — over 100 layers, dozens of metrics,
loss features, optimizers, and callbacks, the Keras coaching and analysis
loops, and the Keras saving & serialization infrastructure. All of the APIs you
know and love are right here.
Any Keras mannequin that solely makes use of built-in layers will instantly work with
all supported backends. Actually, your present tf.keras
fashions
that solely use built-in layers can begin operating in JAX and PyTorch instantly!
That is proper, your codebase simply gained an entire new set of capabilities.
Creator multi-framework layers, fashions, metrics…
Keras 3 lets you create elements
(like arbitrary customized layers or pretrained fashions) that can work the identical
in any framework. Particularly, Keras 3 provides you entry
to the keras.ops
namespace that works throughout all backends. It accommodates:
- A full implementation of the NumPy API.
Not one thing “NumPy-like” — simply actually the
NumPy API, with the identical features and the identical arguments.
You getops.matmul
,ops.sum
,ops.stack
,ops.einsum
, and so on. - A set of neural network-specific features which are absent from NumPy,
similar toops.softmax
,ops.binary_crossentropy
,ops.conv
, and so on.
So long as you solely use ops from keras.ops
, your customized layers,
customized losses, customized metrics, and customized optimizers
will work with JAX, PyTorch, and TensorFlow — with the identical code.
Which means which you could keep just one
element implementation (e.g. a single mannequin.py
along with a single checkpoint file), and you should use it in all frameworks,
with the very same numerics.
…that works seamlessly with any JAX, TensorFlow, and PyTorch workflow.
Keras 3 isn’t just supposed for Keras-centric workflows
the place you outline a Keras mannequin, a Keras optimizer, a Keras loss and metrics,
and also you name match()
, consider()
, and predict()
.
It is also meant to work seamlessly with low-level backend-native workflows:
you may take a Keras mannequin (or every other element, similar to a loss or metric)
and begin utilizing it in a JAX coaching loop, a TensorFlow coaching loop,
or a PyTorch coaching loop, or as a part of a JAX or PyTorch mannequin,
with zero friction. Keras 3 offers precisely
the identical diploma of low-level implementation flexibility in JAX and PyTorch
as tf.keras
beforehand did in TensorFlow.
You’ll be able to:
- Write a low-level JAX coaching loop to coach a Keras mannequin
utilizing anoptax
optimizer,jax.grad
,jax.jit
,jax.pmap
. - Write a low-level TensorFlow coaching loop to coach a Keras mannequin
utilizingtf.GradientTape
andtf.distribute
. - Write a low-level PyTorch coaching loop to coach a Keras mannequin
utilizing atorch.optim
optimizer, atorch
loss perform,
and thetorch.nn.parallel.DistributedDataParallel
wrapper. - Use Keras layers in a PyTorch
Module
(as a result of they’reModule
cases too!) - Use any PyTorch
Module
in a Keras mannequin as if it have been a Keras layer. - and so on.
A brand new distribution API for large-scale information parallelism and mannequin parallelism.
The fashions we have been working with have been getting bigger and bigger, so we wished
to offer a Kerasic answer to the multi-device mannequin sharding drawback. The API we designed
retains the mannequin definition, the coaching logic, and the sharding configuration solely separate from every
different, which means that your fashions may be written as in the event that they have been going to run on a single gadget. You
can then add arbitrary sharding configurations to arbitrary fashions when it is time to practice them.
Information parallelism (replicating a small mannequin identically on a number of gadgets) may be dealt with in simply two strains:
Mannequin parallelism allows you to specify sharding layouts for mannequin variables and intermediate output tensors,
alongside a number of named dimensions. Within the typical case, you’ll set up obtainable gadgets as a 2D grid
(known as a gadget mesh), the place the primary dimension is used for information parallelism and the second dimension
is used for mannequin parallelism. You’ll then configure your mannequin to be sharded alongside the mannequin dimension
and replicated alongside the info dimension.
The API allows you to configure the format of each variable and each output tensor by way of common expressions.
This makes it straightforward to shortly specify the identical format for whole classes of variables.
The brand new distribution API is meant to be multi-backend, however is just obtainable for the JAX backend for the time
being. TensorFlow and PyTorch assist is coming quickly. Get began with this guide!
Pretrained fashions.
There’s a variety of pretrained fashions that
you can begin utilizing at this time with Keras 3.
All 40 Keras Functions fashions (the keras.purposes
namespace)
can be found in all backends.
The huge array of pretrained fashions in KerasCV
and KerasNLP additionally work with all backends. This contains:
- BERT
- OPT
- Whisper
- T5
- StableDiffusion
- YOLOv8
- SegmentAnything
- and so on.
Assist for cross-framework information pipelines with all backends.
Multi-framework ML additionally means multi-framework information loading and preprocessing.
Keras 3 fashions may be educated utilizing a variety of
information pipelines — no matter whether or not you are utilizing the JAX, PyTorch, or
TensorFlow backends. It simply works.
tf.information.Dataset
pipelines: the reference for scalable manufacturing ML.torch.utils.information.DataLoader
objects.- NumPy arrays and Pandas dataframes.
- Keras’s personal
keras.utils.PyDataset
objects.
Progressive disclosure of complexity.
Progressive disclosure of complexity is the design precept on the coronary heart
of the Keras API. Keras would not pressure you to observe
a single “true” method of constructing and coaching fashions. As an alternative, it allows
a variety of various workflows, from the very high-level to the very
low-level, equivalent to totally different consumer profiles.
Which means which you could begin out with easy workflows — similar to utilizing
Sequential
and Practical
fashions and coaching them with match()
— and when
you want extra flexibility, you may simply customise totally different elements whereas
reusing most of your prior code. As your wants turn into extra particular,
you do not all of the sudden fall off a complexity cliff and also you need not change
to a unique set of instruments.
We have introduced this precept to all of our backends. For example,
you may customise what occurs in your coaching loop whereas nonetheless
leveraging the ability of match()
, with out having to put in writing your individual coaching loop
from scratch — simply by overriding the train_step
technique.
Here is the way it works in PyTorch and TensorFlow:
And here’s the link to the JAX model.
A brand new stateless API for layers, fashions, metrics, and optimizers.
Do you get pleasure from functional programming?
You are in for a deal with.
All stateful objects in Keras (i.e. objects that personal numerical variables that
get up to date throughout coaching or analysis) now have a stateless API, making it
attainable to make use of them in JAX features (that are required to be absolutely stateless):
- All layers and fashions have a
stateless_call()
technique which mirrors__call__()
. - All optimizers have a
stateless_apply()
technique which mirrorsapply()
. - All metrics have a
stateless_update_state()
technique which mirrorsupdate_state()
and astateless_result()
technique which mirrorsconsequence()
.
These strategies haven’t any side-effects in any way: they take as enter the present worth
of the state variables of the goal object, and return the replace values as half
of their outputs, e.g.:
outputs, updated_non_trainable_variables = layer.stateless_call(
trainable_variables,
non_trainable_variables,
inputs,
)
You by no means must implement these strategies your self — they’re mechanically obtainable
so long as you have carried out the stateful model (e.g. name()
or update_state()
).
Shifting from Keras 2 to Keras 3
Keras 3 is extremely backwards suitable with Keras 2:
it implements the complete public API floor of Keras 2,
with a restricted variety of exceptions, listed here.
Most customers is not going to must make any code change
to start out operating their Keras scripts on Keras 3.
Bigger codebases are more likely to require some code adjustments,
since they’re extra more likely to run into one of many exceptions listed above,
and usually tend to have been utilizing non-public APIs or deprecated APIs
(tf.compat.v1.keras
namespace, experimental
namespace, keras.src
non-public namespace).
That can assist you transfer to Keras 3, we’re releasing an entire migration guide
with fast fixes for all points you would possibly encounter.
You even have the choice to disregard the adjustments in Keras 3 and simply maintain utilizing Keras 2 with TensorFlow —
this is usually a good possibility for initiatives that aren’t actively developed
however must maintain operating with up to date dependencies.
You may have two prospects:
- In the event you have been accessing
keras
as a standalone bundle,
simply change to utilizing the Python bundletf_keras
as a substitute,
which you’ll set up by way ofpip set up tf_keras
.
The code and API are wholly unchanged — it is Keras 2.15 with a unique bundle title.
We are going to maintain fixing bugs intf_keras
and we’ll maintain usually releasing new variations.
Nevertheless, no new options or efficiency enhancements will likely be added,
because the bundle is now in upkeep mode. - In the event you have been accessing
keras
by way oftf.keras
,
there are not any speedy adjustments till TensorFlow 2.16.
TensorFlow 2.16+ will use Keras 3 by default.
In TensorFlow 2.16+, to maintain utilizing Keras 2, you may first set uptf_keras
,
after which export the atmosphere variableTF_USE_LEGACY_KERAS=1
.
It will direct TensorFlow 2.16+ to resolve tf.keras to the locally-installedtf_keras
bundle.
Be aware that this will have an effect on greater than your individual code, nonetheless:
it’s going to have an effect on any bundle importingtf.keras
in your Python course of.
To verify your adjustments solely have an effect on your individual code, you need to use thetf_keras
bundle.
Benefit from the library!
We’re excited so that you can check out the brand new Keras and enhance your workflows by leveraging multi-framework ML.
Tell us the way it goes: points, factors of friction, characteristic requests, or success tales —
we’re keen to listen to from you!
FAQ
Q: Is Keras 3 suitable with legacy Keras 2?
Code developed with tf.keras
can usually be run as-is with Keras 3
(with the TensorFlow backend). There is a restricted variety of incompatibilities you ought to be conscious
of, all addressed in this migration guide.
On the subject of utilizing APIs from tf.keras
and Keras 3 aspect by aspect,
that’s not attainable — they’re totally different packages, operating on solely separate engines.
Q: Do pretrained fashions developed in legacy Keras 2 work with Keras 3?
Typically, sure. Any tf.keras
mannequin ought to work out of the field with Keras 3
with the TensorFlow backend (be sure to reserve it within the .keras
v3 format).
As well as, if the mannequin solely
makes use of built-in Keras layers, then it’s going to additionally work out of the field
with Keras 3 with the JAX and PyTorch backends.
If the mannequin accommodates customized layers written utilizing TensorFlow APIs,
it’s often straightforward to transform the code to be backend-agnostic.
For example, it solely took us a number of hours to transform all 40
legacy tf.keras
fashions from Keras Functions to be backend-agnostic.
Q: Can I save a Keras 3 mannequin in a single backend and reload it in one other backend?
Sure, you may. There isn’t any backend specialization in saved .keras
information in any way.
Your saved Keras fashions are framework-agnostic and may be reloaded with any backend.
Nevertheless, observe that reloading a mannequin that accommodates customized elements
with a unique backend requires your customized elements to be carried out
utilizing backend-agnostic APIs, e.g. keras.ops
.
Q: Can I take advantage of Keras 3 elements inside tf.information
pipelines?
With the TensorFlow backend, Keras 3 is absolutely suitable with tf.information
(e.g. you may .map()
a Sequential
mannequin right into a tf.information
pipeline).
With a unique backend, Keras 3 has restricted assist for tf.information
.
You will not be capable of .map()
arbitrary layers or fashions right into a tf.information
pipeline. Nevertheless, it is possible for you to to make use of particular Keras 3
preprocessing layers with tf.information
, similar to IntegerLookup
or
CategoryEncoding
.
On the subject of utilizing a tf.information
pipeline (that doesn’t use Keras)
to feed your name to .match()
, .consider()
or .predict()
—
that works out of the field with all backends.
Q: Do Keras 3 fashions behave the identical when run with totally different backends?
Sure, numerics are similar throughout backends.
Nevertheless, be mindful the next caveats:
- RNG conduct is totally different throughout totally different backends
(even after seeding — your outcomes will likely be deterministic in every backend
however will differ throughout backends). So random weight initializations
values and dropout values will differ throughout backends. - Because of the nature of floating-point implementations,
outcomes are solely similar as much as1e-7
precision in float32,
per perform execution. So when coaching a mannequin for a very long time,
small numerical variations will accumulate and will find yourself ensuing
in noticeable numerical variations. - As a consequence of lack of assist for common pooling with uneven padding
in PyTorch, common pooling layers withpadding="similar"
could lead to totally different numerics on border rows/columns.
This does not occur fairly often in apply —
out of 40 Keras Functions imaginative and prescient fashions, just one was affected.
Q: Does Keras 3 assist distributed coaching?
Information-parallel distribution is supported out of the field in JAX, TensorFlow,
and PyTorch. Mannequin parallel distribution is supported out of the field for JAX
with the keras.distribution
API.
With TensorFlow:
Keras 3 is suitable with tf.distribute
—
simply open a Distribution Technique scope and create / practice your mannequin inside it.
Here’s an example.
With PyTorch:
Keras 3 is suitable with PyTorch’s DistributedDataParallel
utility.
Here’s an example.
With JAX:
You are able to do each information parallel and mannequin parallel distribution in JAX utilizing the keras.distribution
API.
For example, to do information parallel distribution, you solely want the next code snippet:
distribution = keras.distribution.DataParallel(gadgets=keras.distribution.list_devices())
keras.distribution.set_distribution(distribution)
For mannequin parallel distribution, see the following guide.
You may as well distribute coaching your self by way of JAX APIs similar to
jax.sharding
. Here’s an example.
Q: Can my customized Keras layers be utilized in native PyTorch Modules
or with Flax Modules
?
If they’re solely written utilizing Keras APIs (e.g. the keras.ops
namespace), then sure, your
Keras layers will work out of the field with native PyTorch and JAX code.
In PyTorch, simply use your Keras layer like every other PyTorch Module
.
In JAX, be sure to make use of the stateless layer API, i.e. layer.stateless_call()
.
Q: Will you add extra backends sooner or later? What about framework XYZ?
We’re open to including new backends so long as the goal framework has a big consumer base
or in any other case has some distinctive technical advantages to deliver to the desk.
Nevertheless, including and sustaining a brand new backend is a big burden,
so we will fastidiously take into account every new backend candidate on a case by case foundation,
and we’re unlikely so as to add many new backends. We is not going to add any new frameworks
that are not but well-established.
We are actually doubtlessly contemplating including a backend written in Mojo.
If that is one thing you would possibly discover helpful, please let the Mojo crew know.