google/switch-c-2048 · Hugging Face
- TL;DR
- Model Details
- Usage
- Uses
- Bias, Risks, and Limitations
- Training Details
- Evaluation
- Environmental Impact
- Citation
- Model Card Authors
Change Transformers is a Combination of Consultants (MoE) mannequin skilled on Masked Language Modeling (MLM) activity. The mannequin structure is much like the traditional T5, however with the Feed Ahead layers changed by the Sparse MLP layers containing “consultants” MLP. Based on the original paper the mannequin permits sooner coaching (scaling properties) whereas being higher than T5 on fine-tuned duties.
As talked about within the first few traces of the summary :
we advance the present scale of language fashions by pre-training as much as trillion parameter fashions on the “Colossal Clear Crawled Corpus”, and obtain a 4x speedup over the T5-XXL mannequin.
Disclaimer: Content material from this mannequin card has been written by the Hugging Face crew, and elements of it have been copy pasted from the original paper.
Mannequin Description
Word that these checkpoints has been skilled on Masked-Language Modeling (MLM) activity. Due to this fact the checkpoints will not be “ready-to-use” for downstream duties. It’s possible you’ll wish to test FLAN-T5
for working fine-tuned weights or fine-tune your personal MoE following this notebook
Discover under some instance scripts on tips on how to use the mannequin in transformers
– keep in mind that the mannequin is extraordinarily giant, so you might think about using disk offload from speed up
:
Utilizing the Pytorch mannequin
Operating the mannequin on a CPU
Click on to develop
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("google/switch-c-2048")
mannequin = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-c-2048", device_map="auto", offload_folder=<OFFLOAD_FOLDER>)
input_text = "A <extra_id_0> walks right into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = mannequin.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
Operating the mannequin on a GPU
Click on to develop
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("google/switch-c-2048")
mannequin = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-c-2048", device_map="auto", offload_folder=<OFFLOAD_FOLDER>)
input_text = "A <extra_id_0> walks right into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
outputs = mannequin.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
Operating the mannequin on a GPU utilizing totally different precisions
BP16
Click on to develop
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("google/switch-c-2048")
mannequin = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-c-2048", device_map="auto", torch_dtype=torch.bfloat16, offload_folder=<OFFLOAD_FOLDER>)
input_text = "A <extra_id_0> walks right into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
outputs = mannequin.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
INT8
Click on to develop
from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("google/switch-c-2048")
mannequin = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-c-2048", device_map="auto", offload_folder=<OFFLOAD_FOLDER>)
input_text = "A <extra_id_0> walks right into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
outputs = mannequin.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
Direct Use and Downstream Use
See the research paper for additional particulars.
Out-of-Scope Use
Extra info wanted.
Extra info wanted.
Moral issues and dangers
Extra info wanted.
Identified Limitations
Extra info wanted.
Delicate Use:
Extra info wanted.
Coaching Information
The mannequin was skilled on a Masked Language Modeling activity, on Colossal Clear Crawled Corpus (C4) dataset, following the identical process as T5
.
Coaching Process
Based on the mannequin card from the original paper the mannequin has been skilled on TPU v3 or TPU v4 pods, utilizing t5x
codebase along with jax
.
Testing Information, Elements & Metrics
The authors evaluated the mannequin on numerous duties and in contrast the outcomes towards T5. See the desk under for some quantitative analysis:
For full particulars, please test the research paper.
Outcomes
For full outcomes for Change Transformers, see the research paper, Desk 5.
Carbon emissions may be estimated utilizing the Machine Learning Impact calculator introduced in Lacoste et al. (2019).
- {Hardware} Kind: Google Cloud TPU Pods – TPU v3 or TPU v4 | Variety of chips ≥ 4.
- Hours used: Extra info wanted
- Cloud Supplier: GCP
- Compute Area: Extra info wanted
- Carbon Emitted: Extra info wanted
BibTeX:
@misc{https://doi.org/10.48550/arxiv.2101.03961,
doi = {10.48550/ARXIV.2101.03961},
url = {https://arxiv.org/abs/2101.03961},
creator = {Fedus, William and Zoph, Barret and Shazeer, Noam},
key phrases = {Machine Studying (cs.LG), Synthetic Intelligence (cs.AI), FOS: Laptop and knowledge sciences, FOS: Laptop and knowledge sciences},
title = {Change Transformers: Scaling to Trillion Parameter Fashions with Easy and Environment friendly Sparsity},
writer = {arXiv},
12 months = {2021},
copyright = {arXiv.org perpetual, non-exclusive license}
}