Now Reading
google/switch-c-2048 · Hugging Face

google/switch-c-2048 · Hugging Face

2023-11-20 08:25:00

model image

  1. TL;DR
  2. Model Details
  3. Usage
  4. Uses
  5. Bias, Risks, and Limitations
  6. Training Details
  7. Evaluation
  8. Environmental Impact
  9. Citation
  10. Model Card Authors

Change Transformers is a Combination of Consultants (MoE) mannequin skilled on Masked Language Modeling (MLM) activity. The mannequin structure is much like the traditional T5, however with the Feed Ahead layers changed by the Sparse MLP layers containing “consultants” MLP. Based on the original paper the mannequin permits sooner coaching (scaling properties) whereas being higher than T5 on fine-tuned duties.
As talked about within the first few traces of the summary :

we advance the present scale of language fashions by pre-training as much as trillion parameter fashions on the “Colossal Clear Crawled Corpus”, and obtain a 4x speedup over the T5-XXL mannequin.

Disclaimer: Content material from this mannequin card has been written by the Hugging Face crew, and elements of it have been copy pasted from the original paper.



Mannequin Description

Word that these checkpoints has been skilled on Masked-Language Modeling (MLM) activity. Due to this fact the checkpoints will not be “ready-to-use” for downstream duties. It’s possible you’ll wish to test FLAN-T5 for working fine-tuned weights or fine-tune your personal MoE following this notebook

Discover under some instance scripts on tips on how to use the mannequin in transformers – keep in mind that the mannequin is extraordinarily giant, so you might think about using disk offload from speed up:



Utilizing the Pytorch mannequin



Operating the mannequin on a CPU

Click on to develop

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-c-2048")
mannequin = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-c-2048", device_map="auto", offload_folder=<OFFLOAD_FOLDER>)

input_text = "A <extra_id_0> walks right into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = mannequin.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>



Operating the mannequin on a GPU

Click on to develop

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-c-2048")
mannequin = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-c-2048", device_map="auto", offload_folder=<OFFLOAD_FOLDER>)

input_text = "A <extra_id_0> walks right into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = mannequin.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>



Operating the mannequin on a GPU utilizing totally different precisions



BP16

Click on to develop

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-c-2048")
mannequin = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-c-2048", device_map="auto", torch_dtype=torch.bfloat16, offload_folder=<OFFLOAD_FOLDER>)

input_text = "A <extra_id_0> walks right into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = mannequin.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>



INT8

Click on to develop

from transformers import AutoTokenizer, SwitchTransformersForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("google/switch-c-2048")
mannequin = SwitchTransformersForConditionalGeneration.from_pretrained("google/switch-c-2048", device_map="auto", offload_folder=<OFFLOAD_FOLDER>)

input_text = "A <extra_id_0> walks right into a bar a orders a <extra_id_1> with <extra_id_2> pinch of <extra_id_3>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)

outputs = mannequin.generate(input_ids)
print(tokenizer.decode(outputs[0]))
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>



Direct Use and Downstream Use

See the research paper for additional particulars.



Out-of-Scope Use

Extra info wanted.

Extra info wanted.



Moral issues and dangers

Extra info wanted.



Identified Limitations

Extra info wanted.

See Also



Delicate Use:

Extra info wanted.



Coaching Information

The mannequin was skilled on a Masked Language Modeling activity, on Colossal Clear Crawled Corpus (C4) dataset, following the identical process as T5.



Coaching Process

Based on the mannequin card from the original paper the mannequin has been skilled on TPU v3 or TPU v4 pods, utilizing t5x codebase along with jax.



Testing Information, Elements & Metrics

The authors evaluated the mannequin on numerous duties and in contrast the outcomes towards T5. See the desk under for some quantitative analysis:
image.png
For full particulars, please test the research paper.



Outcomes

For full outcomes for Change Transformers, see the research paper, Desk 5.

Carbon emissions may be estimated utilizing the Machine Learning Impact calculator introduced in Lacoste et al. (2019).

  • {Hardware} Kind: Google Cloud TPU Pods – TPU v3 or TPU v4 | Variety of chips ≥ 4.
  • Hours used: Extra info wanted
  • Cloud Supplier: GCP
  • Compute Area: Extra info wanted
  • Carbon Emitted: Extra info wanted

BibTeX:

@misc{https://doi.org/10.48550/arxiv.2101.03961,
  doi = {10.48550/ARXIV.2101.03961},
  
  url = {https://arxiv.org/abs/2101.03961},
  
  creator = {Fedus, William and Zoph, Barret and Shazeer, Noam},
  
  key phrases = {Machine Studying (cs.LG), Synthetic Intelligence (cs.AI), FOS: Laptop and knowledge sciences, FOS: Laptop and knowledge sciences},
  
  title = {Change Transformers: Scaling to Trillion Parameter Fashions with Easy and Environment friendly Sparsity},
  
  writer = {arXiv},
  
  12 months = {2021},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top