Now Reading
Mass Modifying Reminiscence in a Transformer

Mass Modifying Reminiscence in a Transformer

2023-04-21 08:56:56

(a) Language fashions may be considered as information bases containing memorized
tuples (s, r, o), every connecting some topic s to an object
o through a relation r, e.g., (s = Michael Jordan,
r = performs sport, o = basketball). (b) MEMIT modifies transformer weights to edit
recollections, e.g., “Michael Jordan now performs the game baseball,” whereas (c) sustaining
generalization, specificity, and fluency at scales past different strategies.
As detailed in Part 5.2.2 of the paper, modifying rating is the harmonic imply of
efficacy, generalization, and specificity metrics.

Why edit information in a mannequin?

Giant language fashions such because the GPT fashions comprise some quantity of
world information since they will recall info about actual folks, locations,
and issues. For instance, in case you ask GPT-3 to finish the sentence

Michael Jordan performs the game…

the mannequin will predict basketball, a phrase that
not solely is grammatically right, however that additionally it is in keeping with
a real truth in the true world.

Nonetheless, the information contained in a big language mannequin isn’t good:
even the biggest fashions will likely be lacking specialised information, and a mannequin
may even comprise out of date information that it discovered from previous textual content.

GPT-3 predicts: Polaris is within the constellation Ursa Minor

GPT-3 predicts: Arneb is within the constellation of Aquila
(incorrect – needs to be Lepus)

GPT-3 predicts: The present Vice President of america is called Mike Pence
(out of date)

To repair such issues, a number of knowledge-editing strategies
have been proposed to insert new recollections straight into mannequin parameters.
But most of this these strategies are centered on updating a single
reminiscence within the mannequin, and it has been a problem to make use of these strategies
to replace greater than a handful of info. In observe we might need to insert
tons of or 1000’s of recent recollections so as to replace or enhance
a big mannequin.

On this work, we suggest MEMIT, a direct mannequin modifying technique that
is able to updating 1000’s of recollections directly.

How does it work?

MEMIT is a successor to our earlier work ROME, which performs a rank-one modification of the
MLP weights of a single layer to straight write a reminiscence into the mannequin. MEMIT builds upon ROME to insert many recollections by modifying the MLP weights
of a vary of important layers. We carry out causal tracing to discover a set of mediating MLP layers that recall recollections a couple of sure topic.
For GPT-J these layers are = {3, 4, 5, 6, 7, 8}.

See Also

Then for a set of recent recollections we calculate the replace Δ and unfold this Δ throughout all of the mediating MLP layers such that at
the ultimate layer the output of ultimate mediating layer captures all the brand new recollections.

In our paper, we derive and clarify the strategy intimately. We conduct benchmarks testing the power of MEMIT to scale on quite a lot of batch knowledge-editing duties, and we evaluate our technique to different approaches. Our code is open-source and available on Github.

How one can cite

This work isn’t but peer-reviewed. The preprint may be cited as follows.

Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. “Mass Modifying Reminiscence in a Transformer.” arXiv preprint arXiv:2210.07229 (2022).

  title={Mass Modifying Reminiscence in a Transformer},
  creator={Kevin Meng and Sen Sharma, Arnab and Alex Andonian and Yonatan Belinkov and David Bau},
  journal={arXiv preprint arXiv:2210.07229},
  12 months={2022}

Source Link

What's Your Reaction?
In Love
Not Sure
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top