[논문 정리]Memory Networks

2020. 7. 12. 10:49Machine Learning/NLP-UGRP

2014

https://arxiv.org/abs/1410.3916

 

Memory Networks

We describe a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly. The long-term memory can be read and written to, with the g

arxiv.org

Authors: Jason Weston, Sumit Chopra, Antoine Bordes

 

Search | arXiv e-print repository

Showing 1–33 of 33 results for author: Bordes, A arXiv:2006.12442  [pdf, other]  cs.CL cs.AI Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions Authors: Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes,

arxiv.org

[1. Introduction]

RNN's memory(encoded by hidden states and weights) is typically too small, and is not compartmentalized enough to accurately remember facts from the past(knowledge is compressed into dense vectors).

e.g., RNNs are known to have difficulty in performing memorization, for example the simple copying task of outputting the same input sequence they have just read.

 

In this work, we introduce a class of models called memory networks that attempt to rectify this problem.

The model is then trained to learn how to operate effectively with the memory component.

 

[2. Memory Networks]

m: an array of objects(array of vectors or an array of strings) indexed by m_i

four(potentially learned) components: I, G, O, R

I: (input feature map) - converts the incoming input to the internal feature representation.

G: (generalization) - updates old memories given the new input. We call this as there is an opportunity for the network to compress and generalize its memories at this stage for some intended future use.

O: (output feature map) - produces a new output(in the feature representation space), given the new input and the current memory state.

R: (response) - converts the output into the response format desired. e.g., a textual response or an action.

 

input x: word, sentence, image, audio signal

 

1. Convert x to an internal feature representation I(x)

2. Update memories m_i given the new input: m_i = G(m_i,I(x),m), ∀i.

3. Compute output features o given the new input and the memory: o = O(I(x),m).

4. Finally, decode output features o to give the final response: r = R(o).

 

memories are also stored at test time, but the model parameters of I, G, O and R are not updated.

I,G,O,R can use any existing ideas from the machine learning literature, e.g., make use of your favorite models(SVMs, decision trees, etc.).

 

I component: standard pre-processing.

G component: The simplest form of G is to store I(x) in a “slot” in the memory: m_H(x) = I(x),

H(.)는 slot을 고르는 함수.

G가 m의 index H(x)를 update함.

but all other parts of the memory remain untouched.