2020. 7. 12. 10:49ㆍMachine Learning/NLP-UGRP
2014
https://arxiv.org/abs/1410.3916
Authors: Jason Weston, Sumit Chopra, Antoine Bordes
[1. Introduction]
RNN's memory(encoded by hidden states and weights) is typically too small, and is not compartmentalized enough to accurately remember facts from the past(knowledge is compressed into dense vectors).
e.g., RNNs are known to have difficulty in performing memorization, for example the simple copying task of outputting the same input sequence they have just read.
In this work, we introduce a class of models called memory networks that attempt to rectify this problem.
The model is then trained to learn how to operate effectively with the memory component.
[2. Memory Networks]
m: an array of objects(array of vectors or an array of strings) indexed by m_i
four(potentially learned) components: I, G, O, R
I: (input feature map) - converts the incoming input to the internal feature representation.
G: (generalization) - updates old memories given the new input. We call this as there is an opportunity for the network to compress and generalize its memories at this stage for some intended future use.
O: (output feature map) - produces a new output(in the feature representation space), given the new input and the current memory state.
R: (response) - converts the output into the response format desired. e.g., a textual response or an action.
input x: word, sentence, image, audio signal
1. Convert x to an internal feature representation I(x)
2. Update memories m_i given the new input: m_i = G(m_i,I(x),m), ∀i.
3. Compute output features o given the new input and the memory: o = O(I(x),m).
4. Finally, decode output features o to give the final response: r = R(o).
memories are also stored at test time, but the model parameters of I, G, O and R are not updated.
I,G,O,R can use any existing ideas from the machine learning literature, e.g., make use of your favorite models(SVMs, decision trees, etc.).
I component: standard pre-processing.
G component: The simplest form of G is to store I(x) in a “slot” in the memory: m_H(x) = I(x),
H(.)는 slot을 고르는 함수.
G가 m의 index H(x)를 update함.
but all other parts of the memory remain untouched.
'Machine Learning > NLP-UGRP' 카테고리의 다른 글
Memory network (0) | 2020.07.21 |
---|---|
[논문정리] StyleNet: Generating Attractive Visual Captions with Styles (0) | 2020.07.13 |
[논문 정리]End-To-End Memory Networks (0) | 2020.07.12 |
데이터 전처리 (0) | 2020.07.07 |
GloVe(글로브) 모델 (0) | 2020.07.05 |