当前位置：网站首页>64 attention mechanism 10 chapters

64 attention mechanism 10 chapters

2022-07-24 04:18:00 【I want to send SCI】

Autonomy tips ： I think ...、 See what you want to see

Involuntary prompt ： Take a look at it 、 See what appears in the environment

The things of the environment are keys and values I want to inquire

#@save
def show_heatmaps(matrices, xlabel, ylabel, titles=None, figsize=(2.5, 2.5),
                  cmap='Reds'):
    """ Display matrix heat map """
    d2l.use_svg_display()
    num_rows, num_cols = matrices.shape[0], matrices.shape[1]
    fig, axes = d2l.plt.subplots(num_rows, num_cols, figsize=figsize,
                                 sharex=True, sharey=True, squeeze=False)
    for i, (row_axes, row_matrices) in enumerate(zip(axes, matrices)):
        for j, (ax, matrix) in enumerate(zip(row_axes, row_matrices)):
            pcm = ax.imshow(matrix.detach().numpy(), cmap=cmap)
            if i == num_rows - 1:
                ax.set_xlabel(xlabel)
            if j == 0:
                ax.set_ylabel(ylabel)
            if titles:
                ax.set_title(titles[j])
    fig.colorbar(pcm, ax=axes, shrink=0.6);

attention_weights = torch.eye(10).reshape((1, 1, 10, 10))
show_heatmaps(attention_weights, xlabel='Keys', ylabel='Queries')

It doesn't say wow ....

n_train = 50  #  The number of training samples 
x_train, _ = torch.sort(torch.rand(n_train) * 5)   #  Sorted training samples 
0-5 Randomly generated between 50 A digital   Then sort 

def f(x):
    return 2 * torch.sin(x) + x**0.8

y_train = f(x_train) + torch.normal(0.0, 0.5, (n_train,))  #  Output of training samples 
 Real function plus noise   The noise is 0 mean value  0.5 variance   gaussian 
x_test = torch.arange(0, 5, 0.1)  #  Test samples  
0-5 Press... Between 0.1 Even things 
y_truth = f(x_test)  #  The real output of the test sample   
n_test = len(x_test)  #  Number of test samples 
n_test

50、

Drawing function

def plot_kernel_reg(y_hat):
    d2l.plot(x_test, [y_truth, y_hat], 'x', 'y', legend=['Truth', 'Pred'],
             xlim=[0, 5], ylim=[-1, 5])
    d2l.plt.plot(x_train, y_train, 'o', alpha=0.5);

# X_repeat The shape of the :(n_test,n_train),
#  Each line contains the same test input （ for example ： Same query ）
X_repeat = x_test.repeat_interleave(n_train).reshape((-1, n_train))
repeat  Well 1234 become 1111 222 333 444  n_train Control the number of repetitions 
# x_train Contains keys .attention_weights The shape of the ：(n_test,n_train),
#  Each row contains the value of each query to be given （y_train） Between the distribution of attention  The weight 
attention_weights = nn.functional.softmax(-(X_repeat - x_train)**2 / 2, dim=1)
# y_hat Each element of is a weighted average of values , The weight is attention weight 
y_hat = torch.matmul(attention_weights, y_train)
 Predict by weight 
plot_kernel_reg(y_hat)

Top corner

The deepest

It should be the nearest So the weight is the largest Then the bottom right corner is the same

weights = torch.ones((2, 10)) * 0.1
values = torch.arange(20.0).reshape((2, 10))
torch.bmm(weights.unsqueeze(1), values.unsqueeze(-1))

tensor([[[ 4.5000]],

        [[14.5000]]])

class NWKernelRegression(nn.Module):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.w = nn.Parameter(torch.rand((1,), requires_grad=True))

    def forward(self, queries, keys, values):
        # queries and attention_weights The shape of is ( Number of queries ,“ key － value ” Number of pairs )
        queries = queries.repeat_interleave(keys.shape[1]).reshape((-1, keys.shape[1]))
        self.attention_weights = nn.functional.softmax(
            -((queries - keys) * self.w)**2 / 2, dim=1)
        # values The shape of is ( Number of queries ,“ key － value ” Number of pairs )
        return torch.bmm(self.attention_weights.unsqueeze(1),
                         values.unsqueeze(-1)).reshape(-1)

# X_tile The shape of the :(n_train,n_train), Each line contains the same training input 
X_tile = x_train.repeat((n_train, 1))
# Y_tile The shape of the :(n_train,n_train), Each line contains the same training output 
Y_tile = y_train.repeat((n_train, 1))
# keys The shape of the :('n_train','n_train'-1)
keys = X_tile[(1 - torch.eye(n_train)).type(torch.bool)].reshape((n_train, -1))
# values The shape of the :('n_train','n_train'-1)
values = Y_tile[(1 - torch.eye(n_train)).type(torch.bool)].reshape((n_train, -1))

net = NWKernelRegression()
loss = nn.MSELoss(reduction='none')
trainer = torch.optim.SGD(net.parameters(), lr=0.5)
animator = d2l.Animator(xlabel='epoch', ylabel='loss', xlim=[1, 5])

for epoch in range(5):
    trainer.zero_grad()
    l = loss(net(x_train, keys, values), y_train)
    l.sum().backward()
    trainer.step()
    print(f'epoch {epoch + 1}, loss {float(l.sum()):.6f}')
    animator.add(epoch + 1, float(l.sum()))

# keys The shape of the :(n_test,n_train), Each line contains the same training input （ for example , The same key ）
keys = x_train.repeat((n_test, 1))
# value The shape of the :(n_test,n_train)
values = y_train.repeat((n_test, 1))
y_hat = net(x_test, keys, values).unsqueeze(1).detach()
plot_kernel_reg(y_hat)

d2l.show_heatmaps(net.attention_weights.unsqueeze(0).unsqueeze(0),
                  xlabel='Sorted training inputs',
                  ylabel='Sorted testing inputs')

Didn't say much ....

Pytorch Attention mechanism _ Wow, Kaka, a negative blog -CSDN Blog Environment use Kaggle Built for free in Notebook The tutorial uses Mr. Li Mu's Hands-on deep learning Website and Video Explanation tips ： When you don't understand the function, you can press View function details . No random clues （ Unconscious attention ）： Want to read ： Random clues （ Pay conscious attention to ） Nonparametric is not learning parameters .1exp(−2u2) that ：f(x)=∑i=1nexp⁡(−12(x−xi)2)∑j=1nexp⁡(−12(x−xj)2)yi=∑i=1nsoftmax⁡(−12(x−xi)2)yi\begin{aligned}f(x) &=\sum_{i=1}^{n} \https://blog.csdn.net/qq_39906884/article/details/125245275?spm=1001.2014.3001.5502

query and keys Do the attention score function Then throw it to softmax Become attention weight For each map Do weighting and output

65 Attention score 【 Hands-on deep learning v2】_ Bili, Bili _bilibili Hands-on deep learning v2 - Introduce the home page of deep learning algorithm and code implementation course from scratch ：https://courses.d2l.ai/zh-v2/ The teaching material ：https://zh-v2.d2l.ai/, Video playback volume 54299、 The amount of bullet screen 410、 Number of likes 920、 The number of coins dropped 771、 Number of collectors 312、 Forwarding number 45, Video author Learn from Li Mu AI, Author's brief introduction , Related video ：66 Using the attention mechanism seq2seq【 Hands-on deep learning v2】, Current situation of AI graduate students , Online passionate explanation transformer&Attention Attention mechanism （ On ）,【Transformer Model 】 Graceful animation is easy to learn , Image metaphor thief easy to remember , Hua Zhibing's real body and AI Sync , Train yourself AI Play king glory ,GAN Intensive reading of the thesis paragraph by paragraph 【 Intensive reading 】,【 So that's it 】 Attention mechanism in deep learning (attention) The true origin of ,PyTorch Deep learning quick start tutorial （ Absolutely easy to understand ！）【 Small mound 】, Hand push transformerhttps://www.bilibili.com/video/BV1Tb4y167rb?p=3&spm_id_from=pageDriver&vd_source=eba877d881f216d635d2dfec9dc10379

10.3. Attention scoring function — Hands-on deep learning 2.0.0-beta0 documentationhttps://zh.d2l.ai/chapter_attention-mechanisms/attention-scoring-functions.html Pytorch Attention score _ Wow, Kaka, a negative blog -CSDN Blog Environment use Kaggle Built for free in Notebook The tutorial uses Mr. Li Mu's Hands-on deep learning Website and Video Explanation tips ： When you don't understand the function, you can press View function details . Focus （Pooling） expression ：f(x)=∑iα(x,xi)yi=∑i=1nsoftmax(−12(x−xi)2)yif(x) = \sum_i{\alpha(x, x_i)y_i} = \sum_{i=1}^{n}softmax(-\frac{1}{2}(x-x_i)^2)y_if(x)=i∑α(x,xi)yi=i=1∑nsoftmahttps://blog.csdn.net/qq_39906884/article/details/125248680?spm=1001.2014.3001.5502

3. Q&A

Q：mask_softmax What do you mean ？
A： Sometimes a sentence is not long enough , Say a sentence 4 Word , Input format requirements 10 Word , So you need to fill in 6 A meaningless word , And then use mask_softmax tell Query There is no need to consider the post 6 Word .

There is no learning

原网站

版权声明
本文为[I want to send SCI]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/204/202207222005590665.html

当前位置：网站首页>64 attention mechanism 10 chapters

64 attention mechanism 10 chapters

3. Q&A

边栏推荐

猜你喜欢

随机推荐