当前位置:网站首页>64 attention mechanism 10 chapters
64 attention mechanism 10 chapters
2022-07-24 04:18:00 【I want to send SCI】
![]()




Autonomy tips : I think ...、 See what you want to see
Involuntary prompt : Take a look at it 、 See what appears in the environment
The things of the environment are keys and values I want to inquire

#@save
def show_heatmaps(matrices, xlabel, ylabel, titles=None, figsize=(2.5, 2.5),
cmap='Reds'):
""" Display matrix heat map """
d2l.use_svg_display()
num_rows, num_cols = matrices.shape[0], matrices.shape[1]
fig, axes = d2l.plt.subplots(num_rows, num_cols, figsize=figsize,
sharex=True, sharey=True, squeeze=False)
for i, (row_axes, row_matrices) in enumerate(zip(axes, matrices)):
for j, (ax, matrix) in enumerate(zip(row_axes, row_matrices)):
pcm = ax.imshow(matrix.detach().numpy(), cmap=cmap)
if i == num_rows - 1:
ax.set_xlabel(xlabel)
if j == 0:
ax.set_ylabel(ylabel)
if titles:
ax.set_title(titles[j])
fig.colorbar(pcm, ax=axes, shrink=0.6);
attention_weights = torch.eye(10).reshape((1, 1, 10, 10)) show_heatmaps(attention_weights, xlabel='Keys', ylabel='Queries')
It doesn't say wow ....

n_train = 50 # The number of training samples x_train, _ = torch.sort(torch.rand(n_train) * 5) # Sorted training samples 0-5 Randomly generated between 50 A digital Then sort def f(x): return 2 * torch.sin(x) + x**0.8 y_train = f(x_train) + torch.normal(0.0, 0.5, (n_train,)) # Output of training samples Real function plus noise The noise is 0 mean value 0.5 variance gaussian x_test = torch.arange(0, 5, 0.1) # Test samples 0-5 Press... Between 0.1 Even things y_truth = f(x_test) # The real output of the test sample n_test = len(x_test) # Number of test samples n_test50、Drawing function
def plot_kernel_reg(y_hat): d2l.plot(x_test, [y_truth, y_hat], 'x', 'y', legend=['Truth', 'Pred'], xlim=[0, 5], ylim=[-1, 5]) d2l.plt.plot(x_train, y_train, 'o', alpha=0.5);

# X_repeat The shape of the :(n_test,n_train), # Each line contains the same test input ( for example : Same query ) X_repeat = x_test.repeat_interleave(n_train).reshape((-1, n_train)) repeat Well 1234 become 1111 222 333 444 n_train Control the number of repetitions # x_train Contains keys .attention_weights The shape of the :(n_test,n_train), # Each row contains the value of each query to be given (y_train) Between the distribution of attention The weight attention_weights = nn.functional.softmax(-(X_repeat - x_train)**2 / 2, dim=1) # y_hat Each element of is a weighted average of values , The weight is attention weight y_hat = torch.matmul(attention_weights, y_train) Predict by weight plot_kernel_reg(y_hat)

Top corner
The deepest
It should be the nearest So the weight is the largest Then the bottom right corner is the same
weights = torch.ones((2, 10)) * 0.1 values = torch.arange(20.0).reshape((2, 10)) torch.bmm(weights.unsqueeze(1), values.unsqueeze(-1))tensor([[[ 4.5000]], [[14.5000]]])

class NWKernelRegression(nn.Module):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.w = nn.Parameter(torch.rand((1,), requires_grad=True))
def forward(self, queries, keys, values):
# queries and attention_weights The shape of is ( Number of queries ,“ key - value ” Number of pairs )
queries = queries.repeat_interleave(keys.shape[1]).reshape((-1, keys.shape[1]))
self.attention_weights = nn.functional.softmax(
-((queries - keys) * self.w)**2 / 2, dim=1)
# values The shape of is ( Number of queries ,“ key - value ” Number of pairs )
return torch.bmm(self.attention_weights.unsqueeze(1),
values.unsqueeze(-1)).reshape(-1)
# X_tile The shape of the :(n_train,n_train), Each line contains the same training input X_tile = x_train.repeat((n_train, 1)) # Y_tile The shape of the :(n_train,n_train), Each line contains the same training output Y_tile = y_train.repeat((n_train, 1)) # keys The shape of the :('n_train','n_train'-1) keys = X_tile[(1 - torch.eye(n_train)).type(torch.bool)].reshape((n_train, -1)) # values The shape of the :('n_train','n_train'-1) values = Y_tile[(1 - torch.eye(n_train)).type(torch.bool)].reshape((n_train, -1))net = NWKernelRegression() loss = nn.MSELoss(reduction='none') trainer = torch.optim.SGD(net.parameters(), lr=0.5) animator = d2l.Animator(xlabel='epoch', ylabel='loss', xlim=[1, 5]) for epoch in range(5): trainer.zero_grad() l = loss(net(x_train, keys, values), y_train) l.sum().backward() trainer.step() print(f'epoch {epoch + 1}, loss {float(l.sum()):.6f}') animator.add(epoch + 1, float(l.sum()))

# keys The shape of the :(n_test,n_train), Each line contains the same training input ( for example , The same key ) keys = x_train.repeat((n_test, 1)) # value The shape of the :(n_test,n_train) values = y_train.repeat((n_test, 1)) y_hat = net(x_test, keys, values).unsqueeze(1).detach() plot_kernel_reg(y_hat)

d2l.show_heatmaps(net.attention_weights.unsqueeze(0).unsqueeze(0),
xlabel='Sorted training inputs',
ylabel='Sorted testing inputs')Didn't say much ....

query and keys Do the attention score function Then throw it to softmax Become attention weight For each map Do weighting and output

10.3. Attention scoring function — Hands-on deep learning 2.0.0-beta0 documentation
https://zh.d2l.ai/chapter_attention-mechanisms/attention-scoring-functions.htmlPytorch Attention score _ Wow, Kaka, a negative blog -CSDN Blog Environment use Kaggle Built for free in Notebook The tutorial uses Mr. Li Mu's Hands-on deep learning Website and Video Explanation tips : When you don't understand the function, you can press View function details . Focus (Pooling) expression :f(x)=∑iα(x,xi)yi=∑i=1nsoftmax(−12(x−xi)2)yif(x) = \sum_i{\alpha(x, x_i)y_i} = \sum_{i=1}^{n}softmax(-\frac{1}{2}(x-x_i)^2)y_if(x)=i∑α(x,xi)yi=i=1∑nsoftma
https://blog.csdn.net/qq_39906884/article/details/125248680?spm=1001.2014.3001.5502
3. Q&A
Q:mask_softmax What do you mean ?
A: Sometimes a sentence is not long enough , Say a sentence 4 Word , Input format requirements 10 Word , So you need to fill in 6 A meaningless word , And then use mask_softmax tell Query There is no need to consider the post 6 Word .
There is no learning
边栏推荐
- 可能有点用的冷知识
- ACM warm-up Exercise 4 in 2022 summer vacation (summary)
- 佳的性能和可靠性发起写入IIC协类型码和的参数是-4
- 训练赛《眼不见,心不烦,理不乱》题解
- What if the references in the word sent by others are {} in such a garbled format
- Design and implementation of data analysis platform for intelligent commerce
- 阿里淘系面试题:Redis 如何实现库存扣减操作和防止被超卖?
- [dish of learning notes, dog learning C] Dachang written test, is that it?
- Pat class a 1040 long symmetric string
- Alibaba Taobao Department interview question: how does redis realize inventory deduction and prevent oversold?
猜你喜欢

What are the 10 live demos showing? It's worth watching again whether you've seen it or not

6-15 vulnerability exploitation SMB rce remote command execution
![[translation] announce krius -- accelerate your monitoring and adoption of kubernetes](/img/6c/be19a910e60da701a054c4bf689000.jpg)
[translation] announce krius -- accelerate your monitoring and adoption of kubernetes

IP second experiment mGRE OSPF

Page Jump and redirection in flask framework

Leetcode 20 valid parentheses, 33 search rotation sort array, 88 merge two ordered arrays (nums1 length is m+n), 160 intersecting linked list, 54 spiral matrix, 415 character addition (cannot be direc

Worthington purified enzyme preparation helps neonatal cardiomyocyte isolation system

Energy principle and variational method note 11: shape function (a dimension reduction idea)

(零八)Flask有手就行——数据库迁移Flask-Migrate

Four characteristics of nb-iot
随机推荐
MPLS VPN cross domain -optionb
MOS cameraization and digitization "includes designation (contro. skilled
Introduction to the application fields and functions of bank virtual human technology
Insider of LAN SDN hard core technology 22 Kang long regrets -- Specifications and restrictions (Part 2)
What are the 10 live demos showing? It's worth watching again whether you've seen it or not
How safe is Volvo XC90? Come and have a look
flask框架中页面跳转与重定向
6-14 vulnerability exploitation rpcbind vulnerability exploitation
ACM warm-up Exercise 4 in 2022 summer vacation (summary)
【望解答】数据无法正确同步了
What new opportunities exist in the short video local life section?
【C语言】程序环境和预处理操作
Ambire wallet opens twitter spaces series
1.7.1 正误问题(中缀表达式)
Oracle的并行技术
PMIX ERROR: ERROR in file gds_ ds12_ lock_ pthread.c
发送数据1010_1发人员通过 字节的
Baidu search cracking down on pirated websites: why Internet content infringement continues despite repeated prohibitions
Leetcode 20 valid parentheses, 33 search rotation sort array, 88 merge two ordered arrays (nums1 length is m+n), 160 intersecting linked list, 54 spiral matrix, 415 character addition (cannot be direc
短视频本地生活版块,有哪些新的机会存在?
https://www.bilibili.com/video/BV1Tb4y167rb?p=3&spm_id_from=pageDriver&vd_source=eba877d881f216d635d2dfec9dc10379
https://zh.d2l.ai/chapter_attention-mechanisms/attention-scoring-functions.html