当前位置:网站首页>Bert-whitening 向量降维及使用
Bert-whitening 向量降维及使用
2022-06-24 13:04:00 【loong_XL】
参考:https://kexue.fm/archives/8069
https://kexue.fm/archives/9079
https://zhuanlan.zhihu.com/p/531476789
输入:vv是多个向量组成的三维矩阵
结果:v_data1 256维度
def compute_kernel_bias(vecs, n_components=256):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
# print(cov)
u, s, vh = np.linalg.svd(cov)
print(np.diag(1 / np.sqrt(s) ))
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W[:, :n_components], -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
""" 最终向量标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
v_data = np.array(vv[0]) ## vv[0]多个向量组成的二维矩阵,如果输入一个向量的二维矩阵计算会报错
kernel,bias=compute_kernel_bias(v_data)
# print(kernel,bias)
v_data1=transform_and_normalize(v_data, kernel=kernel, bias=bias)
***线上单个向量就把上面整体计算出的kernel,bias用上,直接transform_and_normalize(v_data, kernel=kernel, bias=bias)就行
import numpy as np
data = np.random.rand(5,768)
print('data.shape = ')
print(data.shape,data)
def compute_kernel_bias(vecs):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
u, s, vh = np.linalg.svd(cov)
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W, -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
"""应用变换,然后标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
kernel,bias = compute_kernel_bias(data)
kernel = kernel[:,:64]
print('kernel.shape = ')
print(kernel.shape)
print('bias.shape = ')
print(bias.shape)
data = transform_and_normalize(data, kernel, bias)
print('data.shape = ')
print(data.shape,data)

线上单个向量降维
data1 = np.random.rand(1,768)
data1_1 = transform_and_normalize(data1, kernel, bias)

边栏推荐
- Getting to know cloud native security for the first time: the best guarantee in the cloud Era
- 10 Ces autographes très stylisés.
- Keras深度学习实战(11)——可视化神经网络中间层输出
- Detailed explanation of redis data types
- [pytoch] quantification
- 港股上市公司公告 API 数据接口
- v-if 和 v-show 的区别
- 二造考生必看|巩固优选题库助力考生最后冲刺
- R语言plotly可视化:可视化模型在整个数据空间的分类轮廓线(等高线)、meshgrid创建一个网格,其中每个点之间的距离由mesh_size变量表示、使用不同的形状标签表征、训练、测试及分类标签
- Convolution kernel and characteristic graph visualization
猜你喜欢

The function and principle of key in V-for

P2PDB 白皮书

laravel下视图间共享数据

Method of establishing unity thermodynamic diagram

【比特熊故事汇】6月MVP英雄故事|技术实践碰撞境界思维

打败 二叉树!

遠程辦公之:在家露營辦公小工具| 社區征文

Second, the examinee must see | consolidate the preferred question bank to help the examinee make the final dash

融云通信“三板斧”,“砍”到了银行的心坎上
![Generate binary tree according to preorder & inorder traversal [partition / generation / splicing of left subtree | root | right subtree]](/img/f7/8d026c0e4435fc8fd7a63616b4554d.png)
Generate binary tree according to preorder & inorder traversal [partition / generation / splicing of left subtree | root | right subtree]
随机推荐
Rongyun communication has "hacked" into the heart of the bank
leetcode.12 --- 整数转罗马数字
Idea connection MySQL custom generated entity class code
Some basic database operations (providing the original database information)
[deep learning] storage form of nchw, nhwc and chwn format data
Method of establishing unity thermodynamic diagram
unity 等高线创建方法
conda和pip命令
在CVS中恢复到早期版本
Three efficient programming skills of go language
Solution of channel management system for food and beverage industry: realize channel digital marketing layout
Go language -init() function - package initialization
简谈企业Power BI CI /CD 实施框架
Maximum path sum in binary tree [handle any subtree, then handle the whole tree]
R语言plotly可视化:使用plotly可视化数据划分后的训练集和测试集、使用不同的形状标签表征、训练集、测试集、以及数据集的分类标签(Display training and test split
`Thymeleaf`模板引擎全面解析
SSH keygen configuration does not require entering a password every time
Defoaming
ASCII code table extracted from tanhaoqiang's C program design (comparison table of common characters and ASCII codes)
业务与技术双向结合构建银行数据安全管理体系