当前位置:网站首页>Bert-whitening 向量降维及使用
Bert-whitening 向量降维及使用
2022-06-24 13:04:00 【loong_XL】
参考:https://kexue.fm/archives/8069
https://kexue.fm/archives/9079
https://zhuanlan.zhihu.com/p/531476789
输入:vv是多个向量组成的三维矩阵
结果:v_data1 256维度
def compute_kernel_bias(vecs, n_components=256):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
# print(cov)
u, s, vh = np.linalg.svd(cov)
print(np.diag(1 / np.sqrt(s) ))
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W[:, :n_components], -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
""" 最终向量标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
v_data = np.array(vv[0]) ## vv[0]多个向量组成的二维矩阵,如果输入一个向量的二维矩阵计算会报错
kernel,bias=compute_kernel_bias(v_data)
# print(kernel,bias)
v_data1=transform_and_normalize(v_data, kernel=kernel, bias=bias)
***线上单个向量就把上面整体计算出的kernel,bias用上,直接transform_and_normalize(v_data, kernel=kernel, bias=bias)就行
import numpy as np
data = np.random.rand(5,768)
print('data.shape = ')
print(data.shape,data)
def compute_kernel_bias(vecs):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
u, s, vh = np.linalg.svd(cov)
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W, -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
"""应用变换,然后标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
kernel,bias = compute_kernel_bias(data)
kernel = kernel[:,:64]
print('kernel.shape = ')
print(kernel.shape)
print('bias.shape = ')
print(bias.shape)
data = transform_and_normalize(data, kernel, bias)
print('data.shape = ')
print(data.shape,data)

线上单个向量降维
data1 = np.random.rand(1,768)
data1_1 = transform_and_normalize(data1, kernel, bias)

边栏推荐
- laravel 8 实现Auth登录
- Keras深度学习实战(11)——可视化神经网络中间层输出
- 如何避免下重复订单
- 一文搞定 UDP 和 TCP 高频面试题!
- How to avoid placing duplicate orders
- 港股上市公司公告 API 数据接口
- leetcode:1504. 统计全 1 子矩形的个数
- Development of B2B transaction collaborative management platform for kitchen and bathroom electrical appliance industry and optimization of enterprise inventory structure
- 高薪程序员&面试题精讲系列115之Redis缓存如何实现?怎么发现热key?缓存时可能存在哪些问题?
- 远程办公之:在家露营办公小工具| 社区征文
猜你喜欢
随机推荐
One click to generate University, major and even admission probability. Is it so magical for AI to fill in volunteer cards?
Antd checkbox, limit the selected quantity
Getting to know cloud native security for the first time: the best guarantee in the cloud Era
Linux 安装 CenOS7 MySQL - 8.0.26
Record various sets of and or of mongotemplate once
STM32F1与STM32CubeIDE编程实例-WS2812B全彩LED驱动(基于SPI+DMA)
Jupyter notebook操作
专精特新“小巨人”再启动,“企业上云”数字赋能
Method of establishing unity thermodynamic diagram
[untitled]
Idea connection MySQL custom generated entity class code
Baidu map API drawing points and tips
21set classic case
在线文本实体抽取能力,助力应用解析海量文本数据
ssh-keygen 配置无需每次输入密码
postgresql之词法分析简介
GO语言并发模型-MPG模型
leetcode.12 --- 整数转罗马数字
[pytoch] quantification
卷积核、特征图可视化









![[untitled]](/img/6c/df2ebb3e39d1e47b8dd74cfdddbb06.gif)