当前位置:网站首页>Bert-whitening 向量降维及使用
Bert-whitening 向量降维及使用
2022-06-24 13:04:00 【loong_XL】
参考:https://kexue.fm/archives/8069
https://kexue.fm/archives/9079
https://zhuanlan.zhihu.com/p/531476789
输入:vv是多个向量组成的三维矩阵
结果:v_data1 256维度
def compute_kernel_bias(vecs, n_components=256):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
# print(cov)
u, s, vh = np.linalg.svd(cov)
print(np.diag(1 / np.sqrt(s) ))
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W[:, :n_components], -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
""" 最终向量标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
v_data = np.array(vv[0]) ## vv[0]多个向量组成的二维矩阵,如果输入一个向量的二维矩阵计算会报错
kernel,bias=compute_kernel_bias(v_data)
# print(kernel,bias)
v_data1=transform_and_normalize(v_data, kernel=kernel, bias=bias)
***线上单个向量就把上面整体计算出的kernel,bias用上,直接transform_and_normalize(v_data, kernel=kernel, bias=bias)就行
import numpy as np
data = np.random.rand(5,768)
print('data.shape = ')
print(data.shape,data)
def compute_kernel_bias(vecs):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
u, s, vh = np.linalg.svd(cov)
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W, -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
"""应用变换,然后标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
kernel,bias = compute_kernel_bias(data)
kernel = kernel[:,:64]
print('kernel.shape = ')
print(kernel.shape)
print('bias.shape = ')
print(bias.shape)
data = transform_and_normalize(data, kernel, bias)
print('data.shape = ')
print(data.shape,data)

线上单个向量降维
data1 = np.random.rand(1,768)
data1_1 = transform_and_normalize(data1, kernel, bias)

边栏推荐
- Win10 system problems
- 一文搞定 UDP 和 TCP 高频面试题!
- 业务与技术双向结合构建银行数据安全管理体系
- 文本对比学习综述
- laravel8使用faker调用工厂填充数据
- Development of digital Tibetan product system NFT digital Tibetan product system exception handling source code sharing
- Kunpeng arm server compilation and installation paddlepaddle
- Go language -init() function - package initialization
- Development of B2B transaction collaborative management platform for kitchen and bathroom electrical appliance industry and optimization of enterprise inventory structure
- Convolution kernel and characteristic graph visualization
猜你喜欢

The "little giant" specialized in special new products is restarted, and the "enterprise cloud" digital empowerment

远程办公之:在家露营办公小工具| 社区征文

Unit contour creation method

成功解决:selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This versi

The difference between V-IF and v-show

不要小看了积分商城,它的作用可以很大

数商云:加强供应商管理,助推航空运输企业与供应商高效协同

Daily knowledge popularization

tongweb使用之端口冲突处理办法

Common singleton mode & simple factory
随机推荐
STM32F1与STM32CubeIDE编程实例-WS2812B全彩LED驱动(基于SPI+DMA)
Defoaming
如何解决 Iterative 半监督训练 在 ASR 训练中难以落地的问题丨RTC Dev Meetup
MySQL log management, backup and recovery
二叉树中最大路径和[处理好任意一颗子树,就处理好了整个树]
取消冒泡
智慧园区SaaS管理系统解决方案:赋能园区实现信息化、数字化管理
Go language - use of goroutine coroutine
leetcode.12 --- 整数转罗马数字
【LeetCode】10、正则表达式匹配
10_那些格调很高的个性签名
Virtual machines on the same distributed port group but different hosts cannot communicate with each other
鲲鹏arm服务器编译安装PaddlePaddle
Unity 热力图建立方法
ESP32系列--ESP32各个系列对比
Unit contour creation method
21set classic case
v-if 和 v-show 的区别
Development of digital Tibetan product system NFT digital Tibetan product system exception handling source code sharing
数据库一些基本操作(提供了原数据库信息)