当前位置:网站首页>Bert-whitening 向量降维及使用
Bert-whitening 向量降维及使用
2022-06-24 13:04:00 【loong_XL】
参考:https://kexue.fm/archives/8069
https://kexue.fm/archives/9079
https://zhuanlan.zhihu.com/p/531476789
输入:vv是多个向量组成的三维矩阵
结果:v_data1 256维度
def compute_kernel_bias(vecs, n_components=256):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
# print(cov)
u, s, vh = np.linalg.svd(cov)
print(np.diag(1 / np.sqrt(s) ))
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W[:, :n_components], -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
""" 最终向量标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
v_data = np.array(vv[0]) ## vv[0]多个向量组成的二维矩阵,如果输入一个向量的二维矩阵计算会报错
kernel,bias=compute_kernel_bias(v_data)
# print(kernel,bias)
v_data1=transform_and_normalize(v_data, kernel=kernel, bias=bias)
***线上单个向量就把上面整体计算出的kernel,bias用上,直接transform_and_normalize(v_data, kernel=kernel, bias=bias)就行
import numpy as np
data = np.random.rand(5,768)
print('data.shape = ')
print(data.shape,data)
def compute_kernel_bias(vecs):
"""计算kernel和bias
vecs.shape = [num_samples, embedding_size],
最后的变换:y = (x + bias).dot(kernel)
"""
mu = vecs.mean(axis=0, keepdims=True)
cov = np.cov(vecs.T)
u, s, vh = np.linalg.svd(cov)
W = np.dot(u, np.diag(1 / np.sqrt(s)))
return W, -mu
def transform_and_normalize(vecs, kernel=None, bias=None):
"""应用变换,然后标准化
"""
if not (kernel is None or bias is None):
vecs = (vecs + bias).dot(kernel)
return vecs / (vecs**2).sum(axis=1, keepdims=True)**0.5
kernel,bias = compute_kernel_bias(data)
kernel = kernel[:,:64]
print('kernel.shape = ')
print(kernel.shape)
print('bias.shape = ')
print(bias.shape)
data = transform_and_normalize(data, kernel, bias)
print('data.shape = ')
print(data.shape,data)

线上单个向量降维
data1 = np.random.rand(1,768)
data1_1 = transform_and_normalize(data1, kernel, bias)

边栏推荐
- box-sizing
- Zhiyuan community weekly 86: Gary Marcus talks about three linguistic factors that can be used for reference in large model research; Google puts forward the Wensheng graph model parti which is compar
- One click to generate University, major and even admission probability. Is it so magical for AI to fill in volunteer cards?
- 【ansible问题处理】远程执行用户环境变量加载问题
- Overview of SAP marketing cloud functions (III)
- [learn ZABBIX from scratch] I. Introduction and deployment of ZABBIX
- 成功解决:selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This versi
- CONDA and pip commands
- 二造考生必看|巩固优选题库助力考生最后冲刺
- ssh-keygen 配置无需每次输入密码
猜你喜欢

The function and principle of key in V-for

Qunhui synchronizes with alicloud OSS

Virtual machines on the same distributed port group but different hosts cannot communicate with each other

The difference between V-IF and v-show

P2PDB 白皮书

Mit-6.824-lab4a-2022 (ten thousand words explanation - code construction)

markdown/LaTeX中在字母下方输入圆点的方法

百度地图API绘制点及提示信息

SAP Marketing Cloud 功能概述(四)

鲲鹏arm服务器编译安装PaddlePaddle
随机推荐
根据前序&中序遍历生成二叉树[左子树|根|右子树的划分/生成/拼接问题]
Overview of SAP marketing cloud functions (IV)
Go language - use of goroutine coroutine
R语言plotly可视化:可视化模型在整个数据空间的分类轮廓线(等高线)、meshgrid创建一个网格,其中每个点之间的距离由mesh_size变量表示、使用不同的形状标签表征、训练、测试及分类标签
ssh-keygen 配置无需每次输入密码
数据库一些基本操作(提供了原数据库信息)
Some basic database operations (providing the original database information)
在宇宙的眼眸下,如何正确地关心东数西算?
Online text entity extraction capability helps applications analyze massive text data
Development of B2B transaction collaborative management platform for kitchen and bathroom electrical appliance industry and optimization of enterprise inventory structure
Getting to know cloud native security for the first time: the best guarantee in the cloud Era
c语言---18 函数(自定义函数)
智源社区周刊#86:Gary Marcus谈大模型研究可借鉴的三个语言学因素;谷歌提出媲美Imgen的文生图模型Parti;OpenAI提出视频预训练模型VPT,可玩MC游戏
Convolution kernel and characteristic graph visualization
v-for 中 key的作用和原理
Kotlin shared mutable state and concurrency
文本对比学习综述
MySQL复合索引探究
pgsql查询分组中某个字段最大或者最小的一条数据
Baidu map API drawing points and tips