当前位置:网站首页>Calculate Euclidean distance and cosine similarity

Calculate Euclidean distance and cosine similarity

2022-06-23 05:18:00 Dream painter

In this paper, Python Calculate Euclidean distance and cosine similarity . For cosine similarity, we need to use Euclidean distance , Let's first introduce the European distance .

Euclidean distance

Euclidean distance identifies the distance between two vectors , The calculation formula is as follows :

Euclidean distance = Σ ( A i − B i ) 2 \sqrt{Σ(A_i-B_i)^2} Σ(AiBi)2

python Calculate the Euclidean distance , have access to numpy.linalg.norm function :

#  Import package 
import numpy as np
from numpy.linalg import norm

#  Define vector 
a = np.array([2, 6, 7, 7, 5, 13, 14, 17, 11, 8])
b = np.array([3, 5, 5, 3, 7, 12, 13, 19, 22, 7])

#  Calculate the Euclidean distance between two vectors 
norm(a-b)

# 12.409673645990857

The output shows that the Euclidean distance between the two vectors is :12.409673645990857

If two vectors are not equal in length , Function generates a warning :


import numpy as np
from numpy.linalg import norm


a = np.array([2, 6, 7, 7, 5, 13, 14])
b = np.array([3, 5, 5, 3, 7, 12, 13, 19, 22, 7])


norm(a-b)

#  Generate error messages , Can't broadcast 
# ValueError: operands could not be broadcast together with shapes (7,) (10,) 

You can also calculate the Euclidean distance for the columns of the data frame :


import pandas as pd 
import numpy as np
from numpy.linalg import norm

#  Define the data frame 
df = pd.DataFrame({'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#  stay  'points'  and  'assists'  Calculate Euclidean distance between two columns 
norm(df['points'] - df['assists'])

# 40.496913462633174

Cosine similarity

Cosine similarity measures the difference between two vectors by the cosine of the angle between two vectors in vector space . The closer the cosine is to 1, It shows that the closer the angle between two vectors is 0 degree , Then the more similar the two vectors are .

The calculation formula is as follows :

Cosine similarity = Σ A i B i / ( Σ A i 2 Σ B i 2 ) { ΣA_iB_i /(\sqrt{ΣA_i^2}\sqrt{ΣB_i^2}}) ΣAiBi/(ΣAi2ΣBi2)
 Insert picture description here

How to use NumPy The library calculates the cosine similarity of two vectors .

from numpy import dot
from numpy.linalg import norm

#  Define an array 
a = [23, 34, 44, 45, 42, 27, 33, 34]
b = [17, 18, 22, 26, 26, 29, 31, 30]

#  Calculate cosine similarity 
cos_sim = dot(a, b)/(norm(a)*norm(b))

cos_sim

0.965195008357566

norm Function to calculate the Euclidean distance ,dot Calculate the dot product of the vector .

You can also use this method for longer array lengths :

import numpy as np
from numpy import dot
from numpy.linalg import norm

#  Define an array 
a = np.random.randint(10, size=100)
b = np.random.randint(10, size=100)

#  Calculate cosine similarity 
cos_sim = dot(a, b)/(norm(a)*norm(b))

cos_sim

0.7340201613960431

The last thing to say is , If the length is inconsistent, an error will still be reported .

原网站

版权声明
本文为[Dream painter]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206230225337124.html