当前位置：网站首页>Systematic analysis of social networks using Networkx: Facebook network analysis case

Systematic analysis of social networks using Networkx: Facebook network analysis case

2022-06-27 01:13:00 【Zhiyuan community】

introduction ： This issue is for you to share NetworkX Official social network analysis cases ： Facebook Network analysis [1], To further deepen the understanding of the basic knowledge of complex networks .

This notebook contains a social network analysis , Mainly used NetworkX Executed by . say concretely , Will be right 10 Individual facebook circle ( The friends list ) Conduct inspection and review , To extract all kinds of valuable information . Datasets can be found on the Stanford University website [2] find . Besides , as everyone knows ,facebook The network is undirected , No weight , Because a user may only be friends with another user once . From the point of view of graph analysis, data sets ：

* Each node represents an anonymous facebook user , He belongs to one of the ten friends list .

* Each edge corresponds to two... Belonging to the network facebook User friendships . let me put it another way , Two users must be in facebook You can only connect to a specific network by being friends on .

The official jupyter notebook Sample code ：

First , Import necessary Libraries ：

import pandas as pdimport numpy as npimport networkx as nximport matplotlib.pyplot as pltfrom random import randint

Load edge data from the data folder , Each side is a new line , There is one on each side start_node And a end_node Column ：

facebook = pd.read_csv(    "data/facebook_combined.txt.gz",    compression="gzip",    sep=" ",    names=["start_node", "end_node"],)

Creating networks ：

G = nx.from_pandas_edgelist(facebook, "start_node", "end_node")

Visualization network ： Because we have no real sense of the structure of the data , So let's start with random_layout Check out the Internet , This is one of the fastest layout functions .

fig, ax = plt.subplots(figsize=(15, 9))ax.axis("off")plot_options = {"node_size": 10, "with_labels": False, "width": 0.15}nx.draw_networkx(G, pos=nx.random_layout(G), ax=ax, **plot_options)

The generated graph is not very useful , This graphical visualization is sometimes popularly referred to as “ Hairball ”, Because overlapping edges can cause entanglement confusion .

Obviously , If we want to get a sense of the data , We need to apply more structure to the positioning . So , We can use spring_layout function , It is networkx The default layout function of the drawing module .spring_layout The advantage of the function is that it considers nodes and edges to calculate the position of nodes . However , The disadvantage is that the computational cost of this process is much higher , And for having 100 Nodes and 1000 For a graph with two edges, it will be very slow .

Because our data set has more than 80k side , We will limit spring_layout Number of iterations used in the function , To reduce the calculation time . We will also save the calculated layout , In order to use it in future Visualization .

pos = nx.spring_layout(G, iterations=15, seed=1721)fig, ax = plt.subplots(figsize=(15, 9))ax.axis("off")nx.draw_networkx(G, pos=pos, ax=ax, **plot_options)

* Get the basic topology properties of the network ：

# Number of nodes

G.number_of_nodes()4039#  Number of edges G.number_of_edges()88234np.mean([d for _, d in G.degree()])43.69101262688784shortest_path_lengths = dict(nx.all_pairs_shortest_path_length(G))# Length of shortest path between nodes 0 and 42shortest_path_lengths[0][42]  1diameter = max(nx.eccentricity(G, sp=shortest_path_lengths).values())diameter8# Compute the average shortest path length for each nodeaverage_path_lengths = [    np.mean(list(spl.values())) for spl in shortest_path_lengths.values()]# The average over all nodesnp.mean(average_path_lengths)3.691592636562027

The above results represent the average length of the shortest path for all nodes ： To get from one node to another , The average traversal time is about 3.6 side .

The above metrics capture useful information about the network , But a measure like the average represents only one moment of the distribution . We can calculate in advance dict-of-dicts Build a visualization of the shortest path length distribution ：

# We know the maximum shortest path length (the diameter), so create an array# to store values from 0 up to (and including) diameterpath_lengths = np.zeros(diameter + 1, dtype=int)# Extract the frequency of shortest path lengths between two nodesfor pls in shortest_path_lengths.values():    pl, cnts = np.unique(list(pls.values()), return_counts=True)    path_lengths[pl] += cnts# Express frequency distribution as a percentage (ignoring path lengths of 0)freq_percent = 100 * path_lengths[1:] / path_lengths[1:].sum()# Plot the frequency distribution (ignoring path lengths of 0) as a percentagefig, ax = plt.subplots(figsize=(15, 8))ax.bar(np.arange(1, diameter + 1), height=freq_percent)ax.set_title(    "Distribution of shortest path length in G", fontdict={"size": 35}, loc="center")ax.set_xlabel("Shortest Path Length", fontdict={"size": 22})ax.set_ylabel("Frequency (%)", fontdict={"size": 22})

Most shortest paths are from 2 Strip edge to 5 The length of the edge . Besides , For a pair of nodes , The shortest path length is 8( Diameter length ) It's very unlikely , Because it is less likely than 0.1%.

Calculate the density of the graph , obviously , This is a very sparse graph . And the number of components included in the diagram , As expected , This network is made up of a huge component ：

nx.density(G)0.010819963503439287nx.number_connected_components(G)1

Next , Yes facebook The central index of the network ：

* Centrality ： Degree centrality simply assigns an important score according to the number of links each node has . In this analysis , This means that the more central a node is , The more edges connected to this node , Therefore, the neighbor nodes of this node (facebook Good friends ) The more . in fact , The centrality of a node is the score of the nodes it is connected to . let me put it another way , It is the percentage of a particular node in the network that has a relationship with friends .

First , We find the node with the highest centrality . among ,8 The nodes with the highest degree centrality and their degree centrality are shown in the following figure ：

degree_centrality = nx.centrality.degree_centrality(G)  # save results in a variable to use again(sorted(degree_centrality.items(), key=lambda item: item[1], reverse=True))[:8][(107, 0.258791480931154), (1684, 0.1961367013372957), (1912, 0.18697374938088163), (3437, 0.13546310054482416), (0, 0.08593363051015354), (2543, 0.07280832095096582), (2347, 0.07206537890044576), (1888, 0.0629024269440317)]

Now we can also see the number of neighbors of the node with the highest centrality ：

(sorted(G.degree, key=lambda item: item[1], reverse=True))[:8][(107, 1045), (1684, 792), (1912, 755), (3437, 547), (0, 347), (2543, 294), (2347, 291), (1888, 254)]

Plot the distribution of degree centrality ：

plt.figure(figsize=(15, 8))plt.hist(degree_centrality.values(), bins=25)plt.xticks(ticks=[0, 0.025, 0.05, 0.1, 0.15, 0.2])  # set the x axis ticksplt.title("Degree Centrality Histogram ", fontdict={"size": 35}, loc="center")plt.xlabel("Degree Centrality", fontdict={"size": 20})plt.ylabel("Counts", fontdict={"size": 20})

Now let's check the user with the highest centrality according to the size of the node ：

node_size = [    v * 1000 for v in degree_centrality.values()]  # set up nodes size for a nice graph representationplt.figure(figsize=(15, 8))nx.draw_networkx(G, pos=pos, node_size=node_size, with_labels=False, width=0.15)plt.axis("off")

* Betweenness centrality ：

betweenness_centrality = nx.centrality.betweenness_centrality(    G)  # save results in a variable to use again(sorted(betweenness_centrality.items(), key=lambda item: item[1], reverse=True))[:8][(107, 0.4805180785560152), (1684, 0.3377974497301992), (3437, 0.23611535735892905), (1912, 0.2292953395868782), (1085, 0.14901509211665306), (0, 0.14630592147442917), (698, 0.11533045020560802), (567, 0.09631033121856215)]plt.figure(figsize=(15, 8))plt.hist(betweenness_centrality.values(), bins=100)plt.xticks(ticks=[0, 0.02, 0.1, 0.2, 0.3, 0.4, 0.5])  # set the x axis ticksplt.title("Betweenness Centrality Histogram ", fontdict={"size": 35}, loc="center")plt.xlabel("Betweenness Centrality", fontdict={"size": 20})plt.ylabel("Counts", fontdict={"size": 20})

Network visualization is carried out according to the size of the intermediate value ：

node_size = [    v * 1200 for v in betweenness_centrality.values()]  # set up nodes size for a nice graph representationplt.figure(figsize=(15, 8))nx.draw_networkx(G, pos=pos, node_size=node_size, with_labels=False, width=0.15)plt.axis("off")

* Proximity centrality ：

closeness_centrality = nx.centrality.closeness_centrality(    G)  # save results in a variable to use again(sorted(closeness_centrality.items(), key=lambda item: item[1], reverse=True))[:8][(107, 0.45969945355191255), (58, 0.3974018305284913), (428, 0.3948371956585509), (563, 0.3939127889961955), (1684, 0.39360561458231796), (171, 0.37049270575282134), (348, 0.36991572004397216), (483, 0.3698479575013739)]#  Besides , A specific node v The average distance to any other node can also be easily calculated by a formula ：1 / closeness_centrality[107]2.1753343239227343

plt.figure(figsize=(15, 8))plt.hist(closeness_centrality.values(), bins=60)plt.title("Closeness Centrality Histogram ", fontdict={"size": 35}, loc="center")plt.xlabel("Closeness Centrality", fontdict={"size": 20})plt.ylabel("Counts", fontdict={"size": 20})

node_size = [    v * 50 for v in closeness_centrality.values()]  # set up nodes size for a nice graph representationplt.figure(figsize=(15, 8))nx.draw_networkx(G, pos=pos, node_size=node_size, with_labels=False, width=0.15)plt.axis("off")

And the centrality of eigenvectors , In a similar way i.e

The above chart is available .

* Agglomeration coefficient ：

# Average agglomeration coefficient

nx.average_clustering(G)0.6055467186200876plt.figure(figsize=(15, 8))plt.hist(nx.clustering(G).values(), bins=50)plt.title("Clustering Coefficient Histogram ", fontdict={"size": 35}, loc="center")plt.xlabel("Clustering Coefficient", fontdict={"size": 20})plzt.ylabel("Counts", fontdict={"size": 20})

* Bridge ：

nx.has_bridges(G)True

# Number of output bridges

bridges = list(nx.bridges(G))len(bridges)75

plt.figure(figsize=(15, 8))nx.draw_networkx(G, pos=pos, node_size=10, with_labels=False, width=0.15)nx.draw_networkx_edges(    G, pos, edgelist=local_bridges, width=0.5, edge_color="lawngreen")  # green color for local bridgesnx.draw_networkx_edges(    G, pos, edgelist=bridges, width=0.5, edge_color="r")  # red color for bridgesplt.axis("off")

* Network correlation coefficient ：

nx.degree_assortativity_coefficient(G)0.06357722918564943nx.degree_pearson_correlation_coefficient(G)  0.06357722918564918

* The Internet Community ： A community is a set of nodes , So the nodes in the group are connected

There are many more edges connected between groups . The network will use two different methods

The same algorithm for community detection .

First , A semi synchronous label propagation method is used to detect the community ：

This function determines by itself the number of communities that will be detected . Now we will traverse the community

District , And create a list of colors , For nodes belonging to the same community

Contains the same color . Besides , The number of communities is also printed out :

colors = ["" for x in range(G.number_of_nodes())]  # initialize colors listcounter = 0for com in nx.community.label_propagation_communities(G):    color = "#%06X" % randint(0, 0xFFFFFF)  # creates random RGB color    counter += 1    for node in list(        com    ):  # fill colors list with the particular color for the community nodes        colors[node] = colorcounter44plt.figure(figsize=(15, 9))plt.axis("off")nx.draw_networkx(    G, pos=pos, node_size=10, with_labels=False, width=0.15, node_color=colors)

secondly , Using asynchronous fluid group algorithm ：

colors = ["" for x in range(G.number_of_nodes())]for com in nx.community.asyn_fluidc(G, 8, seed=0):    color = "#%06X" % randint(0, 0xFFFFFF)  # creates random RGB color    for node in list(com):        colors[node] = colorplt.figure(figsize=(15, 9))plt.axis("off")nx.draw_networkx(    G, pos=pos, node_size=10, with_labels=False, width=0.15, node_color=colors)

[1]https://networkx.org/nx-guides/content/exploratory_notebooks/facebook_notebook.html#id2
[2]http://snap.stanford.edu/data/ego-Facebook.html

原网站

版权声明
本文为[Zhiyuan community]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/178/202206270032476314.html

当前位置：网站首页>Systematic analysis of social networks using Networkx: Facebook network analysis case

Systematic analysis of social networks using Networkx: Facebook network analysis case

边栏推荐

猜你喜欢

随机推荐