当前位置:网站首页>Scanpy (VII) spatial data analysis based on scanorama integrated scrna seq
Scanpy (VII) spatial data analysis based on scanorama integrated scrna seq
2022-06-25 08:19:00 【tzc_ fly】
Catalog
This article describes how to use multiple Visium Data sets , And how to use scRNA-seq Data sets for integration . We will use Scanorama To integrate . First we need to install Scanorama:
pip install scanorama
Load related frames :
import scanpy as sc
import anndata as an
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import scanorama
sc.logging.print_versions()
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.settings.verbosity = 3
""" anndata 0.8.0 scanpy 1.9.1 scanorama 1.7.2 """
Suggest reading first Paper reading notes - utilize Scanorama Efficient integration of heterogeneous single cell transcriptome
Reading the data
We will use two mouse brains Visium Spatial transcriptome dataset , The data set can be obtained from 10x genomics website obtain .
function datasets.visium_sge()
from 10x genomics Download the dataset and return adata object ( contain counts,images and spatial coordinates), We use pp.calculate_qc_metrics
Calculate quality control indicators , And visualize it .
adata_spatial_anterior = sc.datasets.visium_sge(
sample_id="V1_Mouse_Brain_Sagittal_Anterior"
) # Data sets 1
adata_spatial_posterior = sc.datasets.visium_sge(
sample_id="V1_Mouse_Brain_Sagittal_Posterior"
) # Data sets 2
adata_spatial_anterior
""" AnnData object with n_obs × n_vars = 2695 × 32285 obs: 'in_tissue', 'array_row', 'array_col' var: 'gene_ids', 'feature_types', 'genome' uns: 'spatial' obsm: 'spatial' """
adata_spatial_posterior
""" AnnData object with n_obs × n_vars = 3355 × 32285 obs: 'in_tissue', 'array_row', 'array_col' var: 'gene_ids', 'feature_types', 'genome' uns: 'spatial' obsm: 'spatial' """
adata_spatial_anterior.var_names_make_unique()
adata_spatial_posterior.var_names_make_unique()
sc.pp.calculate_qc_metrics(adata_spatial_anterior, inplace=True)
sc.pp.calculate_qc_metrics(adata_spatial_posterior, inplace=True)
for name, adata in [
("anterior", adata_spatial_anterior),
("posterior", adata_spatial_posterior),
]:
fig, axs = plt.subplots(1, 4, figsize=(12, 3))
fig.suptitle(f"Covariates for filtering: {
name}")
sns.distplot(adata.obs["total_counts"], kde=False, ax=axs[0])
sns.distplot(
adata.obs["total_counts"][adata.obs["total_counts"] < 20000],
kde=False,
bins=40,
ax=axs[1],
)
sns.distplot(adata.obs["n_genes_by_counts"], kde=False, bins=60, ax=axs[2])
sns.distplot(
adata.obs["n_genes_by_counts"][adata.obs["n_genes_by_counts"] < 4000],
kde=False,
bins=60,
ax=axs[3],
)
sc.datasets.visium_sge
The download is already filtered visium Data sets , By looking at QC indicators , We can observe that the sample contains almost no blank points .
We went further to visium Of counts Data standardization ( Use the built-in normalize_total
function ), Then detect hypervariable genes .
for adata in [
adata_spatial_anterior,
adata_spatial_posterior,
]:
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor="seurat", n_top_genes=2000, inplace=True)
""" normalizing counts per cell finished (0:00:00) If you pass `n_top_genes`, all cutoffs are ignored. extracting highly variable genes finished (0:00:00) --> added 'highly_variable', boolean vector (adata.var) 'means', float vector (adata.var) 'dispersions', float vector (adata.var) 'dispersions_norm', float vector (adata.var) normalizing counts per cell finished (0:00:00) If you pass `n_top_genes`, all cutoffs are ignored. extracting highly variable genes finished (0:00:00) --> added 'highly_variable', boolean vector (adata.var) 'means', float vector (adata.var) 'dispersions', float vector (adata.var) 'dispersions_norm', float vector (adata.var) """
View two datasets :
adata_spatial_anterior
""" AnnData object with n_obs × n_vars = 2695 × 32285 obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes' var: 'gene_ids', 'feature_types', 'genome', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm' uns: 'spatial', 'log1p', 'hvg' obsm: 'spatial' """
adata_spatial_posterior
""" AnnData object with n_obs × n_vars = 3355 × 32285 obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes' var: 'gene_ids', 'feature_types', 'genome', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm' uns: 'spatial', 'log1p', 'hvg' obsm: 'spatial' """
Data integration
Now? , We are ready to integrate these two datasets . As mentioned earlier , We will use Scanorama To achieve this .Scanorama The return contains two adata A list of ( Every adata Including the integrated embedding). About data consolidation , We can also use BBKNN or Ingest.
adatas = [adata_spatial_anterior, adata_spatial_posterior]
adatas_cor = scanorama.correct_scanpy(adatas, return_dimred=True)
adatas_cor
""" [AnnData object with n_obs × n_vars = 2695 × 32285 obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes' var: 'gene_ids', 'feature_types', 'genome', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm' uns: 'spatial', 'log1p', 'hvg' obsm: 'spatial', 'X_scanorama', AnnData object with n_obs × n_vars = 3355 × 32285 obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes' var: 'gene_ids', 'feature_types', 'genome', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm' uns: 'spatial', 'log1p', 'hvg' obsm: 'spatial', 'X_scanorama'] """
len(adatas_cor) # 2
adatas_cor[0].obsm['X_scanorama'].shape # (2695, 100)
then , We spliced the integrated embedding, The consolidated data set is adata_spatial
,embedding Save as adata_spatial.obsm['scanorama_embedding']
. Besides , We will calculate UMAP Visualize results and qualitatively evaluate data integration tasks .
Please note that , We use uns_merge="unique"
Policy connects two data sets , This is in order to connect anndata Reserved in object visium Two in the dataset images.
adata_spatial = sc.concat(
adatas_cor,
label="library_id",
uns_merge="unique",
keys=[
k
for d in [
adatas_cor[0].uns["spatial"],
adatas_cor[1].uns["spatial"],
]
for k, v in d.items()
],
index_unique="-",
)
sc.pp.neighbors(adata_spatial, use_rep="X_scanorama")
sc.tl.umap(adata_spatial)
sc.tl.leiden(adata_spatial, key_added="clusters")
""" computing neighbors finished: added to `.uns['neighbors']` `.obsp['distances']`, distances for each pair of neighbors `.obsp['connectivities']`, weighted adjacency matrix (0:00:06) computing UMAP finished: added 'X_umap', UMAP coordinates (adata.obsm) (0:00:09) running Leiden clustering finished: found 22 clusters and added 'clusters', the cluster labels (adata.obs, categorical) (0:00:00) """
adata_spatial
""" AnnData object with n_obs × n_vars = 6050 × 32285 obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'library_id', 'clusters' uns: 'spatial', 'log1p', 'hvg', 'neighbors', 'umap', 'leiden' obsm: 'spatial', 'X_scanorama', 'X_umap' obsp: 'distances', 'connectivities' """
sc.pl.umap(
adata_spatial, color=["clusters", "library_id"], palette=sc.pl.palettes.default_20
)
We can also visualize clustering results in spatial coordinates . So , We first need to save the cluster colors in the dictionary . then , We can visualize in the views of the two data sets , And display the results side by side .
clusters_colors = dict(
zip([str(i) for i in range(18)], adata_spatial.uns["clusters_colors"])
)
fig, axs = plt.subplots(1, 2, figsize=(15, 10))
for i, library in enumerate(
["V1_Mouse_Brain_Sagittal_Anterior", "V1_Mouse_Brain_Sagittal_Posterior"]
):
ad = adata_spatial[adata_spatial.obs.library_id == library, :].copy()
sc.pl.spatial(
ad,
img_key="hires",
library_id=library,
color="clusters",
size=1.5,
palette=[
v
for k, v in clusters_colors.items()
if k in ad.obs.clusters.unique().tolist()
],
legend_loc=None,
show=False,
ax=axs[i],
)
plt.tight_layout()
From these clusters , We can clearly see the layering in both organizations . This proof , Data set consolidation really works right .
边栏推荐
- Matlab code format one click beautification artifact
- Logu P2486 [sdoi2011] coloring (tree chain + segment tree + merging of intervals on the tree)
- LeetCode_ Hash table_ Medium_ 454. adding four numbers II
- Set the textalign property of the label control in C to control the method of text centering
- C disk drives, folders and file operations
- Luogu p5994 [pa2014]kuglarz (XOR thinking +mst)
- Find out the possible memory leaks caused by the handler and the solutions
- [supplementary question] 2021 Niuke summer multi school training camp 4-N
- Electronics: Lesson 012 - Experiment 11: light and sound
- C # set up FTP server and realize file uploading and downloading
猜你喜欢
Neural network and deep learning-3-simple example of machine learning pytorch
Electronics: Lesson 010 - Experiment 8: relay oscillator
2022年毕业生求职找工作青睐哪个行业?
The era of enterprise full cloud -- the future of cloud database
DNS protocol and its complete DNS query process
Electronics: Lesson 013 - Experiment 14: Wearable pulsed luminaries
Biweekly investment and financial report: capital ambush Web3 infrastructure
[thesis study] vqmivc
c#磁盘驱动器及文件夹还有文件类的操作
电子学:第008课——实验 6:非常简单的开关
随机推荐
FFT [template]
Basic record of getting started with PHP
使用apt-get命令如何安装软件?
Self made ramp, but it really smells good
Matlab代码格式一键美化神器
Electronics: Lesson 010 - Experiment 8: relay oscillator
Static web server
双周投融报:资本埋伏Web3基础设施
物联网毕设(智能灌溉系统 -- Android端)
共话云原生数据库的未来
Luogu p1073 [noip2009 improvement group] optimal trade (layered diagram + shortest path)
4个不可不知的采用“安全左移”的理由
打新债的安全性 有风险吗
六月集训(第25天) —— 树状数组
TCP stuff
Luogu p2048 [noi2010] super Piano (rmq+ priority queue)
Daily question brushing record (III)
Rqt command
自制坡道,可是真的很香
Nodehandle common member functions