Seurat Basic Analysis Workflow

Objective

Preprocess quality-controlled scRNA-seq data for downstream analyses such as comparative analysis.

Overview of Seurat Basic Analysis Workflow

  • Normalize sequencing depth differences among cells
  • Reduce dimensionality of high-dimensional data
  • Integrate datasets measured under different conditions
  • Classify similar cell populations

Steps in the Seurat Basic Analysis Workflow

  1. Data normalization
  2. Identification of highly variable genes (HVGs)
  3. Data scaling
  4. Principal Component Analysis (PCA)
  5. Data integration
  6. k-nearest neighbor (kNN) graph construction
  7. Clustering
  8. Uniform Manifold Approximation and Projection (UMAP)

Load Required Libraries

library(Seurat)
library(ggplot2)

Load Data

Load the Seurat object created in the 02_quality_control step.

merged4norm_seurat_obj <- readRDS("./output/02_quality_control/obj/merged4norm_seurat_obj.RDS")

Seurat Basic Analysis Workflow

Seurat V5 allows simultaneous processing of multiple datasets stored in layer of a single Seurat object.

1. Data Normalization

Since total read counts vary across cells, normalization is required to enable comparison. The default method scales the total read count per cell to 10,000, followed by log transformation.

norm_seurat_obj <- NormalizeData(merged4norm_seurat_obj)

2. Identification of Highly Variable Genes (HVGs)

Highly variable genes are crucial for defining cellular characteristics, while genes with low variability (e.g., housekeeping genes) are expressed similarly across all cells. By default, 2,000 genes are selected.

norm_seurat_obj <- FindVariableFeatures(norm_seurat_obj)

Visualizing the distribution of HVGs:

VariableFeaturePlot(norm_seurat_obj)
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Removed 2035 rows containing missing values (`geom_point()`).

3. Data Scaling

Preparation for PCA. Scaling enables gene-to-gene comparison. By default, scaling is performed only on HVGs to reduce computational load. Each gene is transformed to have a mean of 0 and a standard deviation of 1.

norm_seurat_obj <- ScaleData(norm_seurat_obj)

4. Principal Component Analysis (PCA)

Dimensionality reduction is performed to convert high-dimensional scRNA-seq data into a lower-dimensional representation while preserving its essential features.

norm_seurat_obj <- RunPCA(norm_seurat_obj)

5. Data Integration

Integration of datasets across experimental batches, donors, or conditions is a crucial step in scRNA-seq workflows. Seurat V5 introduces the IntegrateLayers function, which supports multiple integration methods.

Canonical Correlation Analysis (CCA) Integration
integ_norm_seurat_obj <- IntegrateLayers(
  object = norm_seurat_obj,
  method = CCAIntegration,
  orig.reduction = "pca",
  new.reduction = "integrated.cca",
  verbose = FALSE
)

6. k-Nearest Neighbor (kNN) Graph Construction

Preparation for clustering. The reduction argument specifies the dimensionality reduction result to be used.

integ_norm_seurat_obj <- FindNeighbors(integ_norm_seurat_obj, 
                                       reduction = "integrated.cca",
                                       dims = 1:30)

7. Clustering

Cells are classified based on their gene expression profiles. The resolution parameter controls cluster granularity: higher values lead to more clusters, lower values lead to fewer clusters.

integ_norm_seurat_obj <- FindClusters(integ_norm_seurat_obj, cluster.name = "cca_clusters")
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 13207
## Number of edges: 536785
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9061
## Number of communities: 22
## Elapsed time: 2 seconds

8. UMAP

A non-linear dimensionality reduction method that visualizes data structure and patterns in two or three dimensions.

integ_norm_seurat_obj <- RunUMAP(integ_norm_seurat_obj, 
                                 reduction = "integrated.cca",
                                 dims = 1:30,
                                 reduction.name = "umap.cca")
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session

Visualizing UMAP results:

DimPlot(
  integ_norm_seurat_obj,
  reduction = "umap.cca",
  group.by = c("condition", "cca_clusters")
)

Merging Layers

integ_norm_seurat_obj <- JoinLayers(integ_norm_seurat_obj)

Save Processed Seurat Object

saveRDS(integ_norm_seurat_obj, "./output/03_data_preprocessing_and_integration/obj/integ_norm_seurat_obj.RDS")