【scRNA-seq】 How to Start scRNA-seq Analysis Using Seurat (Part 2)【Seurat】

【scRNA-seq】 How to Start scRNA-seq Analysis Using Seurat (Part 2)【Seurat】

This article is the second part of our hands-on tutorial with Seurat. If you haven’t worked through the first part, please start there.

In this session, we’ll engage in a hands-on experience with Seurat, an R package for scRNA-seq, and set the stage for initiating scRNA-seq. Since this post is based on Seurat’s hands-on guide, you’ll be able to grasp an overview of its usage. I encourage you to dive in!

Tested Environment

macOS Monterey (12.4), Quad-Core Intel Core i7, Memory 32GB

We’ll proceed based on the assumption that you’ve executed the code introduced in the first half. If you haven’t done so, attempting the latter half of the hands-on may result in errors, so please be cautious.

install.packages("dplyr")
install.packages("Seurat")
install.packages("patchwork")

library(dplyr)
library(Seurat)
library(patchwork)

pbmc.data <- Read10X(data.dir = "./filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)

plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot1 + plot2

pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

Before Executing PCA

Before employing dimensionality reduction techniques like PCA, we’ll start with a preprocessing step using linear transformation. Please execute the following command.

all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)

The command to run PCA on the scaled data is as follows.

pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))

If you obtain output similar to the following for the various components, it was successful.

PC_ 1 
Positive:  
CST3, TYROBP, LST1, AIF1, FTL, FTH1, LYZ, FCN1, S100A9, TYMP 	   
FCER1G, CFD, LGALS1, S100A8, CTSS, LGALS2, SERPINA1, IFITM3, SPI1, CFP 	   
PSAP, IFI30, SAT1, COTL1, S100A11, NPC2, GRN, LGALS3, GSTP1, PYCARD 
Negative:  
MALAT1, LTB, IL32, IL7R, CD2, B2M, ACAP1, CD27, STK17A, CTSW 	   
CD247, GIMAP5, AQP3, CCL5, SELL, TRAF3IP3, GZMA, MAL, CST7, ITM2A 	   
MYC, GIMAP7, HOPX, BEX2, LDLRAP1, GZMK, ETS1, ZAP70, TNFAIP8, RIC3 
PC_ 2 
Positive:  
CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1, HLA-DRA, LINC00926, CD79B, HLA-DRB1, CD74 	   
HLA-DMA, HLA-DPB1, HLA-DQA2, CD37, HLA-DRB5, HLA-DMB, HLA-DPA1, FCRLA, HVCN1, LTB 	   
BLNK, P2RX5, IGLL5, IRF8, SWAP70, ARHGAP24, FCGR2B, SMIM14, PPP1R14A, C16orf74 
Negative:  
NKG7, PRF1, CST7, GZMB, GZMA, FGFBP2, CTSW, GNLY, B2M, SPON2 	   
CCL4, GZMH, FCGR3A, CCL5, CD247, XCL2, CLIC3, AKR1C3, SRGN, HOPX 	   
TTC38, APMAP, CTSC, S100A4, IGFBP7, ANXA1, ID2, IL32, XCL1, RHOC 
PC_ 3 
Positive:  
(Output continues below...)

The results might appear a bit cluttered, so let’s narrow them down to five with the next command.

print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)

If you achieve an output like the one below, you’ve been successful. Ensure that PC_1 through PC_5 are displayed.

> print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)
PC_ 1
Positive: FTL, FTH1, COTL1, CST3, OAZ1
Negative: MALAT1, IL32, LTB, CCL5, CTSW
PC_ 2
Positive: FTL, TYROBP, S100A8, S100A9, FCN1
Negative: ACTG1, STMN1, TUBA1B, TYMS, ZWINT
PC_ 3
Positive: CD74, HLA-DRA, HLA-DPB1, HLA-DQB1, HLA-DQA1
Negative: PPBP, GNG11, SPARC, PF4, AP001189.4
PC_ 4
Positive: CD74, HLA-DQB1, HLA-DQA1, HLA-DRA, HLA-DQA2
Negative: NKG7, GZMA, GNLY, PRF1, FGFBP2
PC_ 5
Positive: ZWINT, KIAA0101, RRM2, HMGB2, AQP3
Negative: NKG7, CD74, HLA-DQA1, GNLY, SPON2

With this, the preparations for visualizing PCA are complete. Next, we’ll delve into the methods for visualizing the components.

Visualizing PCA

First, let’s determine which genes contribute to each component. Execute the following command.

VizDimLoadings(pbmc, dims = 1:2, reduction = "pca")

You can see contributions starting from the genes with the highest contribution rates.

For a typical scatter plot using PC_1 and PC_2, you’ll employ the DimPlot.

DimPlot(pbmc, reduction = "pca")

The output will be as follows.

With the DimHeatmap function, you can inspect the main components in a heatmap style. The dims argument allows you to control the principal component you want to display. The following is a code to display PC1.

DimHeatmap(pbmc, dims = 1, cells = 500, balanced = TRUE)

Determining the ‘Dimensionality’ of the Dataset

Using JackStrawPlot, you can calculate and visualize the p-values for each principal component.

*Note: Executing this code might take some time.

pbmc <- JackStraw(pbmc, num.replicate = 100)
pbmc <- ScoreJackStraw(pbmc, dims = 1:20)

JackStrawPlot(pbmc, dims = 1:15)

For ranking the contribution of each component, ElbowPlot is available. It’s handy for visualizing how much each component contributes.

ElbowPlot(pbmc)

From the output, you can observe an ‘elbow’ around PC9-10. This indicates that the majority of the true signal is explained by the first 10 PCs.

Cell Clustering (UMAP/tSNE)

We’re now on to cell clustering. Many might associate scRNA-seq analysis with nonlinear dimensionality reduction using UMAP/tSNE. Let’s execute the following command to proceed with UMAP.

# Cell Clustering
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)

# Executing UMAP
pbmc <- RunUMAP(pbmc, dims = 1:10)

The following is a detailed discussion of each function. It might come off as complex, so feel free to skip over it if you wish. A basic understanding that the module is being optimized for clustering is sufficient.

FindNeighbors Function

This function constructs a KNN graph based on the Euclidean distance in PCA space. It refines the edge weights between any two cells based on shared overlap in their local neighborhood, which is measured using the Jaccard similarity. It takes the previously defined dimensions of the dataset as input (the first 10 PCs).

FindClusters Function

For cell clustering, we then apply modularity optimization techniques like the Louvain algorithm (by default) or SLM. This function iteratively groups cells with the aim of optimizing the standard modularity function. It includes a resolution parameter to set the “granularity” of downstream clustering – a larger value will result in more clusters.

The results can be visualized using the DimPlot.

DimPlot(pbmc, reduction = "umap")

And with that, you’ve successfully conducted a UMAP analysis!

If you’ve reached this far, you’ve essentially grasped the basics of scRNA-seq analysis using Seurat. The hands-on guide introduces further analysis techniques, so if you’re interested, I’d highly recommend giving it a shot.

If you encounter errors even after executing the commands as instructed…

Should you face errors despite following the command instructions, it’s likely that the input value to pbmc is incorrect. Do not panic if errors arise; simply start over with quality checking, normalization, and re-execute the commands in order. By inputting in the correct sequence, you should proceed without errors.

In Conclusion

How did you find the tutorial? Hopefully, you’ve found that using Seurat for scRNA-seq analysis is simpler than anticipated. Next, I’m planning to pen an article on how to actually find and work with scRNA-seq data. Stay tuned!