Step 3. Quality control

Note

Consideration for the analysis.: Steps 3b and 3c can be done either in the interface or with the command-line code in the “merging.batch.harmony.R” in the project directory.
Prerequisite: Completed the step 1. foramtting and have the two files (matrix.csv and meta.data.csv files in the 1_SeuratQC folder)

3a. Quality control of a Seurat object

This section describes the Seurat QC process for human samples with 10x Genomics.

Upload both the meta-data.csv file and the matrix file (.csv.gz, .csv or .h5).
Add the file name, which will be identical to the previous file name.
Select the data origin, as it will change the cut-off settings.
- 10x Genomics >2500 features (transcripts per cell), are deemed to be doublets.
- BD Rhapsody Immune panel > 150 features, are deemed to be doublets.
Check that the file was uploaded
Look at the first volcano plot “before” tab, as we need to check the cut-off for the MtDNA and rDNA cut-offs
- MtDNA standard cut-off less than 20%
- rRNA standard cut-off is greater than 5%. If there is no rRNA genes identifed, set value to 0 or the program will crash due to no cell’s remaining.
Once the settings have been configured, hit update. This will add the File name.

The user can then inspect the presence of the top variable features, elbow plot and Heatmaps. The latter two plots are to aid in setting the number of dimentions to use for the UMAP dimentional reduction. Default is set to 15. The resolutions are used for the unsupervised clustering. Note: we recommend using scGate for the annotation rather than the unsupervised clusters as it is not suitable for identifying subtle differences in T cell sub-populations.
In the side panel, hit the “Run clustering” button to run the dimentionality reduction.

In the “Add meta data”, first check that the file has uploaded correctly.

Now your file has completed the Seurat QC and has had the TCR sequences added to the meta-data. click the “Download Seurat” object and save it to the “2_SCobj” folder.

command-line equivalent

Once the user has identified the parameters for filtering, we recommend using the command-line function that automates this process.

###### Step 3a. Automated file filtering ------
# default parameteres: folder = "1_SeuratQC", dataset_type = "10x", species = "hs",  features.min = 200, features.max = 6000, percent.mt = 20, percent.rb = 5, dimension_sc = 15, resolution_sc = 1, limit_to_TCR_GEx = F, save_plots = T

# run defaults for 10x based on seurat.
automated_sc_filtering(dataset_type = "10x")

# BD Rhapsody (immune panel) -----
automated_sc_filtering(dataset_type = "BD_rap",features.min = 45,features.max = 160, percent.mt = 0, percent.rb = 0)

3b. Merging multiple Seurat objects and batch correction.

This section explains how to merge multiple .rds objects and correct for batch effects.

Merging

From the “2_SCobj” upload all the .rds files.

Download the merged.rds object

** For repeated file merging and annotating, the program window may disappear, so we recommend merging and annotating using command-line functions **

require(STEGO.R)
###### merging seurat object ------
# Check that you are in the correct working directory with your RDS files
merging_multi_SeuratRDS(set_directory = "2_scObj/", merge_RDS = F, pattern_RDS = ".rds$")

# once that is check, switch merge_RDS to TRUE or T
sc_merge <- merging_multi_SeuratRDS(set_directory = "2_scObj/", merge_RDS = T, pattern_RDS = ".rds$")

# merges the different layers in V5 of seurat
sc <- JoinLayers(sc_merge,  assay = "RNA")

# save the merged file - it will not have the scaled data and PCA stored any more due to the merging process.
saveRDS(sc_merge,"2_scObj/sc_merge.rds")

batch correction

Wait for the files to finish uploading before proceeding
Update the Project name e.g., BreastCancer
Hit the bottons in each tab in the following order:
- Run VariableFeatures
- Run Scale
- Run PCA
- Run Harmony Note: Harmony R package is used for the batch correction.
- Run Dimentional Reduction
Once completed click the “Download merged Seurat”

Note

* Visit Harmony publication for more detail.

Note: Depending on the number of files, and total number of cells will impact the speed of merging

If you have more than a few files to merge and annotate, we recommend using the following code:

require(STEGO.R)

sc <- readRDS("2_scObj/sc_merge.rds")

## perform the harmony batch correction ------
sc <- harmony_batch_correction_1_variableFeatures(file = sc)
sc <- harmony_batch_correction_2_Scaling(file = sc, Seruat_version = "V5")
sc <- harmony_batch_correction_3_PC(file = sc)
sc <- harmony_batch_correction_4_Harmony(file = sc)
saveRDS(sc,"2_scObj/sc_harmony.rds")

3c. Annotating the Seurat object

This section describes how to annotated the files either from the “default” and/or “custom” modules with scGate. If you have large files, we recommend using scripts and not the interface.

Upload the merged file to commence the annotation process.

This is what the window will print once the annotation is completed.

The annotations are added to the end of the meta data table in the Seurat object.

require(STEGO.R)

#### annotating Seurat object -----
sc <- readRDS("2_scObj/sc_harmony.rds")

sc <-     scGate_annotating(
          file = sc,
          TcellFunction = T,
          generic = T,
          immune_checkpoint = T,
          senescence = T,
          cycling = T,
          Th1_cytokines = T,
          TCRseq = T,
          threshold = 0.25, # change to 0.55 if you use the focused BD Rhapsody immuen panel
          reductionType = "harmony",
          chunk_size = 50000
        )

sc@meta.data$Cell_Index_old <- sc@meta.data$Cell_Index
sc@meta.data$Cell_Index <- rownames(sc@meta.data)

saveRDS(sc,"3_analysis/sc_anno.rds")

Default annotation models with scGate

The scGate threshold was set to 0.2 for full capture and 0.55 for the immune panel (<400 transcripts).

Human
- TcellFunction (see Annotation table)
- immune checkpoint
  
  Based on four markers: PDCD1 (PD-1), TIGIT, (LAG-3) and H
- Senescence
  
  Activated (B3GAT1+CD28+)
  
  activated-early/senescent (B3GAT1-CD28-)
  
  non-activated/early-activated (B3GAT1-CD28+)
  
  Terminally differentiated (B3GAT1+CD28-)
- Cycling (e.g., cell division)
  
  MKI67+TOP2A+
  
  This is the same as cell division
- Th1_cytokines
  
  combination of IFNG and TNF
- TCRseq (meta.data TCR-seq)
  
  MAIT cells (TRAV1-2 TRAJ33/12/20)
  
  iNKT cells ()
  
  possible CD1b/c restricted
  
  ab T cells (excludes the above categories)
  
  gd T cells
Mouse

Note: Under development.

Functional T cell annotations

Annotation Table

Cell type	Sub-classification	Transcriptional markers
CD4	CD4	CD8A-, CD8B-, CD4;CD40LG, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4-, CXCR3-, Naive, KLRB1-, FTH1-
	CD161	CD8A-, CD8B-, CD4;CD40LG, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4-, CXCR3-, Naive, KLRB1, IL32-
	CD161.IL32	CD8A-, CD8B-, CD4;CD40LG, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4-, CXCR3-, Naive, KLRB1, IL32
	FTH1	CD8A-, CD8B-, CD4;CD40LG, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4-, CXCR3-, Naive, KLRB1-, FTH1
	Naive	CD8A-, CD8B-, CD4;CD40LG, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4-, CXCR3-, Naive
	Tfh	CD8A-, CD8B-, CD4;CD40LG, FOXP3-, RORC-, CCR4-, IL21
	Th1	CD8A-, CD8B-, CD4;CD40LG, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4-, CXCR3
	Th17	CD8A-, CD8B-, CD4;CD40LG, FOXP3-, RORC
	Th2	CD8A-, CD8B-, CD4;CD40LG, FOXP3-, RORC-, CCR4
	Th22	CD8A-, CD8B-, CD4;CD40LG, FOXP3-, RORC-, CCR4-, IL21-, CCR10
	Th9	CD8A-, CD8B-, CD4;CD40LG, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4
	Treg	CD8A-, CD8B-, CD4;CD40LG, FOXP3
CD8a	CD8aa	CD8A, CD8B-, Cyto-, FOXP3-, KIR2DL1-, Naïve-, FTH1-
	Eff	CD8A, CD8B-, Cyto, KIR2DL1-
	Eff.KIR	CD8A, CD8B-, Cyto, KIR2DL1
	FTH1	CD8A, CD8B-, Cyto-, FOXP3-, KIR2DL1-, Naïve-, FTH1
	KIR	CD8A, CD8B-, Cyto-, FOXP3-, KIR2DL1
	Naive	CD8A, CD8B-, Cyto-, FOXP3-, KIR2DL1-,Naive
	Treg	CD8A, CD8B-, Cyto-, FOXP3
CD8ab	CD8ab	CD8A, CD8B, Cyto-, FOXP3-, KIR2DL1, RORC-, CCR4-, CCR10-, IRF4-, CXCR3-, Naive-, KLRK1-
	Eff	CD8A, CD8B, Cyto, RORC-, CCR4-, CCR10-, IRF4-, CXCR3-
	Eff.Tc1	CD8A, CD8B, Cyto, RORC-, CCR4-, CCR10-, IRF4-, CXCR3
	Eff.Tc2	CD8A, CD8B, Cyto, RORC-, CCR4
	Eff.Tc22	CD8A, CD8B, Cyto, RORC-, CCR4-, CCR10
	Eff.Tc9	CD8A, CD8B, Cyto, RORC-, CCR4-, CCR10-, IRF4
	Eff.Tc17	CD8A, CD8B, Cyto, RORC
	KIR	CD8A, CD8B, Cyto-, FOXP3-, KIR2DL1
	Naive	CD8A, CD8B, Cyto-, FOXP3-, KIR2DL1-, RORC-, CCR4-, CCR10-, IRF4-, CXCR3-, Naive
	NKG2D	CD8A, CD8B, Cyto-, FOXP3-, KIR2DL1-, RORC-, CCR4-, CCR10-, IRF4-, CXCR3-, Naive-, KLRK1
	Tc1	CD8A, CD8B, Cyto-, FOXP3-, KIR2DL1-, RORC-, CCR4-, CCR10-, IRF4-, CXCR3
	Tc17	CD8A, CD8B, Cyto-, FOXP3-, KIR2DL1-, RORC
	Tc2	CD8A, CD8B, Cyto-, FOXP3-, KIR2DL1-, RORC-, CCR4
	Tc22	CD8A, CD8B, Cyto-, FOXP3-, KIR2DL1-, RORC-, CCR4-, CCR10
	Tc9	CD8A, CD8B, Cyto-, FOXP3-, KIR2DL1-, RORC-, CCR4-, CCR10-, IRF4
	Treg	CD8A, CD8B, Cyto-, FOXP3
CD8b	CD8b	CD8A-,CD8B
DN	DN	CD8A-, CD8B-, CD4-, Cyto-, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4-, CXCR3-, Naive-, FTH1-
	Eff	CD8A-, CD8B-, CD4-, Cyto
	FTH1	CD8A-, CD8B-, CD4-, Cyto-, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4-, CXCR3-, Naive-, FTH1
	Naive	CD8A-, CD8B-, CD4-, Cyto-, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4-, CXCR3-, Naive
	Tfh	CD8A-, CD8B-, CD4-, Cyto-, FOXP3-, RORC-, CCR4-, IL21
	Th1	CD8A-, CD8B-, CD4-, Cyto-, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4-, CXCR3
	Th17	CD8A-, CD8B-, CD4-, Cyto-, FOXP3-, RORC
	Th2	CD8A-, CD8B-, CD4-, Cyto-, FOXP3-, RORC-, CCR4
	Th22	CD8A-, CD8B-, CD4-, Cyto-, FOXP3-, RORC-, CCR4-, IL21-, CCR10
	Th9	CD8A-, CD8B-, CD4-, Cyto-, FOXP3-, RORC-, CCR4-, IL21-, CCR10-, IRF4
	Treg	CD8A-, CD8B-, CD4-, Cyto-, FOXP3

Annotation Details

Cyto = GZMA, GZMB, PRF1, GNLY, NKG7 Naive = CCR7, SELL DN = double negative; Th = T helper, Tfh = follicular T helper, Tc = cytotoxic T cell; Treg = regulatory T cell

Simple T cell annotations

Simple Annotation Table

Cell type	Sub-classification	Panel
CD4	Treg	CD8A-, CD8B-, FOXP3, RORC- , CCR6-,CXCR6-
	Treg.IL17	CD8A-, CD8B-, FOXP3, RORC , CCR6, CXCR6
	Tcm-Tn	CD8A-, CD8B- , FOXP3- , RORC- , CCR6-,CXCR6-, GZMB-, IFNG-, GNLY-, NKG7-, CXCL13-, IL21-, IL7R, TCF7, CCR7
	Tfh	CD8A-, CD8B- , FOXP3- , RORC- , CCR6-,CXCR6-, GZMB-, IFNG-, GNLY-, NKG7-, CXCL13, IL21
	Th1	CD8A-, CD8B- , FOXP3- , RORC- , CCR6-,CXCR6-, GZMB, IFNG, GNLY, NKG7
	Th17	CD8A, CD8B, FOXP3-, RORC , CCR6,CXCR6,
	other	CD8A-, CD8B- ,CD4, CD40LG, FOXP3- , RORC- , CCR6-,CXCR6-, GZMB-, IFNγ-, GNLY-, NKG7-, CXCL13-, IL21-, IL7R-, TCF7-, CCR7-
CD8	Tcm-Tem	CD8A, CD8B,CD4-, CD40LG-, ITGAE-,GZMK
	Tcm-Tn	CD8A, CD8B,CD4-, CD40LG-, ITGAE-,GZMK-, GZMB-, PRF1-, GNLY-, SELL
	Teff	CD8A, CD8B,CD4-, CD40LG-, ITGAE-,GZMK-, GZMB, PRF1, GNLY,
	Teff.TS	CD8A, CD8B,CD4-, CD40LG-, ITGAE, ENTPD1
	Trm.Teff	CD8A, CD8B,CD4-, CD40LG-, ITGAE, ENTPD1-, CD160-
	IEL	CD8A, CD8B,CD4-, CD40LG-, ITGAE, ENTPD1-, CD160
	other	CD8A, CD8B,CD4-, CD40LG-, ITGAE-,GZMK-, GZMB-, PRF1-, GNLY-, SELL-
DN	DN	CD8A-, CD8B- ,CD4-, CD40LG-, FOXP3- , RORC- , CCR6-,CXCR6-, GZMB-, IFNγ-, GNLY-, NKG7-, CXCL13-, IL21-, IL7R-, TCF7-, CCR7-
DP	DP	CD8A, CD8B,CD4, CD40LG

Simple Annotation Table

Cyto = GZMA, GZMB, PRF1, GNLY, NKG7 Naive = CCR7, SELL DN = double negative; Th = T helper, Tfh = follicular T helper, Tc = cytotoxic T cell; Treg = regulatory T cell

Creating custom annotation stratergies

The user can add additonal annotations beyond the default.

There are nine custom databases that a user can amend. If needed, these can be added to an already annotated file.

Design your custom marker set

Check that the cells express the markers under the “marker check” tab.

To run, click the ‘View Feature plot’

Design the panel to add into the nine custom databases in custom_db folder

Do not rename the master file. However, the file will need to be altered to set the markers. The user will need to define the name as well as the singature (gene list). If multiple genes are used, separate with ; as per the scGate documentation.

The user can alter the names in front of the _scGate_Model.tsv, while the latter is used to find each type of annotation.

To use scGate effectively, the genes within a set are read as OR (either of the marker need to be present), while the levels equate to AND i.e., All markers need to be present. These levels then need to be defined as positive or negative under the use_as column.

Use a fresh run of the app, so the new database is read into, otherwise it will be the default.

Run just the custom annotation needed.

Check that there the signal overlaps with the gene expression, as per the feature plot. Once this is what you expect, download the object.

3d. Removing samples (optional)

This section allows the user to remove problematic cells based on any of the information in the meta data. This could be used to remove problematic samples post QC. Reasons for removal may be due to poor TCR-seq coverage. Also, you could limit the analysis to cells with both the GEx and TCR-seq present.

Additionally, this step can be used to extract certain samples. For instance, we extracted the

Under development: Downsampling. This section will be used for limiting to an even number of cell per annotation model and/or dataset origin to add a background data for smaller data to help with the FindMarker statistic.

3e. Re-formatting meta-data

In instances where the user has already done the QC, they will need to reformat their meta-data to make it compatible with STEGO.R formatting.

The currently available formatting is for scRepertoire that has:

TCR1 (Alpha v_gene j_gene) and TCR1_cdr3
TCR2 (BETA v_gene j_gene) and TCR2_cdr3