Step 1. Formatting

Note

Consideration for the analysis.

The pipeline was built for scRNA-seq with scTCR-seq datasets.

QC documentation is being actively updated.

Prerequisite

Install STEGO. Copy all your raw files to the 0_Raw_files folder within your project.

Running STEGO.R

Now the the R environment is set up and the STEGO.R and its dependencies are installed, we can now run the application.

Run the following lines in R. A window will opened that runs the STEGO.R shiny R application.

require(STEGO.R)

runSTEGO()

You are now ready to process your scRNA-seq with scTCR-seq data!

Manual processing

Formatting the 10x Genomics, BD Rhapsody or Array to create the necessary documents for Seurat QC process, ClusTCR2, TCRex and TCR_Explore.

While this process was initially developed for TCRs, we have also included the capacity for pairing the BCR sequences.

Table 1. File to upload in the respective section.

Source

Inputs

10x Genomics

  • Features.csv.gz: Gene ID (SYMBOL)

  • Barcode.csv.gz: Cell ID (AGGATTT_1)

  • Matrix.mtx.gz: Gene expression.

  • Contig file (AIRR) format: contains the V(D)J sequences

BD Rhapsody

  • Features.csv.gz: Gene ID (SYMBOL)

  • Barcode.csv.gz: Cell ID (numeric)

  • Matrix.mtx.gz: expression captured

  • Sample_Tag.csv: multiplexing annotations

  • Contig file (AIRR) format: contains the V(D)J sequences

10x Genomics

There are several file formats that are compatible with the STEGO.R QC process. These includes the standard cell ranger output of barcode, features, matrix files as well as the separate contig file.

Under development: multi-TCR (requires unfiltered contig file)

Alternative text

BD Rhapsody

There are two format outputs of BD Rhapsody aligment of either a cellxgene (.csv.gz) or the barcode, features and matrix file. Unlike 10x Genomics, there is also an extra file called “Sample Tags”, which contains the multi-plex ID names. The “Sample Tags” file is required. However, if this file is not created (one sample in the experiment), it can be created under the “Create Sample Tags file”.

Additionally there are several formats for the TCR contig file: paired dominant, AIRR dominant (unpaired), AIRR unfilted. For the latter two file formates we include to filter to keep only the ‘paired’ and if BCR was present as well.

Alternative text

Array

Note

The **TCR_Explore** tool is designed for the interrogation of the TCR repertoire independent of gene expression data. For more information, visit TCR_Explore.

File outputs and storage for Step 1.

Upload the documents to the required sections depending on the technology and files available. Repeat for each of the samples within your project.

  1. Upload the files according to Table 1.

  2. Check that the files have uploaded in the “Uploaded data” tab.

  3. Add File Name, this will be added to the “orig.ident” and “Sample_Name”
    • (10x Genomics and Array, as this is added from the “Sample Tags” in BD Rhapsody) column and used through out the process. This name needs to be unqiue to the file.

    • If, at a latter point it needs to be updated, this can be done with the “Updated_label.csv”, located in 3_Analysis folder

Download to each of the 1_ folders e.g.,

  1. Download the TCRex output (containing functional Beta chains) to “1_TCRex” folder

  2. Download both the “meta-data” and “Matrix” in the SeuratQC into the “1_SeuratQC” folder

  3. Two files need to be downloaded per sample under the “ClusTCR” to the “1_ClusTCR” folder. They will have the prefix of AG_ and BD_ (Version 1.5)

  4. Download the TCR_Explore file “1_TCR_Explore” folder

or

  1. Download all files in order of TCRex, Seurat (matrix and meta.data), ClusTCR2 and TCR_Explore to the download folder.

  2. Move all files to the respective folders in the Project directory according 1_

Automated Step 1.

Currently available for 10x genomics

Within the raw data folder should be subfolders for each unique sample in the dataset. These subfolders should contain at least the barcodes, features, matrix and filtered_contig files.

  • 0_rawfiles
    • treatment1_SampleID1
      • barcode

      • features

      • Matrix

      • TCR files (needs to have the word contig to be found)

    • Sample1_treatment2

    • Sample2_treatment1

Alternative text

Use the preprocessing.R found in the R folder of the project_director

Alternative text