First Release, Latest online

# Allelic variation in class I HLA determines CD8+ T cell repertoire shape and cross-reactive memory responses to SARS-CoV-2

## INTRODUCTION

Elicitation of a robust and durable neutralizing antibody response following immunization of large sections of the population with approved SARS-CoV-2 vaccines is limiting viral transmission and decreasing mortality, providing hope that the global threat from the COVID-19 pandemic is diminishing. However, the appearance of new viral variants warrants continued vigilance. A more complete understanding of the underlying cellular mechanisms that regulate host immunity and contribute to long term protection is required. Infection with SARS-CoV-2 leads to an upper respiratory tract infection, which can be benign or even asymptomatic. If not controlled by the immune response, it can evolve into a lethal pneumonia with immunopathology due to excessive amplification of the innate inflammatory response, complicated by several extra-respiratory manifestations (1). While humoral responses play an important role in immunological control of infection, the generation of effective cellular immunity and expansion of cytotoxic CD8+ memory T cells is also required to eliminate virally infected cells as shown from the earlier SARS-CoV-1 epidemic, even in the absence of seroconversion (27).
Several recent studies have focused on the discovery of relevant SARS-CoV-2 epitopes in both CD4+ and CD8+ T cell responses, leveraging in silico predictions, stimulation/expansion with peptide pools (818), tetramer binding (19, 20), and analysis of presentation in vitro (21). Collectively, these studies identified a number of immunodominant epitopes derived from across the viral proteome including structural and non-structural proteins in canonical (820) and non-canonical open reading frames (21). Interestingly, some of these specificities were also detected in uninfected individuals, suggesting potential cross-reactivity from endemic human coronaviruses (HCoV) to which the population is routinely exposed (22), though a direct connection to pre-existing memory cells has not been established.
The breadth and nature of the cellular immune response to SARS-CoV-2 infection is driven by diversity in both T cell receptor (TCR) repertoire and human leukocyte antigen (HLA) genetics. Mammalian cells express up to six different HLA class I alleles that shape antigen presentation in disease, and allelic diversity has been associated with both disease susceptibility and outcome of viral infections (23, 24). There are divergent reports regarding HLA polymorphism and COVID-19 incidence and severity, although the major genome-wide association studies clearly show no dominant effect of the locus (2529). Together with genetic influences on HLA-associated antigen presentation, the clonal selection of TCRs that compose an individual’s repertoire contributes to the nature and dynamics of the antiviral response, including cellular cytotoxicity and memory formation. Interestingly, despite a potential TCR diversity of 1015 (30), several studies have described “public” T cell responses in COVID-19, where complementarity-determining region (CDR) sequences are conserved within and across individuals (18, 31). The extent to which TCR diversity, especially in the context of epitope specificity restricted to HLA, contributes to response is not well understood.

Here, we leverage an assay technology to elucidate, at single-cell resolution, the connection between T cell specificity, HLA variation, conserved features of paired α/β TCR repertoires, and cellular phenotype observed in CD8+ T cell responses to SARS-CoV-2 infection. We profiled 96,909,416 CD8+ T cells ex vivo across 78 samples from acute, convalescent, or unexposed individuals, and identified T cell specificity to 648 epitopes presented by four HLA alleles across the SARS-CoV-2 proteome, few of which are implicated by the current variants of concern. Estimated frequencies of epitope-specific CD8+ T cells observed in convalescent patients had a mean value of 0.01% and maximum around 1% of the total CD8+ T cell population. We observed that TCR repertoires were surprisingly public in nature, though we found a high degree of pre-existing immunity associated with a clonally diverse response to HLA-B*07:02, which can efficiently present homologous epitopes from SARS-CoV-2 and HCoVs. Transcriptomic analysis and functional validation confirmed a central memory phenotype and TCR cross-reactivity in unexposed individuals with HLA-B*07:02. Our data suggest an association between HLA genotype and the CD8+ T cell response to SARS-CoV-2, which may have important implications for understanding herd immunity and elements of vaccine design that are likely to confer long-term immunity to protect against SARS-CoV-2 variants and related viral pathogens.

## DISCUSSION

Here we presented a unified description of the CD8+ T cell response to SARS-CoV-2, highlighting the importance of HLA genetics, TCR repertoire diversity, and epitope-specific navigation through a complex transcriptomic phenotype at various stages of disease. In building a comprehensive map of immunodominant, HLA-restricted epitopes broadly derived from proteins across the entire SARS-CoV-2 proteome, we highlight how only some HLA haplotypes are associated with the existence of a pre-existing CD8+ T cell memory pool in unexposed individuals. We further show how HLA variation plays an important role in shaping the diversity of CD8+ T cell repertoires upon exposure to SARS-CoV-2, and that cellular phenotype and commitment to memory can be associated with epitope-specificity in the context of both SARS-CoV-2 and latent EBV infections.

The presence of SARS-CoV-2 reactive CD8+ T cells has been linked to milder disease (5, 11, 12), although the precise link between cellular immunity and host protection still remains to be further understood (7, 40, 41). We found that individuals carrying HLA-B*07 show a CD8+ T cell response that is dominated by pre-existing memory pools reactive to multiple SARS-CoV-2 epitopes, especially SPR-B07, which is likely induced by previous exposures to benign HCoVs. In contrast, the immunodominant responses in A*02 individuals (e.g., to YLQPRTFLL in A*02 (YLQ-A02, Spike) and LLY-A02) appear to be driven largely by the expansion of antigen-inexperienced SARS-CoV-2-specific T cells. It is interesting to note that CD8+ T cell cross-reactivity may be less widespread in unexposed individuals than for CD4+ T cross-reactivity, for which ~50% of unexposed individuals exhibited CD4+ T cell memory (16). Our data provides a basis for this limited representation of the CD8+ T cell repertoire in that only a subpopulation of individuals carrying a specific HLA allele would have these cross-reactive memory CD8+ T cells. The extent to which pre-existing memory specific to SPR-B07 contributes to protection would need to be explored with longitudinal studies spanning SARS-CoV-2 exposure.
The interplay between HLA-restricted epitope presentation and available TCR repertoire shapes the cellular response to SARS-CoV-2. There are few limited studies suggesting an influence of HLA genotype on COVID-19 severity (28, 4244). Large-scale studies evaluating T cell responses across a comprehensive HLA coverage per patient may help identify or deconvolute relationships between HLA genotype, like B*07 in this study, and protection against severe disease, ideally uncovering mechanism. Here, we observed an interesting connection between TCR repertoire diversity and HLA restriction. Responses seen in A*02, A*24, and A*01 were more often associated with “public” CDR3 motifs and consistent V gene segment usage in the α− and/or β− chains. In contrast, the dominant immune response in B*07 leveraged a significantly more diverse TCR repertoire. Several contributors to public TCR responses have been proposed, focusing on the physicochemical features of HLA-restricted peptides (e.g., “featureless” peptide-HLAs may drive a public response) and convergent recombination of TCR sequences (45). The method described in this work provides an ideal system to address this question. Perhaps counterintuitively, our results show that in the case of COVID-19, the largest pool of potentially protective, pre-existing cellular immunity is derived from one of the least public epitope-specific repertoires, possibly reflecting the influence of repeated acute infections with HCoVs throughout the life of the individuals.
Beyond the comprehensive deciphering of TCR specificity reported here, we also provided a detailed picture of the complex and dynamic transcriptional landscape of the CD8+ T response to SARS-CoV-2. Importantly, we were able to demonstrate that the pre-existing SPR-B07 reactivity, observed in ~80% of unexposed subjects with HLA-B*07, was predominantly associated with a central memory-like transcriptional profile (88% of SPR-B07-reactive T cells), confirming that it originates from prior exposures. In convalescent patients, we observed a much broader distribution of SPR-B07-reactive T cells spanning every functional state at proportions ranging from 5-29% (Data file S7). This is consistent with late contraction/early memory formation described for SARS-CoV-2 in a recent study (12), where cells spanned naïve, central memory, various classifications of effector memory, and terminally differentiated effector memory expressing RA (TEMRA). There was no evidence for a particularly frequent “exhausted” state among SARS-CoV-2-specific CD8+ T cells, as suggested elsewhere (46, 47) (acknowledging that the phenotypic state is a proxy for true reactivity testing, and that blood T cells may not fully reflect what happens in the lung). We also did not find evidence of “antigenic sin” resulting from HCoV pre-exposure (48) that would stifle an effective response to SARS-CoV-2-unexposed B*07 individuals. Whether HLA haplotype plays a role in the durability of the CD8+ T cell responses, especially to SARS-CoV-2 vaccines, may have impact for long-term protection across different ethnic groups and geographic regions.
Another interesting observation from this work, as noted by others (49), is that even at the height of infection or shortly after viral clearance, the cumulative anti-SARS-CoV-2 CD8+ T cell response barely reached the frequency of anti-influenza memory responses and was well below the frequencies that could be achieved by CMV-specific cells in the same individuals (Fig. S6). This was notably evident in the acutely infected individuals, at a time where the contribution of cytotoxic CD8+ T cells would have been most important. We acknowledge the caveat that peripheral frequencies were measured, and some degree of sequestration in viral target tissues, such as the lung, is likely to occur in acute patients. Yet, the response seems much more muted than the robust response observed in some other viral infections (50). This meager outcome was seen both for the cross-reactive “secondary responses” by memory T cells pre-primed by endemic HCoVs, as well as for the primary responses of truly SARS-CoV-2 species-specific CD8+ T cells amplified de novo. This suggests that the paucity likely does not result from a blocking of primary activation, but from a dampening of all specific CD8+ T cells. Consistent with this notion, the detection of influenza/EBV/CMV reactive cells were also lower in acute COVID-19 patients, compared to SARS-CoV-2 “naïve” individuals. It has been proposed that the lethal cytokine storm in severe COVID-19 stems from innate immune functions overcompensating for adaptive immune system failures (2).
Given the widespread lymphopenia observed in acute COVID-19, we considered the possibility of latent virus reactivation with the loss of protective CMV- and EBV-specific T memory pools. While we have no direct evidence of impact on disease outcome, we do observe a significant alteration of cell state within these subsets. While CMV-reactive cells remained within, though somewhat shuffled, the same effector/memory transcriptional phenotypes between unexposed and COVID-19 cohorts (including chronic stimulation, cytotoxic terminal effector, and terminal effector memory), we observed a striking shift of EBV-specific cells from chronic stimulation and central memory into the “CD127+ memory” state in COVID-19-exposed individuals. These cells expressed moderate to high levels of many naïve (IL7R, SELL, CCR7), memory (GZMK), and effector-associated genes (NKG7, CST7, GZMA), along with markers of activation/exhaustion (TIGIT, LAG3), making them particularly interesting and difficult to ascribe to conventional phenotype labels. Recently, two transcriptionally distinct stem-like CD8+ T cell memory states were described, one of which was functionally committed to a dysfunctional lineage (38). As these cell states were differentiated by many of the same markers observed in our “CD127+ memory” compartment, it would be interesting to determine to what extent these “CD127+ memory” cells, dominated by EBV-reactive pools, experience similar fates of dysfunction. We speculate that this phenotype may be a consequence of the particular inflammatory milieu of COVID-19 patients.

There are several limitations to this study. While we have investigated the CD8+ T cell response to epitopes predicted to be presented with high affinity in four common HLA alleles, the selection of HLAs and epitopes was not exhaustive. We assessed predominantly 9-mer epitopes from canonical open reading frames of a single SARS-CoV-2 variant. Subsequent studies may include a more comprehensive set of epitopes, broader coverage of HLAs, the exploration of non-canonical open reading frames, and inclusion of several SARS-CoV-2 variants. Another limitation of this study is the small sample size for specific HLA alleles and limited cell recovery for samples from acute patients. Our findings on response prevalence, public features of T cell repertoires, and T cell phenotype could be further substantiated or broadened with deeper sampling across genetically diverse populations and larger cell inputs. Related to this, there are limitations in the interpretation of response frequencies calculated in this work, especially in the cases of low cell input. The frequencies calculated are intended to provide a qualitative assessment of T cell response, allowing for comparisons across subjects, HLAs, and epitopes.

In conclusion, we leveraged a powerful single-cell technology to better elucidate the roles of HLA variation, TCR diversity, and cellular phenotypes in establishing pre-existing immunity to SARS-CoV-2. We observed the presence of a diverse and immuno-dominant nucleocapsid epitope-specific memory pool in subjects with HLA-B*07 but saw little evidence of similar reactivity in individuals with other HLA alleles. Outside of the HLA-B*07, the epitope-specific TCR repertoires observed were largely public in nature. We measured a diverse landscape of T cell phenotypes associated with SARS-CoV-2 infection, and also observed an influence on T cell repertoires reactive to persistent and latent infections with other viruses. Overall, this work provides a framework for the unified characterization of the cellular response to novel viral infections. The ability to understand the basis of cellular immunity to SARS-CoV-2 and other pathogens will provide insight for the continued assessment of immune surveillance, health security, and long-term protection from future respiratory pathogens.

## MATERIALS AND METHODS

Study design. The aim of this study was to identify features of CD8+ T cell responses to SARS-CoV-2 associated with disease state and HLA genetics, including immunodominant T cell epitopes, evidence of immune recall, and shared TCR sequence motifs. We used libraries of peptide-HLA tetramers with epitopes derived from across the SARS-CoV-2 proteome presented in four HLAs with high prevalence in North America. Samples from acute and convalescent patients, with HLAs matching the tetramer libraries, were acquired as they became available and screened in several batches alongside samples from unexposed subjects. A total of 27 acute, 28 convalescent, and 23 unexposed subjects were screened providing HLA-matched analysis for 43 A*02:01, 18 A*24:02, 17 B*07:02, and 9 A*01:01 samples.

Antigen library design. Antigenic peptide libraries were designed by scoring all possible 9mer peptides derived from the entire SARS-CoV-2 proteome (NC_045512.2) using netMHC-4.0 (32) in the HLA-A*02:01, HLA-A*01:01, HLA-A*24:02 or HLA-B*07:02 alleles. SARS-CoV-1 peptides that had evidence of T cell positive assays, obtained from the Immune Epitope Database (www.iedb.org; (51)), and that were highly homologous to their SARS-CoV2 counterparts within hamming-distance of 2 were converted to 9-mers. Additionally, SARS-CoV-2 peptides predicted to raise immunogenic responses by others were also included (52, 53). Finally, libraries included a set of well-defined viral epitopes from Cytomegalovirus, Epstein-Barr virus, and Influenza viruses (CEF peptide pool) that elicit T cell responses in the population at large. Antigenic peptides with 500 nM affinity or lower were then selected for inclusion (Data file S8).
Production of tetramer library pools. HLA-A*01:01, -A*02:01, -A*24:02 and HLA-B*07:02 extracellular domains were expressed in E. coli and refolded along with beta-2-microglobulin and ultraviolet (UV)-labile place-holder peptides STAPGJLEY, KILGFVFJV, VYGJVRACL and AARGJTLAM, respectively (54). A C-terminal sortase recognition sequence on the HLA was modified by sortase transpeptidation (55, 56) with a synthetic alkynylated linker peptide, featuring an N-terminal triglycine connected to propargylglycine via a PEG linker (Genscript, Piscataway, NJ). The modified HLA monomer was then purified by size exclusion chromatography (SEC). Full-length streptavidin with an N-terminal Flag tag and a C-terminal sortase recognition sequence and 6xHisTag was prepared by expression and purification from E. coli using immobilized metal affinity chromatography and SEC. Streptavidin was modified by sortase transpeptidation with a synthetic azidylated linker peptide, featuring an N-terminal triglycine connected to picolyl azide via a PEG linker (Click Chemistry Tools, Scottsdale, AZ). HLA tetramers were produced by mixing alkynylated HLA monomers and azidylated streptavidin in 0.5 mM copper sulfate, 2.5 mM BTTAA (2-(4-((Bis((1-(tert-butyl)-1H-1,2,3-triazol-4-yl)methyl)amino)methyl)-1H-1,2,3-triazol-1-yl)acetic acid) and 5 mM ascorbic acid for up to 4 hours on ice, followed by purification of highly multimeric fractions by SEC. Individual peptide exchange reactions containing 500 nM HLA tetramer and 60 uM peptide were exposed to long-wave UV (366 nm) at a distance of 2-5 cm for 30 min at 4°C, followed by 30 min incubation at 30°C. A biotinylated oligonucleotide barcode (Integrated DNA Technologies) was added to each individual reaction followed by 30 min incubation at 4°C. Individual tetramer reactions were then pooled and concentrated using 30 kDa molecular weight cut-off centrifugal filter units (Amicon). Tetramer production was quality controlled using SEC (Fig. S1a), sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) (Fig. S1b), and UV-mediated peptide exchange by assessing binding to peptide-expanded cell lines (Fig. S2).

Patient Samples. Peripheral blood mononuclear cells (PBMCs) from COVID-19 positive donors or unexposed donors were obtained from Precision 4 Medicine (USA), the Massachusetts Consortium on Pathogen Readiness (MassCPR, Boston, USA), or CTL (USA), all under appropriate informed consent. Patients were defined COVID-19 positive based on positive SARS-CoV-2 real-time reverse transcriptase–polymerase-chain-reaction (RT-PCR) using nasopharyngeal swabs. Patient samples were characterized as “acute” if collected while the patient was hospitalized and as “convalescent” if collected after recovery or when presenting mild disease. Samples from unexposed subjects were collected prior to December, 2019. A summary of patient samples used in this study are presented in Data file S2.

Cell Staining. PBMCs were thawed, and CD8+ T cells were enriched by magnetic-activated cell sorting (MACS) using a CD8+ T Cell Isolation Kit (Miltenyi) following the manufacturers protocol. The CD8+ T cells were then stained with tetramer libraries (Data file S8), matched to subject HLAs (Data file S2) and at 1nM final concentration for each member, in the presence of 2 mg/mL salmon sperm DNA in PBS with 0.5% BSA solution for 20 min. Cells were then labeled with anti-TCR antibody-derived tag (ADT, clone IP26, Biolegend, CA, USA) for 15 min followed by washing. Tetramer bound cells were then labeled with phycoerythrin (PE) conjugated anti-DKDDDDK-Flag antibody (BioLegend, CA, USA) followed by dead cell discrimination using 7-amino-actinomycin D (7-AAD). The live, tetramer positive cells were sorted (Fig. S3) using a Sony MA900 Sorter (Sony). When necessary, sorting gates were set liberally to enable sufficient cell recovery for single-cell sequencing.

Sample multiplexing. To ensure sufficient cell loading and subsequent cDNA production in single-cell sequencing, we used sample multiplexing for several experiments. When applied, samples were independently stained with tetramer libraries, labeled using custom anti-TCR ADTs with unique 15 base pair DNA barcodes (clone IP26, BioLegend, CA, USA), and sorted. ADT-labeled, sorted samples were combined prior to encapsulation and single-cell sequencing. In several cases, an expanded T cell line (Cellero Anti-MART-1, MA, USA) was labeled with a BV785 anti-CD8 antibody (BioLegend, CA, USA), stained using a tetramer for ELAGIGILTV in A*02:01, and subsequently mixed and co-sorted alongside samples interrogated for this study. This provided confirmation of tetramer staining, guidance for gating, and verification of the multiplexing strategy (Fig. S3). The anti-MART-1 T cells (TCR sequences provided in Data file S9) were excluded from any subsequent analyses.

Single-cell Sequencing. Tetramer positive cells were counted by Nexcelom Cellometer (Lawrence, MA, USA) using AOPI stain following manufacturer’s recommended conditions. When possible, 15,000 cells were targeted for encapsulation. Single-cell encapsulations were generated utilizing 5′ v1 Gem beads from 10x Genomics (Pleasanton, CA, USA) on a 10x Chromium controller and downstream TCR, Gene Expression, and Surface marker libraries were made following manufacturer recommended conditions. All libraries were quantified on a BioRad CFX 384 (Hercules, CA, USA) using Kapa Biosystems (Wilmington, MA, USA) library quantified kits and pooled at an equimolar ratio. TCRs, Gene Expression, surface markers, and tetramer generated libraries were sequenced on Illumina (San Diego, CA, USA) NextSeq550 instruments. Sequencing data were processed using the Cell Ranger Software Suite (Version 3). Samples were demultiplexed and unique molecular identifier (UMI) counts were quantified for TCRs, tetramers, and gene expression.

Single-cell Transcriptomic Analysis. Hydrogel-based RNA-seq data were analyzed using the Cell Ranger package from 10X Genomics (v3.1.0) with the GRCh38 human expression reference (v3.0.0). Except where noted, Scanpy (v1.6.0 (57)) was used to perform the subsequent single cell analyses. Any exogenous control cells identified by TCR clonotype were removed before further gene expression processing. Hydrogels that contain UMIs for less than 300 genes were excluded. Genes that were detected in less than 3 cells were also excluded from further analysis. Several additional quality control thresholds were also enforced. To remove data generated from cells likely to be damaged, upper thresholds were set for percent UMIs arising from mitochondrial genes (13%). To exclude data likely arising from multiple cells captured in a single drop, upper thresholds were set for total UMI counts based on individual distributions from each encapsulation (from 1500 to 3000 UMIs). A lower threshold of 10% was set for UMIs arising from ribosomal protein genes. Finally, an upper threshold of 5% of UMIs was set for the MALAT1 gene. Any hydrogel outside of any of the thresholds was omitted from further analysis. A total of 15,683 hydrogels were carried forward. Gene expression data were normalized to counts per 10,000 UMIs per cell (CP10K) followed by log1p transformation: ln(CP10K + 1).
Highly variable genes were identified (1,567) and scaled to have a mean of zero and unit variance. They were then provided to scanorama (v1.7, (58)) to perform batch integration and dimension reduction. The data were used to generate the nearest neighbor graph which was in turn used to generate a UMAP representation that was used for Leiden clustering. The hydrogel data (not scaled to mean zero, unit variance, and before extraction of highly variable genes) were labeled with cluster membership and provided to SingleR (v1.4.0, (59)) using the following references from Celldex (v1.0.0, (59)): Monaco Immune Data, Database Immune Cell Expression Data, and Blueprint Encode Data. SingleR was used to annotate the clusters with their best-fit match from the cell types in the references. Clusters that yielded cell types other than types of the T Cell lineage were removed from consideration and the process was repeated starting from the batch integration step. The best-fit annotations from SingleR after the second round of clustering and the annotation was assigned as putative labels for each Leiden cluster. Further clustering of transcriptomic data was performed across the genes shown in Fig. 5 using KMeans in sklearn (v0.24) with n_clusters set to 8. As the method has a preference to assign like-sized clusters, further consolidation of two central memory clusters was performed.
In order to provide corroboration for the SingleR best-fit annotations and further evidence as to the phenotype of the clusters, gene panels representing functional categories (Naïve, Effector, Memory, Exhaustion, Proliferation) were used to score each hydrogel’s expression profiles using scanpy’s “score_genes” function (57) which compares the mean expression values of the target gene set against a larger set of randomly chosen genes that represent background expression levels. The gene panels for each class were: Naïve – TCF7, LEF1, CCR7; Effector – GZMB, PRF1, GNLY; Memory – AQP3, CD69, GZMK; Exhaustion – PDCD1, TIGIT, LAG3; Proliferation – MKI67, TYMS. The gene expression matrix for all hydrogels were first imputed using the MAGIC algorithm (v2.0.4, (60). These functional scores were the only data generated from imputed expression values.

Scoring peptide-HLA-TCR interactions. Tetramer data analysis was performed using built-in methods of pandas (v1.2.5) and numpy (v1.20.3) in Python (v3.7.3). For each single-cell encapsulation, tetramer UMI counts (columns) were matrixed by cell (rows) and log-transformed. Duplicates of this matrix were independently Z-score transformed by row or column, and subsequently median-centered by the opposite axis (column or row), respectively (Fig. S7). For each peptide-HLA-cell interaction, this provided two scores – inter-tetramer (

$No alternative text available$

) and inter-cell (

$No alternative text available$

), which were used to calculate a classifier for unique CDR3 a/b clonotypes across

$No alternative text available$

cells as

$No alternative text available$

. Classifier thresholds for positive interactions were set at 40, 36, 50, and 65 for A*02:01, B*07:02, A*24:02, and A*01:01, respectively.

Frequency Calculation. The frequency of reactive T cells in parent CD8+ T cell populations was estimated using a calculation of compounded frequency by taking the product of the fraction of reactive cells in the sorted population and the fraction of cells sorted (Fig. S8). When sample multiplexing was applied, care was taken to include only de-multiplexed cells from the corresponding sample to determine reactive cell fraction.

TCR Network Analysis. TCR motif analysis was performed using scirpy (v0.6.1) with receptor_arms = “any,” metric = “alignment,” and default cutoff of 10. Once clusters were identified, sequence alignment was performed using the pairwise2 module in Biopython (v1.78) and visualized using logomaker (v0.8).

Recombinant TCR validation. Recombinant TCRs identified from patient samples were ordered from TWIST Biosciences in the pLVX-EF1a lentiviral backbone (Takara) as a bicistronic TCRb-T2A-TCRa vector. Viral supernatants from transfected HEK 293T cells were collected 48 and 72 hours after transfection and added to the parental TCRab−/− Jurkat J76 cell like (34) expressing CD8 and a nuclear factor of activated T cells (NFAT)-green fluorescent protein (GFP) reporter, referred to as J76-CD8-NFAT-GFP. Recombinant TCR surface expression was confirmed through flow cytometry by staining transduced J76-CD8-NFAT-GFP cells with anti-CD3-PE (Clone UCHT1) and anti-TCRab-allophycocyanin (APC) antibodies (Clone IP26).

To assess functional activity of recombinant TCRs, J76-CD8-NFAT-GFP expressing recombinant TCRs were incubated at a 1:1 ratio with the HLA-A*02:01+and HLA-B*07:02+ HCC 1428 BL (ATCC CRL-2327) lymphoblastic cell line, with a final concentration of 0.5% dimethylsulfoxide (DMSO, vehicle) or 50 uM of cognate peptide (New England Peptide, >95% pure). Cell mixtures were incubated in the Sartorius IncuCyte at 37°C, 5% CO2 overnight and analyzed for NFAT-GFP expression measured as total integrated intensity (GCU x mm2/image) at 12 hours after assay setup. At 16 hours, cells were removed from the IncuCyte and subsequently washed and blocked with staining buffer (BD 554656), stained with anti-CD3-PE-Cy7 (Clone UCHT1) and anti-CD69-APC (Clone FN50) antibodies, and analyzed using the Intellicyt iQue Screener Plus and FlowJo v10. CD69 activity was measured as percent positive of CD3+ cells.

## Acknowledgments

ACKNOWLEDGMENTS

Funding: The MGH/MassCPR COVID biorepository was supported by a gift from Ms. Enid Schwartz, by the Mark and Lisa Schwartz Foundation, the Massachusetts Consortium for Pathogen Readiness and the Ragon Institute of MGH, MIT and Harvard.

Author Contributions: Experimental design, J.M.F., D.L-E, A.D., G.L., V.R, J.L., C.D., V.R., A.M., M.N., K.S., T.H., A.C., C.B., D.C.P. Reagents and samples, D.L-E, C.T., J.L., C.D., A.H., V.R., Y.W., M.W., M.D., B.R., M.N., M.M., E.G., J.N., MGH COVID-19 Collection and Processing Team, T.M., P.B., W.G., J.S. Analysis, J.M.F, D.L-E., A.D., G.L., J.L, C.D.,K.H. A.S., G.R., M.N., M.S., K.S., T.H., A.W.G., A.K.S., A.C., C.B., D.C.P. Writing, J.M.F., A.D., A.K.S., A.C., C.B., D.C.P.

Competing Interests: J.M.F., D.L-E., A.D., G.L., C.T., J.L., C.D., A.H., V.R., Y.W., M.W., M.D., K.H., A.S., B.R., M.N., G.R., M.M., E.G., J.N., A.M., M.N., T.M., P.B., W.G., J.S., M.S., K.S., T.H., A.C. and D.C.P. are employees and/or stockholders of Repertoire Immune Medicines. A.W.G. reports compensation for SAB membership from Pandion Therapeutics and AresenalBio. C.B. reports compensation for consulting for Repertoire Immune Medicines. A.K.S. reports compensation for consulting and/or SAB membership from Merck, Honeycomb Biotechnologies, Cellarity, Repertoire Immune Medicines, Hovione, Third Rock Ventures, Ochre Bio, Relation Therapeutics, FL82, and Dahlia Biosciences unrelated to this work. A.K.S.’s involvement in this work is through his relationship with Repertoire Immune Medicines. J.M.F, D.L-E, V.R., D.C.P are inventors on an unpublished patent application owned by Repertoire Immune Medicines that covers aspects of the technology described in this work.

Data and materials availability: All data used to make the conclusions of this work are available in the manuscript or supplementary materials. The raw sequencing data have been deposited in NCBI’s Gene Expression Omnibus and are accessible through GEO Series accession number GSE188429. Tetramer reagents are available under a material transfer agreement from Repertoire Immune Medicines.

This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/. This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using this material.