Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors

Abstract

CRISPR–Cas base-editor technology enables targeted nucleotide alterations, and is being increasingly used for research and potential therapeutic applications1,2. The most widely used cytosine base editors (CBEs) induce deamination of DNA cytosines using the rat APOBEC1 enzyme, which is targeted by a linked Cas protein–guide RNA complex3,4. Previous studies of the specificity of CBEs have identified off-target DNA edits in mammalian cells5,6. Here we show that a CBE with rat APOBEC1 can cause extensive transcriptome-wide deamination of RNA cytosines in human cells, inducing tens of thousands of C-to-U edits with frequencies ranging from 0.07% to 100% in 38–58% of expressed genes. CBE-induced RNA edits occur in both protein-coding and non-protein-coding sequences and generate missense, nonsense, splice site, and 5′ and 3′ untranslated region mutations. We engineered two CBE variants bearing mutations in rat APOBEC1 that substantially decreased the number of RNA edits (by more than 390-fold and more than 3,800-fold) in human cells. These variants also showed more precise on-target DNA editing than the wild-type CBE and, for most guide RNAs tested, no substantial reduction in editing efficiency. Finally, we show that an adenine base editor7 can also induce transcriptome-wide RNA edits. These results have implications for the use of base editors in both research and clinical settings, illustrate the feasibility of engineering improved variants with reduced RNA editing activities, and suggest the need to more fully define and characterize the RNA off-target effects of deaminase enzymes in base editor platforms.

Main

APOBEC1—which is present in the widely used BE33 and BE44 CBEs—is well known as a DNA cytosine deaminase8,9, but the earliest studies of this enzyme initially characterized its RNA cytosine deaminase activity10,11 (Fig. 1a). Subsequent work showed that endogenous expression or overexpression of APOBEC1 can lead to modification of cytosines in dozens of transcripts other than APOB and in multiple cell types12,13,14,15. However, although CBEs containing rat APOBEC1 (rAPOBEC1) have been used to edit DNA sequences in a variety of organisms and cell types1, published reports have not—to our knowledge—focused on whether these editors might also cause C-to-U changes in RNA (Fig. 1a).

Fig. 1: Transcriptome-wide off-target C-to-U RNA editing induced by BE3 in HepG2 cells.

a, Schematic of known rAPOBEC1 enzymatic activities (left) and known and unknown activities of a CBE containing rAPOBEC1 (right). Orange, rAPOBEC1; blue, nCas9; violet lines, gRNA; green circle, UGI (uracil DNA glycosylase inhibitor); yellow halos, cytosine deamination. ss, single-stranded; ds, double-stranded. b, Heat map showing on-target DNA editing efficiencies of BE3 and nCas9–UGI–NLS (control) within the editing window of RNF2 site 1 (bases numbered with 1 as the most PAM-distal). In this (and all main Figs.), C12 in the spacer is not shown because of its relatively low editing efficiency, but comprehensive quantification of editing efficiencies of all spacer cytosines are shown in Extended Data Fig. 1a. Rep., experimental replicate. c, Jitter plots showing efficiencies of C-to-U edits (y-axis) identified from RNA-seq experiments with BE3 expression or a GFP negative control (see Methods). n, total number of modified cytosines identified. d, Manhattan plot of modified cytosines across the transcriptome for replicate 1 from c. n, total number of modified cytosines. e, Percentages of expressed genes in each RNA-seq replicate with at least one edited cytosine. Numbers of expressed genes are shown within bars. f, Sequence logos from edited cytosines identified in each RNA-seq replicate. RNA-seq data were generated using cDNA, so every T should be considered a U in RNA. g, Bottom left, scatter plot correlating RNA editing rates of 54,818 cytosines edited by BE3 with DNA editing rates as determined by WES (n = 3 biologically independent samples, pooled data). Top and right, histograms depict fractions of edited cytosines on RNA (top x-axis) or DNA (right y-axis).

To test whether CBEs might also deaminate RNA cytosines, we investigated the activity of BE3 in human liver-derived HepG2 cells. We co-transfected HepG2 cells with plasmids that encode BE3 or a negative control nickase Cas9 (nCas9)–UGI–NLS fusion (that is, BE3 without rAPOBEC1) and a guide RNA (gRNA) targeted to a human RNF2 gene site (see Methods). Because HepG2 cells are not efficiently transfected with larger plasmids (data not shown), we assessed genomic DNA and total RNA following fluorescence-activated cell sorting for the highest 5% of GFP signal (BE3 and nCas9–UGI–NLS are encoded on our plasmids as co-translational P2A fusions to eGFP; see Methods). Quadruplicate experiments confirmed efficient on-target DNA editing at the RNF2 site with BE3 expression (mean frequencies of 41% and 50% at positions C3 and C6, respectively; Fig. 1b, Extended Data Fig. 1a, Supplementary Table 1). To assess RNA editing, we used targeted RNA amplicon sequencing (see Methods) to examine cytosines in the human APOB transcript (at position 6666 and other positions shown to be deaminated by APOBEC114,15,16,17). The results showed that BE3 edited many of these RNA cytosines, with the most efficient editing observed at C6666 (Extended Data Fig. 1b, Supplementary Table 1). Targeted DNA amplicon sequencing of the genomic APOB locus confirmed that C-to-U RNA alterations were not due to DNA edits (Extended Data Fig. 1b, Supplementary Table 1).

We assessed transcriptome-wide RNA editing with BE3 in the same transfected HepG2 cells using RNA sequencing (RNA-seq; about 70–100 million reads per library) of total RNA. Using Genome Analysis Toolkit (GATK) Best Practices for variant calling and further downstream filtering, we identified RNA base positions that were altered in cells that expressed BE3 but not in control cells that expressed nCas9–UGI–NLS (see Methods). This unbiased analysis showed that the vast majority (99.986% to 99.995%) of alterations were C-to-U changes (Extended Data Table 1), with tens of thousands of such edits observed in all four replicates and very few in negative control cells expressing only GFP (Fig. 1c, Extended Data Fig. 1c, Supplementary Table 2). The C-to-U alterations we identified were induced with frequencies ranging from 0.07% to 81.48% (mean of 16.42% with 95% CI of 16.40–16.45%; Fig. 1c, Extended Data Fig. 1c, Supplementary Table 2) and were distributed throughout the transcriptome (Fig. 1d, Extended Data Fig. 1d, Supplementary Table 2). Notably, 43–52% of the genes detected in these RNA-seq experiments contained at least one C-to-U edit (Fig. 1e). We found alterations in coding sequence (with a mean of 19.1% of all C edits creating missense or nonsense mutations) and non-coding sequence (with a substantial percentage in 3′ untranslated regions (UTRs) but also some in splice sites and 5′ UTRs; Extended Data Fig. 1e, Supplementary Table 3). Thirty-six per cent of edited C positions were found in three or four of the replicates, and these bases generally showed a higher range of editing frequencies than those found in only one or two replicates (Extended Data Fig. 1f), which suggests editing of particular cytosines with BE3. Consistent with this hypothesis, edited RNA cytosines were found preferentially within the consensus motif ACW (W denotes A or U) in all four replicates (Fig. 1f), matching a sequence previously identified for wild-type APOBEC18,13. Using whole-exome sequencing (WES), which captures both exons and UTRs, we were able to sequence with 100× coverage (in pooled triplicates) 49% of cytosines identified as edited on RNA; 98.48% of these cytosines showed no evidence of DNA editing (Fig. 1g, Supplementary Table 4), which confirms that the edits found in the RNA-seq experiments were not caused by editing of the corresponding DNA sequences.

To test whether transcriptome-wide RNA editing could also occur in a human cell line from a tissue source other than the liver, we examined BE3 with two gRNAs (targeted to sites in the human RNF2 and EMX1 genes) in human embryonic kidney (HEK293T) cells. As expected, we found efficient on-target DNA editing by BE3 with RNF2 and EMX1 gRNAs (each performed in triplicate; Extended Data Fig. 2a, b, Supplementary Table 5). RNA-seq again revealed tens of thousands of C-to-U edits induced with BE3 and each gRNA in all replicates, with editing frequencies at these Cs ranging from 0.07% to 66.7% (mean of 14.22% with 95% CI of 14.20–14.24%; Extended Data Fig. 2c, d, Extended Data Table 1, Supplementary Tables 6, 7). Edits were distributed across the transcriptome in both coding and non-coding regions (Extended Data Figs. 2e, 3a, Supplementary Tables 69) with 38–52% and 47–51% of expressed genes having at least one C-to-U edit for the RNF2 and EMX1 gRNAs, respectively (Extended Data Fig. 3b). A substantial percentage of edited cytosines was found in two or three of the replicates for each gRNA (31% and 34% for the RNF2 and EMX1 gRNAs, respectively; Extended Data Fig. 3c). RNA edits again occurred within the consensus motif ACW (Extended Data Fig. 3d) and 38% of cytosines were edited by both the RNF2 and EMX1 gRNAs (Extended Data Fig. 3e).

To examine the dose-dependence of BE3-mediated RNA editing, we transfected HEK293T cells and sorted for cells with the highest 5% of GFP signal. For these experiments, we assessed BE3 with three gRNAs: RNF2, EMX1 and a third targeted to a site that does not occur in the human genome (non-targeted gRNA). The efficiency of on-target DNA editing by BE3 with the RNF2 and EMX1 gRNAs (Extended Data Fig. 4a, b, Supplementary Table 10) was higher than in the HEK293T experiments described above (Extended Data Fig. 2a, b). In addition, compared with the earlier experiments in which all GFP-positive cells were sorted, we observed more C-to-U edits (means of 149,973, 124,428 and 145,028 with the RNF2, EMX1 and non-targeted gRNAs, respectively; Extended Data Fig. 4c, Extended Data Table 1, Supplementary Tables 1113) with higher mean frequencies throughout the transcriptome (26%, 27% and 25%, respectively; Extended Data Fig. 4d, e, Supplementary Tables 1113) and with a greater percentage and higher absolute number of edits occurring in coding sequences (Extended Data Fig. 5a, Supplementary Tables 1416). A higher percentage of expressed genes had at least one C-to-U edit (means of 58%, 51% and 58% for the RNF2, EMX1 and non-targeted gRNAs, respectively; Extended Data Fig. 5b). As before, edits occurred within the consensus motif ACW in all replicates (Extended Data Fig. 5c). Forty-two per cent of cytosines were edited with all three gRNAs (including the non-targeted gRNA; Extended Data Fig. 5d) and replicates performed with the same gRNA did not seem to share more off-target edits than those performed with different gRNAs (Extended Data Fig. 5e), again suggesting that RNA edits induced with BE3 are gRNA-independent. Using WES, we sequenced 60% of the cytosines that were edited in RNA at 100× coverage (pooled data from triplicates, from the experiments with the RNF2 gRNA) and confirmed that 98.52% of these cytosines showed no DNA editing (Extended Data Fig. 5f, Supplementary Table 17).

To engineer selective curbing of unwanted RNA editing (SECURE) variants that would show reduced RNA editing but retain efficient on-target DNA base editing, we screened 16 BE3 editors with various APOBEC1 mutations that have previously been reported to reduce RNA C-to-U editing18,19,20,21,22. We identified two variants (BE3-R33A and BE3-R33A/K34A) that had on-target DNA editing efficiencies comparable to that of wild-type BE3 (data not shown) but that also showed substantially reduced RNA editing even when highly expressed in HEK293T cells (Extended Data Fig. 6a, Extended Data Table 1, Supplementary Table 18). To characterize RNA editing by these two variants more rigorously, we performed RNA-seq experiments using the RNF2 gRNA in transfected HEK293T cells sorted for high expression of wild-type BE3, BE3-R33A, BE3-R33A/K34A or a catalytically impaired BE3-E63Q mutant19. For these studies, we used high-expression conditions (top 5% sorting) to enable the most sensitive detection of any residual RNA editing by these variants. We observed marked reductions in the number of transcriptome-wide C-to-U edits, with BE3-R33A expression inducing only hundreds, and BE3-R33A/K34A expression inducing 26 or fewer, such edits (Fig. 2a, b, Extended Data Table 1, Supplementary Tables 1922). The number of edits observed with BE3-R33A/K34A was similar to the baseline number seen with the catalytically impaired BE3-E63Q mutant (Fig. 2a). The on-target DNA editing efficiency of the variants was comparable to that of wild-type BE3 with the RNF2 gRNA in HEK293T cells (Extended Data Fig. 6b, c, Supplementary Table 23).

Fig. 2: SECURE BE3 variants with substantially reduced RNA editing activities but comparable and more-precise DNA editing in HEK293T cells.

a, Jitter plots from RNA-seq experiments in HEK293T cells showing RNA cytosines modified by expression of wild-type (WT) BE3, BE3-R33A, BE3-R33A/K34A or BE3-E63Q. n, total number of modified cytosines identified. b, Manhattan plots showing the distribution of modified cytosines induced with BE3-R33A and BE3-R33A/K34A from replicate 2 in a, overlaid on modified cytosines induced with wild-type BE3 (the wild-type BE3 data are the same in the top and bottom plots). n, total number of modified cytosines identified. c, Heat maps of on-target DNA base editing efficiencies of nCas9–UGI–NLS (control), wild-type BE3, BE3-R33A and BE3-R33A/K34A in HEK293T cells with 12 different gRNAs (cells transfected and collected without sorting). Bases shown are within the editing window of the on-target site (numbered with 1 as the most PAM-distal position).

More-extensive characterization of BE3-R33A and BE3-R33A/K34A with 12 gRNAs designed for various human genes in HEK293T cells revealed that these variants generally edited on-target sites with efficiencies at least comparable to those of wild-type BE3, but with higher precision (Fig. 2c, Extended Data Fig. 7a, Supplementary Table 24). These experiments were performed without sorting for GFP expression so that DNA editing activities were assessed without the benefit of higher BE3 variant expression used in the RNA-seq studies described above. Comparable, or sometimes higher, efficiencies of base editing were observed at 10 of the 12 sites with BE3-R33A and at 8 of the 12 sites with BE3-R33A/K34A. The BE3-R33A variant showed a narrowed editing window, with maximum editing at cytosines in spacer positions 5–7 (weaker on C4 and C8), whereas the BE3-R33A/K34A variant showed an even-more restricted editing window (maximum editing on C5 and C6 with weaker editing on C7). Also, our data suggest a relatively stringent 5′T requirement for the BE3-R33A/K34A variant (Fig. 2c, Extended Data Fig. 7a, Supplementary Table 24). Testing of BE3-R33A and BE3-R33A/K34A with the RNF2 gRNA in HepG2 cells also showed a marked reduction in RNA edits throughout the transcriptome (Extended Data Fig. 7b, c, Extended Data Table 1, Supplementary Tables 2527) but on-target DNA editing rates similar to those of wild-type BE3 with both variants (Extended Data Fig. 7d, e, Supplementary Table 28). The altered precision of the two SECURE variants is summarized in Extended Data Fig. 7f.

Given the widespread induction of RNA edits by CBEs, we investigated whether the more-recently described adenine base editors (ABEs) could also induce RNA edits. ABEs induce targeted A-to-I DNA alterations and consist of nCas9 fused to a linked heterodimer of Escherichia coli TadA adenosine deaminases (one wild-type and one evolved to deaminate A-to-I in DNA7). Wild-type E. coli TadA normally deaminates adenine 34 (A34)23,24 in E. coli transfer (t)RNAArg2, but the TadA variant present in ABEs was not specifically evolved for loss of RNA editing activity7. We co-transfected HEK293T cells in triplicate with plasmids encoding ABEmax (GenScript codon-optimized ABE7.10 with bipartite nuclear localization signals (NLSs) at the N and C termini25) or a negative control (NLS–nCas9–NLS; that is, ABEmax lacking TadA domains) (each fused to P2A–eGFP–NLS) and the HEK site 2 gRNA (see Methods and Supplementary Table 34). In cells sorted for the top 5% of GFP expression, we observed efficient on-target DNA adenine editing at HEK site 2 (mean frequencies of 87% at A5 and 24% at A7; Fig. 3a, Extended Data Fig. 8a, Supplementary Table 29). RNA-seq showed that tens of thousands of RNA base positions were altered in cells expressing ABEmax compared to matched negative control cells expressing NLS–nCas9–NLS, with nearly all (99.76–99.83%) being A-to-G edits on cDNA that was reverse transcribed from RNA (which we presume result from A-to-I alterations on RNA; Extended Data Table 1, Supplementary Table 30). The frequencies of the adenine edits that we identified with ABEmax ranged from 0.1% to 100% (mean of 22.7% with 95% CI of 22.6–22.8%; Fig. 3b, Extended Data Fig. 8b, Supplementary Table 30) and these edits were distributed throughout the transcriptome (Fig. 3c, Extended Data Fig. 8c, Supplementary Table 30). The A-to-I edits we identified were found in coding and non-coding sequences (Extended Data Fig. 8d, Supplementary Table 31). Among genes with detectable RNA transcripts, 51–59% had at least one adenine edit (Extended Data Fig. 8e). Forty-three per cent of edited adenine positions were found in two or three replicates, and these bases showed higher mean editing frequencies than those found in only one replicate (Extended Data Fig. 8f). In addition, edited adenines lay preferentially within a consensus UA motif (Fig. 3d) that matches the tRNA substrate of wild-type E. coli TadA. Using WES, we were able to sequence at 100× coverage (pooled data from triplicates) 88% of adenines that were edited on RNA and found that 95.39% of these were not edited on DNA (Extended Data Fig. 8g, Supplementary Table 32).

Fig. 3: ABEmax induces transcriptome-wide off-target A-to-I RNA editing in HEK293T cells.

a, Heat map of on-target DNA base editing efficiencies of ABEmax and NLS–nCas9–NLS (control) in HEK293T cells with the HEK site 2 gRNA. Bases shown are within the editing window of the on-target site (numbered with 1 as the most PAM-distal position). b, Jitter plots derived from RNA-seq experiments showing RNA adenines modified by ABEmax expression with the HEK site 2 gRNA or a GFP control (see Methods). n, total number of modified adenines identified. c, Manhattan plot showing the distribution of modified adenines across the transcriptome for replicate 3 from b. n, total number of modified adenines identified. d, Sequence logos derived from edited adenines in each RNA-seq replicate. Analysis done using RNA-seq data generated from cDNA; every T depicted should be considered a U in RNA.

The observation that both CBEs and ABEs induce extensive RNA edits has important implications for the application of these technologies in both research and clinical settings. The confounding effects of unwanted RNA editing will need to be accounted for in research studies, especially if stable base-editor expression (even in the absence of a gRNA) is used. For therapeutic applications in humans, the duration and level of base-editor expression should be minimized. Our data suggest that safety assessments for human therapeutic applications may need to include an analysis of the potential functional consequences of transcriptome-wide RNA edits. The short timeframe of our transient transfection experiments did not permit us to assess the longer-term functional consequences of widespread RNA editing, but our initial in silico and experimental analyses suggest that some edits may have phenotypic effects on cells (Supplementary Discussion, Supplementary Methods, Extended Data Fig. 9a).

Our SECURE APOBEC1-based CBE variants provide an important proof-of-principle that unwanted RNA editing can be preferentially reduced. No structural information is currently available for rAPOBEC1, but a predicted model that we generated suggests that the amino acid positions that were mutated in our SECURE variants do not lie directly adjacent to the deaminase catalytic residues in 3D space (Extended Data Fig. 9b). The higher precision of on-target DNA editing observed with our SECURE variants reduces the targeting range, but it is likely that this limitation can be overcome by using engineered Cas9s with altered protospacer adjacent motif (PAM) recognition specificities. In addition, we expressed the SECURE BE3 variants from plasmids and it will therefore be important in future experiments to assess their activities when delivered as RNA or ribonucleoprotein complexes to other cell types, such as primary cells. Another important question to address is whether SECURE variants might also be engineered for ABEs. In summary, our results show that base editor off-target effects can be more multi-dimensional than those generated by gene-editing nucleases, and illustrates how such effects can be defined and minimized for research and therapeutic applications.

Methods

Molecular cloning

Expression plasmids were constructed using isothermal assembly (or Gibson Assembly, NEB), cloning PCR-amplified DNA sequences with matching overlaps into a CAG expression vector for BE3 constructs (AgeI–NotI–EcoRV digest of SQT817, Addgene 53373)26 or a CMV expression vector (AgeI–NotI digest of pCMV-BE1, Addgene 73019) for ABEmax-derived constructs. PCR was conducted using Phusion High-Fidelity DNA Polymerase (NEB). Templates for BE3 cloning PCRs were pCMV-BE3 (Addgene 73021) and pCMV-BE3-P2A-eGFP (BPK4335). pCMV-ABEmax-P2A-eGFP-NLS (Addgene 112101) was the only ABE plasmid used as a template. Cas9 gRNAs were cloned into the pUC19-based entry vector BPK1520 (Addgene 65777, BsmBI digest) under the control of a U6 promoter. Plasmids for transfection were prepared with QIAGEN Plasmid Maxi and Plus Maxi kits (Qiagen). A list of all cloned CBE and ABE constructs and controls with nucleotide and amino acid sequences can be found in Supplementary Table 33. Guide RNA oligonucleotides used in this study are listed in Supplementary Table 34.

Human cell culture

HEK293T cells (ATCC CRL-3216) were cultured and passaged in Dulbecco’s modified Eagle’s medium (DMEM, Gibco) supplemented with 10% (v/v) fetal bovine serum (FBS, Gibco) and 1% (v/v) penicillin–streptomycin (Gibco). Cells were passaged at ~80% confluency every 2–3 days to maintain an actively growing population and avoid anoxic conditions. HepG2 cells (ATCC HB-8065) were cultured and passaged in Eagle’s minimum essential medium (EMEM, ATCC) supplemented with 10% (v/v) FBS and 0.5% (v/v) penicillin–streptomycin. Cells were passaged at ~80% confluency every 3–4 days. Both cell lines were used for experiments until passage 20 for HEK293T and passage 12 for HepG2, and both cell lines were maintained at 37 °C with 5% CO2. Cells were authenticated via STR profiling by the supplier (ATCC). Supernatants of cell cultures were analysed every two weeks using MycoAlert PLUS (Lonza) and cells continuously tested negative.

Cell transfections

HEK293T (6–7 × 106 cells) or HepG2 (15 × 106 cells) cells were seeded into 150-mm TC-treated culture dishes (Corning) 20–24 h before transfection to yield ~60–80% confluency on the day of transfection. Cells were then transfected with 37.5 µg base editor or negative control (nCas9(D10A)-UGI-NLS(SV40) or bpNLS-32AA linker-nCas9(D10A)-bpNLS) plasmid fused to P2A-eGFP, 12.5 µg gRNA expression plasmid, and 150 µl TransIT-293 (for HEK293T, Mirus) or transfeX (for HepG2, ATCC) according to the manufacturer’s protocols. To ensure maximal correlation of negative controls to base-editor overexpression, for every CBE experiment, cells were transfected and sorted with nCas9-UGI-NLS-P2A-eGFP (BE3 without rAPOBEC1 and XTEN linker as negative control) in parallel. For ABE experiments, cells were transfected and sorted in parallel with bpNLS-32AA linker-nCas9-bpNLS-P2A-eGFP-NLS (ABEmax without TadA-dimer; GenScript codon-optimized as previously described25). The GFP controls (Figs. 1c, 3b; encoding P2A–eGFP; plasmid-size-adjusted transfection dose of 22 µg) were transfected without a matching nCas9-UGI-NLS-P2A-eGFP control. Each CBE and ABE replicate was processed in parallel with a respective nCas9 control experiment for direct comparison during downstream analysis. Only for the experiments shown in the SECURE BE3 variant screen (Extended Data Fig. 6a), cells were transfected on three consecutive days (three conditions per day). For experiments shown in Fig. 2a and Extended Data Fig. 7b, SECURE CBE variants were transfected on the same day with matching nCas9-UGI-NLS-P2A-eGFP and BE3-P2A-eGFP controls. Before sorting, cells were incubated for 36–40 h post-transfection. This length of time was chosen because preliminary experiments in which we transiently transfected plasmids encoding rAPOBEC1 into HepG2 cells showed the highest level of RNA editing at the APOB C6666 nucleotide at 24–48 h with progressively decreasing levels of editing at the 72 and 96 h time points (data not shown). For experiments to validate the DNA on-target activity of SECURE variants, 1.5 × 104 HEK293T cells were seeded into 96-well flat-bottom cell-culture plates (Corning) and transfected 24 h after seeding with 220 ng DNA (165 ng base editor or negative control plasmid and 55 ng gRNA expression plasmid) and 0.66 µl TransIT-293. Cells were incubated for 72 h post-transfection before genomic DNA (gDNA) was collected.

Fluorescence-activated cell sorting

HEK293T cells were washed with phosphate-buffered saline (PBS, Corning) and HepG2 cells with 0.25% Trypsin-EDTA solution (ATCC) 36–40 h after transfection. Trypsin-EDTA (0.05%; Gibco) was added to detach both cell types. Cells were prepared for sorting by diluting with PBS supplemented with 10% (v/v) FBS and filtering through 35-µm cell strainer caps (Corning). Flow cytometry was carried out on a FACSAria II (BD Biosciences) using FACSDiva version 6.1.3 (BD Biosciences). Cells were gated on their population via forward/sideward scatter after doublet exclusion (Supplementary Note). Cells treated with base editor were flow-sorted for all GFP-positive cells and/or the top 5% of gated cells (% parent) with the highest GFP (FITC) signal into pre-chilled FBS. Cells treated with nCas9-UGI-NLS-P2A-eGFP (BE3 control) or bpNLS-32AA linker-nCas9-bpNLS-P2A-eGFP-NLS (ABEmax control, abbreviated as NLS-nCas9-NLS) were sorted for all GFP-positive cells and/or the 5% of cells with a mean fluorescence intensity (MFI or geometric mean in FACSDiva software) matching the MFI of the top 5% GFP signal in BE3- or ABEmax-transfected cells that were assayed on the same day. GFP controls (Figs. 1c, 3b; P2A-eGFP) were MFI-matched to the top 5% GFP signal of BE3-P2A-eGFP-expressing cells from the same day. The negative control-transfected cells were MFI-matched because the negative control plasmids are smaller than the BE3 and ABEmax plasmids, yielding higher transfection efficiency and overall higher GFP or FITC signal. nCas9 controls and base-editor experiments were sorted on the same day, except for the SECURE BE3 variant screen (Extended Data Fig. 6a), for which cells were sorted for top 5% of GFP signal (percentage of total) and samples were sorted on three consecutive days (three conditions per day, in the order shown in the figure). For each experiment, at least 5–8 × 105 cells were sorted for gDNA and RNA extraction.

RNA and DNA extraction and reverse transcription

After sorting (~40–44 h post-transfection), cells were split into subsets for gDNA (usually at least 1–3 × 105 cells) or RNA (usually 3–6 × 105 cells) extraction and centrifuged at 175g for 8 min. For DNA extraction, cell pellets were lysed with 175 µl freshly prepared DNA lysis buffer (100mM Tris HCl pH 8.0, 200mM NaCl, 5mM EDTA, 0.05% SDS, adapted from a published method27), supplemented with 5 µl 1 M DTT (Sigma) and 20 μl proteinase K (NEB; 200 µl total volume of lysis buffer mix per condition). After 12–24 h of lysis at 55 °C and 500 r.p.m., gDNA was extracted using 0.7–2× paramagnetic beads that were prepared in a similar way to that previously described (GE Healthcare Sera-Mag SpeedBeads from Fisher Scientific, washed in 0.1× TE and suspended in 20% PEG-8000 (w/v), 1.5 M NaCl, 10 mM Tris-HCl pH 8, 1 mM EDTA pH 8, and 0.05% Tween20)28. The lysate–bead mélange was mixed rigorously, incubated for 5 min, separated on a magnetic plate and washed 3 times with 70% EtOH (washing was performed while the plate was off the magnet). After drying for 5 min, the DNA was eluted in 30–100 µl elution buffer. For RNA extraction, cell pellets were resuspended in 350 µl RNA lysis buffer LBP (Macherey-Nagel) and either processed subsequently or stored at –80 °C. RNA was extracted using the NucleoSpin RNA Plus kit (Macherey-Nagel) following the manufacturer’s instructions. For HEK293T DNA on-target experiments without sorting (96-well format), 50 µl freshly prepared DNA lysis buffer mix (including DTT and proteinase K, as described above) was added directly into each well after washing with 100 μl PBS. Reverse transcription was performed using the High Capacity RNA-to-cDNA kit (Thermo Fisher) following the manufacturer’s instructions.

Next-generation sequencing of DNA and RNA amplicons

Next-generation sequencing of gDNA or cDNA was performed as previously described3,7. Genomic or transcriptomic sites of interest were amplified by PCR using gene-specific primers flanking the target sequence and containing appropriate Illumina forward and reverse adaptor sequences (PCR1; all primers and next-generation sequencing amplicons for all genomic sites are listed in Supplementary Table 34). Specifically, for each 50-µl PCR reaction, 5–20 ng extracted genomic DNA or 2 µl 1:10 diluted cDNA, 2.5 µl of each 10 µM forward and reverse primer, 5 µl of 2 mM dNTP, 10 µl 5× Phusion HF buffer, and 0.5 µl Phusion high-fidelity DNA polymerase (NEB) were added. PCR1 reactions were carried out as follows: 98 °C for 2 min, then 30 cycles of (98 °C for 10 s, appropriate annealing temperature for desired primer pairs for 12–15 s, 72 °C for 12–15 s), and a final 72 °C extension for 10 min. PCR products were verified by running on a high-resolution or fast-analysis QIAxcel automated electrophoresis device (Qiagen) and cleaned with paramagnetic beads (0.6–0.7× beads-to-sample ratio). In a secondary ‘barcoding’ PCR (PCR2), the amplicons were indexed with primer pairs containing unique Illumina barcodes (analogous to TruSeq CD indexes, formerly known as TruSeq HT). Specifically, for each 50 µl barcoding PCR reaction, 50–200 ng DNA input from the purified PCR product (PCR1), 2.5 µl of 10 µM forward and reverse barcoding primers, 5 µl of 2 mM dNTP, 10 µl 5× Phusion HF buffer, and 0.5 µl Phusion high-fidelity DNA polymerase were added. PCR2 reactions were carried out as follows: 98 °C for 2 min, then 5–10 cycles of (98 °C for 10 s, 65 °C for 30 s, 72 °C for 30 s), and a final 72 °C extension for 10 min. PCR products were verified on a QIAxcel capillary electrophoresis machine (Qiagen) and cleaned with paramagnetic beads (0.6–0.7× beads-to-sample ratio), eluting the final product in 30 µl of 1× TE buffer. DNA concentration was quantified with the QuantiFluor dsDNA system (Promega) and Synergy HT microplate reader (BioTek) at 485/528 nm. Libraries were pooled and pools quantified with qPCR using the NEBNext Library Quant Kit for Illumina (NEB). Amplicon libraries were sequenced paired-end (PE) 2 × 150 on the Illumina MiSeq machine using 300-cycle MiSeq Reagent Kit v2 or Micro Kit v2 (Illumina) according to the manufacturer’s protocols. After demultiplexing, FASTQs were downloaded from BaseSpace (Illumina) and analysed using a batch version of the software CRISPResso 2 (release 20180918). See ‘On-target DNA amplicon sequencing analysis’ for further details.

RNA-seq experiments

RNA library preparation was performed using the TruSeq Stranded Total RNA Library Prep Gold kit (Illumina) with an initial input of ~500 ng extracted RNA per sample, using SuperScript III (Invitrogen) for first-strand synthesis. Depletion of ribosomal RNA (rRNA) was confirmed after the initial rRNA removal step by fluorometric quantification using the Qubit RNA HS Assay kit (Invitrogen). IDT for Illumina TruSeq RNA UD Indexes (96 indexes) were used to barcode each library with unique dual indexes to mitigate index hopping. RNA-seq libraries were examined on a high-resolution QIAxcel (Qiagen) and pooled on the basis of qPCR quantification with the KAPA Library Quantification Kit Illumina (KAPA Biosystems) or the NEBNext Library Quant Kit for Illumina (NEB). RNA-seq libraries were sequenced on an Illumina HiSeq 2500 machine in High Output mode, PE 2 × 76, or on an Illumina NextSeq 500 (PE 2 × 150), using a 500/550 Mid Output cartridge (Extended Data Fig. 6a; performed at MGH Molecular Profiling Laboratory). HiSeq runs (all remaining RNA-seq data) were performed by the Broad Institute of Harvard and MIT.

RNA sequence variant calling and quality control

Illumina paired-end fastq sequencing reads were processed according to GATK Best Practices for RNA-seq variant calling29,30. In brief, reads were aligned to the human hg38 reference genome using STAR version 2.6.0c31, RNA base-editing variants were called using HaplotypeCaller (GATK version 3.8), and empirical editing efficiencies were established on PCR-deduplicated (Picard version 2.7.1; http://broadinstitute.github.io/picard/) aligned reads. Known variants in dbSNP version 138 were used during base quality recalibration. From all called variants, downstream analyses focused solely on single-nucleotide variants (SNVs) on canonical (1–22, X, Y and M) chromosomes. To quantify the per-base nucleotide abundances per variant, we ran bam-readcount version 0.8.0 (https://github.com/genome/bam-readcount) on the ‘analysis-ready’ BAM file from the final output of the GATK pipeline. Furthermore, we assessed possible low-quality libraries or contamination by assessing: (1) possible gDNA contamination; (2) abundance of rRNA; and (3) contamination of mycoplasma in the cell line data. For (1), we assessed rates of possible gDNA contamination based on the ratio of reads mapping to the annotated transcriptome (hg38 GTF file) compared to all mapped genomic regions. Next, for (2), the abundance of rRNA was estimated by overlapping regions of rRNA from the UCSC hg38 annotation as a ratio of all reads remaining from the GATK pipeline. Finally, for (3), potential mycoplasma contamination was assessed my mapping reads with bowtie2 version 2.3.132 to four mycoplasma genomes obtained from NCBI—Mycoplasma hominis ATCC 23114 (NC_013511.1), Mycoplasma hyorhinis MCLD (NC_017519.1), Mycoplasma fermentans M64 (NC_014921.1) and Acholeplasma laidlawii PG-8A (NC_010163.1)—that were previously reported to be common contaminants in cell lines33.

RNA sequence variant filtering

Variant loci in base-editor overexpression experiments were filtered to exclude sites without high-confidence reference genotype calls in the control experiment. The read coverage for a given SNV in a control experiment should be >90th percentile of the read coverage across all SNVs in the corresponding overexpression experiment. Additionally, these loci were required to have a consensus of at least 99% of reads containing the reference allele in the control experiment. RNA edits in GFP compared to nCas9 controls were filtered to include only loci with 10 or more reads and with greater than 0% reads containing alternate allele. Base edits labelled as C-to-U comprise C-to-U edits called on the positive strand as well as G-to-A edits sourced from the negative strand. Base edits labelled as A-to-I comprise A-to-I edits called on the positive strand as well as T-to-C edits sourced from the negative strand. Edits considered for Venn diagrams were further filtered to include only those with read depths of more than 100. Results obtained with our pipeline may underestimate the actual number of RNA edits occurring in cells because of the high stringency of our variant calling pipeline and potential under-representation of intronic and intergenic RNA in our experiments.

RNA sequence variant effect prediction

The effect of identified variants was determined using the Variant Effect Predictor (VEP) version 92.5 tool from Ensembl34 with default parameters and option ‘–pick’ to filter for one consequence per variant (http://useast.ensembl.org/info/docs/tools/vep/index.html). VEP was run using the GRCh38.p12 reference human genome, Polyphen version 2.2.2, Sift version 5.2.2, COSMIC version 83, 1000genomes version phase3, ESP version V2-SSA137, gnomAD version 170228, GENCODE version 28, genebuild version 2014-07, HGMD-PUBLIC version 20174, regbuild version 16, ClinVar version 201802, and dbSNP version 150. The intergenic category in barplot figures also includes upstream and downstream gene variants.

Quantification of gene expression

Gene expression was inferred from STAR ‘–quantMode GeneCounts’ quantifications using UCSC annotations and is reported in transcripts per million (TPM). We defined expressed genes as those with 10 TPM or more.

On-target DNA amplicon sequencing analysis

Analysis of on-target amplicon sequencing was performed with CRISPResso2 version 20180918 in batch mode35 (http://crispresso2.pinellolab.org/), with options ‘-p 10–base_editor_output’. The main figures display percentage of C-to-T or A-to-G edits, zoomed in to the regions of interest, with other potentially occurring editing events not displayed. The grey background represents editing frequencies <2%. Raw data are provided in the Supplementary Tables.

Generation of sequence motifs

Sequence motifs were generated with WebLogo version 2.836 To generate extended the 100-bp sequence logos (Extended Data Fig. 9c–f), we used WebLogo version 3.6.036.

WES

Exome sequence enrichment was performed using Agilent SureSelect according to the manufacturer’s protocol (Agilent Technologies). Libraries were prepared using the SureSelect QXT transposase-based method, followed by enrichment with biotinylated RNA oligomers that were contained within the SureSelect v5+UTR capture pool. WES libraries were sequenced on an Illumina NovaSeq S1 flow cell. All library preparations and sequencing runs were performed by the Clinical Genomics Center of the Oklahoma Medical Research Foundation.

WES analysis

Each exome library was processed using GATK Best Practices29,30, including paired-end alignment, PCR duplicate removal, indel realignment and base quality recalibration. Per base, per nucleotide quantifications for each library were inferred using bam-readcount. A set of RNA edits per experiment was determined by using the union of high-quality edits from the three biological replicate libraries for each condition. Pooled RNA editing and DNA editing rates were determined per single nucleotide by taking the ratio of the total edited alleles over the total alleles at a given position. For scatter plots, the background rates of C-to-T or A-to-G alterations in the control sample were subtracted from base editor-treated sample to compute the DNA editing rate attributable to the base editor; in these same scatter plots, note that we only call RNA edits in base-editor-treated samples that do not appear in their corresponding control samples (nCas9–UGI–NLS for CBE or NLS–nCas9–NLS for ABE) as processed by our filtering pipeline (see ‘RNA sequence variant filtering methods’) and thus background rates of RNA editing are already accounted for in the depiction of these data.

Data Reporting

Sample sizes were not predetermined with statistical methods. Investigators were not blinded to experimental conditions or outcome assessments.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

Plasmids encoding the most relevant constructs shown in this work, including both SECURE BE3 variants, have been deposited to Addgene (http://www.addgene.org/browse/article/28197497/; Addgene IDs 123611–123616).

All RNA-seq data used in this study have been deposited in the Gene Expression Omnibus (GEO) repository (National Center for Biotechnology Information). The files are accessible through the GEO Series accession number GSE121668. All WES and targeted amplicon sequencing data have been deposited at the SRA repository under bioproject number PRJNA497753. All other relevant data are available from the corresponding author on request.

Code availability

The authors will make all previously unreported custom computer code used in this work available upon reasonable request.

References

  1. 1.

    Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).

    CAS  Article  Google Scholar 

  2. 2.

    Seo, H. & Kim, J. S. Towards therapeutic base editing. Nat. Med. 24, 1493–1495 (2018).

    Article  Google Scholar 

  3. 3.

    Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

    ADS  CAS  Article  Google Scholar 

  4. 4.

    Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci. Adv. 3, eaao4774 (2017).

    Google Scholar 

  5. 5.

    Kim, D. et al. Genome-wide target specificities of CRISPR RNA-guided programmable deaminases. Nat. Biotechnol. 35, 475–480 (2017).

    CAS  Article  Google Scholar 

  6. 6.

    Zuo, E. et al. Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos. Science 364, 289–292 (2019).

    Google Scholar 

  7. 7.

    Gaudelli, N. M. et al. Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).

    ADS  CAS  Article  Google Scholar 

  8. 8.

    Salter, J. D., Bennett, R. P. & Smith, H. C. The APOBEC protein family: united by structure, divergent in function. Trends Biochem. Sci. 41, 578–594 (2016).

    CAS  Article  Google Scholar 

  9. 9.

    Harris, R. S., Petersen-Mahrt, S. K. & Neuberger, M. S. RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. Mol. Cell 10, 1247–1253 (2002).

    CAS  Article  Google Scholar 

  10. 10.

    Lau, P. P., Chen, S. H., Wang, J. C. & Chan, L. A 40 kilodalton rat liver nuclear protein binds specifically to apolipoprotein B mRNA around the RNA editing site. Nucleic Acids Res. 18, 5817–5821 (1990).

    CAS  Article  Google Scholar 

  11. 11.

    Boström, K. et al. Apolipoprotein B mRNA editing. Direct determination of the edited base and occurrence in non-apolipoprotein B-producing cell lines. J. Biol. Chem. 265, 22446–22452 (1990).

    PubMed  Google Scholar 

  12. 12.

    Skuse, G. R., Cappione, A. J., Sowden, M., Metheny, L. J. & Smith, H. C. The neurofibromatosis type I messenger RNA undergoes base-modification RNA editing. Nucleic Acids Res. 24, 478–485 (1996).

    CAS  Article  Google Scholar 

  13. 13.

    Rosenberg, B. R., Hamilton, C. E., Mwangi, M. M., Dewell, S. & Papavasiliou, F. N. Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA-editing targets in transcript 3′ UTRs. Nat. Struct. Mol. Biol. 18, 230–236 (2011).

    CAS  Article  Google Scholar 

  14. 14.

    Sowden, M., Hamm, J. K. & Smith, H. C. Overexpression of APOBEC-1 results in mooring sequence-dependent promiscuous RNA editing. J. Biol. Chem. 271, 3011–3017 (1996).

    CAS  Article  Google Scholar 

  15. 15.

    Yamanaka, S., Poksay, K. S., Driscoll, D. M. & Innerarity, T. L. Hyperediting of multiple cytidines of apolipoprotein B mRNA by APOBEC-1 requires auxiliary protein(s) but not a mooring sequence motif. J. Biol. Chem. 271, 11506–11510 (1996).

    CAS  Article  Google Scholar 

  16. 16.

    Powell, L. M. et al. A novel form of tissue-specific RNA processing produces apolipoprotein-B48 in intestine. Cell 50, 831–840 (1987).

    CAS  Article  Google Scholar 

  17. 17.

    Chen, S. H. et al. Apolipoprotein B-48 is the product of a messenger RNA with an organ-specific in-frame stop codon. Science 238, 363–366 (1987).

    ADS  CAS  Article  Google Scholar 

  18. 18.

    Yamanaka, S., Poksay, K. S., Balestra, M. E., Zeng, G. Q. & Innerarity, T. L. Cloning and mutagenesis of the rabbit ApoB mRNA editing protein. A zinc motif is essential for catalytic activity, and noncatalytic auxiliary factor(s) of the editing complex are widely distributed. J. Biol. Chem. 269, 21725–21734 (1994).

    CAS  PubMed  Google Scholar 

  19. 19.

    Navaratnam, N. et al. Evolutionary origins of apoB mRNA editing: catalysis by a cytidine deaminase that has acquired a novel RNA-binding motif at its active site. Cell 81, 187–195 (1995).

    CAS  Article  Google Scholar 

  20. 20.

    Teng, B. B. et al. Mutational analysis of apolipoprotein B mRNA editing enzyme (APOBEC1). Structure–function relationships of RNA editing and dimerization. J. Lipid Res. 40, 623–635 (1999).

    CAS  PubMed  Google Scholar 

  21. 21.

    Chen, Z. et al. Hypermutation induced by APOBEC-1 overexpression can be eliminated. RNA 16, 1040–1052 (2010).

    CAS  Article  Google Scholar 

  22. 22.

    MacGinnitie, A. J., Anant, S. & Davidson, N. O. Mutagenesis of apobec-1, the catalytic subunit of the mammalian apolipoprotein B mRNA editing enzyme, reveals distinct domains that mediate cytosine nucleoside deaminase, RNA binding, and RNA editing activity. J. Biol. Chem. 270, 14768–14775 (1995).

    CAS  Article  Google Scholar 

  23. 23.

    Wolf, J., Gerber, A. P. & Keller, W. tadA, an essential tRNA-specific adenosine deaminase from Escherichia coli. EMBO J. 21, 3841–3851 (2002).

    CAS  Article  Google Scholar 

  24. 24.

    Kim, J. et al. Structural and kinetic characterization of Escherichia coli TadA, the wobble-specific tRNA deaminase. Biochemistry 45, 6407–6416 (2006).

    CAS  Article  Google Scholar 

  25. 25.

    Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).

    CAS  Article  Google Scholar 

  26. 26.

    Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).

    CAS  Article  Google Scholar 

  27. 27.

    Laird, P. W. et al. Simplified mammalian DNA isolation procedure. Nucleic Acids Res. 19, 4293 (1991).

    CAS  Article  Google Scholar 

  28. 28.

    Rohland, N. & Reich, D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 22, 939–946 (2012).

    CAS  Article  Google Scholar 

  29. 29.

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  Article  Google Scholar 

  30. 30.

    DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    CAS  Article  Google Scholar 

  31. 31.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  Article  Google Scholar 

  32. 32.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    CAS  Article  Google Scholar 

  33. 33.

    Olarerin-George, A. O. & Hogenesch, J. B. Assessing the prevalence of mycoplasma contamination in cell culture via a survey of NCBI’s RNA-seq archive. Nucleic Acids Res. 43, 2535–2542 (2015).

    CAS  Article  Google Scholar 

  34. 34.

    McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).

    Article  Google Scholar 

  35. 35.

    Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).

    CAS  Article  Google Scholar 

  36. 36.

    Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).

    CAS  Article  Google Scholar 

  37. 37.

    Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protocols 10, 845–858 (2015).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

J.K.J., J.G. and R.Z. are supported by the Defense Advanced Research Projects Agency (HR0011-17-2-0042). Support was also provided by the National Institutes of Health (RM1 HG009490 to J.K.J. and J.G. and R35 GM118158 to J.K.J. and M.J.A.). J.K.J. is additionally supported by the Desmond and Ann Heathwood MGH Research Scholar Award. We thank M. M. Kaminski, B. P. Kleinstiver and K. Petri for discussions; V. Pattanayak for input on the manuscript; Y. E. Tak, G. Boulay, M. K. Clement, A. A. Sousa, R. T. Walton, M. L. Bobbin, M. V. Maus and A. Schmidts for technical advice; and P. K. Cabeceiras and O. R. Cervantes for technical assistance. J.K.J. dedicates this paper to the memory of C. J. Park.

Author information

Affiliations

Authors

Contributions

J.G. and R.Z. performed all wet laboratory experiments together. S.P.G., S.I., C.A.L. and M.J.A. performed all bioinformatic and computational analysis of data. J.G. and J.K.J. conceived and designed the study. J.G., M.J.A. and J.K.J. organized and supervised the work. J.G. and J.K.J. wrote the initial draft of the manuscript and all authors contributed to the writing of the final manuscript.

Corresponding author

Correspondence to J. Keith Joung.

Ethics declarations

Competing interests

J.K.J. has financial interests in Beam Therapeutics, Editas Medicine, Endcadia, Pairwise Plants, Poseida Therapeutics and Transposagen Biopharmaceuticals. These interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies. J.K.J. and M.J.A. also hold equity in Excelsior Genomics. J.K.J. is a member of the Board of Directors of the American Society of Gene and Cell Therapy. J.G., R.Z. and J.K.J. are co-inventors on patent applications that have been filed by Partners Healthcare/Massachusetts General Hospital on engineered base editor architectures that reduce RNA editing activities.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Additional data and analysis for transcriptome-wide off-target C-to-U RNA editing induced with BE3 in HepG2 cells.

a, Dot plot of RNF2 on-target DNA editing data shown in Fig. 1b, depicting editing frequencies for all cytosines across the spacer sequence. b, Heat maps showing RNA and DNA editing efficiencies with BE3 and control on cytosines in human APOB. Numbering indicates nucleotide positions in the APOB transcript; asterisks identify those previously shown to be modified by APOBEC1. c, Histograms showing numbers of RNA-edited cytosines identified (y-axis) with RNA C-to-U editing frequencies (x-axis) for the four replicates shown in Fig. 1c. Dashed red line, median; solid red line, mean. d, Manhattan plots of data for replicates 2, 3, and 4 from Fig. 1c showing the distribution of modified cytosines identified across the transcriptome. n, total number of modified cytosines identified. e, Percentages of different predicted effects and locations of edited cytosines identified in each RNA-seq replicate. f, Jitter plots of cytosines modified by BE3 expression with the RNF2 gRNA categorized by their presence in 4, 3, 2 or 1 of the replicate RNA-seq experiments performed in HepG2 cells (n = 4 biologically independent samples, as in Fig. 1c). Box spans the interquartile range (IQR) (first to third quartiles); horizontal line shows median (second quartile); whiskers extend to ± 1.5 × IQR. n, total number of modified cytosines present in each category. The percentage of all modified cytosines in each category is also shown.

Extended Data Fig. 2 BE3 expression with two different gRNAs induces transcriptome-wide off-target RNA editing in HEK293T cells.

a, Heat maps of on-target DNA base editing efficiencies of BE3 and nCas9–UGI–NLS (control) in HEK293T cells (all GFP sorting) determined in triplicate with RNF2 or EMX1 gRNA. Bases shown are within the editing window of the on-target spacer sequence (numbering is at the bottom with 1 being the most PAM-distal spacer position). b, Dot plots of RNF2 and EMX1 on-target DNA editing data shown in a, depicting editing frequencies for all cytosines across the spacer sequence. c, Jitter plots derived from RNA-seq experiments showing RNA cytosines modified by BE3 expression with RNF2 or EMX1 gRNA. n, total number of modified cytosines identified in each replicate. d, Histograms showing numbers of RNA-edited cytosines identified (y-axis) with RNA C-to-U editing frequencies (x-axis) for the experiments shown in c. Dashed red line, median; solid red line, mean. e, Manhattan plots of data shown in c depicting the distribution of modified cytosines across the transcriptome. n, total number of modified cytosines identified.

Extended Data Fig. 3 Additional analysis of data showing transcriptome-wide off-target RNA editing in HEK293T cells with BE3 and two different gRNAs.

a, Percentages of different predicted effects and locations of edited cytosines in each RNA-seq replicate from Extended Data Fig. 2c. b, Percentages (x-axis) and numbers (shown inside bars) of expressed genes in each RNA-seq replicate from data shown in Extended Data Fig. 2c that show at least one edited cytosine. c, Jitter plots of cytosines modified by BE3 expression with RNF2 or EMX1 gRNA categorized by their presence in 3, 2 or 1 of the replicate RNA-seq experiments performed in HEK293T cells (n = 3 biologically independent samples, as in Extended Data Fig. 2c). Box, whiskers and n are as defined in Extended Data Fig. 1f. The percentage of all modified cytosines identified in each category is also shown. d, Sequence logos derived from edited cytosines identified in each RNA-seq replicate. Analysis done using RNA-seq data generated from cDNA; every T depicted should be considered a U in RNA. e, Venn diagram showing numbers of cytosines edited with the RNF2 and EMX1 gRNAs. For each gRNA, the number of cytosines represents the union of those identified in the three replicates.

Extended Data Fig. 4 Increased BE3 expression induces higher numbers and frequencies of transcriptome-wide RNA cytosine edits in HEK293T cells.

a, Heat maps of on-target DNA base editing efficiencies of BE3 and nCas9–UGI–NLS (control) in HEK293T cells (top 5% GFP sorting) determined in duplicate with RNF2 or EMX1 gRNA. Bases shown are within the editing window of the on-target spacer sequence (numbering is at the bottom with 1 being the most PAM-distal spacer position). b, Dot plots of data shown in a, depicting editing frequencies for all cytosines across the spacer sequence. c, Jitter plots derived from duplicate RNA-seq experiments showing RNA cytosines modified by BE3 expression with RNF2, EMX1 or non-targeted (NT) gRNA. n, total number of modified cytosines identified in each replicate. d, Histograms showing numbers of RNA edited cytosines identified (y-axis) with RNA C-to-U editing frequencies (x-axis) for the experiments shown in c. Dashed red line, median; solid red line, mean. e, Manhattan plots of data for both replicates of RNF2, EMX1, and non-targeted gRNAs from c showing the distribution of modified cytosines across the transcriptome. n,  total number of modified cytosines identified.

Extended Data Fig. 5 Additional data and analysis showing that increased BE3 expression induces higher numbers and frequencies of transcriptome-wide RNA cytosine edits in HEK293T cells.

a, Percentages of different predicted effects and locations of edited cytosines identified in each RNA-seq replicate from Extended Data Fig. 4c. b, Percentages (x-axis) and numbers (shown inside bars) of expressed genes in each RNA-seq replicate that have at least one edited cytosine. c, Sequence logos derived from edited cytosines identified in each RNA-seq duplicate experiment from Extended Data Fig. 4c for the RNF2, EMX1 and non-targeted gRNAs. Analysis done using RNA-seq data generated from cDNA; every T depicted should be considered a U in RNA. d, Venn diagram showing numbers of edited cytosines identified with the RNF2, EMX1 and non-targeted gRNAs. For each gRNA, the circle encompasses the union of cytosines identified in the two replicates (data derived from the experiments shown in Extended Data Fig. 4c). e, Venn diagrams showing all possible pairwise comparisons of edited cytosines identified in duplicate experiments performed with the RNF2, EMX1 and non-targeted gRNAs (data derived from the experiments shown in Extended Data Fig. 4c). f, Scatter plot correlating RNA editing frequencies (x-axis) of 154,264 cytosines previously shown to be edited by RNA-seq with DNA editing frequencies (y-axis) determined by WES performed with DNA derived from the same experiments (n = 3 biologically independent samples, pooled data). Superimposed histograms (top and right) depict the percentages of cytosines that show various editing rates on RNA (upper x-axis) or DNA (right y-axis).

Extended Data Fig. 6 Additional data showing that SECURE BE3 variants induce substantially reduced numbers of RNA edits but possess comparable and more-precise DNA editing activities in HEK293T cells.

a, Initial screen of transcriptome-wide RNA editing activities of six BE3 variants containing various rAPOBEC1 mutations and expressed at high levels in HEK293T cells (sorting cells with top 5% of GFP signal). Jitter plots of single replicate RNA-seq experiments showing RNA cytosines modified by expression of wild-type BE3, BE3-E63Q (rAPOBEC1 catalytic site mutant), BE3-P29F, BE3-P29T, BE3-L182A, BE3-R33A, BE3-K34A and BE3-R33A/K34A variants. n, total number of modified cytosines identified in each sample. b, Heat map of on-target DNA base-editing efficiencies of nCas9–UGI–NLS (control), wild-type BE3, BE3-R33A and BE3-R33A/K34A in HEK293T cells with the RNF2 gRNA (cells from experiment shown in Fig. 2a). Bases within the editing window of the on-target spacer sequence are numbered as previously described. Note the inclusion of C12, which is inefficiently edited by wild-type BE3 in these samples but not edited by the SECURE BE3 variants, even with higher expression. c, Dot plot for HEK293T on-target data displayed in b, expanded to include all cytosines across the spacer sequence.

Extended Data Fig. 7 Additional data and analysis of the on-target DNA and off-target RNA activities of BE3 and SECURE BE3 variants.

a, Dot plots illustrating on-target DNA editing efficiencies of nCas9–UGI–NLS (control), wild-type BE3, BE3-R33A and BE3-R33A/K34A in HEK293T cells on 12 genomic sites. These are the same data as shown in Fig. 2c, expanded to include all cytosines across the spacer sequence. b, Jitter plots from RNA-seq experiments in HepG2 cells showing RNA cytosines modified by wild-type BE3, BE3-R33A and BE3-R33A/K34A. Data for wild-type BE3 are from the experiments presented in Fig. 1c (replicates 2–4). n, total number of modified cytosines identified. c, Manhattan plots of data showing the distribution of modified cytosines induced with BE3-R33A or BE3-R33A/K34A expression for replicate 3 from b overlaid on modified cytosines induced with wild-type BE3 expression (the wild-type BE3 data are the same in the top and bottom plots). n, total number of modified cytosines identified. d, Heat map of on-target DNA base editing efficiencies of nCas9–UGI–NLS (control), wild-type BE3, BE3-R33A and BE3-R33A/K34A in HepG2 cells with the RNF2 gRNA (cells from same experiment as shown in b). Replicates 1, 2 and 3 for wild-type BE3 and nCas9–UGI–NLS show the same data presented as replicates 2, 3 and 4 for wild-type BE3 and nCas9–UGI–NLS in Fig. 1b. Bases within the editing window of the on-target spacer sequence are numbered as previously described. Note again the inclusion of position C12. e, Dot plot for HepG2 on-target data shown in d, expanded to include all cytosines across the spacer sequence. f, Schematic of the editing windows (coloured boxes) for wild-type BE3, BE3-R33A and BE3-R33A/K34A based on experimental data from Fig. 2c and Extended Data Fig. 7a. Darker-coloured and more-translucent boxes indicate positions generally showing higher and lower C-to-T editing efficiencies, respectively. Increased stringency for a 5′T with BE3-R33A/K34A is also indicated. The PAM (NGG) and the nicking site in the DNA backbone are highlighted. Drawings are adapted with permission from table 1 of ref. 1.

Extended Data Fig. 8 Additional data and analysis for transcriptome-wide off-target A-to-I RNA editing induced by ABEmax expression in HEK293T cells.

a, Dot plot of HEK site 2 on-target DNA editing data shown in Fig. 3a, depicting editing frequencies for all adenines across the spacer sequence. b, Histograms showing numbers of RNA-edited adenines identified (y-axis) with RNA A-to-I editing frequencies (x-axis) for three replicates shown in Fig. 3b. Dashed red line, median; solid red line, mean. c, Manhattan plots of data for replicates 1 and 2 from Fig. 3b showing the distribution of modified adenines identified across the transcriptome. n, total number of modified adenines identified. d, Percentages of different predicted effects and locations of edited adenines in each RNA-seq replicate shown in Fig. 3b. e, Percentages (x-axis) and numbers (inside bars) of expressed genes in each RNA-seq replicate that show at least one edited adenine. f, Jitter plots of adenines modified by ABEmax expression with the HEK site 2 gRNA categorized by their presence in 3, 2 or 1 of the replicate RNA-seq experiments shown in Fig. 3b (n = 3 biologically independent samples). Box and whiskers are as defined in Extended Data Fig. 1f. n, total number of modified adenines present in each category. The percentage of all modified adenines found in each category is also shown. g, Scatter plot correlating RNA editing frequencies (x-axis) of 52,462 adenines previously shown to be RNA edited with DNA editing frequencies (y-axis) determined by WES (n = 3 biologically independent samples, pooled data). Superimposed histograms (top and right) depict the percentages of edited adenines on RNA (upper x-axis) or DNA (right y-axis).

Extended Data Fig. 9 Effects of BE3 and SECURE BE3 variants on cell viability, structural model of rAPOBEC1 and extended sequence logos of off-target RNA edited sites.

a, Cell viability assay comparing HEK293T cells transfected with plasmid expressing nCas9–UGI–NLS, wild-type (WT) BE3, BE3-R33A, BE3-R33A/K34A or BE3-E63Q (n = 3 biologically independent samples per condition). Each dot represents one biological replicate (and is the mean of three technical replicates). All data points were normalized to the mean luminescence of the nCas9–UGI–NLS controls (set to 100%, grey dotted line) that were performed for each biological replicate experiment. The assay was performed on days 1, 2, 3 and 4 after plating (following sorting for all GFP-positive cells). Data shown as mean ± s.e.m. RLU, relative light unit; n.s., not significantly decreased compared to matched nCas9–UGI–NLS control; *P P Supplementary Methods. b, Structural model of rAPOBEC1 with locations of catalytic residues and the R33 and K34 positions that were altered in SECURE variants. A predicted rAPOBEC1 structure is shown that was generated with Protein Homology/analogY Recognition Engine v 2.0 (Phyre2)37 and visualized in PyMOL (v 1.8.2.1). The R33 and K34 residues mutated in the SECURE variants are shown in orange and blue, respectively. Catalytic site residues (H61, E63, C93 and C96) have previously been described19 and are shown in green. cf, Extended sequence logos for BE3- and ABEmax-induced RNA editing sites. Sequence logos derived with the nucleotides 100 base pairs upstream and downstream of the motifs edited in RNA by BE3 (ACW) or ABEmax (UA) are shown. Logos were derived from data for BE3 expression in HepG2 cells (c; Fig. 1c), BE3 expression in HEK293T cells (d; all GFP-sorted cells; Extended Data Fig. 2c), higher BE3 expression in HEK293T cells (e; top-5% GFP-sorted cells; Extended Data Fig. 4c), and ABEmax expression in HEK293T experiments (f; top 5% GFP-sorted cells; Fig. 3b). Analysis was done using RNA-seq data generated from cDNA; every T depicted should be considered a U in RNA.

Extended Data Table 1 Summary of RNA edits observed for all RNA-seq experiments

Supplementary information

Supplementary information

This file contains the Supplementary Methods, Supplementary Discussion, Supplementary References and a Supplementary Note which includes FACS raw data and gating examples for different experimental conditions.

Reporting Summary

Supplementary Tables

This file contains Supplementary Tables 1-36 and a Supplementary Table Guide.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Grünewald, J., Zhou, R., Garcia, S.P. et al. Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors. Nature 569, 433–437 (2019). https://doi.org/10.1038/s41586-019-1161-z

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.