Highly specific Cas9 nucleases derived from SpCas9 are valuable tools for genome editing, but their wide applications are hampered by a lack of knowledge governing guide RNA (gRNA) activity. Here, we perform a genome-scale screen to measure gRNA activity for two highly specific SpCas9 variants (eSpCas9(1.1) and SpCas9-HF1) and wild-type SpCas9 (WT-SpCas9) in human cells, and obtain indel rates of over 50,000 gRNAs for each nuclease, covering ~20,000 genes. We evaluate the contribution of 1,031 features to gRNA activity and develope models for activity prediction. Our data reveals that a combination of RNN with important biological features outperforms other models for activity prediction. We further demonstrate that our model outperforms other popular gRNA design tools. Finally, we develop an online design tool DeepHF for the three Cas9 nucleases. The database, as well as the designer tool, is freely accessible via a web server, http://www.DeepHF.com/.
Broader application of highly specific Cas9 nucleases has been hampered by lack of knowledge for gRNA design. Our study filled the gap by generating a database of over 50,000 gRNAs covering ~20,000 human genes for eSpCas9(1.1) and SpCas9-HF1. Users can pick efficient gRNAs from the database for gene knockout. In addition, we have shown here that the Tree SHAP algorithm is a powerful tool for evaluation of feature importance. Based on large data set and important features, we optimized seven models for gRNA activity prediction. Importantly, we have demonstrated that RNN + biofeature is to the best of our knowledge, a state-of-the-art model for activity prediction for the three Cas9 nucleases. These useful clues will facilitate the development of optimal computer models for gRNA design for other Cas9 nucleases. We finally developed an online tool for gRNA design for WT-SpCas9, eSpCas9(1.1), and SpCas9-HF1. Taken together, our study will facilitate application of highly specific cas9 nucleases for genome editing.