✂️ CRISPR sgRNA Library Design for Oligo Pool Synthesis
Complete workflow for designing genome-wide (70K-200K guides) or targeted (500-10K guides) sgRNA libraries for CRISPR knockout, activation (CRISPRa), or interference (CRISPRi) screens. Includes coverage calculation, sequence QC, scaffold assembly, and synthesis specifications for Twist Bioscience, CustomArray, and Agilent oligo pools.(Updated Q4 2025)
What You'll Learn
- ✓Calculate required sgRNA count for genome-wide or targeted screens
- ✓Validate sgRNA sequences for GC content, secondary structures, and quality issues
- ✓Assess library complexity and representation before synthesis
- ✓Prepare final oligo pool for array synthesis with proper formatting
Prerequisites
Required Knowledge:
- Basic CRISPR/Cas9 principles and experimental design
- sgRNA design software (e.g., Benchling, GPP Web Portal, CRISPRscan)
- Target gene list or genomic regions for screening
Required Files:
- sgRNA sequences (20-23 bp, typically 20 bp for SpCas9)
- Target gene IDs and genomic coordinates
- Scaffold sequence (constant region, e.g., tracrRNA)
Tools Needed:
- sgRNA design tool (not provided by OligoPool.com - use Benchling/GPP/etc.)
- OligoPool.com tools: Coverage Calculator, GC Analyzer, Batch QC
- Spreadsheet software (Excel/Google Sheets) for library compilation
CRISPR Library Types & Cas9 Systems
Cas9 System Specifications
| System | PAM | Guide Length | Application |
|---|---|---|---|
| SpCas9 | 5'-NGG-3' | 20 bp (17-24 bp) | Standard knockout, most common |
| SaCas9 | 5'-NNGRRT-3' | 21 bp | AAV delivery (smaller size) |
| Cas12a (Cpf1) | 5'-TTTV-3' | 23-25 bp | AT-rich targets, multiplexing |
| dCas9-VP64 | 5'-NGG-3' | 20 bp | CRISPRa (activation) libraries |
* Use GC Content Analyzer to verify guide compatibility with chosen Cas9 system
🧬 Genome-Wide Knockout
Target all protein-coding genes (18,000-20,000 genes). Phenotype discovery.
- • Guides/gene: 4-6 (knockout), 10 (CRISPRa/i)
- • Total sgRNAs: 72,000-200,000
- • Controls: 1,000 non-targeting + essential genes
- • Example: Human GeCKO v2 (123K guides)
🎯 Targeted/Pathway Libraries
Gene families, pathways, or drug targets (100-1,000 genes). Focused validation.
- • Guides/gene: 5-10 (higher redundancy)
- • Total sgRNAs: 500-10,000
- • Use case: Kinome, GPCRs, epigenetics
- • Advantage: Deep coverage, affordable
📍 Tiling/Regulatory
Dense sgRNA coverage of non-coding regions. Enhancer/promoter mapping.
- • Density: 1 guide every 5-10 bp
- • Size: Depends on locus (1-100 kb typical)
- • Use case: CRISPRi of enhancers/promoters
- • Tool: Use Coverage Calculator for tiling design
Design Workflow
Calculate Library Coverage
Determine how many sgRNAs you need using the Coverage Calculator.
📋 Instructions:
- Go to Coverage Calculator
- Select "CRISPR Library" mode
- Enter number of target genes (e.g., 18,000 for genome-wide human)
- Set guides per gene (typically 4-6 for knockout, 3-10 for activation)
- Include non-targeting controls (recommend 500-1,000 guides)
- Review total oligo count and estimated cost
Typical Library Sizes:
| Library Type | Genes | sgRNA/Gene | Total sgRNAs |
|---|---|---|---|
| Human Genome-Wide | ~19,000 | 4-6 | 76,000-114,000 |
| Mouse Genome-Wide | ~22,000 | 4-6 | 88,000-132,000 |
| Targeted (Kinases) | ~500 | 10 | 5,000 |
| CRISPRa/i | ~18,000 | 10 | 180,000 |
* Add 1,000 non-targeting controls to any library
✅ Design Guidelines:
- Redundancy: 4-6 sgRNAs/gene for robust phenotypes
- Controls: Include non-targeting and essential gene controls
- Power: Ensure sufficient coverage for statistical analysis (1000x representation)
Design sgRNA Sequences
Use validated design algorithms to generate guides with high predicted activity and minimal off-targets.
⚠️ Design Software (External):
OligoPool.com provides QC tools only. Use these for sgRNA design:
- Algorithm: Doench 2016 (Rule Set 2)
- Score range: 0-100
- Free, web-based
- Broad Institute, validated libraries
- Human/mouse genome-wide
- Includes Azimuth scores
- ML-based, latest algorithm
- Multi-species support
- Off-target scoring
- Moreno-Mateos 2015 algorithm
- Command-line, batch processing
- Good for zebrafish/mouse
sgRNA Design Criteria (Quantitative):
📈 Activity Score Interpretation:
Include metadata in FASTA headers for traceability. Filter by Doench >0.5 and MIT >50 before QC.
QC sgRNA Sequences
Screen all sgRNA sequences for quality issues before adding scaffolds.
A. GC Content Analysis
- Navigate to GC Content Analyzer (batch mode)
- Upload FASTA file with all sgRNA spacer sequences (20 bp only)
- Flag sequences: <25% GC (poor activity) or >75% GC (synthesis issues)
- Export flagged list, design replacements for critical targets
Impact on Efficiency:
GC 40-60%: optimal activity. GC <25%: significantly reduced cutting efficiency (30-50% in published studies). GC >75%: increased synthesis failure risk (15-25% reported by array vendors). See GC Content Tutorial.
B. Batch Sequence QC
- Go to Batch Sequence QC
- Upload FASTA with all sgRNA spacers
- Flag homopolymers: GGGG (high dropout risk), TTTT (U6 termination), AAAA (synthesis errors)
- Check for palindromes and tandem repeats (synthesis artifacts)
Rejection Criteria:
- TTTT or longer: Mandatory removal (pol-III termination)
- GGGG runs: High library dropout risk (≥40-50% observed in NGS QC)
- Low complexity (<1.5 Shannon entropy): Replace if critical gene
C. Secondary Structure Check
- Use Secondary Structure Predictor for flagged sequences
- Upload sgRNA spacer + scaffold (full 96 bp construct)
- Check ΔG < -3 kcal/mol for stable hairpins in spacer region
- Prioritize replacement if hairpin blocks PAM-proximal region (positions 1-12)
Hairpin Impact:
Stable hairpins (ΔG < -3 kcal/mol) in seed region (bp 13-20): significant activity reduction (40-60% reported). See Secondary Structure Tutorial.
🚫 Sequences to Remove:
- 1. Poly-T (TTTT or longer) → causes transcription termination
- 2. Extreme GC (<20% or >80%)
- 3. Perfect palindromes → may form hairpins
- 4. Sequences matching your cloning vector
💡 Replacement Strategy:
For each removed guide, design an alternative targeting nearby region (±50 bp). Maintain the same number of guides per gene for balanced representation.
Add Scaffold & Cloning Adapters
Assemble full-length oligos with system-specific scaffolds and vector-compatible adapters.
Oligo Structure by Cas9 System:
SpCas9 (lentiCRISPR v2 / pLKO):
SaCas9 (pX601):
CRISPRa (SAM MS2 system):
📋 Oligo Assembly in Excel/Python:
- Create spreadsheet with columns: Gene, sgRNA_ID, spacer_sequence
- Add columns for: 5'-adapter, scaffold, 3'-adapter (constant for all)
- Concatenate: =CONCATENATE(adapter_5, spacer, scaffold, adapter_3)
- Add barcode column (optional, for deconvolution)
- Export as CSV or FASTA for synthesis order
⚠️ Important Considerations:
- Strand: Order sense strand (same as U6 transcription)
- Format: Check vendor requirements (single vs. double-stranded)
- Length limit: Most arrays support up to 230 bp (you'll be ~100-110 bp)
- Barcoding: Add unique barcode if multiplexing libraries
Final QC & Synthesis Order
Perform final quality checks on assembled oligos before placing synthesis order.
Final Checklist:
✅ Synthesis Vendor Options:
- Scale: 1K-300K oligos
- Length: 60-300 bp
- Cost: $0.04-0.08/oligo (scale-dependent)
- NGS QC: Included (500×)
- Lead time: 2-3 weeks
- Scale: Up to 1M features
- Length: 60-200 bp
- Cost: $0.03-0.06/oligo (100K+ pools)
- QC: Optional NGS add-on
- Lead time: 3-4 weeks
- Scale: 12K-2M oligos
- Length: 40-350 bp
- Cost: $0.08-0.12/oligo (mid-scale)
- High uniformity (<3-fold CV)
- Lead time: 2-3 weeks
* Pricing as of Q4 2025, varies by library size and specifications. Compare vendors: Oligo Pool Synthesis Vendors
🧬 NGS QC Metrics (Post-Synthesis):
📝 File Format for Synthesis Order:
Option 1: FASTA Format (Recommended)
Header format: >unique_ID|gene_name|guide_number|metadata (optional)
Option 2: CSV/TSV Format
Required columns: oligo_id (unique), sequence (full-length oligo). Optional: gene, metadata.
- Twist: FASTA or CSV, max 300 bp, no special characters in IDs
- Agilent: Tab-delimited text file, oligo name + sequence columns
- CustomArray: Excel template provided, FASTA also accepted
Post-Synthesis Steps (Brief):
- Resuspend oligo pool in TE buffer
- PCR amplify with minimal cycles (6-8 cycles)
- Clone into lentiviral vector (Gibson assembly or restriction cloning)
- Transform, grow overnight, maxi-prep
- NGS verification of library representation
- Package lentivirus and transduce target cells
Detailed cloning protocols are beyond the scope of this calculator guide.
Post-Screen Data Analysis
After completing your CRISPR screen, use statistical algorithms to identify significant hits from NGS count data.
📊 Analysis Tools
- MAGeCK: Most popular, RRA algorithm, gene-level FDR
- BAGEL: Bayesian approach, best for essentiality screens
- drugZ: Optimized for drug resistance/synthetic lethality
- RIGER: Early method, rank-based, still widely used
✅ Hit Calling Thresholds
- FDR: <0.05 (stringent), <0.25 (permissive for validation)
- Log2 Fold-Change: |Δ| >1.5 (depletion/enrichment screens)
- Guide concordance: ≥3/6 guides significant per gene
- Effect size: Consider biological relevance, not just p-value
Statistical Considerations:
Best Practices
Design Redundancy
Always include 4-6 guides per gene. Statistical power for hit calling requires multiple independent guides showing consistent phenotypes.
Include Controls
1,000 non-targeting guides (negative control) + essential gene guides (positive control, e.g., POLR2A, RPA3) for quality assessment.
Plan Sequencing Depth
Aim for 500-1000x coverage per sgRNA in NGS QC. For 100K library, that's 50-100 million reads. Budget for deep sequencing.
Validate Library Quality
ALWAYS perform NGS on final plasmid library before virus production. Check for representation, dropouts, and skew. Aim for <10% dropout.
Troubleshooting Library Issues
Problem: High dropout rate (>20% of designed guides)
Diagnose by dropout pattern:
- 20-40% dropout, random: Synthesis failure. Check for poly-T (TTTT+), extreme GC (<25%, >75%), or secondary structures. Redesign flagged sequences.
- 20-30% dropout, sequence-specific: PCR amplification bias. Reduce cycles from 12→6-8, use KAPA HiFi or Q5 polymerase.
- >40% dropout: Transformation bottleneck. Need ≥10× library complexity as CFUs (e.g., 100K library = 1M colonies minimum). Pool 5-10 transformations.
- Dropout enriched in high-GC guides: PCR bias during amplification. Switch to long-extension polymerase (1 min/kb).
Problem: Skewed representation (CV >5-fold)
Optimize PCR amplification:
- Reduce cycles: Start with 6 cycles, increase only if yield <1 µg. Each extra cycle progressively increases skew (typically ~10-15% additional bias observed).
- Template amount: Use 10-100 ng input DNA. Too low (<1 ng) = stochastic sampling. Too high (>500 ng) = incomplete denaturation.
- Polymerase choice: KAPA HiFi (lowest bias), Q5 (good), Phusion (moderate bias). Avoid Taq (extreme GC bias).
- Alternative: Emulsion PCR or linear amplification (IVT) for highly uniform libraries.
Problem: Low cloning efficiency (<105 CFU/µg)
Systematic optimization:
- Vector prep: High-quality maxi-prep, A260/280 = 1.8-1.9. Digest overnight, gel-purify, dephosphorylate (CIP or rSAP).
- Insert:vector ratio: Start 1:5 molar ratio, optimize to 1:10 or 1:20 if low efficiency.
- Competent cells: Use ≥109 CFU/µg cells (Endura, Stbl3, or ElectroMAX). Test efficiency with control plasmid before library.
- Transformation scale: For 100K library, perform 10-20 transformations of 10 µL cells each. Pool after recovery.
- Recovery: 1 hour at 37°C in SOC medium before plating. Do NOT exceed 1.5 hours (some guides may replicate faster).
Problem: Sequences present in input pool but lost after cloning
Likely toxic or unstable in E. coli:
- Cryptic promoters: Some sgRNA sequences may form bacterial promoters. Use recombination-deficient strains (Stbl3).
- Palindromes: Inverted repeats can trigger recombination. These should have been flagged in Step 3 QC.
- Solution: Clone in low-copy plasmid (pSC101 origin) or switch to yeast-based library construction (more stable but complex).
Workflow Summary
Coverage Calculation
sgRNA Design
Sequence QC
Add Scaffolds
Order & Validate
⚡ Quick Reference: CRISPR Library Design Parameters
For experienced users — critical thresholds at a glance.
| Parameter | Optimal Range | Reject If | Tool |
|---|---|---|---|
| sgRNA Length | 20 bp (SpCas9) | <17 bp or >24 bp | Design software |
| GC Content | 40-60% | <25% or >75% | GC Analyzer |
| Activity Score | Doench >0.5, Azimuth >50 | Doench <0.3 | Benchling, GPP |
| Off-Target CFD | MIT score >50 | CFD aggregate >0.20 | CRISPick |
| Poly-T Runs | 0 (absent) | ≥TTTT (U6 termination) | Batch QC |
| Hairpin ΔG | >-2 kcal/mol | <-3 kcal/mol in seed | Structure Predictor |
| Guides per Gene | 4-6 (knockout), 10 (CRISPRa) | <3 (low power) | Coverage Calculator |
| Non-Targeting Controls | 500-1,000 guides | <100 | Manual design |
| Oligo Length | 100-110 bp (total) | >230 bp (array limit) | Spreadsheet |
| NGS Coverage (QC) | 500-1000× per guide | <100× (insufficient) | Vendor QC report |
| Library Dropout | <10% | >20% (redesign needed) | NGS analysis |
| Representation Skew | <3-fold CV | >5-fold (PCR bias) | NGS analysis |