✂️ CRISPR sgRNA Library Design for Oligo Pool Synthesis

Advanced⏱ 60-120 minutesSpCas9/SaCas9/Cas12a

Complete workflow for designing genome-wide (70K-200K guides) or targeted (500-10K guides) sgRNA libraries for CRISPR knockout, activation (CRISPRa), or interference (CRISPRi) screens. Includes coverage calculation, sequence QC, scaffold assembly, and synthesis specifications for Twist Bioscience, CustomArray, and Agilent oligo pools.(Updated Q4 2025)

Library Scales
500 - 200,000 sgRNAs
Oligo Length
60-110 bp (typical)
Synthesis Cost
$0.03-0.12/oligo (2025)
QC Coverage
500-1000× per guide

What You'll Learn

  • Calculate required sgRNA count for genome-wide or targeted screens
  • Validate sgRNA sequences for GC content, secondary structures, and quality issues
  • Assess library complexity and representation before synthesis
  • Prepare final oligo pool for array synthesis with proper formatting

Prerequisites

Required Knowledge:

  • Basic CRISPR/Cas9 principles and experimental design
  • sgRNA design software (e.g., Benchling, GPP Web Portal, CRISPRscan)
  • Target gene list or genomic regions for screening
2025 Note: This guide focuses on established SpCas9/dCas9 systems (knockout, CRISPRa/i). For emerging approaches like base editing (CBE/ABE) or prime editing (PE) libraries, similar QC principles apply but with modified design criteria. These advanced systems are covered in separate tutorials.

Required Files:

  • sgRNA sequences (20-23 bp, typically 20 bp for SpCas9)
  • Target gene IDs and genomic coordinates
  • Scaffold sequence (constant region, e.g., tracrRNA)

Tools Needed:

  • sgRNA design tool (not provided by OligoPool.com - use Benchling/GPP/etc.)
  • OligoPool.com tools: Coverage Calculator, GC Analyzer, Batch QC
  • Spreadsheet software (Excel/Google Sheets) for library compilation

CRISPR Library Types & Cas9 Systems

Cas9 System Specifications

SystemPAMGuide LengthApplication
SpCas95'-NGG-3'20 bp (17-24 bp)Standard knockout, most common
SaCas95'-NNGRRT-3'21 bpAAV delivery (smaller size)
Cas12a (Cpf1)5'-TTTV-3'23-25 bpAT-rich targets, multiplexing
dCas9-VP645'-NGG-3'20 bpCRISPRa (activation) libraries

* Use GC Content Analyzer to verify guide compatibility with chosen Cas9 system

🧬 Genome-Wide Knockout

Target all protein-coding genes (18,000-20,000 genes). Phenotype discovery.

  • Guides/gene: 4-6 (knockout), 10 (CRISPRa/i)
  • Total sgRNAs: 72,000-200,000
  • Controls: 1,000 non-targeting + essential genes
  • Example: Human GeCKO v2 (123K guides)

🎯 Targeted/Pathway Libraries

Gene families, pathways, or drug targets (100-1,000 genes). Focused validation.

  • Guides/gene: 5-10 (higher redundancy)
  • Total sgRNAs: 500-10,000
  • Use case: Kinome, GPCRs, epigenetics
  • Advantage: Deep coverage, affordable

📍 Tiling/Regulatory

Dense sgRNA coverage of non-coding regions. Enhancer/promoter mapping.

  • Density: 1 guide every 5-10 bp
  • Size: Depends on locus (1-100 kb typical)
  • Use case: CRISPRi of enhancers/promoters
  • Tool: Use Coverage Calculator for tiling design

Design Workflow

1

Calculate Library Coverage

Determine how many sgRNAs you need using the Coverage Calculator.

📋 Instructions:

  1. Go to Coverage Calculator
  2. Select "CRISPR Library" mode
  3. Enter number of target genes (e.g., 18,000 for genome-wide human)
  4. Set guides per gene (typically 4-6 for knockout, 3-10 for activation)
  5. Include non-targeting controls (recommend 500-1,000 guides)
  6. Review total oligo count and estimated cost

Typical Library Sizes:

Library TypeGenessgRNA/GeneTotal sgRNAs
Human Genome-Wide~19,0004-676,000-114,000
Mouse Genome-Wide~22,0004-688,000-132,000
Targeted (Kinases)~500105,000
CRISPRa/i~18,00010180,000

* Add 1,000 non-targeting controls to any library

✅ Design Guidelines:

  • Redundancy: 4-6 sgRNAs/gene for robust phenotypes
  • Controls: Include non-targeting and essential gene controls
  • Power: Ensure sufficient coverage for statistical analysis (1000x representation)
2

Design sgRNA Sequences

Use validated design algorithms to generate guides with high predicted activity and minimal off-targets.

⚠️ Design Software (External):

OligoPool.com provides QC tools only. Use these for sgRNA design:

Benchling CRISPR
  • Algorithm: Doench 2016 (Rule Set 2)
  • Score range: 0-100
  • Free, web-based
GPP sgRNA Designer
  • Broad Institute, validated libraries
  • Human/mouse genome-wide
  • Includes Azimuth scores
CRISPick (Broad)
  • ML-based, latest algorithm
  • Multi-species support
  • Off-target scoring
CRISPRscan
  • Moreno-Mateos 2015 algorithm
  • Command-line, batch processing
  • Good for zebrafish/mouse

sgRNA Design Criteria (Quantitative):

Length:20 bp (SpCas9), 21 bp (SaCas9), 23-25 bp (Cas12a)
GC Content:40-60% optimal, reject <25% or >75%
PAM Location:NGG at 3' end (not included in oligo synthesis)
Target Position:5-50% of CDS length for knockout (early exons preferred)
Activity Score:>0.5 (Doench), >50 (Azimuth) - keep top 6 guides/gene
Off-Target CFD:>0.20 aggregate = high risk. MIT score >50 for top off-target
Off-Target Sites:0 perfect matches, <3 sites with 1-2 mismatches

📈 Activity Score Interpretation:

Doench 2016 (Rule Set 2): Score 0-1. Median ~0.45. Use >0.5 for libraries.
Azimuth: Score 0-100. Mean ~50. Use >50-60 for high-confidence hits.
CRISPRscan: Score 0-100. Validated in vivo. Use >50.
For genome-wide screens, prioritize top 4-6 guides per gene by activity score. For targeted libraries, you can afford 8-10 guides including lower-scoring alternatives.
Example Design Output (FASTA with metadata):
>BRCA1_g1|chr17:43044295|+|Doench:0.78|MIT:85|off-targets:0
GGCTATCCTCTCAGAGTGAC
>BRCA1_g2|chr17:43045802|-|Doench:0.82|MIT:92|off-targets:1
CACCAAAGTGTAGGCTCAGG
>TP53_g1|chr17:7674230|+|Doench:0.85|MIT:88|off-targets:0
CGTGCAAGTCACAGACTTGG

Include metadata in FASTA headers for traceability. Filter by Doench >0.5 and MIT >50 before QC.

3

QC sgRNA Sequences

Screen all sgRNA sequences for quality issues before adding scaffolds.

A. GC Content Analysis

  1. Navigate to GC Content Analyzer (batch mode)
  2. Upload FASTA file with all sgRNA spacer sequences (20 bp only)
  3. Flag sequences: <25% GC (poor activity) or >75% GC (synthesis issues)
  4. Export flagged list, design replacements for critical targets

Impact on Efficiency:

GC 40-60%: optimal activity. GC <25%: significantly reduced cutting efficiency (30-50% in published studies). GC >75%: increased synthesis failure risk (15-25% reported by array vendors). See GC Content Tutorial.

B. Batch Sequence QC

  1. Go to Batch Sequence QC
  2. Upload FASTA with all sgRNA spacers
  3. Flag homopolymers: GGGG (high dropout risk), TTTT (U6 termination), AAAA (synthesis errors)
  4. Check for palindromes and tandem repeats (synthesis artifacts)

Rejection Criteria:

  • TTTT or longer: Mandatory removal (pol-III termination)
  • GGGG runs: High library dropout risk (≥40-50% observed in NGS QC)
  • Low complexity (<1.5 Shannon entropy): Replace if critical gene

C. Secondary Structure Check

  1. Use Secondary Structure Predictor for flagged sequences
  2. Upload sgRNA spacer + scaffold (full 96 bp construct)
  3. Check ΔG < -3 kcal/mol for stable hairpins in spacer region
  4. Prioritize replacement if hairpin blocks PAM-proximal region (positions 1-12)

Hairpin Impact:

Stable hairpins (ΔG < -3 kcal/mol) in seed region (bp 13-20): significant activity reduction (40-60% reported). See Secondary Structure Tutorial.

🚫 Sequences to Remove:

  • 1. Poly-T (TTTT or longer) → causes transcription termination
  • 2. Extreme GC (<20% or >80%)
  • 3. Perfect palindromes → may form hairpins
  • 4. Sequences matching your cloning vector

💡 Replacement Strategy:

For each removed guide, design an alternative targeting nearby region (±50 bp). Maintain the same number of guides per gene for balanced representation.

4

Add Scaffold & Cloning Adapters

Assemble full-length oligos with system-specific scaffolds and vector-compatible adapters.

Oligo Structure by Cas9 System:

SpCas9 (lentiCRISPR v2 / pLKO):

CACCG[spacer-20bp]GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTT
Total: 5 + 20 + 76 + 4 = 105 bp

SaCas9 (pX601):

CACCG[spacer-21bp]GTTTAAGAGCTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC
Total: 5 + 21 + 86 = 112 bp

CRISPRa (SAM MS2 system):

CACCG[spacer-20bp]GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGC
Total: ~97 bp (MS2 aptamer adds 41 bp to scaffold)
Note: Scaffold sequences are system-specific. Verify with your plasmid map before synthesis. The 5' adapter (CACCG) creates BsmBI-compatible overhang for Golden Gate cloning.

📋 Oligo Assembly in Excel/Python:

  1. Create spreadsheet with columns: Gene, sgRNA_ID, spacer_sequence
  2. Add columns for: 5'-adapter, scaffold, 3'-adapter (constant for all)
  3. Concatenate: =CONCATENATE(adapter_5, spacer, scaffold, adapter_3)
  4. Add barcode column (optional, for deconvolution)
  5. Export as CSV or FASTA for synthesis order

⚠️ Important Considerations:

  • Strand: Order sense strand (same as U6 transcription)
  • Format: Check vendor requirements (single vs. double-stranded)
  • Length limit: Most arrays support up to 230 bp (you'll be ~100-110 bp)
  • Barcoding: Add unique barcode if multiplexing libraries
5

Final QC & Synthesis Order

Perform final quality checks on assembled oligos before placing synthesis order.

Final Checklist:

All oligos same length (or within 5 bp if variable)
No duplicate sequences (check with Excel or command-line tools)
Scaffold sequence is correct for your Cas9 vector
5' and 3' adapters match your cloning strategy
Non-targeting controls included (500-1,000 guides)
Total library size within budget and experimental capacity

✅ Synthesis Vendor Options:

Twist Bioscience
  • Scale: 1K-300K oligos
  • Length: 60-300 bp
  • Cost: $0.04-0.08/oligo (scale-dependent)
  • NGS QC: Included (500×)
  • Lead time: 2-3 weeks
Agilent SurePrint
  • Scale: Up to 1M features
  • Length: 60-200 bp
  • Cost: $0.03-0.06/oligo (100K+ pools)
  • QC: Optional NGS add-on
  • Lead time: 3-4 weeks
CustomArray
  • Scale: 12K-2M oligos
  • Length: 40-350 bp
  • Cost: $0.08-0.12/oligo (mid-scale)
  • High uniformity (<3-fold CV)
  • Lead time: 2-3 weeks

* Pricing as of Q4 2025, varies by library size and specifications. Compare vendors: Oligo Pool Synthesis Vendors

🧬 NGS QC Metrics (Post-Synthesis):

Read Depth: 500-1000× per sgRNA (50M-100M reads for 100K library)
Representation: ≥90% of designed sgRNAs detected (≥50 reads)
Dropout Rate: <10% (missing guides with <10 reads)
Skew (CV): <3-fold between 10th-90th percentile
Uniformity: Gini coefficient <0.25 (ideal <0.15)
Poor metrics (>20% dropout, >5-fold skew): Redesign failed sequences, re-synthesize. See QC Tutorial.

📝 File Format for Synthesis Order:

Option 1: FASTA Format (Recommended)

>sgRNA_001|BRCA1|guide1|Doench:0.78
CACCGGGCTATCCTCTCAGAGTGACGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTT
>sgRNA_002|BRCA1|guide2|Doench:0.82
CACCGCACCAAAGTGTAGGCTCAGGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTT

Header format: >unique_ID|gene_name|guide_number|metadata (optional)

Option 2: CSV/TSV Format

oligo_id,gene,guide_number,sequence,length
sgRNA_001,BRCA1,1,CACCGGGCTATCCTCTCAGAGTGAC...,105
sgRNA_002,BRCA1,2,CACCGCACCAAAGTGTAGGCTCAGG...,105

Required columns: oligo_id (unique), sequence (full-length oligo). Optional: gene, metadata.

Vendor-specific requirements:
  • Twist: FASTA or CSV, max 300 bp, no special characters in IDs
  • Agilent: Tab-delimited text file, oligo name + sequence columns
  • CustomArray: Excel template provided, FASTA also accepted

Post-Synthesis Steps (Brief):

  1. Resuspend oligo pool in TE buffer
  2. PCR amplify with minimal cycles (6-8 cycles)
  3. Clone into lentiviral vector (Gibson assembly or restriction cloning)
  4. Transform, grow overnight, maxi-prep
  5. NGS verification of library representation
  6. Package lentivirus and transduce target cells

Detailed cloning protocols are beyond the scope of this calculator guide.

Post-Screen Data Analysis

After completing your CRISPR screen, use statistical algorithms to identify significant hits from NGS count data.

📊 Analysis Tools

  • MAGeCK: Most popular, RRA algorithm, gene-level FDR
  • BAGEL: Bayesian approach, best for essentiality screens
  • drugZ: Optimized for drug resistance/synthetic lethality
  • RIGER: Early method, rank-based, still widely used

✅ Hit Calling Thresholds

  • FDR: <0.05 (stringent), <0.25 (permissive for validation)
  • Log2 Fold-Change: |Δ| >1.5 (depletion/enrichment screens)
  • Guide concordance: ≥3/6 guides significant per gene
  • Effect size: Consider biological relevance, not just p-value

Statistical Considerations:

Read count normalization: Use median/total-count normalization or DESeq2 size factors. Avoid TMM if high dropout.
Replicate requirements: ≥2 biological replicates for MAGeCK. 3-4 replicates for robust hits.
Control guides: Use non-targeting guide distribution to calibrate FDR. Should show Δ12FC ≈ 0.
Essential gene benchmarking: Check recall of known essential genes (e.g., RPL/RPS genes) — should be ≥80% at FDR <0.05.

Best Practices

🎯

Design Redundancy

Always include 4-6 guides per gene. Statistical power for hit calling requires multiple independent guides showing consistent phenotypes.

🧪

Include Controls

1,000 non-targeting guides (negative control) + essential gene guides (positive control, e.g., POLR2A, RPA3) for quality assessment.

📊

Plan Sequencing Depth

Aim for 500-1000x coverage per sgRNA in NGS QC. For 100K library, that's 50-100 million reads. Budget for deep sequencing.

🔬

Validate Library Quality

ALWAYS perform NGS on final plasmid library before virus production. Check for representation, dropouts, and skew. Aim for <10% dropout.

Troubleshooting Library Issues

Problem: High dropout rate (>20% of designed guides)

Diagnose by dropout pattern:

  • 20-40% dropout, random: Synthesis failure. Check for poly-T (TTTT+), extreme GC (<25%, >75%), or secondary structures. Redesign flagged sequences.
  • 20-30% dropout, sequence-specific: PCR amplification bias. Reduce cycles from 12→6-8, use KAPA HiFi or Q5 polymerase.
  • >40% dropout: Transformation bottleneck. Need ≥10× library complexity as CFUs (e.g., 100K library = 1M colonies minimum). Pool 5-10 transformations.
  • Dropout enriched in high-GC guides: PCR bias during amplification. Switch to long-extension polymerase (1 min/kb).

Problem: Skewed representation (CV >5-fold)

Optimize PCR amplification:

  • Reduce cycles: Start with 6 cycles, increase only if yield <1 µg. Each extra cycle progressively increases skew (typically ~10-15% additional bias observed).
  • Template amount: Use 10-100 ng input DNA. Too low (<1 ng) = stochastic sampling. Too high (>500 ng) = incomplete denaturation.
  • Polymerase choice: KAPA HiFi (lowest bias), Q5 (good), Phusion (moderate bias). Avoid Taq (extreme GC bias).
  • Alternative: Emulsion PCR or linear amplification (IVT) for highly uniform libraries.

Problem: Low cloning efficiency (<105 CFU/µg)

Systematic optimization:

  • Vector prep: High-quality maxi-prep, A260/280 = 1.8-1.9. Digest overnight, gel-purify, dephosphorylate (CIP or rSAP).
  • Insert:vector ratio: Start 1:5 molar ratio, optimize to 1:10 or 1:20 if low efficiency.
  • Competent cells: Use ≥109 CFU/µg cells (Endura, Stbl3, or ElectroMAX). Test efficiency with control plasmid before library.
  • Transformation scale: For 100K library, perform 10-20 transformations of 10 µL cells each. Pool after recovery.
  • Recovery: 1 hour at 37°C in SOC medium before plating. Do NOT exceed 1.5 hours (some guides may replicate faster).

Problem: Sequences present in input pool but lost after cloning

Likely toxic or unstable in E. coli:

  • Cryptic promoters: Some sgRNA sequences may form bacterial promoters. Use recombination-deficient strains (Stbl3).
  • Palindromes: Inverted repeats can trigger recombination. These should have been flagged in Step 3 QC.
  • Solution: Clone in low-copy plasmid (pSC101 origin) or switch to yeast-based library construction (more stable but complex).

Workflow Summary

1️⃣

Coverage Calculation

2️⃣

sgRNA Design

3️⃣

Sequence QC

4️⃣

Add Scaffolds

5️⃣

Order & Validate

⚡ Quick Reference: CRISPR Library Design Parameters

For experienced users — critical thresholds at a glance.

ParameterOptimal RangeReject IfTool
sgRNA Length20 bp (SpCas9)<17 bp or >24 bpDesign software
GC Content40-60%<25% or >75%GC Analyzer
Activity ScoreDoench >0.5, Azimuth >50Doench <0.3Benchling, GPP
Off-Target CFDMIT score >50CFD aggregate >0.20CRISPick
Poly-T Runs0 (absent)≥TTTT (U6 termination)Batch QC
Hairpin ΔG>-2 kcal/mol<-3 kcal/mol in seedStructure Predictor
Guides per Gene4-6 (knockout), 10 (CRISPRa)<3 (low power)Coverage Calculator
Non-Targeting Controls500-1,000 guides<100Manual design
Oligo Length100-110 bp (total)>230 bp (array limit)Spreadsheet
NGS Coverage (QC)500-1000× per guide<100× (insufficient)Vendor QC report
Library Dropout<10%>20% (redesign needed)NGS analysis
Representation Skew<3-fold CV>5-fold (PCR bias)NGS analysis