How to Perform Next-Generation Sequencing

Next-Generation Sequencing (NGS) has revolutionized genetic research by enabling rapid, high-throughput sequencing of DNA and RNA. This technology has become an indispensable tool in fields ranging from genomics and transcriptomics to clinical diagnostics and personalized medicine. Performing NGS involves a series of carefully orchestrated steps, from sample preparation to data analysis. This article provides a comprehensive guide on how to perform next-generation sequencing effectively.

Understanding Next-Generation Sequencing

Before diving into the practical steps, it is important to understand what NGS entails. Unlike traditional Sanger sequencing, which sequences one DNA fragment at a time, NGS allows millions of fragments to be sequenced simultaneously. This massively parallel approach generates vast amounts of data in a relatively short time and at a lower cost per base.

NGS platforms commonly include Illumina, Ion Torrent, PacBio, and Oxford Nanopore Technologies. Each platform has its unique chemistry and workflow but shares common core principles: library preparation, clonal amplification, sequencing, and data analysis.

Step 1: Experimental Design and Sample Preparation

Define Your Research Question

The first step is to clearly define the purpose of your sequencing project. Are you performing whole-genome sequencing (WGS), whole-exome sequencing (WES), targeted gene panel sequencing, or RNA sequencing (RNA-seq)? The type of study will influence your choice of platform, depth of coverage, and library preparation method.

Sample Collection and Quality Control

Collect high-quality biological samples relevant to your study (blood, tissue biopsies, cultured cells). DNA or RNA integrity is critical; degraded samples will result in poor sequencing results.

Use spectrophotometry (e.g., NanoDrop) and fluorometry (e.g., Qubit) to assess nucleic acid concentration and purity. Agarose gel electrophoresis or Bioanalyzer systems help evaluate nucleic acid integrity.

Nucleic Acid Extraction

Extract DNA or RNA using protocols optimized for your sample type. Commercial kits offer streamlined workflows that minimize contamination and maximize yield. For RNA samples, include DNase treatment to remove residual DNA.

Step 2: Library Preparation

Library preparation converts extracted nucleic acids into a format compatible with the sequencing platform.

Fragmentation

Genomic DNA or cDNA must be fragmented into smaller pieces suitable for sequencing — usually between 200-600 base pairs for short-read platforms like Illumina.

Fragmentation can be done mechanically (sonication or nebulization) or enzymatically using transposases or restriction enzymes.

End Repair and A-Tailing

Fragmented DNA ends are enzymatically repaired to create blunt ends with 5’ phosphate groups. Next, an ‘A’ nucleotide is added to the 3’ ends (A-tailing) to facilitate adapter ligation.

Adapter Ligation

Adapters are short, double-stranded oligonucleotides that contain platform-specific sequences required for clonal amplification and sequencing priming sites.

Ligate adapters to both ends of each DNA fragment. These adapters often include barcodes (indexes) if you plan to multiplex multiple samples in the same run.

Size Selection

Size selection enriches for library fragments within a desired size range to optimize cluster generation and sequencing efficiency. This can be performed using gel electrophoresis or magnetic bead-based methods such as SPRI beads.

Library Amplification and Quantification

Perform PCR amplification using primers complementary to adapter sequences to increase the amount of library material.

Quantify the final library concentration using qPCR or fluorometric assays. Assess library quality via Bioanalyzer or TapeStation systems to verify fragment size distribution.

Step 3: Clonal Amplification

Once the library is prepared, it must be clonally amplified to generate sufficient signal during sequencing. Different platforms use distinct methods:

Illumina: Bridge amplification on a flow cell surface creates clusters of identical DNA molecules.
Ion Torrent: Emulsion PCR amplifies library fragments onto beads.
PacBio/Oxford Nanopore: Single-molecule real-time sequencing eliminates the need for amplification but requires higher input amounts.

Ensure optimal loading concentrations based on your platform’s guidelines to avoid over- or under-clustering which affects data quality.

Step 4: Sequencing Run

Load the amplified library onto the sequencer according to manufacturer instructions. Initiate the run using appropriate software settings specifying read length, number of reads, paired-end or single-end runs, etc.

Modern sequencers automate base calling in real-time but monitoring run metrics is crucial. Parameters like cluster density, phasing/pre-phasing rates (Illumina), and quality scores provide insight into run performance.

Typical NGS runs can last from several hours (targeted panels) up to days (whole genomes).

Step 5: Data Analysis Pipeline

Raw data generated by sequencers initially come as image files or electrical signals depending on platform chemistry. These are converted into sequence reads through base calling algorithms resulting in FASTQ files containing sequence information and quality scores.

Quality Control of Raw Reads

Use tools like FastQC or MultiQC to assess read quality metrics including per-base sequence quality, GC content, adapter contamination, duplications, etc.

Trim adapters and low-quality bases using software such as Trimmomatic or Cutadapt.

Alignment to Reference Genome

Map cleaned reads against a reference genome with aligners like BWA, Bowtie2 (short reads), or Minimap2 (long reads). Proper alignment is critical for downstream variant detection or expression quantification.

Post-Alignment Processing

Mark duplicates using Picard tools. Perform local realignment around indels and base quality score recalibration if necessary (GATK pipeline).

Variant Calling / Quantification

Identify genetic variants (SNPs, indels) with variant callers such as GATK HaplotypeCaller or FreeBayes.

For RNA-seq studies, quantify gene expression levels with tools like featureCounts or Salmon followed by differential expression analysis using DESeq2 or edgeR.

Annotation and Interpretation

Annotate variants with databases like dbSNP, ClinVar for clinical relevance using tools such as ANNOVAR or SnpEff.

Integrate omics data with biological pathways for comprehensive interpretation utilizing software like Ingenuity Pathway Analysis or Cytoscape.

Best Practices and Tips for Successful NGS

Sample Quality: Always start with high-quality samples; garbage in leads to garbage out.
Avoid Contamination: Use clean reagents and workspaces dedicated for pre- and post-PCR steps.
Library QC: Routinely verify library size distribution and concentration.
Multiplexing: Use unique indexes/barcodes to avoid sample misidentification.
Run Controls: Include positive controls and replicates when possible.
Data Backup: Store raw data securely; reanalysis may be needed as methods evolve.
Stay Updated: Follow advances in sequencing chemistries and bioinformatics tools continuously improving results accuracy.

Conclusion

Next-generation sequencing is a powerful methodology that requires careful planning from sample collection through data analysis. By following structured protocols tailored to your research objectives and platform specifications, you can generate high-quality genomic data enabling novel biological insights. While NGS technologies continue evolving rapidly towards longer reads, higher accuracy, and lower costs; mastering these foundational steps remains essential for successful implementation in any laboratory setting.