‹‹ Back to SVS Home

Copy Number Analysis Overview

10.1 Copy Number Analysis Overview

Copy Number Variation

A normal base pair has two copies, one on each chromosome. (A base pair on the X chromosome in men will normally have only one copy.) Even if the two base pairs are different alleles, there are still considered to be two copies.

However, under certain circumstances, and especially in the case of certain diseases, there may sometimes be a base pair, or even an entire chromosome, that will be replicated more than two times, appear just once, or deleted entirely. The number of copies of a base pair is termed “copy number”, and this variation of the copy number is termed “copy number variation” (CNV).

For both microarray and array CGH (aCGH) scans, the more copies there are of a base pair, the higher the total intensity will be, irrespective of which alleles may be present, even if the base pair is a polymorphism. Typically, a lot of processing is needed to transform intensity data to a quantile-normalized log base-2 (log2) ratio of intensities of observations versus a reference population. When the intensities of the observations are the same as the reference population median for a given base pair, the log2 ratio will be equal to zero. Amplifications over the reference standard will be significantly greater than zero, and deletions will be significantly less than zero.

Copy Number Analysis Module (CNAM)

CNAM supports reading microarray and aCGH log2 ratio data from the Affymetrix, Agilent, Illumina, and NimbleGen platforms, as well as processing Affymetrix CEL files to generate log2 ratios, with the object of determining where CNVs occur. Log2 ratio data from other platforms may also be prepared and analyzed by CNAM by first converting it to the Affymetrix CNT text file format (see Affymetrix CNT File Format). Subsequently, association analysis can be performed on the log2 ratios directly or on related covariates over found CNV regions.

An abbreviated workflow for CNV association analysis is as follows:

  1. Import and/or prepare log2 ratio data.
  2. Perform quality assurance on log2 ratios.
  3. Execute CNAM optimal segmenting on the “cleaned” log2 ratios.
  4. Create a new covariate spreadsheet consisting of average log2 ratios over found CNV regions for each sample.
  5. Import a spreadsheet with phenotypic data such as case-control status.
  6. Join the log2 ratio covariate spreadsheet with the spreadsheet containing phenotypic data.
  7. Perform association analysis on a phenotype with the covariates from the joined spreadsheet.