bioinformatics

 
Home

syllabus

general information

homework

lectures

websites

Will Terzaghi's Homepage



Membership

Login

 
 

Week 12 lecture

Microarrays

Technique for studying multiple genes simultaneously.

Uses include

  • Studying expression of many genes in the same tissue at the same time
  • Comparing gene expression between tissues
  • Comparing gene expression between treatments
  • Identifying and studying genetic polymorphisms
  • Comparing responses of different organisms

Basic approach = reverse Northern

for an excellent animation see http://www.bio.davidson.edu/courses/genomics/chip/chip.html

microarray 1:

  • Attach clones of the genes you wish to study to a solid support (usually a glass slide)
    • each clone is placed at a precise position within a grid
    • clones can be anything from short oligonucleotides to full length cDNAs
    • each of these clones is called a probe
      • (note that there is confusion in the literature on this point, since in techniques such as Southern blots we call the labeled nucleic acid the probe)
  • Extract RNA from the organisms you are studying (or DNA if you are studying DNA polymorphisms) then "label" it by covalently attaching fluorescent dyes to it.
    • This is usually done by making cDNA copies of the mRNA with reverse transcriptase and either adding fluorescent dyes to the primers or using deoxyribonucleotide triphosphates that have fluorescent dyes attached to them.
    • Alternatively, RNA can be labeled directly (this is how we do it in my lab)
    • The labeled nucleic acid is called the "target."
  • Usually two different tissues or treatments are compared
    • RNA from one is labeled with a dye that fluoresces green
    • RNA from the other is labeled with a dye that fluoresces red
    • To control for differences in labeling we usually prepare a green and red version of both samples
  • Hybridize with labeled targets:
    • place a solution containing equal amounts of the labeled RNAs (or cDNA) on the array
    • incubate at a suitable temperature long enough for each target to find and anneal to a cloned gene attached to the grid.
    • wash off targets which did not stick (or stuck to the wrong gene)
  • Measure amount of target annealed to each probe
    • perform two sequential scans
      • first scan the array by irradiating with a wavelength that excites the green dye and take a digital picture of the array
      • then scan the array by irradiating with a wavelength that excites the red dye and take a digital picture of the array
    • measure the intensity of each spot = amount of target that annealed to the gene at that position
    • superimpose the red and green pictures to look for differences in gene expression between the two samples

    microarray 2:

    • Result is an enormous number of spots of varying colors and intensities
      • computers are needed to make sense of it
    • This is a small part of a human DNA chip that Jennifer Lewis prepared in my lab last year this is the green scan

lewis 1:

this is the red scan

lewis 2:

this is the superimposed image

lewis 3:

Two basic approaches

1) oligonucleotide chips:Short single-stranded DNA fragments deposited at designated location

Affymetrix 1:

Affymetrix arrays are made in situ by photolithography

  • Use light to activate 3’ OH: masks choose which ones
  • Add chosen base with OH protected by photoreactive group
  • Repeat with new mask until have made all the desired oligonucleotides
  • Computer controls the whole process
  • Can make arrays smaller than a dime with 65536 separate oligonucleotides

Affymetrix 2:

Affymetrix 3:

Lee Hood's group makes the oligonucleotides first, then deposits them on the glass slide with a high precision ink-jet printer.

Advantages of oligonucleotide chips

  • Large numbers of defined sequences made to order
    • can make diagnostic chips that can detect every different allele of a particular gene
      • can detect genetic disorders and tell which version a patient has
    • Low annealing & wash temperatures: easier manipulations

Disadvantages

  • Which sequences to make?
  • When dealing with sequences that short must be careful that all of them have the same melting temperature
    • conditions can't be as stringent (=selective) as with longer sequences

2) DNA libraries ( microarrays)

  • Usually spot cDNA libraries (= collections of recombinant clones prepared by extracting mRNA from a tissue, making a cDNA copy of each mRNA using reverse transcriptase, then cloning each of these cDNA copies into a separate vector)
  • Advantages
    • Don't need to have sequence of each gene
      • can prepare arrays by constructing cDNA libraries from the selected organism
    • Can perform hybridizations and washes at higher stringency
  • Disadvantages:
    • Must prepare and maintain libraries
    • Must extract and quantify DNA from each clone
    • Must deposit equal amounts of DNA from each clone at precise location on a slide
    • hard to distinguish members of multi-gene families
  • Spotting techniques
    • Ink-jets: technique used by Incyte (Craig Venter's company)
    • Mechanical spotting: most common approach, developed in Pat Brown's lab at Stanford

Brown 1:

Brown 2:

Brown 3:

Microarrays generate an enormous amount of data very quickly!

Need computers to make sense of it!

First question: are the data any good?

  • This is not a trivial matter, because errors can creep in at many places!
  • Sources of error
    1. Treatments
    • Experimental conditions
    • Tissue preparation
2. Targets
  • RNA isolation
    • use identical amounts of tissue
    • identical extraction methods
    • minimum number of steps
    • normalize [RNA]
  • labeling
    • measure incorporation of label and normalize samples to same concentration
    • add same amount of label to each hybridization

3. Arrays

  • Quality of probes
    • DNA must be of same quality
    • DNA must be of same concentration so add same amount to each spot
  • Uniformity of spotting
    • must deposit same volume in each spot
    • spots must be same shape!
  • arrays must all be treated and stored in the same way

4. Hybridization and washing

  • ensuring hybridization goes to completion: time needed varies according to transcript abundance
  • ensuring adequate washing (especially when dealing with multigene families: ensuring that have washed off targets that bound to a close relative instead of the correct probe)

hyb time:

5. Data acquisition

  • Image acquisition
    • usually by CCD camera
      • just like film, their response is linear within a certain range, but they get saturated by high signals and they miss rare ones
      • creates a TIFF file that is then analyzed by other software

    CCD:

  • Spot and background detection
    • software aligns spots when superimposing the images from the red and green scans
      • creates grid used to quantitate signal from each target

      grid:

      • software must recognize & integrate irregular spots & subtract background
      • must pick areas outside spots for background
        • We used Scanalyze (a program written by Mike Eisen) to identify & superimpose spots; many others (e.g. Genepix, Spotfinder) are available

grid2:

Since there are so many potential sources of error, developing objective measures for determining the reliability of microarray data is a major research priority!

So far, there are no absolute measures, but there are some general rules

  • check the quality of your RNA
    • contaminants (A260/A280)
    • no degradation ( look for crisp bands of ribosomal RNA on denaturing gels)
  • use lots of internal controls
    • perform reciprocal hybridizations
      • green control targets vs red treatment targets on one chip
      • red control targets vs green treatment targets on another
    • use duplicate or triplicate grids
      • look at averages between two or three spots
    • spike the arrays with positive controls
      • probes that should be highly abundant
    • spike the arrays with negative controls
      • probes that should not hybridize to any of the targets (e.g. chlorophyll synthesis enzymes in humans)
  • Be very careful with the data acquisition and analysis
    • checking that the camera settings are appropriate
    • checking that background settings are appropriate for each spot
      • background varies in different parts of a microarray

developing software that can detect (and correct) sources of error in microarray experiments is a major research objective!

Normalizing signals from microarrays

raw signals from each spot are not directly comparable

  • Programs must first subtract the background signal due to non-specific hybridization from each spot
    • Subtracting negative controls from gene signals
      • negative controls are genes not present in the target population e.g plant genes in human mRNA
  • Programs must also adjust signal to reflect amount of signal expected from positive controls (e.g. to avoid saturating the camera)
    • constitutive genes, eg ubiquitin or actin
    • spike samples with control from other species
      • place probe spot on grid
      • add mRNA to RNA sample before labeling (control for losses during subsequent processing
  • Programs report normalized data: ratio of each gene to its control level
  • Normalized ratios are usually expressed as logs (reflects the fact that errors are so high that don't trust signals that aren't at least two-fold higher or lower in treated vs controls)
  • A log ratio of 0 indicates a gene whose expression is the same in both treatments.

Identifying patterns in microarray data

Basic problem: identifying patterns in expression of n genes from k samples

n points distributed over k dimensions

Four approaches are (currently) used:

  1. Clustering analysis
  2. Self-organizing maps
  3. Principal components analysis
  4. Neural Networks

Techniques are derived from techniques used for “machine vision,” voice recognition or bandwidth compression

  • Are all ways to cluster points in multidimensional space

Clustering analysis

cluster analysis is basically constructing phylogenetic trees
aren’t designed to reflect multiple ways genes may have similar expression patterns

  • do pairwise comparisons of expression
  • variation in gene G over N conditions is given by the following equation

Goffset:

Compare two genes using this equation to generate correlation coefficient of expression

correlation:

Eisen et al. examined 8600 human genes in cells grown in the presence or absence of serum. (Michael B. Eisen*, Paul T. Spellman*, Patrick O. Brown, and David Botstein* (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Aca. Sci. USA Vol. 95, Issue 25, 14863-14868. )

  • Genes whose expression changed by a factor of 3.0 or more in at least 2 timepoints were subjected to cluster analysis.
    Green - strong down-regulation at a given timepoint
    RED - strong up-regulation at a given timepoint
    BLACK - little or no difference between serum-treated and serum starved cells.

EISEN:

Wen et al. studied 112 genes at various stages in spinal cord development in rats and identified 5 patterns

WEN:

Self-organizing maps

  • find sets of X,Y points that most closely-approximate the mean for each group
    of points.
  • SOMs begin by arbitrarily creating a set of nodes (N) with randomly-assigned values.
  • For each iteration a datapoint P is chosen, and the position of each node is changed to move it closer to P.
    • a scoring function ecides whether the new position is better or worse than the previous
  • each node will "come to rest" in the vicinity of the set of data to which it is closest.

SOM:

SOM2:

  • Self-organizing maps attempt to impose structure on the data, but allow you to test various models quickly
  • results depend upon the initial array

links and information about many analysis programs are posted at http://genome-www5.stanford.edu/MicroArray/SMD/restech.html

Mike Eisen wrote many of the programs that are commonly used and has posted them at
http://rana.lbl.gov/EisenSoftware.htm

National Human Genome Research Institute (NHGRI) provides a good overview of microarray image analysis techniques
It also provides some nice images to play with at http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/images.html

Array viewer allows you to identify specific dots on the array and download the sequence!

many datasets and tools for studying microarrays online are posted at http://info.med.yale.edu/wmkeck/dna_arrays.htm

TIGR provides many tools for microarray analysis at http://pga.tigr.org/AnalysisTools.shtml

AFFYMETRIX provides data mining software for working with microarrays
http://www.affymetrix.com/

Silicon Genetics, Scanalytics and Axon provide commercial packages for detecting patterns

Servers for online analysis are also available at
http://ep.ebi.ac.uk/EP/
http://bioinfo.cnio.es/dnarray/analysis/




Last update: Tuesday, October 14, 2003 at 6:40:10 PM.