Week 12 lecture
Microarrays
Technique for studying multiple genes simultaneously.
Uses include
Basic approach = reverse Northern
for an excellent animation see http://www.bio.davidson.edu/courses/genomics/chip/chip.html

- Attach clones of the genes you wish to study to a solid support
(usually a glass slide)
- each clone is placed at a precise position
within a grid
- clones can be anything from short oligonucleotides
to full length cDNAs
- each of these clones is called a probe
- (note that there is confusion in the
literature on this point, since in techniques such as Southern blots
we call the labeled nucleic acid the
probe)
- Extract RNA from the organisms you are studying
(or DNA if you are studying DNA polymorphisms) then "label" it by
covalently attaching fluorescent dyes to it.
- This is usually done by making cDNA copies
of the mRNA with reverse transcriptase and either adding fluorescent dyes
to the primers or using deoxyribonucleotide triphosphates that have fluorescent
dyes attached to them.
- Alternatively, RNA can be labeled directly
(this is how we do it in my
lab)
- The labeled nucleic acid is called the "target."
- Usually two different tissues or treatments
are compared
- RNA from one is labeled with a dye that
fluoresces green
- RNA from the other is labeled with a
dye that fluoresces red
- To control for differences in labeling
we usually prepare a green and red version of both samples
- Hybridize with labeled targets:
- place a solution containing equal amounts
of the labeled RNAs (or cDNA) on the array
- incubate at a suitable temperature long
enough for each target to find and anneal to a cloned gene attached to
the grid.
- wash off targets which did not stick (or
stuck to the wrong gene)
- Measure amount of target annealed to each probe
- perform two sequential scans
- first scan the array by irradiating
with a wavelength that excites the green dye and take a digital picture
of the array
- then scan the array by irradiating
with a wavelength that excites the red dye and take a digital picture
of the array
- measure the intensity of each spot = amount
of target that annealed to the gene at that position
- superimpose the red and green pictures
to look for differences in gene expression between the two samples

- Result is an enormous number of spots
of varying colors and intensities
- computers are needed to make sense
of it
- This is a small part of a human DNA chip that Jennifer Lewis
prepared in my lab last year
this is the green scan

this is the red scan

this is the superimposed image

Two basic approaches
1) oligonucleotide chips:Short single-stranded DNA fragments deposited
at designated location

Affymetrix arrays are made in situ by photolithography
- Use light to activate 3’ OH: masks
choose which ones
- Add chosen base with OH protected by photoreactive
group
- Repeat with new mask until have made all the desired oligonucleotides
- Computer controls the whole process
- Can make arrays smaller than a dime with
65536 separate oligonucleotides


Lee Hood's group makes the oligonucleotides first, then deposits
them on the glass slide with a high precision ink-jet printer.
Advantages of oligonucleotide chips
- Large numbers of defined sequences made to
order
- can make diagnostic chips that can detect
every different allele of a particular gene
- can detect genetic disorders and tell
which version a patient has
- Low annealing & wash temperatures: easier manipulations
Disadvantages
- Which sequences to make?
- When dealing with sequences that short must
be careful that all of them have the same melting temperature
- conditions can't be as stringent (=selective)
as with longer sequences
2) DNA libraries ( microarrays)
- Usually spot cDNA libraries (= collections
of recombinant clones prepared by extracting mRNA from a tissue, making a
cDNA copy of each mRNA using reverse transcriptase, then cloning each of these
cDNA copies into a separate vector)
- Advantages
- Don't need to have sequence of each gene
- can prepare arrays by constructing
cDNA libraries from the selected organism
- Can perform hybridizations and washes at higher stringency
- Disadvantages:
- Must prepare and maintain libraries
- Must extract and quantify DNA from each
clone
- Must deposit equal amounts of DNA from
each clone at precise location on a slide
- hard to distinguish members of multi-gene
families
- Spotting techniques
- Ink-jets: technique used by Incyte (Craig
Venter's company)
- Mechanical spotting: most common approach,
developed in Pat Brown's lab at Stanford



Microarrays generate an enormous amount of data
very quickly!
Need computers to make sense of it!
First question: are the data any good?
- This is not a trivial matter, because errors
can creep in at many places!
- Sources of error
- Treatments
- Experimental conditions
- Tissue preparation
2. Targets
- RNA isolation
- use identical amounts of tissue
- identical extraction methods
- minimum number of steps
- normalize [RNA]
- labeling
- measure incorporation of label and
normalize samples to same concentration
- add same amount of label to each hybridization
3. Arrays
- Quality of probes
- DNA must be of same quality
- DNA must be of same concentration so add same amount to each
spot
- Uniformity of spotting
- must deposit same volume in each spot
- spots must be same shape!
- arrays must all be treated and stored in the same way
4. Hybridization and washing
- ensuring hybridization goes to completion:
time needed varies according to transcript abundance
- ensuring adequate washing (especially when
dealing with multigene families: ensuring that have washed off targets that
bound to a close relative instead of the correct probe)

5. Data acquisition
- Image acquisition
- usually by CCD camera
- just like film, their response is
linear within a certain range, but they get saturated by high signals
and they miss rare ones
- creates a TIFF file that is then
analyzed by other software

- Spot and background detection
- software aligns spots when superimposing
the images from the red and green scans
- creates grid used to quantitate
signal from each target

- software must recognize & integrate
irregular spots & subtract background
- must pick areas outside spots for
background
- We used Scanalyze (a program written
by Mike Eisen) to identify & superimpose spots; many others
(e.g. Genepix, Spotfinder) are available

Since there are so many potential sources of
error, developing objective measures for determining the reliability of microarray
data is a major research priority!
So far, there are no absolute measures, but
there are some general rules
- check the quality of your RNA
- contaminants (A260/A280)
- no degradation ( look for crisp bands
of ribosomal RNA on denaturing gels)
- use lots of internal controls
- perform reciprocal hybridizations
- green control targets vs red treatment
targets on one chip
- red control targets vs green treatment
targets on another
- use duplicate or triplicate grids
- look at averages between two or three
spots
- spike the arrays with positive controls
- probes that should be highly abundant
- spike the arrays with negative controls
- probes that should not hybridize to
any of the targets (e.g. chlorophyll synthesis enzymes in humans)
- Be very careful with the data acquisition
and analysis
- checking that the camera settings are
appropriate
- checking that background settings are
appropriate for each spot
- background varies in different parts
of a microarray
developing software that can detect (and correct) sources of error
in microarray experiments is a major research objective!
Normalizing signals from microarrays
raw signals from each spot are not directly comparable
- Programs must first subtract the background
signal due to non-specific hybridization from each spot
- Subtracting negative controls from gene
signals
- negative controls are genes not present
in the target population e.g
plant genes in human mRNA
- Programs must also adjust signal to reflect
amount of signal expected from positive controls (e.g. to avoid saturating
the camera)
- constitutive genes, eg ubiquitin or actin
- spike samples with control from other
species
- place probe spot on grid
- add mRNA to RNA sample before labeling (control for losses
during subsequent processing
- Programs report normalized data: ratio of each gene to its control
level
- Normalized ratios are usually expressed as
logs (reflects the fact that errors are so high that don't trust signals that
aren't at least two-fold higher or lower in treated vs controls)
- A log ratio of 0 indicates a gene whose expression
is the same in both treatments.
Identifying patterns in microarray data
Basic problem: identifying patterns in expression of
n genes from k samples
n points distributed over k dimensions
Four approaches are (currently) used:
- Clustering analysis
- Self-organizing maps
- Principal components analysis
- Neural Networks
Techniques are derived from techniques used for “machine vision,”
voice recognition or bandwidth compression
- Are all ways to cluster points in multidimensional
space
Clustering analysis
cluster analysis is basically constructing
phylogenetic trees
aren’t designed to reflect multiple ways genes may have similar
expression patterns
- do pairwise comparisons of expression
- variation in gene G over N conditions is given
by the following equation

Compare two genes using this equation to generate correlation coefficient
of expression

Eisen et al. examined 8600 human genes in cells grown in the presence
or absence of serum. (Michael B. Eisen*, Paul T. Spellman*, Patrick O. Brown,
and David Botstein* (1998) Cluster analysis and display of genome-wide expression
patterns. Proc. Natl. Aca. Sci. USA Vol. 95, Issue 25, 14863-14868. )
- Genes whose expression changed by a factor
of 3.0 or more in at least 2 timepoints were subjected to cluster analysis.
Green - strong down-regulation at a given timepoint
RED - strong up-regulation at a given timepoint
BLACK - little or no difference between serum-treated and serum starved cells.

Wen et al. studied 112 genes at various stages in spinal cord development
in rats and identified 5 patterns

Self-organizing maps
- find sets of X,Y points that most closely-approximate
the mean for each group
of points.
- SOMs begin by arbitrarily creating a set of
nodes (N) with randomly-assigned values.
- For each iteration a datapoint P is chosen,
and the position of each node is changed to move it closer to P.
- a scoring function ecides whether the
new position is better or worse than the previous
- each node will "come to rest" in the vicinity of the set
of data to which it is closest.


- Self-organizing maps attempt to impose structure
on the data, but allow you to test various models quickly
- results depend upon the initial array
links and information about many analysis programs are posted at http://genome-www5.stanford.edu/MicroArray/SMD/restech.html
Mike Eisen wrote many of the programs that are
commonly used and has posted them at
http://rana.lbl.gov/EisenSoftware.htm
National Human Genome Research Institute (NHGRI)
provides a good overview of microarray image analysis techniques
It also provides some nice images to play with at http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/images.html
Array viewer allows you to identify specific
dots on the array and download the sequence!
many datasets and tools for studying microarrays online are posted at
http://info.med.yale.edu/wmkeck/dna_arrays.htm
TIGR provides many tools for microarray analysis
at http://pga.tigr.org/AnalysisTools.shtml
AFFYMETRIX provides data mining software for
working with microarrays
http://www.affymetrix.com/
Silicon Genetics, Scanalytics and Axon provide
commercial packages for detecting patterns
Servers for online analysis are also available at
http://ep.ebi.ac.uk/EP/
http://bioinfo.cnio.es/dnarray/analysis/

|