Use case: Systems biology of Influenza

Reproducing results of a publication. Systems biology of vaccination for seasonal influenza in humans. Nakaya et al (2011).
The paper uses system biology approach to study immune response to vaccination against influenza in three seasons.
Two cohorts vaccinated with TIV (2007 & 2008). And one with LAIV (2008).


Without ImmuneSpaceR

Fetching data from GEO

The data was made available on GEO after publication. The SuperSeries is referenced in the paper.

library(GEOquery)
gse <- getGEO("GSE29619")

The returned object is a list of 3 ExpressionSets. One per SubSeries/cohort.

gse
## $`GSE29619-GPL13158_series_matrix.txt.gz`
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54715 features, 163 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM733843 GSM733844 ... GSM734021 (163 total)
##   varLabels: title geo_accession ... data_row_count (43 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 1007_PM_s_at 1053_PM_at ... AFFX-TrpnX-M_at (54715
##     total)
##   fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16
##     total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL13158 
## 
## $`GSE29619-GPL3921_series_matrix.txt.gz`
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 22277 features, 84 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM734022 GSM734023 ... GSM734105 (84 total)
##   varLabels: title geo_accession ... data_row_count (38 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (22277
##     total)
##   fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16
##     total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL3921 
## 
## $`GSE29619-GPL570_series_matrix.txt.gz`
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54675 features, 27 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM733816 GSM733817 ... GSM733842 (27 total)
##   varLabels: title geo_accession ... data_row_count (43 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (54675
##     total)
##   fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16
##     total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL570

The seasons cannot be identified without looking in the objects.

names(gse)
## [1] "GSE29619-GPL13158_series_matrix.txt.gz"
## [2] "GSE29619-GPL3921_series_matrix.txt.gz" 
## [3] "GSE29619-GPL570_series_matrix.txt.gz"

phenoData

Let’s look at the metadata available on GEO

library(Biobase)
es_LAIV <- gse[[1]]
head(pData(es_LAIV), 3)
##                                                   title geo_accession
## GSM733843 2008 LAIV subject ID 1 at D0 post-vaccination     GSM733843
## GSM733844 2008 LAIV subject ID 1 at D3 post-vaccination     GSM733844
## GSM733845 2008 LAIV subject ID 1 at D7 post-vaccination     GSM733845
##                          status submission_date last_update_date type
## GSM733843 Public on Jul 10 2011     May 29 2011      Jul 10 2011  RNA
## GSM733844 Public on Jul 10 2011     May 29 2011      Jul 10 2011  RNA
## GSM733845 Public on Jul 10 2011     May 29 2011      Jul 10 2011  RNA
##           channel_count
## GSM733843             1
## GSM733844             1
## GSM733845             1
##                                                                                                                                              source_name_ch1
## GSM733843 Peripheral blood mononuclear cells of subjects vaccinated with Influenza vaccine at day 0 (before vaccination), and days 3 and 7 post-vaccination.
## GSM733844 Peripheral blood mononuclear cells of subjects vaccinated with Influenza vaccine at day 0 (before vaccination), and days 3 and 7 post-vaccination.
## GSM733845 Peripheral blood mononuclear cells of subjects vaccinated with Influenza vaccine at day 0 (before vaccination), and days 3 and 7 post-vaccination.
##           organism_ch1 characteristics_ch1 characteristics_ch1.1
## GSM733843 Homo sapiens     cell type: PBMC         subject id: 1
## GSM733844 Homo sapiens     cell type: PBMC         subject id: 1
## GSM733845 Homo sapiens     cell type: PBMC         subject id: 1
##           characteristics_ch1.2               characteristics_ch1.3
## GSM733843        time point: D0 vaccine: LAIV (FluMist,  MedImmune)
## GSM733844        time point: D3 vaccine: LAIV (FluMist,  MedImmune)
## GSM733845        time point: D7 vaccine: LAIV (FluMist,  MedImmune)
##                                          characteristics_ch1.4
## GSM733843 hai titer (day 0) - a/south dakota/6/2007 (h1n1): 40
## GSM733844 hai titer (day 0) - a/south dakota/6/2007 (h1n1): 40
## GSM733845 hai titer (day 0) - a/south dakota/6/2007 (h1n1): 40
##                                           characteristics_ch1.5
## GSM733843 hai titer (day 28) - a/south dakota/6/2007 (h1n1): 40
## GSM733844 hai titer (day 28) - a/south dakota/6/2007 (h1n1): 40
## GSM733845 hai titer (day 28) - a/south dakota/6/2007 (h1n1): 40
##                                                    characteristics_ch1.6
## GSM733843 hai titer (day 0) - a/uruguay/716/2007  nymc x-175c (h3n2): 40
## GSM733844 hai titer (day 0) - a/uruguay/716/2007  nymc x-175c (h3n2): 40
## GSM733845 hai titer (day 0) - a/uruguay/716/2007  nymc x-175c (h3n2): 40
##                                                     characteristics_ch1.7
## GSM733843 hai titer (day 28) - a/uruguay/716/2007  nymc x-175c (h3n2): 40
## GSM733844 hai titer (day 28) - a/uruguay/716/2007  nymc x-175c (h3n2): 40
## GSM733845 hai titer (day 28) - a/uruguay/716/2007  nymc x-175c (h3n2): 40
##                              characteristics_ch1.8
## GSM733843 hai titer (day 0) - b/florida 4/2006: 20
## GSM733844 hai titer (day 0) - b/florida 4/2006: 20
## GSM733845 hai titer (day 0) - b/florida 4/2006: 20
##                               characteristics_ch1.9 molecule_ch1
## GSM733843 hai titer (day 28) - b/florida 4/2006: 40    total RNA
## GSM733844 hai titer (day 28) - b/florida 4/2006: 40    total RNA
## GSM733845 hai titer (day 28) - b/florida 4/2006: 40    total RNA
##                                                                                                                                                                                                                                                                                               extract_protocol_ch1
## GSM733843 Following PMBC isolation from CPT, 2 x 10^6 cells were lysed in 1 ml of TRIzol and stored at -80C (Cat# 15596-026; Invitrogen Life Technologies). After all time points were collected for a subject, the samples were thawed, and the RNA isolation proceeded according to the manufacturer’s protocol.
## GSM733844 Following PMBC isolation from CPT, 2 x 10^6 cells were lysed in 1 ml of TRIzol and stored at -80C (Cat# 15596-026; Invitrogen Life Technologies). After all time points were collected for a subject, the samples were thawed, and the RNA isolation proceeded according to the manufacturer’s protocol.
## GSM733845 Following PMBC isolation from CPT, 2 x 10^6 cells were lysed in 1 ml of TRIzol and stored at -80C (Cat# 15596-026; Invitrogen Life Technologies). After all time points were collected for a subject, the samples were thawed, and the RNA isolation proceeded according to the manufacturer’s protocol.
##           label_ch1
## GSM733843    biotin
## GSM733844    biotin
## GSM733845    biotin
##                                                                                                                                                                                                                                                                                                                                                                          label_protocol_ch1
## GSM733843 Total RNA sample quality was evaluated by spectrophotometer to determine quantity, protein contamination and organic solvent contamination, and an Agilent 2100 Bioanalyzer was used to check for RNA degradation. Two-round in vitro transcription amplification and labeling was performed starting with 50 ng intact, total RNA per sample, following the Affymetrix protocol.
## GSM733844 Total RNA sample quality was evaluated by spectrophotometer to determine quantity, protein contamination and organic solvent contamination, and an Agilent 2100 Bioanalyzer was used to check for RNA degradation. Two-round in vitro transcription amplification and labeling was performed starting with 50 ng intact, total RNA per sample, following the Affymetrix protocol.
## GSM733845 Total RNA sample quality was evaluated by spectrophotometer to determine quantity, protein contamination and organic solvent contamination, and an Agilent 2100 Bioanalyzer was used to check for RNA degradation. Two-round in vitro transcription amplification and labeling was performed starting with 50 ng intact, total RNA per sample, following the Affymetrix protocol.
##           taxid_ch1
## GSM733843      9606
## GSM733844      9606
## GSM733845      9606
##                                                                                                                                                                                                                                                                             hyb_protocol
## GSM733843 Hybridization was performed on Human U133 Plus 2.0 Arrays (using GeneTitan platform, Affymetrix, or individual cartridges) for 16 h at 45 oC, and 60 r.p.m. in a Hybridization Oven 640 (Affymetrix), slides were washed and stained with a Fluidics Station 450 (Affymetrix).
## GSM733844 Hybridization was performed on Human U133 Plus 2.0 Arrays (using GeneTitan platform, Affymetrix, or individual cartridges) for 16 h at 45 oC, and 60 r.p.m. in a Hybridization Oven 640 (Affymetrix), slides were washed and stained with a Fluidics Station 450 (Affymetrix).
## GSM733845 Hybridization was performed on Human U133 Plus 2.0 Arrays (using GeneTitan platform, Affymetrix, or individual cartridges) for 16 h at 45 oC, and 60 r.p.m. in a Hybridization Oven 640 (Affymetrix), slides were washed and stained with a Fluidics Station 450 (Affymetrix).
##                                                                                                                                                                    scan_protocol
## GSM733843 Scanning was performed on a 7th generation GeneChip Scanner 3000, and the Affymetrix GCOS software was used to perform image analysis and generate raw intensity data.
## GSM733844 Scanning was performed on a 7th generation GeneChip Scanner 3000, and the Affymetrix GCOS software was used to perform image analysis and generate raw intensity data.
## GSM733845 Scanning was performed on a 7th generation GeneChip Scanner 3000, and the Affymetrix GCOS software was used to perform image analysis and generate raw intensity data.
##              description
## GSM733843 2008-LAIV-1-D0
## GSM733844 2008-LAIV-1-D3
## GSM733845 2008-LAIV-1-D7
##                                                                                            data_processing
## GSM733843 RMA normalization was performed using Expression Console software (Affymetrix Inc, version 1.1).
## GSM733844 RMA normalization was performed using Expression Console software (Affymetrix Inc, version 1.1).
## GSM733845 RMA normalization was performed using Expression Console software (Affymetrix Inc, version 1.1).
##           platform_id    contact_name     contact_email  contact_phone
## GSM733843    GPL13158 Helder,I,Nakaya hnakaya@emory.edu 1-404-712-2594
## GSM733844    GPL13158 Helder,I,Nakaya hnakaya@emory.edu 1-404-712-2594
## GSM733845    GPL13158 Helder,I,Nakaya hnakaya@emory.edu 1-404-712-2594
##           contact_laboratory   contact_department contact_institute
## GSM733843     Bali Pulendran Emory Vaccine Center  Emory University
## GSM733844     Bali Pulendran Emory Vaccine Center  Emory University
## GSM733845     Bali Pulendran Emory Vaccine Center  Emory University
##                        contact_address contact_city contact_state
## GSM733843 954 Gatewood Road, room 2040      Atlanta            GA
## GSM733844 954 Gatewood Road, room 2040      Atlanta            GA
## GSM733845 954 Gatewood Road, room 2040      Atlanta            GA
##           contact_zip/postal_code contact_country
## GSM733843                   30329             USA
## GSM733844                   30329             USA
## GSM733845                   30329             USA
##                                                                                           supplementary_file
## GSM733843 ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/samples/GSM733nnn/GSM733843/GSM733843.CEL.gz
## GSM733844 ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/samples/GSM733nnn/GSM733844/GSM733844.CEL.gz
## GSM733845 ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/samples/GSM733nnn/GSM733845/GSM733845.CEL.gz
##                                                                                         supplementary_file.1
## GSM733843 ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/samples/GSM733nnn/GSM733843/GSM733843.chp.gz
## GSM733844 ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/samples/GSM733nnn/GSM733844/GSM733844.chp.gz
## GSM733845 ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/samples/GSM733nnn/GSM733845/GSM733845.chp.gz
##           data_row_count
## GSM733843          54715
## GSM733844          54715
## GSM733845          54715

Issues

  • The metadata is poorly formated
  • Subject ID & visit in the same column
  • HAI results mapped using “characteristic_ch1” columns
  • No demographics (age, gender, race)
  • No mapping to additional datasets (ELISA, ELISPOT, Flow, GE, etc…)

Combining data

Without any additional information the various seasons cannot be combined.

combine(gse[[1]], gse[[2]])
## Error in combine(gse[[1]], gse[[2]]): objects have different annotations: GPL13158, GPL3921

The platforms are different accross seasons.

Summary

In order to do something simple like differential gene expression analyses:

  • Parse the phenoData of each ExpressionSet for visit and patient ID
  • Use annotate to map probes to genes
  • combine into one expression matrix
  • Run DGEA (limma)

Combining this information with another dataset would require additional data cleaning (assuming the IDs are consistent accros experiment).


With ImmuneSpaceR

This study was funded by HIPC. As a result, all of its data has been curated and standardized by ImmPort and is now publicly available on ImmuneSpace. We will use ImmuneSpaceR to retrieve the gene-expression data and associated metadata.

Connect to ImmuneSpace

The study SDY269 holds the data for the two 2008 cohorts, LAIV and TIV.