Makrophagen Ressource - Skripte, Algorithmen, Software

Hier stellen wir Ihnen die Skripte, Algorithmen und die Software vor, die wir zur Analyse dieses großen Datensatzes zur Aktivierung humaner Makrophagen genutzt haben.

    Extending the current model of macrophage polarization (Figure 1)

    Primary data handling - Illumina Genome Studio

    • For primary data handling of the microarrays manufactured by Illumina, we used Illumina Genome Studio Software. Genome Studio Gene Expression (GX) Module collects signal image files generated from Illumina HiScanSQ system. Average signal intensity values and detection p-values of probesets for each array sample were computed and exported as Partek Project file.
    • quantile normalization of raw expression data in PGS.
    • batch effect of separate array experiment was removed from normalized data.
    • background signal was calculated with an R script (6.747). Genes are only kept for further analysis if their mean expression values are higher than background in at least one condition from 299 macrophage transcriptomes.
    • multiple-probe filtering: for each gene, only include one probe with highest mean expression. 9498 probesets representing most informative genes are remained.

    Secondary data analysis

    After preprocessing raw data, the normalized expression data were used to perform advanced data structural analysis by applying a variety of algorisms in PGS and other tools like BioLayout Express 3D

    • Unfiltered expression transcriptomic data were used for the following.
    • Co-regulation network analysis among 297 macrophage transcriptomes referring to 29 stimulation conditions using BioLayout. Pearson correlation threshold 0.95 was set for network visualization.
    • Self-organizing-map clustering on transcriptomic level for samples assembled in Figure 1E.
      • Compute the mean expression values of each stimulation condition.
      • In PGS, maps for SOM clustering was performed by 10 times 10 grids with 20,000 iterations
    • Identification of 10 clusters in 29 conditions in PGS.
      • Calculate the mean expression of all 29 conditions using only samples generated at the end of activation time points (72 hrs or 24 hrs).
      • The Pearson correlation matrix was calculated for each condition-condition pair.
      • Hierarchical clustering of Pearson correlation matrix in both columns and rows.
    • Utilizing the 3D coordinates of the individual macrophage samples determined by BioLayout, calculating mean vectors for the clusters and plotting the information in a 3D graph  using the coordinates of the baseline macrophages (Mb) as the origin. This was done by implementing a Matlab script vectarrow.m.
    • 161 macrophage samples generated at the end of activation time points (72 hrs or 24 hrs) with 9498 present genes were used.
    • SOM clustering on 29 conditions, in which relative expression values of each gene from one particular condition in comparison to other stimulations.
    • One gene with highest relative expression for each condition was chosen, and their absolute mean expression values were visualized in scatter plots with standard deviations as error bars.

    Network analysis defines stimulus-associated programs of macrophage activation (Figure 3)

    Condition-specific transcriptional programs were identified utilizing R package for WGCNA (Weighted Gene Coexpression Network Analysis), which defines gene modules based on Pearson correlation.

    • 161 macrophage samples generated at the end of activation time points (72 hrs or 24 hrs) with 9498 present genes were used.
    • When using WGCNA R package, the standard parameters were altered to a power of 6 and a minModuleSize of 30 resulting in 49 modules.
      • For each module the eigengene corresponding to the first principal component of a given module was calculated. The network for each module of interest was generated using the “1-TOMsimilarityFromExpr” function of the WGCNA R package.
    • The network was subsequently imported into Cytoscape for GO-enrichment analysis and visualization using BiNGO, Enrichment Map, and Word Clouding
    • The 3 most positive correlated modules specific for each condition (IFNγ, IL4 and TPP) were used to visualize a transcription factors correlation network respectively for each condition.
      • Genomatix was used to assess whether transcription factors were part of the modules of interest.
      • The transcription factors correlation network was calculated in BioLayout using all time point samples stimulated by the corresponding conditions. Networks were then visualzed in Cytoscape to show correlation levels between TFs. Here are the corresponding settings for three stimuli:

        a)    IFNg (11 TFs in modules): correlation threshold 0.56. 8 TFs are visualized in the network.

        b)    IL4 (18 TFs in modules): correlation threshold 0.23. 17 TFs are visualized in the network.

        c)     TPP (39 TFs in modules): correlation threshold 0.60. 21 TFs are visualized in the network.

    Distinct phenotype of macrophages activated by TNF, PGE2, and TLR2 ligand (Figure 4)

    Condition-specific transcriptional programs were identified utilizing R package for WGCNA (Weighted Gene Coexpression Network Analysis), which defines gene modules based on Pearson correlation.

    • 161 macrophage samples generated at the end of activation time points (72 hrs or 24 hrs) with 9498 present genes were used.
    • When using WGCNA R package, the standard parameters were altered to a power of 6 and a minModuleSize of 30 resulting in 49 modules.
      • For each module the eigengene corresponding to the first principal component of a given module was calculated. The network for each module of interest was generated using the “1-TOMsimilarityFromExpr” function of the WGCNA R package.
    • The network was subsequently imported into Cytoscape for GO-enrichment analysis and visualization using BiNGO, Enrichment Map, and Word Clouding
    • The 3 most positive correlated modules specific for each condition (IFNγ, IL4 and TPP) were used to visualize a transcription factors correlation network respectively for each condition.
      • Genomatix was used to assess whether transcription factors were part of the modules of interest.
      • The transcription factors correlation network was calculated in BioLayout using all time point samples stimulated by the corresponding conditions. Networks were then visualzed in Cytoscape to show correlation levels between TFs. Here are the corresponding settings for three stimuli:

        a)    IFNg (11 TFs in modules): correlation threshold 0.56. 8 TFs are visualized in the network.

        b)    IL4 (18 TFs in modules): correlation threshold 0.23. 17 TFs are visualized in the network.

        c)     TPP (39 TFs in modules): correlation threshold 0.60. 21 TFs are visualized in the network.

    Macrophage activation model can be used to predict macrophage programs in vivo (Figure 5)

    • We downloaded two human alveolar macrophage gene expression datasets from GEO: GSE13896 and GSE2125. Two datasets from the same platform were merged in PGS and the effect from different experiment was removed.
    •  Analyze differential expressed genes by applying ANOVA model with fold change cutoff 2.5 and FDR adjusted p-value 0.05. These DE genes from 80 arrays were used to build sample co-regulation network with correlation threshold 0.80.
    • Use the 49 defined WGCNA gene modules as stimulus-specific gene sets for each in vitro condition, Gene set enrichment analysis (GSEA) was performed with 1000 permutations.

    Common denominators of macrophage activation (Figure 6)

             Reverse network engineering

    • 299 macrophage samples with 9498 present genes were used for network calculation.
    • Common transcriptional regulatory networks were inferring by reverse engineering algorisms ARACNe and TINGe. The ARACNe algorism has been integrated into geWorkbench, an integrative GUI tool for microarray analysis. 3 parameter settings were performed:

    o    (i) Bonferonni corrected p-value 10e-7, DPI 0.1 using ARACNe algorithm

    o    (ii) uncorrected p-value 10e-7, DPI 0.1 using ARACNe algorithm

    o    (iii) uncorrected p-value 10e-7, DPI 0.1 using TINGe algorithm

    •  Subsequent network analysis such as network visualization and statistics were performed in Cytoscape. The networks were visualized in force-directed layout and the mean expression levels of 10 defined clusters were visualized by MultiColoredNodes plug-in.
    • The inferred networks from ARACNe (Network ii) and TINGe (Network iii) were compared topologically by the degree of connectivity of each gene using a degree-degree plot, where the degrees of ARACNe network genes were in x-axis and corresponding degrees in TINGe network in y-axis.

        Gene prioritization by ToppGene and Endeavour

    •  The top 10% highly connected hub genes with a degree of connectivity higher than 30 (as test gene set) were prioritized by association with macrophage lineage and activation information using the transcription factors PU.1 and RUNX1 as training gene set.
    •   For both tools, use the default parameter settings.
    •   The results of the two approaches were subsequently combined by the Borda ranking method.

     

     

    Refinement of core genes of murine tissue macrophages using human macrophage activation signatures (Figure 7)

     

    • Collection of 44 and 43 genes being previously classified as macrophage (Gautier et al., 2012) and dendritic cell (Miller et al., 2012) core genes, respectively
    • Preparation of human dataset:
      • Selection of 161 macrophage (29 conditions), 33 DC and 7 monocyte human samples
      • Data set was log2-transformed, quantile normalized and batch-corrected in PGS
      • Non-present transcripts (expression level is lower than background value of 6.75 on log2 scale in all three cell types) were filtered out
      • For each gene the transcript with the highest mean expression across all samples was kept as a representative
    • Calculation of three different fold changes:
      • all macrophages against all DCs (overall fold change)
      • each single macrophage condition against all DCs
      • each single macrophage condition against mature DCs
    • Translation of murine gene symbols into human gene symbols:
      • Existing human orthologs were extracted from BioMart
      • For those genes where no orthologue was listed in BioMartBioGPS was searched for a possible orthologue
    • Visualization of translated core genes:
      • Expression values were standardized to a mean of zero and a standard deviation of one and scaled to a maximum expression of 2 and a minimum expression of -2 (on log2 scale)
      • Genes were sorted according to the overall fold change
      • Visualization as heatmap by using Mayday
      • Genes missing from the murine core signatures were either not interrogated on the Illumina array, not present relative to the background value or did not have a distinct human orthologue