Motif enrichment analysis on the selected probes

This step is to identify enriched motif in a set of probes which is carried out by function get.enriched.motif.

This function uses a pre-calculated data Probes.motif which was generated using HOMER with a \(p-value \le 10^{–4}\) to scan a \(\pm250bp\) region around each probe using HOmo sapiens COmprehensive MOdel COllection http://hocomoco.autosome.ru/ v10 (Kulakovskiy et al. 2016) position weight matrices (PWMs). For each probe set tested (i.e. the list of gene-linked hypomethylated probes in a given group), a motif enrichment Odds Ratio and a 95% confidence interval were calculated using following formulas: \[ p= \frac{a}{a+b} \] \[ P= \frac{c}{c+d} \] \[ Odds\quad Ratio = \frac{\frac{p}{1-p}}{\frac{P}{1-P}} \] \[ SD = \sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}} \] \[ lower\quad boundary\quad of\quad 95\%\quad confidence\quad interval = \exp{(\ln{OR}-SD)} \]

where a is the number of probes within the selected probe set that contain one or more motif occurrences; b is the number of probes within the selected probe set that do not contain a motif occurrence; c and d are the same counts within the entire enhancer probe set. A probe set was considered significantly enriched for a particular motif if the 95% confidence interval of the Odds Ratio was greater than 1.1 (specified by option lower.OR, 1.1 is default), and the motif occurred at least 10 times (specified by option min.incidence. 10 is default) in the probe set. As described in the text, Odds Ratios were also used for ranking candidate motifs.

Main get.pair arguments
Argument Description
data A multi Assay Experiment from createMAE function. If set and probes.motif/background probes are missing this will be used to get this other two arguments correctly. This argument is not require, you can set probes.motif and the backaground.probes manually.
probes A vector lists the name of probes to define the set of probes in which motif enrichment OR and confidence interval will be calculated.
lower.OR A number specifies the smallest lower boundary of 95% confidence interval for Odds Ratio. The motif with higher lower boudnary of 95% confidence interval for Odds Ratio than the number are the significantly enriched motifs (detail see reference).
min.incidence A non-negative integer specifies the minimum incidence of motif in the given probes set. 10 is default.
# Load results from previous sections
mae <- get(load("mae.rda"))
sig.diff <- read.csv("result/getMethdiff.hypo.probes.significant.csv")
pair <- read.csv("result/getPair.hypo.pairs.significant.csv")
head(pair) # significantly hypomethylated probes with putative target genes
##        Probe          GeneID Symbol Distance Sides        Raw.p         Pe
## 1 cg20701183 ENSG00000143013   LMO4        0    R1 7.453984e-14 0.00990099
## 2 cg19403323 ENSG00000143469  SYT14    87476    R2 1.671937e-12 0.00990099
## 3 cg12213388 ENSG00000143674   MLK4   984182    L3 2.527644e-12 0.00990099
## 4 cg26607897 ENSG00000143199 ADCY10   267784    R4 4.593610e-12 0.00990099
## 5 cg10574861 ENSG00000143013   LMO4        0    R1 4.770162e-12 0.00990099
## 6 cg26607897 ENSG00000143147 GPR161   543156    R7 8.048248e-12 0.00990099
# Identify enriched motif for significantly hypomethylated probes which 
# have putative target genes.
enriched.motif <- get.enriched.motif(data = mae,
                                     probes = pair$Probe, 
                                     dir.out = "result", 
                                     label = "hypo",
                                     min.incidence = 10,
                                     lower.OR = 1.1)
names(enriched.motif) # enriched motifs
##  [1] "ARI5B_HUMAN.H11MO.0.C" "CDX2_HUMAN.H11MO.0.A" 
##  [3] "CEBPB_HUMAN.H11MO.0.A" "CPEB1_HUMAN.H11MO.0.D"
##  [5] "E2F8_HUMAN.H11MO.0.D"  "EVX1_HUMAN.H11MO.0.D" 
##  [7] "FOSB_HUMAN.H11MO.0.A"  "FOSL1_HUMAN.H11MO.0.A"
##  [9] "FOSL2_HUMAN.H11MO.0.A" "FOS_HUMAN.H11MO.0.A"  
## [11] "FOXH1_HUMAN.H11MO.0.A" "FOXJ3_HUMAN.H11MO.1.B"
## [13] "HXA11_HUMAN.H11MO.0.D" "HXA1_HUMAN.H11MO.0.C" 
## [15] "HXA9_HUMAN.H11MO.0.B"  "HXB13_HUMAN.H11MO.0.A"
## [17] "HXC10_HUMAN.H11MO.0.D" "HXC6_HUMAN.H11MO.0.D" 
## [19] "HXD8_HUMAN.H11MO.0.D"  "IRF1_HUMAN.H11MO.0.A" 
## [21] "IRF2_HUMAN.H11MO.0.A"  "IRF3_HUMAN.H11MO.0.B" 
## [23] "IRF9_HUMAN.H11MO.0.C"  "JUNB_HUMAN.H11MO.0.A" 
## [25] "JUND_HUMAN.H11MO.0.A"  "JUN_HUMAN.H11MO.0.A"  
## [27] "MAFF_HUMAN.H11MO.0.B"  "MAFG_HUMAN.H11MO.0.A" 
## [29] "MAFK_HUMAN.H11MO.0.A"  "MNX1_HUMAN.H11MO.0.D" 
## [31] "NF2L2_HUMAN.H11MO.0.A" "NFAC1_HUMAN.H11MO.0.B"
## [33] "OZF_HUMAN.H11MO.0.C"   "PO2F1_HUMAN.H11MO.0.C"
## [35] "PO2F2_HUMAN.H11MO.0.A" "PO3F3_HUMAN.H11MO.0.D"
## [37] "PO5F1_HUMAN.H11MO.0.A" "PO5F1_HUMAN.H11MO.1.A"
## [39] "SIX1_HUMAN.H11MO.0.A"  "SIX2_HUMAN.H11MO.0.A" 
## [41] "SOX17_HUMAN.H11MO.0.C" "SOX2_HUMAN.H11MO.1.A" 
## [43] "STAT1_HUMAN.H11MO.1.A" "STAT2_HUMAN.H11MO.0.A"
## [45] "TBP_HUMAN.H11MO.0.A"   "TF7L1_HUMAN.H11MO.0.B"
## [47] "ZBED1_HUMAN.H11MO.0.D" "ZN232_HUMAN.H11MO.0.D"
## [49] "ZN282_HUMAN.H11MO.0.D" "ZN317_HUMAN.H11MO.0.C"
## [51] "ZN394_HUMAN.H11MO.0.C" "ZN713_HUMAN.H11MO.0.D"
## [53] "ZSC16_HUMAN.H11MO.0.D"
head(enriched.motif[names(enriched.motif)[1]]) ## probes in the given set that have the first motif.
## $ARI5B_HUMAN.H11MO.0.C
##  [1] "cg23670519" "cg15576338" "cg17711754" "cg21402096" "cg21400442"
##  [6] "cg11718886" "cg22153407" "cg15965383" "cg09874992" "cg26607897"
## [11] "cg24873093" "cg18395632" "cg12446045" "cg25729466"
# get.enriched.motif automatically save output files. 
# getMotif.hypo.enriched.motifs.rda contains enriched motifs and the probes with the motif. 
# getMotif.hypo.motif.enrichment.csv contains summary of enriched motifs.
dir(path = "result", pattern = "getMotif") 
## [1] "getMotif.hypo.enriched.motifs.rda" 
## [2] "getMotif.hypo.motif.enrichment.csv"
# motif enrichment figure will be automatically generated.
dir(path = "result", pattern = "motif.enrichment.pdf") 
## [1] "hypo.quality.A-DS.motif.enrichment.pdf"             
## [2] "hypo.quality.A-DS_with_summary.motif.enrichment.pdf"

Bibliography

Kulakovskiy, Ivan V, Ilya E Vorontsov, Ivan S Yevshin, Anastasiia V Soboleva, Artem S Kasianov, Haitham Ashoor, Wail Ba-Alawi, et al. 2016. “HOCOMOCO: Expansion and Enhancement of the Collection of Transcription Factor Binding Sites Models.” Nucleic Acids Research 44 (D1). Oxford Univ Press: D116–D125.