Motif enrichment analysis on the selected probes

This step is to identify enriched motif in a set of probes which is carried out by function get.enriched.motif.

This function uses a pre-calculated data Probes.motif which was generated using HOMER with a \(p-value \le 10^{–4}\) to scan a \(\pm250bp\) region around each probe using HOmo sapiens COmprehensive MOdel COllection http://hocomoco.autosome.ru/ v10 (Kulakovskiy et al. 2016) position weight matrices (PWMs). For each probe set tested (i.e. the list of gene-linked hypomethylated probes in a given group), a motif enrichment Odds Ratio and a 95% confidence interval were calculated using following formulas: \[ p= \frac{a}{a+b} \] \[ P= \frac{c}{c+d} \] \[ Odds\quad Ratio = \frac{\frac{p}{1-p}}{\frac{P}{1-P}} \] \[ SD = \sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}} \] \[ lower\quad boundary\quad of\quad 95\%\quad confidence\quad interval = \exp{(\ln{OR}-SD)} \]

where a is the number of probes within the selected probe set that contain one or more motif occurrences; b is the number of probes within the selected probe set that do not contain a motif occurrence; c and d are the same counts within the entire enhancer probe set. A probe set was considered significantly enriched for a particular motif if the 95% confidence interval of the Odds Ratio was greater than 1.1 (specified by option lower.OR, 1.1 is default), and the motif occurred at least 10 times (specified by option min.incidence. 10 is default) in the probe set. As described in the text, Odds Ratios were also used for ranking candidate motifs.

Main get.pair arguments
Argument Description
data A multi Assay Experiment from createMAE function. If set and probes.motif/background probes are missing this will be used to get this other two arguments correctly. This argument is not require, you can set probes.motif and the backaground.probes manually.
probes A vector lists the name of probes to define the set of probes in which motif enrichment OR and confidence interval will be calculated.
lower.OR A number specifies the smallest lower boundary of 95% confidence interval for Odds Ratio. The motif with higher lower boudnary of 95% confidence interval for Odds Ratio than the number are the significantly enriched motifs (detail see reference).
min.incidence A non-negative integer specifies the minimum incidence of motif in the given probes set. 10 is default.
# Load results from previous sections
mae <- get(load("mae.rda"))
sig.diff <- read.csv("result/getMethdiff.hypo.probes.significant.csv")
pair <- read.csv("result/getPair.hypo.pairs.significant.csv")
head(pair) # significantly hypomethylated probes with putative target genes
##        Probe          GeneID Symbol Sides        Raw.p         Pe
## 1 cg00255699 ENSG00000117000    RLF    R7 2.526319e-05 0.00990099
## 2 cg00255699 ENSG00000183682  BMP8A    L9 1.595468e-06 0.00990099
## 3 cg00340127 ENSG00000091483     FH    L7 1.460821e-04 0.00990099
## 4 cg00393673 ENSG00000143157   POGK    R9 1.537896e-06 0.00990099
## 5 cg00393673 ENSG00000143179   UCK2    R6 3.212894e-11 0.00990099
## 6 cg00393673 ENSG00000143183  TMCO1    R5 5.737522e-04 0.00990099
##   distNearestTSS
## 1         319786
## 2         349939
## 3         567446
## 4        1314196
## 5         302283
## 6         243590
# Identify enriched motif for significantly hypomethylated probes which 
# have putative target genes.
enriched.motif <- get.enriched.motif(data = mae,
                                     probes = pair$Probe, 
                                     dir.out = "result", 
                                     label = "hypo",
                                     min.incidence = 10,
                                     lower.OR = 1.1)
names(enriched.motif) # enriched motifs
##  [1] "ARI5B_HUMAN.H11MO.0.C" "CDX2_HUMAN.H11MO.0.A" 
##  [3] "CEBPB_HUMAN.H11MO.0.A" "CPEB1_HUMAN.H11MO.0.D"
##  [5] "E2F8_HUMAN.H11MO.0.D"  "EVX1_HUMAN.H11MO.0.D" 
##  [7] "FOSB_HUMAN.H11MO.0.A"  "FOSL1_HUMAN.H11MO.0.A"
##  [9] "FOSL2_HUMAN.H11MO.0.A" "FOS_HUMAN.H11MO.0.A"  
## [11] "FOXH1_HUMAN.H11MO.0.A" "FOXJ3_HUMAN.H11MO.1.B"
## [13] "HXA11_HUMAN.H11MO.0.D" "HXA1_HUMAN.H11MO.0.C" 
## [15] "HXA9_HUMAN.H11MO.0.B"  "HXB13_HUMAN.H11MO.0.A"
## [17] "HXC10_HUMAN.H11MO.0.D" "HXC6_HUMAN.H11MO.0.D" 
## [19] "HXD8_HUMAN.H11MO.0.D"  "IRF1_HUMAN.H11MO.0.A" 
## [21] "IRF2_HUMAN.H11MO.0.A"  "IRF3_HUMAN.H11MO.0.B" 
## [23] "IRF9_HUMAN.H11MO.0.C"  "JUNB_HUMAN.H11MO.0.A" 
## [25] "JUND_HUMAN.H11MO.0.A"  "JUN_HUMAN.H11MO.0.A"  
## [27] "MAFF_HUMAN.H11MO.0.B"  "MAFG_HUMAN.H11MO.0.A" 
## [29] "MAFK_HUMAN.H11MO.0.A"  "MNX1_HUMAN.H11MO.0.D" 
## [31] "NF2L2_HUMAN.H11MO.0.A" "NFAC1_HUMAN.H11MO.0.B"
## [33] "OZF_HUMAN.H11MO.0.C"   "PO2F1_HUMAN.H11MO.0.C"
## [35] "PO2F2_HUMAN.H11MO.0.A" "PO3F3_HUMAN.H11MO.0.D"
## [37] "PO5F1_HUMAN.H11MO.0.A" "PO5F1_HUMAN.H11MO.1.A"
## [39] "SIX1_HUMAN.H11MO.0.A"  "SIX2_HUMAN.H11MO.0.A" 
## [41] "SOX17_HUMAN.H11MO.0.C" "SOX2_HUMAN.H11MO.1.A" 
## [43] "STAT1_HUMAN.H11MO.1.A" "STAT2_HUMAN.H11MO.0.A"
## [45] "TBP_HUMAN.H11MO.0.A"   "TF7L1_HUMAN.H11MO.0.B"
## [47] "ZBED1_HUMAN.H11MO.0.D" "ZN232_HUMAN.H11MO.0.D"
## [49] "ZN282_HUMAN.H11MO.0.D" "ZN317_HUMAN.H11MO.0.C"
## [51] "ZN394_HUMAN.H11MO.0.C" "ZN713_HUMAN.H11MO.0.D"
## [53] "ZSC16_HUMAN.H11MO.0.D"
head(enriched.motif[names(enriched.motif)[1]]) ## probes in the given set that have the first motif.
## $ARI5B_HUMAN.H11MO.0.C
##  [1] "cg23670519" "cg15576338" "cg17711754" "cg21402096" "cg21400442"
##  [6] "cg11718886" "cg22153407" "cg15965383" "cg09874992" "cg26607897"
## [11] "cg24873093" "cg18395632" "cg12446045" "cg25729466"
# get.enriched.motif automatically save output files. 
# getMotif.hypo.enriched.motifs.rda contains enriched motifs and the probes with the motif. 
# getMotif.hypo.motif.enrichment.csv contains summary of enriched motifs.
dir(path = "result", pattern = "getMotif") 
## [1] "getMotif.hypo.enriched.motifs.rda" 
## [2] "getMotif.hypo.motif.enrichment.csv"
# motif enrichment figure will be automatically generated.
dir(path = "result", pattern = "motif.enrichment.pdf") 
## [1] "hypo.quality.A-DS.motif.enrichment.pdf"             
## [2] "hypo.quality.A-DS_with_summary.motif.enrichment.pdf"

Bibliography

Kulakovskiy, Ivan V, Ilya E Vorontsov, Ivan S Yevshin, Anastasiia V Soboleva, Artem S Kasianov, Haitham Ashoor, Wail Ba-Alawi, et al. 2016. “HOCOMOCO: Expansion and Enhancement of the Collection of Transcription Factor Binding Sites Models.” Nucleic Acids Research 44 (D1). Oxford Univ Press: D116–D125.