This step is to identify enriched motif in a set of probes which is carried out by function get.enriched.motif
.
This function uses a pre-calculated data Probes.motif
which was generated using HOMER with a \(p-value \le 10^{–4}\) to scan a \(\pm250bp\) region around each probe using HOmo sapiens COmprehensive MOdel COllection http://hocomoco.autosome.ru/ v10 (Kulakovskiy et al. 2016) position weight matrices (PWMs). For each probe set tested (i.e. the list of gene-linked hypomethylated probes in a given group), a motif enrichment Odds Ratio and a 95% confidence interval were calculated using following formulas: \[ p= \frac{a}{a+b} \] \[ P= \frac{c}{c+d} \] \[ Odds\quad Ratio = \frac{\frac{p}{1-p}}{\frac{P}{1-P}} \] \[ SD = \sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}} \] \[ lower\quad boundary\quad of\quad 95\%\quad confidence\quad interval = \exp{(\ln{OR}-SD)} \]
where a
is the number of probes within the selected probe set that contain one or more motif occurrences; b
is the number of probes within the selected probe set that do not contain a motif occurrence; c
and d
are the same counts within the entire enhancer probe set. A probe set was considered significantly enriched for a particular motif if the 95% confidence interval of the Odds Ratio was greater than 1.1 (specified by option lower.OR
, 1.1 is default), and the motif occurred at least 10 times (specified by option min.incidence
. 10 is default) in the probe set. As described in the text, Odds Ratios were also used for ranking candidate motifs.
Argument | Description |
---|---|
data | A multi Assay Experiment from createMAE function. If set and probes.motif/background probes are missing this will be used to get this other two arguments correctly. This argument is not require, you can set probes.motif and the backaground.probes manually. |
probes | A vector lists the name of probes to define the set of probes in which motif enrichment OR and confidence interval will be calculated. |
lower.OR | A number specifies the smallest lower boundary of 95% confidence interval for Odds Ratio. The motif with higher lower boudnary of 95% confidence interval for Odds Ratio than the number are the significantly enriched motifs (detail see reference). |
min.incidence | A non-negative integer specifies the minimum incidence of motif in the given probes set. 10 is default. |
# Load results from previous sections
mae <- get(load("mae.rda"))
sig.diff <- read.csv("result/getMethdiff.hypo.probes.significant.csv")
pair <- read.csv("result/getPair.hypo.pairs.significant.csv")
head(pair) # significantly hypomethylated probes with putative target genes
## Probe GeneID Symbol Distance Sides Raw.p Pe
## 1 cg20701183 ENSG00000143013 LMO4 0 R1 7.453984e-14 0.00990099
## 2 cg19403323 ENSG00000143469 SYT14 87476 R2 1.671937e-12 0.00990099
## 3 cg12213388 ENSG00000143674 MLK4 984182 L3 2.527644e-12 0.00990099
## 4 cg26607897 ENSG00000143199 ADCY10 267784 R4 4.593610e-12 0.00990099
## 5 cg10574861 ENSG00000143013 LMO4 0 R1 4.770162e-12 0.00990099
## 6 cg26607897 ENSG00000143147 GPR161 543156 R7 8.048248e-12 0.00990099
# Identify enriched motif for significantly hypomethylated probes which
# have putative target genes.
enriched.motif <- get.enriched.motif(data = mae,
probes = pair$Probe,
dir.out = "result",
label = "hypo",
min.incidence = 10,
lower.OR = 1.1)
names(enriched.motif) # enriched motifs
## [1] "ARI5B_HUMAN.H11MO.0.C" "CDX2_HUMAN.H11MO.0.A"
## [3] "CEBPB_HUMAN.H11MO.0.A" "CPEB1_HUMAN.H11MO.0.D"
## [5] "E2F8_HUMAN.H11MO.0.D" "EVX1_HUMAN.H11MO.0.D"
## [7] "FOSB_HUMAN.H11MO.0.A" "FOSL1_HUMAN.H11MO.0.A"
## [9] "FOSL2_HUMAN.H11MO.0.A" "FOS_HUMAN.H11MO.0.A"
## [11] "FOXH1_HUMAN.H11MO.0.A" "FOXJ3_HUMAN.H11MO.1.B"
## [13] "HXA11_HUMAN.H11MO.0.D" "HXA1_HUMAN.H11MO.0.C"
## [15] "HXA9_HUMAN.H11MO.0.B" "HXB13_HUMAN.H11MO.0.A"
## [17] "HXC10_HUMAN.H11MO.0.D" "HXC6_HUMAN.H11MO.0.D"
## [19] "HXD8_HUMAN.H11MO.0.D" "IRF1_HUMAN.H11MO.0.A"
## [21] "IRF2_HUMAN.H11MO.0.A" "IRF3_HUMAN.H11MO.0.B"
## [23] "IRF9_HUMAN.H11MO.0.C" "JUNB_HUMAN.H11MO.0.A"
## [25] "JUND_HUMAN.H11MO.0.A" "JUN_HUMAN.H11MO.0.A"
## [27] "MAFF_HUMAN.H11MO.0.B" "MAFG_HUMAN.H11MO.0.A"
## [29] "MAFK_HUMAN.H11MO.0.A" "MNX1_HUMAN.H11MO.0.D"
## [31] "NF2L2_HUMAN.H11MO.0.A" "NFAC1_HUMAN.H11MO.0.B"
## [33] "OZF_HUMAN.H11MO.0.C" "PO2F1_HUMAN.H11MO.0.C"
## [35] "PO2F2_HUMAN.H11MO.0.A" "PO3F3_HUMAN.H11MO.0.D"
## [37] "PO5F1_HUMAN.H11MO.0.A" "PO5F1_HUMAN.H11MO.1.A"
## [39] "SIX1_HUMAN.H11MO.0.A" "SIX2_HUMAN.H11MO.0.A"
## [41] "SOX17_HUMAN.H11MO.0.C" "SOX2_HUMAN.H11MO.1.A"
## [43] "STAT1_HUMAN.H11MO.1.A" "STAT2_HUMAN.H11MO.0.A"
## [45] "TBP_HUMAN.H11MO.0.A" "TF7L1_HUMAN.H11MO.0.B"
## [47] "ZBED1_HUMAN.H11MO.0.D" "ZN232_HUMAN.H11MO.0.D"
## [49] "ZN282_HUMAN.H11MO.0.D" "ZN317_HUMAN.H11MO.0.C"
## [51] "ZN394_HUMAN.H11MO.0.C" "ZN713_HUMAN.H11MO.0.D"
## [53] "ZSC16_HUMAN.H11MO.0.D"
head(enriched.motif[names(enriched.motif)[1]]) ## probes in the given set that have the first motif.
## $ARI5B_HUMAN.H11MO.0.C
## [1] "cg23670519" "cg15576338" "cg17711754" "cg21402096" "cg21400442"
## [6] "cg11718886" "cg22153407" "cg15965383" "cg09874992" "cg26607897"
## [11] "cg24873093" "cg18395632" "cg12446045" "cg25729466"
# get.enriched.motif automatically save output files.
# getMotif.hypo.enriched.motifs.rda contains enriched motifs and the probes with the motif.
# getMotif.hypo.motif.enrichment.csv contains summary of enriched motifs.
dir(path = "result", pattern = "getMotif")
## [1] "getMotif.hypo.enriched.motifs.rda"
## [2] "getMotif.hypo.motif.enrichment.csv"
# motif enrichment figure will be automatically generated.
dir(path = "result", pattern = "motif.enrichment.pdf")
## [1] "hypo.quality.A-DS.motif.enrichment.pdf"
## [2] "hypo.quality.A-DS_with_summary.motif.enrichment.pdf"
Kulakovskiy, Ivan V, Ilya E Vorontsov, Ivan S Yevshin, Anastasiia V Soboleva, Artem S Kasianov, Haitham Ashoor, Wail Ba-Alawi, et al. 2016. “HOCOMOCO: Expansion and Enhancement of the Collection of Transcription Factor Binding Sites Models.” Nucleic Acids Research 44 (D1). Oxford Univ Press: D116–D125.