juliaapolonio/Causeway: Output

Introduction

After running a Nextflow pipeline, all the results will be available in an output directory along the pipeline root that can be set using the --outdir flag. If no outdir is set, the folder will be named null. Inside the folder is expected that you have 6 more folders:

collected_files: This folder will contain the results of each analysis performed individually, such as heterogeneity test, metrics (TwoSampleMR regressions), Coloc etc.
coloc: This folder will contain all coloc outputs, that are for each GSMR-significant gene: a regional plot made with locuszoomr, a .txt file with H3, H4 and causal SNP, and another .png file with coloc's default plot.
final_report: This folder will contain the outputs of all results modules. A detailed explanation of these files, as well as how to interpret the metrics, will be presented in the next section.
GCTA_GSMR: This folder will contain GSMR's outputs: a .log file with GSMR's execution log; a .err file if the GSMR task fails for that phenotype; a .gsmr file with the analysis results; and a gcta_error_genes.txt file, that shows every phenotype that failed because of an insufficient number of IVs.
pipeline_info: Nextflow-generated informations about the pipeline run, such as RAM and storage consumption and execution time. You can check more information on what you can do with this data in the Nextflow's documentation
twosamplemr: This is the folder with most of the files. In a nutshell, it has the TwoSampleMR analysis results separated by phenotype; and the TwoSampleMR's scatter plot for all metrics.

Final Report folder

This folder contains all merged result files: - Candidate gene list: a .txt file with all candidate effector phenotypes, separated by newline. - final results: a .csv file with the analysis results combined for each phenotype, and if the phenotype is a candidate effector or not. - summary report: an HTML report with the analysis highlights - an interactive volcano plot with candidates highlighted; an interactive forest plot with the candidates; and a table showing a brief of the metrics for each candidate, with the option to show the regional plot and MR scatter plot for each phenotype individually.

A brief of the summary report

Important note: for the proper renderization of the report, it must be kept along with the pipeline output folder. If moving the HTML report file, ensure to move all the output folder together.

How to interpret the results

Plots

Volcano plot

The x-axis represents the effect size of exposure on the outcome and the Y-axis is showing the FDR-adjusted p-value. The circle size is proportional to the number of instrumental variables (IVs) used in GSMR analysis. The prioritized phenotypes are highlighted in black and have name labels.

Forest plot

The circle size represents the variance of the effect size, while the line size is the 95% confidence interval for the effect size.

Regional plot

This plot is generated using LocusZoom R package and shows regional plots of eQTL (upper) and outcome GWAS (lower) at a specific locus. The index variant in each plot is represented in purple and the colocalized variant is labeled with its rsID. The lowest part of the plot is a gene browser, with the target gene colored in red.

Scatter plot

TwoSampleMR plot generated by Causeway showing the relationship of each IV effect on exposure against each IV effect on outcome across different MR regression methods. The error bars correspond to 95% confidence intervals for the effect size.

Data in final results file

GSMR metrics

gsmr_beta: effect size of GSMR analysis
gsmr_se: Standard error of GSMR analysis
gsmr_pval: P-value of GSMR analysis
gsmr_nsnp: Number of IVs selected for GSMR analysis
heidi_out: p-value for HEIDI-outlier filtering analysis

TwoSampleMR heterogeneity metrics

Q_MR_Egger: Cochran's Q statistic estimate using MR Egger method
Q_Inverse_variance_weighted: Cochran's Q statistic estimate using Inverse Variance Weighted method
Q_df_MR_Egger: heterogeneity degrees of freedom using MR Egger method
Q_df_Inverse_variance_weighted: heterogeneity degrees of freedom using Inverse Variance Weighted method
Q_pval_MR_Egger: Cochran's Q statistic P-value using MR Egger method
Q_pval_Inverse_variance_weighted: Cochran's Q statistic P-value using Inverse Variance Weighted method

TwoSampleMR regression metrics

nsnp: Number of IVs selected for Two Sample MR analysis
b_MR_Egger: Effect size of MR Egger regression
b_Weighted_median: Effect size of Weighted median regression
b_Inverse_variance_weighted: Effect size of Inverse Variance Weighted regression
b_Simple_mode: Effect size of Simple mode regression
b_Weighted_mode: Effect size of Weighted mode regression
se_MR_Egger: Standard Error of MR Egger regression
se_Weighted_median: Standard Error of Weighted median regression
se_Inverse_variance_weighted: Standard Error of Inverse Variance Weighted regression
se_Simple_mode: Standard Error of Simple mode regression
se_Weighted_mode: Standard Error of Weighted mode regression
pval_MR_Egger: P-value of MR Egger regression
pval_Weighted_median: P-value of Weighted median regression
pval_Inverse_variance_weighted: P-value of Inverse Variance Weighted regression
pval_Simple_mode: P-value of Simple mode regression
pval_Weighted_mode: P-value of Weighted mode regression
adjp_MR_Egger: Adjusted P-value of MR Egger regression
adjp_Weighted_median: Adjusted P-value of Weighted median regression
adjp_Inverse_variance_weighted: Adjusted P-value of Inverse Variance Weighted regression
adjp_Simple_mode: Adjusted P-value of Simple mode regression
adjp_Weighted_mode: Adjusted P-value of Weighted mode regression

TwoSampleMR direction metrics

snp_r2.exposure: mean r-squared of exposure effect on IVs
snp_r2.outcome: mean r-squared of outcome effect on IVs
correct_causal_direction: logical value, if TRUE the correct causal direction is exposure -> outcome
steiger_pval: P-value of Steiger test

Coloc metrics

H3: H3 posterior probability of association with both traits but driven by different causal variants
H4: H4 posterior probability of association with both traits driven by a shared causal variant
causal_snp: most probable colocalized variant rsID

TwoSampleMR pleiotropy metrics

egger_intercept: intercept value of Egger regression
se: Standard error of Egger regression
pleiotropy_pval: P-value of Egger regression

TwoSampleMR MRPRESSO metrics

mrpresso_pval: global p-value of MRPRESSO test

Other metrics

is_candidate: logical value, if phenotype passes all filtering criteria, value = TRUE