juliaapolonio/Causeway: Output

Introduction

After running a Nextflow pipeline, all the results will be available in an output directory along the pipeline root that can be set using the --outdir flag. If no outdir is set, the folder will be named null. Inside the folder is expected that you have 6 more folders:

  • collected_files: This folder will contain the results of each analysis performed individually, such as heterogeneity test, metrics (TwoSampleMR regressions), Coloc etc.

  • coloc: This folder will contain all coloc outputs, that are for each GSMR-significant gene: a regional plot made with locuszoomr, a .txt file with H3, H4 and causal SNP, and another .png file with coloc's default plot.

  • final_report: This folder will contain the outputs of all results modules. A detailed explanation of these files, as well as how to interpret the metrics, will be presented in the next section.

  • GCTA_GSMR: This folder will contain GSMR's outputs: a .log file with GSMR's execution log; a .err file if the GSMR task fails for that phenotype; a .gsmr file with the analysis results; and a gcta_error_genes.txt file, that shows every phenotype that failed because of an insufficient number of IVs.

  • pipeline_info: Nextflow-generated informations about the pipeline run, such as RAM and storage consumption and execution time. You can check more information on what you can do with this data in the Nextflow's documentation

  • twosamplemr: This is the folder with most of the files. In a nutshell, it has the TwoSampleMR analysis results separated by phenotype; and the TwoSampleMR's scatter plot for all metrics.

Final Report folder

This folder contains all merged result files: - Candidate gene list: a .txt file with all candidate effector phenotypes, separated by newline. - final results: a .csv file with the analysis results combined for each phenotype, and if the phenotype is a candidate effector or not. - summary report: an HTML report with the analysis highlights - an interactive volcano plot with candidates highlighted; an interactive forest plot with the candidates; and a table showing a brief of the metrics for each candidate, with the option to show the regional plot and MR scatter plot for each phenotype individually.

A brief of the summary report

Important note: for the proper renderization of the report, it must be kept along with the pipeline output folder. If moving the HTML report file, ensure to move all the output folder together.

How to interpret the results

Plots

Volcano plot

The x-axis represents the effect size of exposure on the outcome and the Y-axis is showing the FDR-adjusted p-value. The circle size is proportional to the number of instrumental variables (IVs) used in GSMR analysis. The prioritized phenotypes are highlighted in black and have name labels.

Forest plot

The circle size represents the variance of the effect size, while the line size is the 95% confidence interval for the effect size.

Regional plot

This plot is generated using LocusZoom R package and shows regional plots of eQTL (upper) and outcome GWAS (lower) at a specific locus. The index variant in each plot is represented in purple and the colocalized variant is labeled with its rsID. The lowest part of the plot is a gene browser, with the target gene colored in red.

Scatter plot

TwoSampleMR plot generated by Causeway showing the relationship of each IV effect on exposure against each IV effect on outcome across different MR regression methods. The error bars correspond to 95% confidence intervals for the effect size.

Data in final results file

GSMR metrics

  • gsmr_beta: effect size of GSMR analysis

  • gsmr_se: Standard error of GSMR analysis

  • gsmr_pval: P-value of GSMR analysis

  • gsmr_nsnp: Number of IVs selected for GSMR analysis

  • heidi_out: p-value for HEIDI-outlier filtering analysis

TwoSampleMR heterogeneity metrics

  • Q_MR_Egger: Cochran's Q statistic estimate using MR Egger method
  • Q_Inverse_variance_weighted: Cochran's Q statistic estimate using Inverse Variance Weighted method
  • Q_df_MR_Egger: heterogeneity degrees of freedom using MR Egger method
  • Q_df_Inverse_variance_weighted: heterogeneity degrees of freedom using Inverse Variance Weighted method
  • Q_pval_MR_Egger: Cochran's Q statistic P-value using MR Egger method
  • Q_pval_Inverse_variance_weighted: Cochran's Q statistic P-value using Inverse Variance Weighted method

TwoSampleMR regression metrics

  • nsnp: Number of IVs selected for Two Sample MR analysis
  • b_MR_Egger: Effect size of MR Egger regression
  • b_Weighted_median: Effect size of Weighted median regression
  • b_Inverse_variance_weighted: Effect size of Inverse Variance Weighted regression
  • b_Simple_mode: Effect size of Simple mode regression
  • b_Weighted_mode: Effect size of Weighted mode regression
  • se_MR_Egger: Standard Error of MR Egger regression
  • se_Weighted_median: Standard Error of Weighted median regression
  • se_Inverse_variance_weighted: Standard Error of Inverse Variance Weighted regression
  • se_Simple_mode: Standard Error of Simple mode regression
  • se_Weighted_mode: Standard Error of Weighted mode regression
  • pval_MR_Egger: P-value of MR Egger regression
  • pval_Weighted_median: P-value of Weighted median regression
  • pval_Inverse_variance_weighted: P-value of Inverse Variance Weighted regression
  • pval_Simple_mode: P-value of Simple mode regression
  • pval_Weighted_mode: P-value of Weighted mode regression
  • adjp_MR_Egger: Adjusted P-value of MR Egger regression
  • adjp_Weighted_median: Adjusted P-value of Weighted median regression
  • adjp_Inverse_variance_weighted: Adjusted P-value of Inverse Variance Weighted regression
  • adjp_Simple_mode: Adjusted P-value of Simple mode regression
  • adjp_Weighted_mode: Adjusted P-value of Weighted mode regression

TwoSampleMR direction metrics

  • snp_r2.exposure: mean r-squared of exposure effect on IVs
  • snp_r2.outcome: mean r-squared of outcome effect on IVs
  • correct_causal_direction: logical value, if TRUE the correct causal direction is exposure -> outcome
  • steiger_pval: P-value of Steiger test

Coloc metrics

  • H3: H3 posterior probability of association with both traits but driven by different causal variants
  • H4: H4 posterior probability of association with both traits driven by a shared causal variant
  • causal_snp: most probable colocalized variant rsID

TwoSampleMR pleiotropy metrics

  • egger_intercept: intercept value of Egger regression
  • se: Standard error of Egger regression
  • pleiotropy_pval: P-value of Egger regression

TwoSampleMR MRPRESSO metrics

  • mrpresso_pval: global p-value of MRPRESSO test

Other metrics

  • is_candidate: logical value, if phenotype passes all filtering criteria, value = TRUE