It covers driver gene identification, pathway analysis, mutational signatures, and somatic interactions. users can learn how to make sense of the detected mutations.
Once the variants are annotated, it’s pivotal to interpret the somatic mutation data. The downstream analysis is facilitated by incorporating the Maftools software package into the Musta pipeline.
Starting from the MAF files generated during the Classification step or provided by the user, Musta performs comprehensive analysis and interpretation of somatic variants. It offers a wide range of downstream analysis and visualization modules extensively used in cancer genomic studies. This integration enables researchers to perform several key analyses:
The command structure for invoking the Interpretation module in Musta is as follows:
musta interpret [options]
Here is a breakdown of the available options for the musta interpret
command:
General Options:
-h
, --help
: Show the help message and exit.Required Options:
--workdir PATH
(-w PATH
): Specifies the destination folder for analysis. This is where the Snakemake pipeline, logs, and analysis outputs will be located.--samples-file PATH
(-s PATH
): Points to a YAML file listing the datasets you wish to analyze. This file provides information about the samples to be processed. For further details, please refer to the dedicate section.Optional Options:
--all-variants
or -a
: Includes all variants in the final dataset. By default, only variants marked as PASS are included. Use this option if you want to include all variants regardless of their PASS status.Additional Options:
--force
, -f
: Forces the re-creation of all output files.--dryrun
, -d
: Describes the workflow but does not execute it. This is useful for previewing the analysis steps before running them.Obtain the “test-data-somatic” Dataset: First, download the “test-data-somatic” dataset by following the instructions provided in the dedicated section of this user guide.
sample_A25:
normal_sample_name:
- N1
tumor_sample_name:
- A25
normal_bam:
- /path/to/test-data-somatic/data/bam/N1.chr22.bam
tumor_bam:
- /path/to/test-data-somatic/data/bam/A25.chr22.bam
vcf:
- /path/to/test-data-somatic/data/vcf/sample_A25.musta.snvs.vcf.gz
maf:
- /path/to/test-data-somatic/data/maf/sample_A25.annotated.vep.maf
sample_B33:
normal_sample_name:
- N1
tumor_sample_name:
- B33
normal_bam:
- /path/to/test-data-somatic/data/bam/N1.chr22.bam
tumor_bam:
- /path/to/test-data-somatic/data/bam/B33.chr22.bam
vcf:
- /path/to/test-data-somatic/data/vcf/sample_B33.musta.snvs.vcf.gz
maf:
- /path/to/test-data-somatic/data/maf/sample_B33.annotated.vep.maf
sample_C2:
normal_sample_name:
- N1
tumor_sample_name:
- C2
normal_bam:
- /path/to/test-data-somatic/data/bam/N1.chr22.bam
tumor_bam:
- /path/to/test-data-somatic/data/bam/C2.chr22.bam
vcf:
- /path/to/test-data-somatic/data/vcf/sample_C2.musta.snvs.vcf.gz
maf:
- /path/to/test-data-somatic/data/maf/sample_C2.annotated.vep.maf
Modify it according to your specific requirements. Note: The “test-data-somatic” dataset refers only to chromosome 22 and is not suitable for a significant test. Therefore, we have included an additional MAF file in the demo data, downloaded from the Cancer Genome Atlas Program (TCGA), containing a subset of 25 samples from the BRCA cohort (breast cancer). Consequently, the samples.yml file assumes this form:
sample_BRCA:
maf:
- /path/to/test-data-somatic/data/maf/BRCA.subset_25_samples.maf
Run Musta Interpret: To execute the “Musta Interpret” module, use the following command with the “test-data-somatic” dataset.
musta interpret \
--workdir /path/to/workdir \
--samples-file /path/to/samples.yml
Ensure you replace the /path/to/ placeholders with the actual file paths on your system.
After initiating the execution of musta classify
, you will see the Musta log unfolding on your screen:
YYYY-MM-DD hh-mm-ss|INFO |main |musta v1.0.0 - End-to-end pipeline to detect, classify and interpret mutations in cancer
YYYY-MM-DD hh-mm-ss|INFO |main |Somatic Mutations Interpretation: 1. Identification of cancer driver genes 2. Check for enrichment of known oncogenic pathways. 3. Infer tumor clonality by clustering variant allele frequencies. 4. Deconvolution of Mutational Signatures
YYYY-MM-DD hh-mm-ss|INFO |main |Reading configuration file
YYYY-MM-DD hh-mm-ss|INFO |main |Setting paths
YYYY-MM-DD hh-mm-ss|INFO |main |Deploying musta:v1.2.0.1 pipeline from https://github.com/solida-core/musta.git
YYYY-MM-DD hh-mm-ss|INFO |main |Setting Config file
YYYY-MM-DD hh-mm-ss|INFO |main |Setting Samples file
YYYY-MM-DD hh-mm-ss|INFO |main |Running
Building DAG of jobs..</pre>
After a few minutes, you will see the last lines of the log, and the execution of Musta will conclude. You can locate the resulting analysis in the following destination folders:
musta classify
execution.Let’s take a look at the <WORKDIR>
folder
.
├── benchmarks
│ ├── ...
│ └── interpretation
├── logs
│ ├── ...
│ └── interpretation
├── musta
│ ├── config
│ ├── LICENSE
│ ├── README.md
│ ├── resources
│ └── workflow
└── outputs
├── YYYYMMDD-hhmmss.report.html
├── ...
└── interpretation
benchmarks
├── ...
└── interpretation
├── driver_gene_identification.txt
├── mutational_signature_analysis.txt
├── pathway_analysis.txt
├── tumor_heterogeneity.txt
└── variant_visualization.txt
logs
├── ...
├── interpretation
│ ├── driver_gene_identification.log
│ ├── mutational_signature_analysis.log
│ ├── pathway_analysis.log
│ ├── tumor_heterogeneity.log
│ └── variant_visualization.log
musta: The directory houses the core of the Musta tool, including the Snakemake pipeline, which can be accessed from the official Musta GitHub repository: https://github.com/solida-core/musta.
YYMMDD-hhmmss.report.html
: Auto-generated, independent, detailed HTML reports that include runtime statistics, provenance information, workflow topology, and results.interpretation
: The directory houses the outcomes and data derived from the execution of Musta’s interpretation module. Within this directory, users can explore a wealth of output files and data that are byproducts of the analysis. Specifically, you can find:
interpretation/
└── results
├── driver_gene_identification
│ ├── plots
│ └── tables
├── mutational_signature_analysis
│ ├── plots
│ └── tables
├── pathway_analysis
│ ├── plots
│ └── tables
├── tumor_heterogeneity
│ ├── plots
│ └── tables
└── variant_visualization
├── plots
└── tables
Let’s explore different scenarios for running Musta with various configurations using the musta interpret
command.
1. Run Musta Interpret:
By default, only variants marked as PASS are included.
musta interpret \
--workdir /path/to/workdir \
--samples-file /path/to/samples.yml
2. Includes all variants in the final dataset:
Use this option if you want to include all variants regardless of their PASS status.
musta interpret \
--workdir /path/to/workdir \
--samples-file /path/to/samples.yml \
--all-variants
variant_visualization
├── plots
│ ├── large_oncoplot.png
│ ├── large_oncoplot.with_samplenames.png
│ ├── lollipop
│ │ ├── ASPM.lollipop.png
│ │ ├── CDH1.lollipop.png
│ │ ├── GATA3.lollipop.png
│ │ ├── GRM3.lollipop.png
│ │ ├── HUWE1.lollipop.png
│ │ ├── MUC16.lollipop.png
│ │ ├── PIK3CA.lollipop.png
│ │ ├── PLCE1.lollipop.png
│ │ ├── TP53.lollipop.png
│ │ └── TTN.lollipop.png
│ ├── mafbar.png
│ ├── oncoplot.png
│ ├── oncoplot.with_samplenames.png
│ ├── rainfall
│ │ ├── TCGA-3C-AAAU-01A-11D-A41F-09 .rainfall.png
│ │ ├── TCGA-3C-AALI-01A-11D-A41F-09 .rainfall.png
│ │ ├── TCGA-3C-AALJ-01A-31D-A41F-09 .rainfall.png
│ │ ├── TCGA-3C-AALK-01A-11D-A41F-09 .rainfall.png
│ │ ├── TCGA-4H-AAAK-01A-12D-A41F-09 .rainfall.png
│ │ ├── TCGA-5L-AAT0-01A-12D-A41F-09 .rainfall.png
│ │ ├── TCGA-5L-AAT1-01A-12D-A41F-09 .rainfall.png
│ │ ├── TCGA-5T-A9QA-01A-11D-A41F-09 .rainfall.png
│ │ ├── TCGA-A1-A0SD-01A-11D-A10Y-09 .rainfall.png
│ │ ├── TCGA-A1-A0SE-01A-11D-A099-09 .rainfall.png
│ │ ├── TCGA-A1-A0SF-01A-11D-A142-09 .rainfall.png
│ │ ├── TCGA-A1-A0SG-01A-11D-A142-09 .rainfall.png
│ │ ├── TCGA-A1-A0SH-01A-11D-A099-09 .rainfall.png
│ │ ├── TCGA-A1-A0SI-01A-11D-A142-09 .rainfall.png
│ │ ├── TCGA-A1-A0SJ-01A-11D-A099-09 .rainfall.png
│ │ ├── TCGA-A1-A0SK-01A-12D-A099-09 .rainfall.png
│ │ ├── TCGA-A1-A0SM-01A-11D-A099-09 .rainfall.png
│ │ ├── TCGA-A1-A0SN-01A-11D-A142-09 .rainfall.png
│ │ ├── TCGA-A1-A0SO-01A-22D-A099-09 .rainfall.png
│ │ ├── TCGA-A1-A0SP-01A-11D-A099-09 .rainfall.png
│ │ ├── TCGA-A1-A0SQ-01A-21D-A142-09 .rainfall.png
│ │ ├── TCGA-A2-A04N-01A-11D-A10Y-09 .rainfall.png
│ │ ├── TCGA-A2-A04P-01A-31D-A128-09 .rainfall.png
│ │ ├── TCGA-A2-A04Q-01A-21W-A050-09 .rainfall.png
│ │ └── TCGA-A2-A04R-01A-41D-A117-09 .rainfall.png
│ ├── summary.png
│ ├── summary.with_samplenames.png
│ ├── TGCA_compare.png
│ ├── TGCA_compare.primary_site.png
│ ├── titv.png
│ ├── titv.with_samplenames.png
│ ├── top10_VAF.png
│ └── top20_VAF.png
└── tables
├── gene_summary.tsv
├── mutations_count_matrix.tsv
├── overview.tsv
├── sample_summary.tsv
├── TGCA_median_mutation_burden.tsv
├── TGCA_mutation_burden_perSample.tsv
├── TGCA_pairwise_t_test.tsv
├── transitions_and_transversions_fraction_contribution.txt
├── transitions_and_transversions_fraction_counts.txt
└── transitions_and_transversions_TiTv_fractions.txt
4 directories, 58 files
driver_gene_identification
├── plots
│ ├── drug_interactions.barplot.png
│ ├── drug_interactions.piechart.png
│ ├── oncodrive.png
│ └── somatic_interactions.png
└── tables
├── oncodrive.tsv
└── somatic_interactions.tsv
2 directories, 6 files
mutational_signature_analysis
├── plots
│ ├── Apobecdiff.png
│ ├── cosine_similarities_heatmap.png
│ ├── cosmic_signatures.png
│ ├── plotCophenetic.png
│ └── signature_contributions.png
└── tables
├── APOBEC_scores_matrix.tsv
├── cosime_similarities_matrix.tsv
├── nmf_matrix.tsv
└── signatures_matrix.tsv
2 directories, 9 files
pathway_analysis
├── plots
│ ├── oncogenic_pathways.png
│ └── pathways
│ ├── Cell_Cycle_pathway.png
│ ├── Hippo_pathway.png
│ ├── MYC_pathway.png
│ ├── NOTCH_pathway.png
│ ├── NRF2_pathway.png
│ ├── PI3K_pathway.png
│ ├── RTK-RAS_pathway.png
│ ├── TGF-Beta_pathway.png
│ ├── TP53_pathway.png
│ └── WNT_pathway.png
└── tables
└── oncogenic_pathways.tsv
3 directories, 12 files
tumor_heterogeneity
├── plots
│ └── clusters
│ ├── TCGA-3C-AAAU-01A-11D-A41F-09_clusters.png
│ ├── TCGA-3C-AALI-01A-11D-A41F-09_clusters.png
│ ├── TCGA-3C-AALJ-01A-31D-A41F-09_clusters.png
│ ├── TCGA-3C-AALK-01A-11D-A41F-09_clusters.png
│ ├── TCGA-4H-AAAK-01A-12D-A41F-09_clusters.png
│ ├── TCGA-5L-AAT0-01A-12D-A41F-09_clusters.png
│ ├── TCGA-5L-AAT1-01A-12D-A41F-09_clusters.png
│ ├── TCGA-5T-A9QA-01A-11D-A41F-09_clusters.png
│ ├── TCGA-A1-A0SD-01A-11D-A10Y-09_clusters.png
│ ├── TCGA-A1-A0SE-01A-11D-A099-09_clusters.png
│ ├── TCGA-A1-A0SF-01A-11D-A142-09_clusters.png
│ ├── TCGA-A1-A0SG-01A-11D-A142-09_clusters.png
│ ├── TCGA-A1-A0SH-01A-11D-A099-09_clusters.png
│ ├── TCGA-A1-A0SI-01A-11D-A142-09_clusters.png
│ ├── TCGA-A1-A0SJ-01A-11D-A099-09_clusters.png
│ ├── TCGA-A1-A0SK-01A-12D-A099-09_clusters.png
│ ├── TCGA-A1-A0SM-01A-11D-A099-09_clusters.png
│ ├── TCGA-A1-A0SN-01A-11D-A142-09_clusters.png
│ ├── TCGA-A1-A0SO-01A-22D-A099-09_clusters.png
│ ├── TCGA-A1-A0SP-01A-11D-A099-09_clusters.png
│ ├── TCGA-A1-A0SQ-01A-21D-A142-09_clusters.png
│ ├── TCGA-A2-A04N-01A-11D-A10Y-09_clusters.png
│ ├── TCGA-A2-A04P-01A-31D-A128-09_clusters.png
│ ├── TCGA-A2-A04Q-01A-21W-A050-09_clusters.png
│ └── TCGA-A2-A04R-01A-41D-A117-09_clusters.png
└── tables
├── successful.tsv
├── TCGA-3C-AAAU-01A-11D-A41F-09_clusterData.tsv
├── TCGA-3C-AAAU-01A-11D-A41F-09_clusterMeans.tsv
├── TCGA-3C-AALI-01A-11D-A41F-09_clusterData.tsv
├── TCGA-3C-AALI-01A-11D-A41F-09_clusterMeans.tsv
├── TCGA-3C-AALJ-01A-31D-A41F-09_clusterData.tsv
├── TCGA-3C-AALJ-01A-31D-A41F-09_clusterMeans.tsv
├── TCGA-3C-AALK-01A-11D-A41F-09_clusterData.tsv
├── TCGA-3C-AALK-01A-11D-A41F-09_clusterMeans.tsv
├── TCGA-4H-AAAK-01A-12D-A41F-09_clusterData.tsv
├── TCGA-4H-AAAK-01A-12D-A41F-09_clusterMeans.tsv
├── TCGA-5L-AAT0-01A-12D-A41F-09_clusterData.tsv
├── TCGA-5L-AAT0-01A-12D-A41F-09_clusterMeans.tsv
├── TCGA-5L-AAT1-01A-12D-A41F-09_clusterData.tsv
├── TCGA-5L-AAT1-01A-12D-A41F-09_clusterMeans.tsv
├── TCGA-5T-A9QA-01A-11D-A41F-09_clusterData.tsv
├── TCGA-5T-A9QA-01A-11D-A41F-09_clusterMeans.tsv
├── TCGA-A1-A0SD-01A-11D-A10Y-09_clusterData.tsv
├── TCGA-A1-A0SD-01A-11D-A10Y-09_clusterMeans.tsv
├── TCGA-A1-A0SE-01A-11D-A099-09_clusterData.tsv
├── TCGA-A1-A0SE-01A-11D-A099-09_clusterMeans.tsv
├── TCGA-A1-A0SF-01A-11D-A142-09_clusterData.tsv
├── TCGA-A1-A0SF-01A-11D-A142-09_clusterMeans.tsv
├── TCGA-A1-A0SG-01A-11D-A142-09_clusterData.tsv
├── TCGA-A1-A0SG-01A-11D-A142-09_clusterMeans.tsv
├── TCGA-A1-A0SH-01A-11D-A099-09_clusterData.tsv
├── TCGA-A1-A0SH-01A-11D-A099-09_clusterMeans.tsv
├── TCGA-A1-A0SI-01A-11D-A142-09_clusterData.tsv
├── TCGA-A1-A0SI-01A-11D-A142-09_clusterMeans.tsv
├── TCGA-A1-A0SJ-01A-11D-A099-09_clusterData.tsv
├── TCGA-A1-A0SJ-01A-11D-A099-09_clusterMeans.tsv
├── TCGA-A1-A0SK-01A-12D-A099-09_clusterData.tsv
├── TCGA-A1-A0SK-01A-12D-A099-09_clusterMeans.tsv
├── TCGA-A1-A0SM-01A-11D-A099-09_clusterData.tsv
├── TCGA-A1-A0SM-01A-11D-A099-09_clusterMeans.tsv
├── TCGA-A1-A0SN-01A-11D-A142-09_clusterData.tsv
├── TCGA-A1-A0SN-01A-11D-A142-09_clusterMeans.tsv
├── TCGA-A1-A0SO-01A-22D-A099-09_clusterData.tsv
├── TCGA-A1-A0SO-01A-22D-A099-09_clusterMeans.tsv
├── TCGA-A1-A0SP-01A-11D-A099-09_clusterData.tsv
├── TCGA-A1-A0SP-01A-11D-A099-09_clusterMeans.tsv
├── TCGA-A1-A0SQ-01A-21D-A142-09_clusterData.tsv
├── TCGA-A1-A0SQ-01A-21D-A142-09_clusterMeans.tsv
├── TCGA-A2-A04N-01A-11D-A10Y-09_clusterData.tsv
├── TCGA-A2-A04N-01A-11D-A10Y-09_clusterMeans.tsv
├── TCGA-A2-A04P-01A-31D-A128-09_clusterData.tsv
├── TCGA-A2-A04P-01A-31D-A128-09_clusterMeans.tsv
├── TCGA-A2-A04Q-01A-21W-A050-09_clusterData.tsv
├── TCGA-A2-A04Q-01A-21W-A050-09_clusterMeans.tsv
├── TCGA-A2-A04R-01A-41D-A117-09_clusterData.tsv
└── TCGA-A2-A04R-01A-41D-A117-09_clusterMeans.tsv
3 directories, 76 files