MultiQC: summarize analysis results for multiple tools and samples in a single report (2024)

Article Navigation

Volume 32 Issue 19 October 2016

Article Contents

  • Abstract

  • 1 Introduction

  • 2 Materials and methods

  • 3 Typical applications

  • 4 Conclusion

  • Acknowledgements

  • Funding

  • References

  • < Previous
  • Next >

Journal Article

,

Philip Ewels *

1Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm 106 91, Sweden,

*To whom correspondence should be addressed.

Search for other works by this author on:

Oxford Academic

,

Måns Magnusson

2Department of Molecular Medicine and Surgery, Science for Life Laboratory, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden

Search for other works by this author on:

Oxford Academic

,

Sverker Lundin

3Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Royal Institute of Technology, Stockholm, Sweden

Search for other works by this author on:

Oxford Academic

Max Käller

3Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Royal Institute of Technology, Stockholm, Sweden

Search for other works by this author on:

Oxford Academic

Associate Editor: Jonathan Wren

Author Notes

Bioinformatics, Volume 32, Issue 19, October 2016, Pages 3047–3048, https://doi.org/10.1093/bioinformatics/btw354

Published:

16 June 2016

Article history

Received:

30 April 2016

Revision received:

30 April 2016

Accepted:

29 May 2016

Published:

16 June 2016

  • PDF
  • Split View
  • Views
    • Article contents
    • Figures & tables
    • Video
    • Audio
    • Supplementary Data
  • Cite

    Cite

    Philip Ewels, Måns Magnusson, Sverker Lundin, Max Käller, MultiQC: summarize analysis results for multiple tools and samples in a single report, Bioinformatics, Volume 32, Issue 19, October 2016, Pages 3047–3048, https://doi.org/10.1093/bioinformatics/btw354

    Close

Search

Close

Search

Advanced Search

Search Menu

Abstract

Motivation: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and large sample sets. Assessing analysis results across an entire project can be time consuming and error prone; batch effects and outlier samples can easily be missed in the early stages of analysis.

Results: We present MultiQC, a tool to create a single report visualising output from multiple tools across many samples, enabling global trends and biases to be quickly identified. MultiQC can plot data from many common bioinformatics tools and is built to allow easy extension and customization.

Availability and implementation: MultiQC is available with an GNU GPLv3 license on GitHub, the Python Package Index and Bioconda. Documentation and example reports are available at http://multiqc.info

Contact: phil.ewels@scilifelab.se

1 Introduction

Advances in next-generation sequencing are leading to an avalanche of data. Whilst opening doors to new analysis types and experimental designs, expanding sample numbers make studies increasingly vulnerable to confounding batch effects (Leek et al., 2010; Meyer and Liu 2014; Taub et al., 2010). Such biases are often subtle and difficult to detect and require careful quality control measures.

Most bioinformatics programs produce logs detailing their results. Dedicated QC tools such as FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc), Qualimap (Okonechnikov et al., 2015) and RSeQC (Wang et al., 2012) are excellent at highlighting potential problems in data. However, nearly all of these logs and reports are produced on a per-sample basis, requiring the user to find and compile QC results. This process is time consuming, repetitive and complex, making it prone to errors.

MultiQC addresses this problem by scanning given analysis directories for log files and QC reports, creating a single summary report visualizing results across all samples. Collecting data within a single report provides a fast way to scan key statistics quickly and easily (Fig. 1). Shared plots allow accurate comparison between samples, allowing detection of subtle differences not noticeable when switching between different files. Data visualization aids batch effect detection and minimizes the risk of confounding factors affecting the results of the study. MultiQC is the first tool of its type within the field; it has the potential to greatly improve quality control and reporting for researchers involved in next-generation sequencing, removing the need for custom comparative scripts.

MultiQC: summarize analysis results for multiple tools and samples in a single report (3)

Fig. 1.

Top of a typical MultiQC report. The general statistics table can be seen with metrics from a number of different tools gathered for each sample (Color version of this figure is available at Bioinformatics online.)

Open in new tabDownload slide

2 Materials and methods

2.1 Running MultiQC

MultiQC is written in Python and run on the command line; specified directories are searched recursively for any recognized files. Submodules for each supported tool are run, querying input files using configurable search settings. If any log files are found they are parsed, otherwise the module exits silently. Once all submodules have finished, the specified template is loaded and parsed using the Jinja2 package to render the final report. Parsed data is saved as tab delimited text, YAML or JSON for downstream use. Because submodules only contribute to the report if they find logs, MultiQC is run in the same way for every analysis type. At the time of writing, MultiQC supports 22 common bioinformatics tools including aligners, processing tools and QC programs.

2.2 MultiQC reports

MultiQC generates a single s elf-contained HTML report which can be shared and opened in any modern web browser. Reports render plots using the JavaScript plotting library HighCharts (http://www.highcharts.com). Plots are resizeable and interactive, some with click and drag zooming. Samples can be renamed, hidden and highlighted using a report toolbox. Plots can be exported in a range of publication-ready formats.

Reports with hundreds of samples become too large for use with HighCharts; instead MultiQC switches to rendering plots as images at run-time using the Python plotting library MatPlotLib (Hunter, 2007). Images are embedded within the HTML, maintaining a stand-alone file with consistent file size. These static reports are also suitable for conversion to PDF using tools such as Pandoc (http://pandoc.org).

Each report contains an interactive walk through of features. Tutorial videos can be found at http://multiqc.info along with tutorials and documentation describing installation, usage and troubleshooting.

2.3 Extending MultiQC

MultiQC supports a lot of common bioinformatics tools but it is inevitable that research groups may have their own bespoke scripts or require other customization. To accommodate this, MultiQC is built in such a way that custom code can be tied into its functionality easily. Code hooks allow external plugins to access and modify the internal workings of the program. The use of Python setuptools entry points allows modules, templates and plugins to be kept within a separate code base, whilst still executing as part of the main MultiQC program.

Extensive documentation makes adding to MultiQC simple; four new modules have been contributed by users to date and we are aware of at least three plugins written by different research groups. Adoption by the bioinformatics community has been rapid: MultiQC has been downloaded thousands of times within the past few months and is already integrated as standard within the popular bcbio-nextgen analysis toolkit (http://bcb.io).

3 Typical applications

3.1 Single cell data and population studies

Single cell and population studies are perhaps the perfect examples of large projects where accurate quality control of numerous datasets is critical. MultiQC is able to parse data for thousands of samples within minutes, adapting report output as required. Parsed data saved by MultiQC can be used for post-processing and dataset filtering. Reports reveal overall analysis success and make it easy to identify abnormal samples.

3.2 Sequencing facilities

MultiQC was originally developed for use in a high throughput sequencing facility. Reports give the overview required to spot failing samples and highlighting helps to identify groups of samples behaving in an irregular manner.

Plugins allow integration with in-house systems: we have written the MultiQC_NGI plugin which inserts meta data from our LIMS into reports and stores summary results parsed by MultiQC in our database. This functionality is enormously powerful, facilitating large scale internal data collection that would otherwise require numerous custom scripts. Templates allow report branding and reports are self-contained, making MultiQC an ideal tool for creating delivery reports.

4 Conclusion

As the field of next-generation sequencing matures, there are increasing numbers of bioinformatics tools producing ever more verbose descriptions of data. Integrating these statistics across tools with large sample sets is difficult and time-consuming. MultiQC can automate the parsing of this metadata, providing powerful visualizations with a simple interface. Extension and data export allow MultiQC to function as a central collection point at the terminus of analysis pipelines. Routine use can aid quality control steps early on in data processing, reducing risk of batch effects and other downstream analysis problems.

Acknowledgements

The authors would like to thank D. Klevebring, L. Pantano, G. Carrasco and R. Andeer for contributed modules and discussion.

Funding

This work was supported by the Science for Life Laboratory and the National Genomics Infrastructure, NGI.

Conflict of Interest: none declared.

References

Hunter

J.D.

(

2007

)

Matplotlib: A 2D Graphics Environment

.

Comput. Sci. Eng

.,

9

,

9095

.

Leek

J.T.

et al.. (

2010

)

Tackling the widespread and critical impact of batch effects in high-throughput data

.

Nat. Rev. Genet

.,

11

,

733739

.

Meyer

C.A.

Liu

X.S.

(

2014

)

Identifying and mitigating bias in next-generation sequencing methods for chromatin biology

.

Nat. Rev. Genet

.,

15

,

709721

.

Google Scholar

OpenURL Placeholder Text

Okonechnikov

K.

et al.. (

2015

)

Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data

.

Bioinformatics

,

32

,

292

294

.

Google Scholar

OpenURL Placeholder Text

Taub

M.A.

et al.. (

2010

)

Overcoming bias and systematic errors in next generation sequencing data

.

Genome Med

.,

2

,

87

.

Wang

L.

et al.. (

2012

)

RSeQC: quality control of RNA-seq experiments

.

Bioinformatics

,

28

,

21845

.

Google Scholar

OpenURL Placeholder Text

Author notes

Associate Editor: Jonathan Wren

© The Author 2016. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Advertisem*nt

Citations

Views

68,086

Altmetric

More metrics information

Metrics

Total Views 68,086

54,082 Pageviews

14,004 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 18
December 2016 51
January 2017 67
February 2017 65
March 2017 273
April 2017 157
May 2017 144
June 2017 119
July 2017 164
August 2017 206
September 2017 166
October 2017 188
November 2017 239
December 2017 349
January 2018 270
February 2018 314
March 2018 333
April 2018 341
May 2018 406
June 2018 307
July 2018 343
August 2018 376
September 2018 355
October 2018 226
November 2018 324
December 2018 333
January 2019 359
February 2019 374
March 2019 576
April 2019 476
May 2019 562
June 2019 686
July 2019 860
August 2019 773
September 2019 867
October 2019 732
November 2019 710
December 2019 610
January 2020 649
February 2020 831
March 2020 703
April 2020 471
May 2020 660
June 2020 1,650
July 2020 1,504
August 2020 873
September 2020 909
October 2020 836
November 2020 729
December 2020 686
January 2021 699
February 2021 669
March 2021 854
April 2021 750
May 2021 807
June 2021 761
July 2021 785
August 2021 777
September 2021 837
October 2021 874
November 2021 756
December 2021 860
January 2022 816
February 2022 843
March 2022 1,115
April 2022 972
May 2022 1,033
June 2022 932
July 2022 850
August 2022 1,013
September 2022 1,018
October 2022 829
November 2022 1,004
December 2022 835
January 2023 944
February 2023 1,051
March 2023 1,213
April 2023 1,157
May 2023 1,206
June 2023 1,434
July 2023 1,422
August 2023 1,108
September 2023 1,108
October 2023 1,257
November 2023 1,192
December 2023 1,176
January 2024 1,215
February 2024 1,410
March 2024 1,599
April 2024 1,323
May 2024 1,349
June 2024 1,065
July 2024 948

Citations

Powered by Dimensions

3,442 Web of Science

Altmetrics

×

Email alerts

Article activity alert

Advance article alerts

New issue alert

In progress issue alert

Receive exclusive offers and updates from Oxford Academic

Citing articles via

Google Scholar

  • Latest

  • Most Read

  • Most Cited

Epigenomics coverage data extraction and aggregation in R with tidyCoverage
ClusterMatch aligns single-cell RNA-sequencing data at the multi-scale cluster level via stable matching
PhysioFit: a software to quantify cell growth parameters and extracellular fluxes
DeepCRISTL: Deep transfer learning to predict CRISPR/Cas9 on-target editing efficiency in specific cellular contexts
A deep learning architecture for metabolic pathway prediction

More from Oxford Academic

Bioinformatics and Computational Biology

Biological Sciences

Science and Mathematics

Books

Journals

Advertisem*nt

MultiQC: summarize analysis results for multiple tools and samples in a single report (2024)

FAQs

What is a MultiQC report? ›

MultiQC is a tool that was developed to scan the individual QC reports, creating a single summary report to visualise the combined results across all samples. This tool enables the fast and easy analysis of key statistics as presented in Figure 1 below.

What is multi sample analysis? ›

In multi-sample analysis, data from several samples are combined into one analysis, making it possible, among other features, to test for across-group invariance of specific model parameters.

How long does MultiQC take? ›

It takes a couple of minutes to generate the MultiQC report. The report provides nice visualizations across samples, which is very useful to determine consistency and to identify problematic samples. The output of MultiQC is one HTML file ( multiqc_report_rnaseq. html ) and a data folder.

What is the name of MultiQC report? ›

The report is called multiqc_report. html by default. Tab-delimited data files are created in multiqc_data/ , containing additional information. You can use a custom name for the report with the -n / --filename parameter, or instruct MultiQC to create them in a subdirectory using the -o / --outdir parameter.

What is multi-criteria analysis an example of a tool for? ›

A Multi-Criteria Analysis (MCA) can be used to identify and compare different policy options by assessing their effects, performance, impacts, and trade-offs. MCA provides a systematic approach for supporting complex decisions according to pre-determined criteria and objectives.

What is multiple sampling method? ›

Multiple sampling is an extension of double sampling. It involves inspection of 1 to successive samples as required to reach an ultimate decision. Mil-Std 105D suggests is a good number. Multiple sampling plans are usually presented in tabular form. Procedure for multiple sampling.

What is multiple analysis? ›

The multiples analysis is a valuation technique that utilizes different financial metrics from comparable companies to value a target company. Thus, the assumption is that the relative value of certain financial ratios can be used to rank or value a company within a similar group.

How do I open a Multiqc report? ›

You can launch this report with open multiqc_report. html on the command line, or double clicking the file in a file browser.

What is a QAQC report? ›

Quality assurance and control (QA/QC) reports are essential documents in construction planning that ensure the project meets the required standards, specifications, and regulations.

What is a quality variance report? ›

A variance report is a document that compares planned financial outcomes with the actual financial outcome. In other words: a variance report compares what was supposed to happen with what happened. Usually, variance reports are used to analyze the difference between budgets and actual performance.

What does per base sequence content mean? ›

“Per Base Sequence Content” plots the percentage of each of the four nucleotides (T, C, A, G) at each position across all reads in the input sequence file. As for the per base sequence quality, the x-axis is non-uniform.

References

Top Articles
Uitjes met tot wel 70% korting: bekijk ons ruime aanbod! - Social Deal
EK voetbal 2024: scoor topdeals! - Social Deal
Mchoul Funeral Home Of Fishkill Inc. Services
Toa Guide Osrs
Global Foods Trading GmbH, Biebesheim a. Rhein
Pet For Sale Craigslist
Using GPT for translation: How to get the best outcomes
Shoe Game Lit Svg
Gabriel Kuhn Y Daniel Perry Video
Craigslist Nj North Cars By Owner
Beds From Rent-A-Center
Delectable Birthday Dyes
Gameplay Clarkston
Slapstick Sound Effect Crossword
Sunday World Northern Ireland
Natureza e Qualidade de Produtos - Gestão da Qualidade
Call Follower Osrs
The Weather Channel Facebook
Nj Scratch Off Remaining Prizes
Builders Best Do It Center
Nitti Sanitation Holiday Schedule
Void Touched Curio
800-695-2780
Bx11
Air Force Chief Results
Labby Memorial Funeral Homes Leesville Obituaries
ZURU - XSHOT - Insanity Mad Mega Barrel - Speelgoedblaster - Met 72 pijltjes | bol
Exterior insulation details for a laminated timber gothic arch cabin - GreenBuildingAdvisor
Aldi Bruce B Downs
Bellin Patient Portal
California Online Traffic School
Foodsmart Jonesboro Ar Weekly Ad
'Insidious: The Red Door': Release Date, Cast, Trailer, and What to Expect
Expression&nbsp;Home&nbsp;XP-452 | Grand public | Imprimantes jet d'encre | Imprimantes | Produits | Epson France
Fuse Box Diagram Honda Accord (2013-2017)
Earthy Fuel Crossword
Street Fighter 6 Nexus
Craigslist Free Puppy
Here’s how you can get a foot detox at home!
Omnistorm Necro Diablo 4
Craigslist - Pets for Sale or Adoption in Hawley, PA
1Exquisitetaste
No Boundaries Pants For Men
Gotrax Scooter Error Code E2
Deepwoken: How To Unlock All Fighting Styles Guide - Item Level Gaming
Sandra Sancc
Argus Leader Obits Today
Wwba Baseball
Okta Hendrick Login
Grandma's Portuguese Sweet Bread Recipe Made from Scratch
Haunted Mansion Showtimes Near The Grand 14 - Ambassador
Latest Posts
Article information

Author: Jamar Nader

Last Updated:

Views: 6174

Rating: 4.4 / 5 (75 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Jamar Nader

Birthday: 1995-02-28

Address: Apt. 536 6162 Reichel Greens, Port Zackaryside, CT 22682-9804

Phone: +9958384818317

Job: IT Representative

Hobby: Scrapbooking, Hiking, Hunting, Kite flying, Blacksmithing, Video gaming, Foraging

Introduction: My name is Jamar Nader, I am a fine, shiny, colorful, bright, nice, perfect, curious person who loves writing and wants to share my knowledge and understanding with you.