Galaxy at a Glance

A brief introduction to Galaxy. What is Galaxy? Why should you use Galaxy? How do you use Galaxy?

Galaxy at a Glance

Data Intensive analysis for everyone

Web-based platform for computational biomedical research
- Developed at Penn State, Johns Hopkins and G. Washington universities with substantial outside contributions
- Open source under Academic Free License
More than 11,900 citations
More than 170 public Galaxy servers
- Many more non-public
- Both general-purpose and domain-specific

Core values

Accessibility
- Users without programming experience can easily specify parameters, run tools, workflows and parse/filter data
Reproducibility
- Galaxy captures information so that any user can repeat and understand a complete computational analysis
Transparency
- Users can share or publish their analyses (histories, workflows, visualizations)
- Pages: online Methods for your paper

Go Up

User Interface

Main Galaxy interface

Galaxy user login Galaxy user interface

The Galaxy homepage consists of four main sections (panels):

Left Activity Bar: Navigation to Tools, Workflows, Histories, etc.
Active Panel (left): Default shows expanded Tools panel
Central Viewing Panel: Main analysis workspace
Right History Panel: Current analysis files and datasets

Go Up

Activity Bar

Upload: Various ways to get data into Galaxy

Tools: Opens tool bar

Workflows: Access/create workflows with visual editor

Interactive Tools: Launch/manage interactive environments

Visualize: Create charts and visualizations

Histories: View/manage analysis histories

History Multiview: Search/copy across multiple histories

Datasets: View all datasets

Pages: Galaxy pages and dashboards

Libraries: Private/public data libraries

Notifications: Server updates and sharing alerts

Go Up

Tool interface

The tool search helps in finding a tool in a crowded toolbox

Tool interface

A tool form contains:

input datasets and parameters
help, citations, metadata
a Run Tool button to start a job,
which will add some output datasets to the history

Go Up

History

Location of all your analyses

collects all datasets produced by tools
collects all operations performed on the data

For each dataset (the heart of Galaxy’s reproducibility), the history tracks

Name, format, size, creation time, datatype-specific metadata
Tool id and version, inputs, parameters
Standard output (`stdout`) and error (`stderr`)
State: waiting; running; success; failed
Hidden, deleted, purged (== permanently deleted)

Go Up

Multiple histories

You can have as many histories as you want:

Each history should correspond to a different analysis
and should have a meaningful name

Go Up

History behavior is controlled by the History options

Most options are self explanatory

Create New history will not make your current history disappear
To see all of your histories, use the history switcher
You can Copy Datasets from one history to another

Go Up

Loading data

Importing data

Copy/paste from a file
Upload data from a local computer
Upload data from internet
Upload data from database queries
- UCSC, BioMart, ENCODE, modENCODE, Flymine etc.
Download shared data from public libraries or shared Data libraries, Histories, Workflows, Visualizations, and Pages on https://usegalaxy.org/
Upload data from FTP (>2GB)

See Tutorial

Go Up

Datatypes

When uploading, datatype can be automatically detected or assigned by user
For datasets produced by a tool, the datatype is assigned by the tool
Tools only accept input datasets with the appropriate datatypes
You can change the datatype in 2 ways:
- Edit Attributes -> Datatype (to fix a wrongly assigned datatype)
- Edit Attributes -> Convert Formats (converts the original dataset)

Go Up

Reference genomes

Genome build specifies which genome assembly a dataset is associated with
- e.g. mm10, hg19…
Genome build can be automatically detected or assigned by user
User can define their own custom genome build
New genome assembly can be added by the site Galaxy admin

Go Up

Data Libraries

Provide a way to conveniently share Galaxy datasets within a group of Galaxy users or with everybody that has access to a specific instance of Galaxy.

Can import data from filesystem without duplicating it.
Can import whole directories preserving the folder structure.
The dataset’s size does not count towards user’s quota.
- Every dataset in the library is stored only once no matter how many users are using it in their histories.
Uses roles and groups to control permissions on library/dataset level.
- Only admins can create libraries.

Go Up

Workflows

Workflow interface

Go Up

Workflows

Can be extracted from a history
- Allow to easily convert an existing history into an analysis workflow
Can be built manualy by adding and configuring tools using the workflow canvas
Can be imported using an existing shared workflow

Go Up

Why would you want to create workflows?

Re-run the same analysis on different input data sets
Change parameters before re-running a similar analysis
Make use of the workflow job scheduling
- job is submitted as soon as its inputs are ready
Create sub-workflows: a workflow inside another workflow
Share workflows for publication and with the community

Go Up

Visualization

Datatypes know what tools can be used to visualize datasets:
- Sequencing data has a button for visualizing in IGV
- Tabular data will prompt you to build charts
- Protein data can be seen in a 3D viewer
Interactive environments: Jupyter, RStudio, etc

Go Up

You can share your Galaxy items - histories, workflows, visualizations, and pages - with other people in three different ways:
- Directly using a Galaxy account’s email addresses on the same instance
- Using a web link, with anyone who knows the link
- Using a web link and publishing it to make it accessible to everyone from the Shared Data menu

Go Up

Galaxy Training for Pathogen Genomics, AMR Detection & Virus ID

Galaxy at a Glance

Galaxy at a Glance

Core values

User Interface

Main Galaxy interface

Activity Bar

Tool interface

History

Multiple histories

History options menu

Loading data

Importing data

Datatypes

Reference genomes

Data Libraries

Workflows

Workflow interface

Workflows

Why would you want to create workflows?

Visualization

Galaxy Training for Pathogen Genomics, AMR Detection & Virus ID

Galaxy at a Glance

Galaxy at a Glance

Core values

User Interface

Main Galaxy interface

Activity Bar

Tool interface

History

Multiple histories

History options menu

Loading data

Importing data

Datatypes

Reference genomes

Data Libraries

Workflows

Workflow interface

Workflows

Why would you want to create workflows?

Visualization

Data sharing