Loading Data¶
There are a few choices to make when using snoopy: which mode to use and how to get to your data. The following usage matrix summarises the choices. Following this each use-case is discussed in more detail.
I want to review...
- A single set of sequence files/candidate variants
- My BAM files are local
- Use manual mode and load files within the browser
- My BAM files are on a remote server with HTTP/S access
- Use manual mode and load files from remote HTTP
- My BAM/CRAM files are on a remote server without HTTP/S access
- Use manual mode and load files with the SSH Bridge
- Several sets of sequence files/candidate variants
- My BAM files are local
- Use batch mode and load files from the local file server
- My BAM files are on a remote server with HTTP/S access
- Use batch mode and load files from remote HTTP
- My BAM/CRAM files are on a remote server without HTTP/S access
- Use batch mode and load files with the SSH Bridge
Modes¶
There are two ways to use Snoopy: manual mode and batch mode.
Manual¶
Quickly load a few files and start QC with little preparation.
In this mode, you select one-by-one a single set of sequence files (eg a single individual or a trio) and a list of candidate variant locations. Snoopy will then load the set of sequence files and visit each of the candidate variant locations.
Batch¶
Review several sets of sequence files.
In this mode, you can review several sets of sequence files (e.g. several sets of individuals or several different trios). As loading all of this information one-by-one with in the web interface would be cumbersome, the batch mode requires a JSON file which lists all of the sequence files and the variant locations. This file contains an array of sessions, where each session consists of:
- A set of sequence files (BAM, CRAM) which you want to view together at each of the locations in
- A list of variant locations (SNPs or CNVs) which you want to review in the sequence files.
For example, a session may consist of:
- a trio of sequence files
mother.bam
,father.bam
,offspring.bam
- a set of de novo sites which need to be verified
chr5:96244805
,chr11:28483998
,chr12:106799289
, ...
Data Access¶
Local Files¶
Browser Load¶
It is possible to load local files within the browser but there are some limitations. Firstly, as Dalliance, the genome browser Snoopy wraps around, does not yet provide support for CRAM, it is only possible to upload BAM files. Secondly, as you can only load one file at a time in a web browser you are restricted to manual mode.
- Modes
- Manual
- File types
- Bam
- Command line arguments
- None
Local File Server¶
If you have several local files you would like to view with batch mode Snoopy includes a local server. You will still be limited to BAM files however.
- Modes
- Manual, Batch
- File types
- Bam
- Command line arguments
--local-server
,-l
(see Starting Snoopy for more details)
Remote Files¶
HTTP/S¶
If your files exist on a remote HTTP/S server, you will be able to access these in either manual or batch mode. You will be limited to BAM files however.
- Modes
- Manual, Batch
- File types
- Bam
- Command line arguments
- None
SSH Bridge¶
The SSH Bridge is useful if: your files exist on a remote server but cannot be accessed by HTTP/S; your files are in CRAM format. The SSH bridge works in the following way:
- Establish SSH connection to remote machine.
- Dalliance will make an HTTP/S request to Snoopy’s local server for a specific sequence file (x.bam) at a specific location (chr:start-end).
- HTTP/S request is parsed and turned into a samtools commdand: samtools view x.bam chr:start-end.
- The samtools command is sent, via SSH, to remote machine.
- The results of the command is sent back to the local machine over SSH.
- The output of the samtools command is parsed to JSON.
- JSON is served over local HTTP sever to the Dalliance browser.
This is the most flexible method, but as there quite a few steps involved in the process this is also the slowest.
- Modes
- Manual, Batch
- File types
- Bam, Cram
- Command line arguments
--ssh SSH
,-s
(see Starting Snoopy for more details)
Summary¶
Access mode | File Types | Mode |
---|---|---|
Browser Load | BAM | Manual |
Local File Server | BAM | Manual, Batch |
HTTP/S | BAM | Manual, Batch |
SSH Bridge | BAM, CRAM | Manual, Batch |
Note
When loading Local BAM files through the browser, you will also need to specify the accompanying BAI file. For the other access modes, as long as the BAI files exist in the same directory and have corresponding file names (ie x.bam ==> x.bam.bai
) you do not need to explicitly load them.
Input File Formats¶
Variant List File¶
The variant text file can list both SNPs and CNVs. SNPs can be in any of the following formats:
chr:location
chr-location
chr,location
chr location
The format for CNVs is as SNPs except the location consists of two numbers. For example, a CNV location may be 16:start-end.
Batch JSON File¶
A JavaScript Object Notation (JSON) file is used to specify a set of sessions when using the batch mode of Snoopy. The idea is to use construct the JSON file with some scripting language (e.g. Python or Perl) rather than having to manually loading files. The general structure of the batch file is an array of sessions as follows:
sessions:
session 1:
variants: array of variant locations | a file listing variant locations
sequence_files: array of sequence files (BAM or CRAM)
session 2:
variants: array of variant locations | a file listing variant locations
sequence_files: array of sequence files (BAM or CRAM)
.
.
.
In the JSON format:
{
"sessions": [
{
"variants": [
"chr-loc",
"chr-loc",
"chr-loc"
],
"sequence_files": [
"path/to/sequence file 1",
"path/to/sequence file 2",
"path/to/sequence file 3"
]
},
.
.
.
]
}
The format of the variants takes the form as that in the Variant List File section.
Example¶
For an explicit example of a batch file:
{
"sessions": [
{
"variants": [
"16-48000491",
"16-48001121",
"16-48001200"
],
"sequence_files": [
"/users/joe/examples/mother1.bam"
"/users/joe/examples/father1.bam"
"/users/joe/examples/offspring1.bam"
]
},
{
"variants": [
"12-18001",
"12-1820491",
"14-1803735",
"15-1840848",
],
"sequence_files": [
"/users/joe/examples/individual.bam"
]
}
]
}
Note
To test whether yout JSON batch file is valid you can use the online tool JSONLint