cath-resolve-hits

Description

============
cath-resolve-hits v0.14.1-0-gbb6e6a3 [2017-05-31]
============

Collapse a list of domain matches to your query sequence(s) down to the
non-overlapping subset (ie domain architecture) that maximises the sum of the
hits' scores.

Build
-----
   May 31 2017 08:27:41
   Clang version 3.6.2 (branches/release_36)
   GNU libstdc++ version 20160726
   Boost 1_57

Usage

Usage: cath-resolve-hits [options] <input_file>

Collapse a list of domain matches to your query sequence(s) down to the
non-overlapping subset (ie domain architecture) that maximises the sum of the
hits' scores.

When <input_file> is -, the input is read from standard input.

The input data may contain unsorted hits for different query protein sequences.

However, if your input data is already grouped by query protein sequence, then
specify the --input-hits-are-grouped flag for faster runs that use less memory.

Miscellaneous:
  -h [ --help ]                                  Output help message
  -v [ --version ]                               Output version information

Input:
  --input-format <format> (=raw_with_scores)     Parse the input data from <format>, one of available formats:
                                                    hmmer_domtblout  - HMMER domtblout format (must assume all hits are continuous)
                                                    hmmsearch_out    - HMMer hmmsearch output format (can be used to deduce discontinuous hits)
                                                    raw_with_scores  - "raw" format with scores
                                                    raw_with_evalues - "raw" format with evalues
  --min-gap-length <length> (=30)                When parsing starts/stops from alignment data, ignore gaps of less than <length> residues
  --input-hits-are-grouped                       Rely on the input hits being grouped by query protein
                                                 (so the run is faster and uses less memory)

Segment overlap/removal:
  --overlap-trim-spec <trim> (=30/10)            Allow different hits' segments to overlap a bit by trimming all segments using spec <trim>
                                                 of the form n/m (n is a segment length; m is the *total* length to be trimmed off both ends)
                                                 For longer segments, total trim stays at m; for shorter, it decreases linearly (to 0 for length 1).
                                                 To choose: set m to the biggest total trim you'd want for a really long segment;
                                                            then, set n to length of the shortest segment you'd want to have that total trim
  --min-seg-length <length> (=7)                 Ignore all segments that are fewer than <length> residues long

Hit preference:
  --long-domains-preference <val> (=0)           Prefer longer hits to degree <val>
                                                 (<val> may be negative to prefer shorter; 0 leaves scores unaffected)
  --high-scores-preference <val> (=0)            Prefer higher scores to degree <val>
                                                 (<val> may be negative to reduce preference for higher scores; 0 leaves scores unaffected)
  --apply-cath-rules                             Apply rules specific to CATH-Gene3D during the parsing and processing

Hit filtering:
  --worst-permissible-evalue <evalue> (=0.001)   Ignore any hits with an evalue worse than <evalue>
  --worst-permissible-bitscore <bitscore> (=10)  Ignore any hits with a bitscore worse than <bitscore>
  --worst-permissible-score <score>              Ignore any hits with a score worse than <score>
  --filter-query-id <id>                         Ignore all input data except that for query protein(s) <id>
                                                 (may be specified multiple times for multiple query proteins)
  --limit-queries [=<num>(=1)]                   Only process the first <num> query protein(s) encountered in the input data

Output ([...]-to-file options may be specified multiple times):
  --hits-text-to-file <file>                     Write the resolved hits in plain text to file <file>
  --quiet                                        Suppress the default output of resolved hits in plain text to stdout
  --output-trimmed-hits                          When writing out the final hits, output the hits' starts/stop as they are *after trimming*
  --summarise-to-file <file>                     Write a brief text summary of the input data to file <file> (or '-' for stdout)
  --html-output-to-file <file>                   Write the results as HTML to file <file> (or '-' for stdout)
  --json-output-to-file <file>                   Write the results as JSON to file <file> (or '-' for stdout)
  --export-css-file <file>                       Export the CSS used in the HTML output to <file> (or '-' for stdout)

HTML:
  --restrict-html-within-body                    Restrict HTML output to the contents of the body tag.
                                                 The contents should be included inside a body tag of class crh-body
  --html-max-num-non-soln-hits <num> (=80)       Only display up to <num> non-solution hits in the HTML
  --html-exclude-rejected-hits                   Exclude hits rejected by the score filters from the HTML

Detailed help:
  --cath-rules-help                              Show help on the rules activated by the --apply-cath-rules option
  --raw-format-help                              Show help about the raw input formats (raw_with_scores and raw_with_evalues)

The standard output is one line per selected hit, preceded by header lines (beginning "#"), the last of which (beginning "#FIELDS") lists the fields in the file, typically:
  #FIELDS query-id match-id score boundaries resolved
(`boundaries` and `resolved` describe a domain's starts / stops; `resolved` may include adjustments made to resolve overlaps between hits)

Please tell us your cath-tools bugs/suggestions : https://github.com/UCLOrengoGroup/cath-tools/issues/new