cath-map-clusters.ubuntu-20.04

Description

============
cath-map-clusters v0.16.10-0-g99edb28 [2021-04-20]
============

Map names from previous clusters to new clusters based on (the overlaps between)
their members (which may be specified as regions within a parent sequence).
Renumber any clusters with no equivalents.

Build
-----
   Apr 20 2021 13:24:46
   GNU C++ version 10.2.0
   GNU libstdc++ version 20200808
   Boost 1_71

Usage

Usage: cath-map-clusters [options] <input_file>

Map names from previous clusters to new clusters based on (the overlaps between)
their members (which may be specified as regions within a parent sequence).
Renumber any clusters with no equivalents.

When <input_file> is -, the input is read from standard input.

Miscellaneous:
  -h [ --help ]                         Output help message
  -v [ --version ]                      Output version information

Input:
  --map-from-clustmemb-file <file>      Map numbers from previous clusters specified in <file> to their equivalents in the working clusters where possible and
                                        if all the cluster names in <file> are positive integers, renumber leftover new clusters from one plus the largest
                                        or if not, rename with new_cmc_cluster_1, new_cmc_cluster_2, ...
                                        (of, if unspecified, renumber all working clusters from 1 upwards)
  --read-batches-from-input             Read batches of work from the input file with lines of format: `batch_id working_clust_memb_file prev_clust_memb_file` where:
                                         * batch_id             is a unique label for the batch (with no whitespace)
                                         * prev_clust_memb_file is optional

Mapping:
  --min_equiv_dom_ol <percent> (=60)    Define domain equivalence as: sharing more than <percent>% of residues (over the longest domain)
                                        (where <percent> must be ≥ 50)
  --min_equiv_clust_ol <percent> (=60)  Define cluster equivalence as: more than <percent>% of the map-from cluster's members having equivalents in the working cluster
                                        [and them being equivalent to > 20% of the working cluster's entries and > 50% of those that have an equivalence]
                                        (where <percent> must be ≥ 50%)

Output:
  --append-batch-id <id>                Append batch ID <id> as an extra column in the results output (equivalent to the first column in a --multi-batch-file input file)
  --output-to-file <file>               Write output to file <file> (or, if unspecified, to stdout)
  --summarise-to-file <file>            Print a summary of the renumbering to file <file>
  --print-entry-results                 Output the entry (domain)-level mapping results

Detailed help:
  --sorting-help                        Show the criteria for sorting unmapped clusters

The input cluster-membership data should contain lines like:

cluster_name domain_id

...where domain_id is a sequence/protein name, optionally suffixed with the domain's segments in notation like '/100-199,350-399'. The suffixes should be present for all of the domain IDs or for none of them. Domains shouldn't overlap with others in the same cluster-membership data.

Input data doesn't have to be grouped by cluster.

NOTE: When mapping (ie using --map-from-clustmemb-file), the mapping algorithm treats the two cluster membership files differently - see --min_equiv_clust_ol description.

Please tell us your cath-tools bugs/suggestions : https://github.com/UCLOrengoGroup/cath-tools/issues/new