6. Points to note > Manual

본문 바로가기

Manual

6. Points to note

Page 정보

작성자 TGFam-admin Reply 0건 Read 936회 작성일 18-07-18 11:42

본문

6.1 If TGFam-Finder installation is failed

TGFam-Finder is a program based on known bioinformatics programs. Auto installation script is provided to make it easier for the user to install, since the specifications and status of all servers or computers are not the same, sometimes the installation is not perfect and errors may occur.

At this time, check whether the following libraries and modules are properly installed, and then execute the automatic installation again.

 

Check list

Yum (Ubuntu: apt-get): Use yum to update the following packages

- Packages for rpm or yum-based Linux distributions (RedHat / Fedora / CentOS) are:

zlib-devel, bzip2-devel, xz-devel
- Packages for dpkg-based Linux distributions (Debian / Ubuntu) are:

zlib1g-dev, libbz2-dev, liblzma-dev

 

Perl 5.6.1 or higher Version 5.8 or higher is highly recommended. Modules are tested against version 5.8 and above on 5.8 and above.

BioPerl, perl module – yaml, Glib version 3.20 or higher, KentLib, Libpng, Ncurses,

GNU make & C compiler (e.g. gcc or clang), Boost C++ libraries (version 1.47 or higher)

 

6.2 Configuration

- Users can input multiple ‘TARGET_DOMAIN_ID’, ‘TARGET_DOMAIN_NAME’ using comma delimiter but target domain names should be distinct.
   ex) $TARGET_DOMAIN_ID = “PF00319,cd00265”
      $TARGET_DOMAIN_NAME = “SRF,MEF”

- If users want to obtain final gene model including existing gene model of the target genome, users need to input CDS_OF_TARGET_GENOME and GFF3_OF_TARGET_GENOME.

- IDs in peptides (not RESOURCE_PROTEIN), tsv, coding DNA sequences, and gff3 should be matched. If users use merged peptides and tsv of genomes including target and allied species, IDs of the peptides and tsv should contain IDs of CDS and gff3.

- If users don’t insert RNASeq information, ISGAP and related analyses will not be performed and final gene model is generated combining protein mapping and augustus annotation except for ISGAP and related analyses.

- For annotation of genes having short target domains such as C2H2 zinc finger, we recommend that users register HMM_MATRIX_NANE for hmm matrix for specific domain ID(s). For example, we used PF00096.hmm in Pfam hmm matrix for annotation of C2H2 zinc finger gene families in animal genomes. 

 

 

6.3 Cautions for preparation of genomic resources

- Gff3 file should be simplified. Like the capture below, information of coding genes in the third column should contain gene, mRNA, and CDS. The last column only includes ID name like ’ID=XP00000’ (see below).

f453849089b40573866ea809f113cdfc_1531881739_562.png

- RESOURCE_PROTEIN in RESOURCE.config means combined target protein sequences in multiple genomes and users should prepare them. For example, we extracted and used protein sequences as RESOURCE PROTEIN containing FAR1 (PF3101) NBARC (PF00931) in plant genomes as well as C2H2 zinc finger(PF00096) and homeobox(PF00046) in animal genomes.

The IDs in protein and tsv files of target or allied species should be less than 25 characters.
 

ReplyList

Register된 Reply이 없습니다.

Member Login


그누보드5
Copyright © TGFam-Finder. All rights reserved.