Downloads
Summary
This TSV file contains a unique identifier for each family (e.g., F000292, F000581), the classification type of the family, the number of members within the family and the count of datasets and scaffolds associated with the family
Metadata
This TSV file contains metadata for genomic families. Each row represents a family and includes details such as the number and percentage of metagenomes, metatranscriptomes, and isolates. Additionally, it provides information on the distribution of these families across different environments, including aerial roots, bulbs, endospheres, nodules, rhizoplanes, rhizospheres, stems, stem tubers, and unclassified regions
Domains
This TSV file contains data on PFAM domain annotations for each family. Each row includes information about the PFAM hit, the start and end positions of the hidden Markov model (HMM) alignment, and the corresponding genomic start and end positions. Additionally, it provides an accuracy score for the alignment
Sequences
This TSV file contains representative sequence data for each family. Each row includes the representative sequence length, the average length of sequences in that family, the header information and the sequence itself
Families
This archive contains FASTA files for all protein families. Each file corresponds to a specific family and includes its protein sequences in standard FASTA format. These sequences can be used for downstream analyses such as alignment, annotation, and phylogenetic reconstruction
Families Aligned
This archive contains FASTA files of aligned sequences for all protein families, where each file includes the multiple sequence alignment of the representative protein sequences within a family.
HMM
This archive contains HMM profile files for all protein families, representing probabilistic models built from the aligned sequences of each family for use in sequence similarity searches and annotation.
PDB
This archive contains predicted protein structure files in PDB format for all protein families, with each file corresponding to the representative 3D structure of a family's protein sequence.