ensemble
Ensemble class
- class dpet.ensemble.Ensemble(code: str, data_path: str = None, top_path: str = None, database: str = None, chain_id: str = None, residue_range: Tuple = None)
Bases:
object
Represents a molecular dynamics ensemble.
- Parameters:
code (str) – The code identifier of the ensemble.
data_path (str, optional) – The path to the data file associated with the ensemble. It could be a path to one multi-model pdb file , a path to a folder contain pdb files for each model, or .xtc , .dcd trajectory files. Default is None.
top_path (str, optional) – The path to the topology file associated with the ensemble. In case of having trajectory file. Default is None.
database (str, optional) – The database from which to download the ensemble. Options are ‘ped’ and ‘atlas’. Default is None.
chain_id (str, optional) – Chain identifier used to select a single chain to analyze in case multiple chains are loaded. Default is None.
residue_range (Tuple, optional) – A tuple indicating the start and end of the residue range (inclusive), using 1-based indexing. Default is None.
Notes
If the database is ‘atlas’, the ensemble code should be provided as a PDB ID with a chain identifier separated by an underscore. Example: ‘3a1g_B’.
If the database is ‘ped’, the ensemble code should be in the PED ID format, which consists of a string starting with ‘PED’ followed by a numeric identifier, and ‘e’ followed by another numeric identifier. Example: ‘PED00423e001’.
The residue_range parameter uses 1-based indexing, meaning the first residue is indexed as 1.
- extract_features(featurization: str, *args, **kwargs)
Extract features from the trajectory using the specified featurization method.
- Parameters:
featurization (str) – The method to use for feature extraction. Supported options: ‘ca_dist’, ‘phi_psi’, ‘a_angle’, ‘tr_omega’, ‘tr_phi’, and ‘ca_phi_psi’.
min_sep (int, optional) – The minimum sequence separation for angle calculations. Required for certain featurization methods.
max_sep (int, optional) – The maximum sequence separation for angle calculations. Required for certain featurization methods.
Notes
This method extracts features from the trajectory using the specified featurization method and updates the ensemble’s features attribute.
- get_chains_from_pdb()
Extracts unique chain IDs from a PDB file.
- Raises:
FileNotFoundError – If the specified PDB file or directory does not exist, or if no PDB file is found in the directory.
ValueError – If the specified file is not a PDB file and the path is not a directory.
- get_features(featurization: str, normalize: bool = False, *args, **kwargs) Sequence
Get features from the trajectory using the specified featurization method.
- Parameters:
featurization (str) – The method to use for feature extraction. Supported options: ‘ca_dist’, ‘phi_psi’, ‘a_angle’, ‘tr_omega’, ‘tr_phi’, ‘rg’, ‘prolateness’, ‘asphericity’, ‘sasa’, ‘end_to_end’.
min_sep (int) – The minimum sequence separation for angle calculations.
max_sep (int) – The maximum sequence separation for angle calculations.
- Returns:
features – The extracted features.
- Return type:
Sequence
Notes
This method extracts features from the trajectory using the specified featurization method.
- get_num_residues()
- load_trajectory(data_dir: str)
Load a trajectory for the ensemble.
- Parameters:
data_dir (str) – The directory where the trajectory data is located or where generated trajectory files will be saved.
Notes
This method loads a trajectory for the ensemble based on the specified data path. It supports loading from various file formats such as PDB, DCD, and XTC. If the data path points to a directory, it searches for PDB files within the directory and generates a trajectory from them. If the data path points to a single PDB file, it loads that file and generates a trajectory. If the data path points to a DCD or XTC file along with a corresponding topology file (TOP), it loads both files to construct the trajectory. Additional processing steps include checking for coarse-grained models, selecting a single chain (if applicable), and selecting residues of interest based on certain criteria.
- normalize_features(mean: float, std: float)
Normalize the extracted features using the provided mean and standard deviation.
- Parameters:
mean (float) – The mean value used for normalization.
std (float) – The standard deviation used for normalization.
Notes
This method normalizes the ensemble’s features using the provided mean and standard deviation.
- random_sample_trajectory(sample_size: int)
Randomly sample frames from the original trajectory.
- Parameters:
sample_size (int) – The number of frames to sample from the original trajectory.
Notes
This method samples frames randomly from the original trajectory and updates the ensemble’s trajectory attribute.