Utilities for Provis
All of these files contain scripts and helper methods that are used by the core classes of provis.
provis.utils.atminfo module
- provis.utils.atminfo.import_atm_mass_info()
Funtion to load dictionary storing atomic mass information by atom type.
- dict
dictionary of atomic mass by atom name
- provis.utils.atminfo.import_atm_size_info(vw=False)
Funtion to load dictionaries storing atomic radii, color coding and Van-der-Waals radii by atom name.
Coloring from: https://sciencenotes.org/molecule-atom-colors-cpk-colors/
- Parameters:
- vw: bool, optional
Option to return vanderwaals radius. Default: False.
- Returns:
- dict
dictionary of atomic radius by atom name
- dict
dictionary of color by atom name
- dict
return dictionary of vw rdius by atom name
- provis.utils.atminfo.import_res_size_info()
Funtion to load dictionaries storing residue radii and color coding by residue name.
Coloring from: http://acces.ens-lyon.fr/biotic/rastop/help/colour.htm
- Parameters:
- dict
dictionary of radius by residue name
- dict
dictionary of color by residue name
provis.utils.charges_utils module
Utility functions for computing hydrogen bonds and electrostatics on protein surface.
- provis.utils.charges_utils.compute_angle_deviation(a: numpy.ndarray, b: numpy.ndarray, c: numpy.ndarray, theta: float) float
Computes the absolute angle deviation from theta. a, b, c form the three points that define the angle.
- Parameters:
- a: np.ndarray,
Coordinate vector of the first point.
- b: np.ndarray,
Coordinate vector of the second point.
- c: np.ndarray,
Coordinate vector of the third point.
- theta: float
Angle to compute deviation with respect to.
- Returns:
- float
absolute deviation of the angle formed by a, b, c with theta
- provis.utils.charges_utils.compute_angle_penalty(angle_deviation: float) float
Compute the angle penalty corresponding to angle of deviation.
- Parameters:
- angle_deviation: float
Angle of deviation.
- Returns:
- float
Angle penalty.
- provis.utils.charges_utils.compute_hbond_helper(atom_name: str, res: Bio.PDB.Residue.Residue, v: numpy.ndarray) float
Helper function. Computes the hydrogen bond for given atom.
- Parameters:
- atom_name: str
Name of atom.
- res: Residue
Residue corresponding to atom.
- v: np.ndarray
Vertices.
- Returns:
- float
hydrogen bonds of the atom
- provis.utils.charges_utils.compute_plane_deviation(a: numpy.ndarray, b: numpy.ndarray, c: numpy.ndarray, d: numpy.ndarray) float
Computes the absolute plane deviation from theta. a, b, c form the three points that define the angle.
- Parameters:
- a: np.ndarray,
Coordinate vector of the first point.
- b: np.ndarray,
Coordinate vector of the second point.
- c: np.ndarray,
Coordinate vector of the third point.
- theta: float
Angle to compute deviation with respect to.
- Returns:
- float
absolute deviation of the angle formed by a, b, c with theta
- provis.utils.charges_utils.compute_satisfied_CO_HN(atoms)
Compute the list of backbone C=O:H-N that are satisfied. These will be ignored.
- Parameters:
- atoms: BioPython atoms
list of atoms to be checked.
- Returns:
- set
set of C=O bonds
- set
set of H-N bonds
- provis.utils.charges_utils.is_acceptor_atom(atom_name: str, res: Bio.PDB.Residue.Residue) bool
Check if atom is acceptor atom.
- Parameters:
- atom_name: str
Name of the atom
- res: Residue
The corresponding residue
- Returns:
- bool
True if conditions met
- provis.utils.charges_utils.is_polar_hydrogen(atom_name: str, res_name: str) bool
Check if the atom in a given residue has polar hydrogens.
- Parameters:
- atom_name: str
Name of the atom
- res_name: str
Residue name
- provis.utils.charges_utils.normalize_electrostatics(in_elec: numpy.ndarray) numpy.ndarray
Normalizing charges on the surface, by clipping to upper and lower thresholds and converting all values to a -1/1 scale.
- Parameters:
- in_elec: np.ndarray
Input charges for all surface vertices
- Returns:
- np.ndarray
Normalized surface vertex charges
provis.utils.surface_feat module
https://github.com/bunnech/holoprot/blob/main/holoprot/feat/surface.py is the base for this file. Modifications were made.
Functions to compute features for a patch on the protein surface.
Some of these are borrowed from https://github.com/LPDI-EPFL/masif under the https://github.com/LPDI-EPFL/masif/blob/master/LICENSE license
- provis.utils.surface_feat.assign_props_to_new_mesh(new_vertices, old_vertices: numpy.ndarray, old_props: numpy.ndarray, feature_interpolation: bool = True) numpy.ndarray
Assign properties to vertices in modified mesh given the initial mesh. The assignment is carried using a KDTree data structure to query nearest points.
- Parameters:
- new_vertices: np.ndarray
Vertices on the modified mesh
- old_vertices: np.ndarray
Vertices on the original mesh
- old_props: np.ndarray
Property values for each vertex on the original mesh
- feature_interpolation: bool, (default True)
If set to True interpolates features to new vertices.
- Returns:
- np.ndarray
Property values for vertices on the modified mesh
- provis.utils.surface_feat.compute_charges(vertices: numpy.ndarray, pdb_id: str, path: str) numpy.ndarray
Computes electrostatics for the surface vertices. The function first calls the PDB2PQR executable to prepare the pdb file for electrostatics. Poisson- Boltzmann electrostatics are computed using APSB executable. Multivalue, provided within APSB suite is used to assign charges to each vertex. The charges are further normalized.
- Parameters:
- vertices: np.ndarray
Surface vertex coordinates
- pdb_id: str
PDB ID of the protein
- path: str
Path of the pqr file in the form {path}.pqr
- Returns:
- np.ndarray
Charge values for each vertex
- provis.utils.surface_feat.compute_hbonds(vertices: numpy.ndarray, residues: List[Bio.PDB.Residue.Residue], names: List[str]) numpy.ndarray
Compute H-bond (hydrogen-bond) induced charges at every vertex.
- Parameters:
- vertices: np.ndarray
Vertices of mesh
- residues: List[Residue]
List of residues to compute
- names: List[str]
List of custom names created by output_pdb_as_xyzrn()
- Returns:
- np.ndarray
Array of bonds, by vertex
- provis.utils.surface_feat.compute_hydrophobicity(names: List[str]) numpy.ndarray
Compute hydrophobicity value for all vertices on the surface. Each surface vertex has a mapping to the corresponding residue from the original protein. This is used to assign a hydrophobicity value to each vertex using the Kyte- Doolittle scale.
- Parameters:
- names: List[str]
Identifier names for each vertex in the surface
- Returns:
- np.ndarray
Hydrophobicity values for each surface vertex
- provis.utils.surface_feat.compute_shape_index(mesh) numpy.ndarray
Computes shape index for the patches. Shape index characterizes the shape around a point on the surface, computed using the local curvature around each point. These values are derived using Trimesh’s available geometric processing functionality.
- Parameters:
- mesh: Trimesh
The mesh is constructed using information about vertices and faces.
- Returns:
- np.ndarray
Shape index for each vertex
- provis.utils.surface_feat.compute_surface_features(surface: Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, List[str], Dict[str, str]], pdb_file: str, path: str, mesh=None, fix_mesh: bool = False, return_mesh: bool = False, pdb_id: Optional[str] = None) Tuple[numpy.ndarray]
Computes all patch features.
- Parameters:
- surface: Surface
Tuple of attributes characterizing the surface. These include vertices, faces, normals to each vertex, areas, fdue identifiers for vertices.
- pdb_file: str
PDB File containing the atomic coordinates
- path: str
Path to files without extensions. Usually data/tmp/{pdb_id}
- mesh: trimesh.Trimesh
The mesh.
- fix_mesh: bool, optional
Whether to fix the mesh by collapsing nodes and edges. Default: False.
- return_mesh: bool, optional
Whether to return the mesh. Default: False.
- pdb_id: str, optional
PDB id of the associated protein. Default: None
- Returns:
- np.ndarray
Shape index
- np.ndarray
Hydrogen bond induced charges
- np.ndarray
Hydrophobicity of each surface vertex
- np.ndarray
Electrostatics of each surface vertex
provis.utils.surface_utils module
https://github.com/bunnech/holoprot/blob/main/holoprot/utils/surface.py is the base for this file. Modifications were made.
Utilities for preparing and computing features on molecular surfaces.
- provis.utils.surface_utils.compute_normal(vertices: numpy.ndarray, faces: numpy.ndarray) numpy.ndarray
Compute normals for the vertices and faces
- Parameters:
- vertices: np.ndarray
Vertices of the mesh
- faces: np.ndarray
Faces of the mesh
- Returns:
- np.ndarray:
Normals of the mesh
- provis.utils.surface_utils.crossp(x: numpy.ndarray, y: numpy.ndarray) numpy.ndarray
Creates the cross product of two numpy arrays
- Parameters:
- x: np.ndarray
Array 1
- y: np.ndarray
Array 2
- Returns:
- np.ndarray:
(Array 1) x (Array 2)
- provis.utils.surface_utils.find_nearest_atom(coords, res_id, new_verts)
- provis.utils.surface_utils.fix_trimesh(mesh, resolution: float = 1.0)
Applies a predefined set of fixes to the mesh, and converts it to a specified resolution. These fixes include removing duplicated vertices wihin a certain threshold, removing degenerate triangles, splitting longer edges to a given target length, and collapsing shorter edges.
- Parameters:
- mesh: trimesh.Trimesh
Mesh
- resolution: float
Maximum size of edge in the mesh
- Returns:
- trimesh.Trimesh:
mesh with all fixes applied
- provis.utils.surface_utils.get_surface(out_path: str, density: float, center=[0, 0, 0])
Wrapper function that reads in the output from the MSMS executable to build the protein surface.
- out_path: str
path to output (output path from namechecker) directory. Usually data/tmp
- density: bool
Need to pass same density as used by the MSMS binary, as the face and vert files have the density included in their names. The variable is needed for loading these files.
- center: List[float], optional
Center of the atom cloud. Easily passed from DataHandler._centroid. Default: [0, 0, 0].
- Returns:
- numpy.ndarray:
vertices
- numpy.ndarray:
faces
- numpy.ndarray:
vertex normals
- list:
list of res_id’s from output_pdb_as_xyzrn()
- dict:
dictionary: residues as keys, areas as values
- provis.utils.surface_utils.output_pdb_as_xyzrn(pdb_file: str, xyzrn_file: str) None
Converts a .pdb file to a .xyzrn file.
- Parameters:
- pdb_file: str
path to PDB File to convert (with extension)
- xyzrn_file: str
path to the xyzrn File (with extension)
- provis.utils.surface_utils.prepare_trimesh(vertices: numpy.ndarray, faces: numpy.ndarray, normals: Optional[numpy.ndarray] = None, apply_fixes: bool = False)
Prepare the mesh surface given vertices and faces. Optionally, compute normals and apply fixes to mesh.
- Parameters:
- vertices: np.ndarray
Surface vertices
- faces: np.ndarray
Triangular faces on the mesh
- normals: np.ndarray
Normals for each vertex
- apply_fixes: bool
Optional application of fixes to mesh. Check fix_mesh for details on fixes. Default: False,
- Returns:
- trimesh.Trimesh:
Mesh
- provis.utils.surface_utils.read_msms(file_root: str) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, List[str]]
Read surface constituents from output files generated using MSMS.
- file_root: str
Root name for loading .face and .vert files (produced by MSMS). Default location is data/tmp/{pdb_id}s
- Parameters:
- numpy.ndarray:
vertices
- numpy.ndarray:
faces
- numpy.ndarray:
vertex normals
- list:
list of res_id’s from output_pdb_as_xyzrn()
Module contents
- provis.utils.get_residues(pdb_file)
- provis.utils.str2bool(v: str) bool
Converts str to bool.
- Parameters
name – v - String element
type – str
- Returns
boolean version of v