provis.src.processing package

These classes constitute the “brain” of provis. All computation is done here (sometimes by calling the utils package).

The results/outputs of these classes are passed to the plotting classes for plotting.

provis.src.processing.data_handler module

class provis.src.processing.data_handler.DataHandler(nc, fc=None)

Bases: object

The ‘brain’ of provis, when it comes to handling atomic positions.

This class loads information from a variety of files and creates meshes to be plotted. Upper level classes - eg. Protein - use DataHandler objects to create the meshes.

The DataHandler class loads atom-positional information from a .pdb file and from this information computes the necessairy molecular structure mesh. It also loads pre-defined dictionaries from atminfo.py, that encode the size, coloring and mass of a given atom or residue. The member functions range from loading the atoms from the .pdb file and storing them by type, to creating the meshes from this information, as well as calculating bonds or the backbone.

get_atom_mesh(atom_data, vw=0, probe=0, phi_res=10, theta_res=10)

Create a list of Shperes and colors representing each atom for plotting. Can later be added to a mesh for plotting.

The code in words: Iterates through the atom_data dictionary by atom type (from get_atoms()). It creates uniform Spheres (same size and color) in the position specified by the coordinates list for each atom type. Also differentiates between Van der Waals and normal radii and handles unkown atoms.

Parameters:

atom_data: dict: Dictionary of atoms and their coordinates, by atom type.
vw: bool, optional: ptional - When set to True Van-der-Waals atomic radii used instead of empirical radii. Default: False.
probe: int, optional: size of probe (representing the solvent size) needed for surface calculation. Default: 0.
phi_res: int, optional: pyvista phi_resolution for Sphere objects representing atoms. Default: 10.
theta_res: int, optional: pyvista theta_resolution for Sphere objects representing atoms. Default: 10.

Returns:

list: List of pyvista Shperes representing each atom
list: List of colors corresponding to each atom
list: List of atom ID’s for each atom

get_atom_trimesh(atom_data, vw=False, probe=0)

Create a list of shperes and colors representing each atom for plotting in a Trimesh format. Used for feature computation in the surface_handler class.

The code in words: Iterates through the atom_data dictionary by atom type (from get_atoms()). It creates uniform Spheres (same size and color) in the position specified by the coordinates list for each atom type. Also differentiates between Van der Waals and normal radii and handles unkown atoms.

Parameters:

atom_data: dict: Dictionary of atoms and their coordinates, by atom type.
vw: bool, optional: ptional - When set to True Van-der-Waals atomic radii used instead of empirical radii. Default: False.
probe: int, optional: size of probe (representing the solvent size) needed for surface calculation. Default: 0.

Returns:

list: List of pyvista Shperes representing each atom
list: List of colors for each atom
list: List of atom ID’s for each atom

get_atoms(show_solvent=False, model_id=0)

Creates a dictionary that stores the 3D coordinates for each atom. The dictionary keys are the atom names. For each atom type in the given molecule the coordinates of the atoms of this type are stored in a list within the dictionary.

The code in words: The .pdb file (loaded in __init__()) is iterated through. For each atom it is checked if the type of this atom is already in the dictionary. If not then a new list is created with the coordinates of this atom and added to the dictionary with the name of the atom as the key. If the name of this atom is already present then the coordinates of the current atom are added to the list of coordinates of this same type of atom. The dictionary is returned.

Parameters:

show_solvent: bool, optional: If True solvent molecules also added to retrun dictionary. Default: False.
model_id: int, optional: The dynamic model ID of the desired molecule. Count starts at 0. Leave default value for static molecules. Default: 0.

Returns:

dict: Dictionary of atomic coordinates by atom type.
int: Maximum coordinate in y axis. (Used to create default camera)

get_atoms_IDs(model_id=0)

Get dictionary of atomic coordinates (same as get_atoms()) and residue IDs (format from output_pdb_as_xyzrn()) from the xyzrn file. Also return a list of all the atomic coordinates in a list in the same order as in the .pdb file.

Used by the Surface class.

The code in words: The .xyzrn file is loaded and is iterated through. The relevant fields are stored to temporary variables. For each atom it is checked if the type of this atom is already in the dictionary. If not then a new list is created with the coordinates of this atom and added to the dictionary with the name of the atom as the key. If the name of this atom is already present then the coordinates of the current atom are added to the list of coordinates of this same type of atom. Regardless of the atom already being present in the dictionary the residue id and the coordinates are added to the two lists. The dictionary and the two lists are returned.

Parameters:

model_id: int, optional: The dynamic model ID of the desired molecule. Count starts at 0. Leave default value for static molecules. Default: 0.

Returns:

dict: Dictionary of atomic coordinates by atom type.
list: List of unique residue IDs (format from output_pdb_as_xyzrn())
list: Atomic coordinates (in same order as the residue IDs)

get_backbone_mesh(model_id=0)

Creates and returns a Spline object representing the backbone of the protein.

The code in words: Iterates through the res_data dictionary by atom type (from get_residues()). Calculates the center of each residue and returns these points as a numpy array. (Later used to create a Spline.)

Parameters:

model_id: int, optional: The dynamic model ID of the desired molecule. Count starts at 0. Leave default value for static molecules. Default: 0.

Returns:

pyvista.Spline: Spline running through coordinates representing the centre of mass of each residue.

get_bond_mesh(model_id=0)

Determine bonds from 3D information.

The color information is as follows:: White for all single bonds, Blue for all double bonds, Green for all triple bonds, Red for all amide bonds, Purple for all aromatic bonds, Black for everything else.

The code in words: Parse mol2 file (also works on multi model file). Find where the boundaries of the current molecule are in the file. Extract the atomic and bond information by creating DataFrame. From the DataFrames get the information corresponding to the current bond: create a pyvista.Line() and store the bond type. Return the compiled lists of Lines and bond types (colors).

Parameters:

model_id: int, optional: The dynamic model ID of the desired molecule. Count starts at 0. Leave default value for static molecules. Default: 0.

Returns:

list: List of pyvista lines representing each bond.
list: List of colors corresponding to the lines in the above list
list: List of names of the bonds: single, double, triple, amide, aromatic, unkown

get_residue_info(res, chain, option)

Calculates information about specified residue from mol2 file. Depending on what is specified, either the center of mass (COM) or the charge is computed.

Parameters:

res: str: Residue number of specified residue be looked at.
chain: choose what property of residue you want. com for Centre Of Mass, ch for charge: Chain number of corresponding to residue be looked at. str - options: com, ch

Returns:

list: List of COM coords of given (exact) residue.

get_residue_mesh(res_data, phi_res=25, theta_res=25)

Create a list of Shperes and colors representing each residue for plotting. Can later be added to a mesh for plotting.

The code in words: Iterates through the res_data dictionary by atom type (from get_residues()). It creates uniform Spheres (same size and color) in the position specified by the coordinates list for each residue type. Also differentiates between Van der Waals and normal radii and handles unkown residues.

Parameters:

res_data: dict: Dictionary of residues and their coordinates by residue type.
phi_res: int, optional: pyvista phi_resolution for Sphere objects representing residues. Defaul: 25.
theta_res: int, optional: pyvista theta_resolution for Sphere objects representing residues. Default:25.

Returns:

list: List of pyvista Shperes representing each residue
list: List of colors for each residue
list: List of residue names

get_residues(model_id=0, show_solvent=False)

Creates a dictionary of coordinates by residues from structure object

The code in words: The .pdb file (loaded in __init__()) is iterated through. For each residue it is checked if the type of this residue is already in the dictionary. If not then a new list is created with the coordinates of this residue and added to the dictionary with the type of the residue as the key. If the type of this residue is already present then the coordinates of the current residue are added to the list of coordinates of this same type of residue. The coordinates of the residue are calculated as the arithmetic center of the coordinates. The dictionary is returned.

Parameters:

model_id: int, optional: The dynamic model ID of the desired molecule. Count starts at 0. Leave default value for static molecules. Default: 0.
show_solvent: bool, optional: If True solvent molecules also added to retrun dictionary. Default: False.

Returns:

dict: Dictionary of atomic coordinates by residue type.

get_structure()

Return the loaded structure object

Returns:: structure

provis.src.processing.file_converter module

class provis.src.processing.file_converter.FileConverter(nc, density=3.0, convert_all=False)

Bases: object

Class to create and destroy necessary files required in other parts of code. The member functions call the binaries and scripts to convert the files.

Also has a cleanup function that removes everything in the “root_directory”/data/tmp (and data/img if specified) directory. Best practice is to call this function at the end of your main file. However if you want to plot the same protein many times, then it is benificial to keep the temporary (data/tmp) files as if they exist provis will not recompute them.

cleanup(delete_img: bool = False, delete_meshes: bool = False)

Deletes all files related to the current .pdb id from data/tmp (and if specified, the data/img and data/meshes) directories.

CAUTION: provis does not recompute existing files. So if you have a molecule that you want to plot multiple times then do not delete the temporary files. WARNING: as provis does not recompute existing files it might occur that an old version of the file is stored in the temporary directories and this might cause provis to fail. If this is the case simply delete all temporary files (as well as meshes).

Parameters:

delete_img: bool, optional: If True all files of the form {pdb_id}_{*} will be deleted from the data/meshes directory. (pdb_id is the name of the .pdb file without the .pdb extension and {*} represents that “anything”). Default: False.
delete_meshes: bool, optional: If True all files of the form {pdb_id}_{*} will be deleted from the data/meshes directory. (pdb_id is the name of the .pdb file without the .pdb extension and {*} represents that “anything”). Default: False

decompose_traj(path)

As the MSMS binary is unable to work with trajectory .pdb files the large .pdb file containing all models needs to be decomposed to one file per model. This function completes this exact task.

Only executes decomposition if the first file (“{pdb_file_name}_0.pdb”) does not exists to avoid unnecessairy recomputation.

Parameters:

path: str: Name of input (pdb) file (without extension)

Returns:

int: Number of models in the trajectory.

msms(path, dens)

Run the msms binary for given filename.

It takes the {path}.xyzrn file as input and output is written to {path}_out_{dens} Binary path is read in from environment variable: MSMS_BIN. If environment variable does not exist binary will be looked up in provis/binaries/msms.

Parameters:

path: str: Path to the/Name of the .xyzrn file to be converted.
dens: float: Density of triangulation

Returns:

void: face and vert files

pdb_to_mol2(path, outpath)

Run openbabel, to convert pdb to mol2

Parameters:

path: str: Name of input (pdb) file (without extension)
outpath: str: Name of desired output file (without extension). It will add .mol2 to the given path.

Returns:

void: mol2 file

pdb_to_pqr(path, outpath, forcefield='swanson')

Run the pdb2pqr binary for given filename.

It takes the {path}.pdb file as input and output is written to {outpath}.pqr. Binary path is read in from environment variable: MSMS_BIN. If environment variable does not exist binary will be looked up in provis/binaries/msms.

Binary path is read in from environment variable: PDB2PQR_BIN. If environment variable does not exist binary will be looked up in binaries/pdb2pqr/pdb2pqr.

Parameters:

path: str: Name of input (pdb) file (without extension)
outpath: str: Name of desired output file (without extension). It will add .pqr to the given path.
forcefield: str, optional: Force field used for charge computation, by binary. Default: swanson. Options: amber, charmm, parse, tyl06, peoepb and swanson

Returns:

void: pqr file

pdb_to_xyzrn(path, output)

Converts .pdb to .xyzrn file.

Parameters:

path: str: Name of input (pdb) file (without extension)
output: str: Name of output (xyzrn) file (without extension)

Returns:

void: xyzrn file

provis.src.processing.name_checker module

class provis.src.processing.name_checker.NameChecker(name, base_path: Optional[str] = None)

Bases: object

This class provides uniform names and path locations to all the other classes of provis. NameChecker has internal variables and a method to return these variables.

Class member variables: self._pdb_name - Full path to the pdb file without the .pdb extension. Usually PROVIS_PATH/data/pdb/{pdb_id}. self._out_path - Full path to the temporary files. The names of all temporary files are derived from this variable. Usually PROVIS_PATH/data/tmp/{pdb_id}. self._base_path - Full path of the provis directory or any directory that has the following directory structure within: {path}/data/data, {path}/data/img, {path}/data/tmp, {path}/data/pdb.

return_all()

Return all class variables. This function is used by all other provis classes to retrieve the paths to the files needed.

Returns:

str: path to the input pdb id (file without the extension)
str: path to the output pdb id - data/tmp/{pdb_id}
str: path to the root of the provis directory. Following file structure HAS TO exist within: {path}/data/data, {path}/data/img, {path}/data/tmp, {path}/data/pdb.

provis.src.processing.protein module

class provis.src.processing.protein.Protein(pdb_name, base_path=None, density=3.0, model_id=0)

Bases: object

The protein class encapsulates every other class in the provis.src.processing package.

This class contains all the necessary information need for plotting.

provis.src.processing.residue module

class provis.src.processing.residue.Residue(id=None, chain=0, padding=0)

Bases: object

Residue class is used for plotting a bounding box around the specified residue.

add_residue(id, chain=0)

Add a new residue to the internal list of residues.

Parameters:

id: int, optional: Residue id. Count starting at 0. Default: None.
chain: int, optional: Chain id. Count starting at 0. Specify which chain the residue is on. Default 0 (in case of single chain). Default: 0.

get_res_info()

Returns all internal information of class

Returns:

list:: list of the current residues
list:: list of chain ID’s corresponding to residues
int:: padding for bounding box

remove_residue(id, chain=0)

Remove speciefied residue from internal list.

Parameters:

id: int, optional: Residue id. Count starting at 0. Default: None.
chain: int, optional: Chain id. Count starting at 0. Specify which chain the residue is on. Default 0 (in case of single chain). Default: 0.

provis.src.processing.surface_handler module

class provis.src.processing.surface_handler.SurfaceHandler(nc, fc=None, dh=None, density=3.0)

Bases: object

The ‘brain’ of provis, when it comes to handling surfaces.

This class loads information from a variety of files and creates meshes to be plotted. Upper level classes - eg. Surface - use SurfaceHandler objects to create the meshes.

The SurfaceHandler class loads atom-positional information from a .pdb file and from this information computes the surface mesh. The surface can be computed by the MSMS binary or natively. The MSMS version is chemically more accurate and faster, but the MSMS binary has to be downloaded for it to work.

get_assignments()

Get assignments (coloring) for the mesh. File has to exist, no way to produce it with provis. Loads “root directory”/data/tmp/{pdb_id}.pth and returns it.

Returns:

PyTourch object: Coloring of surface.

get_surface_features(mesh, feature, res_id=None)

Get the coloring corresponding to a specific feature.

The code in words: Creates the pqr file if it does not exist. If the res_id variable is not empty retrieve the surface structure (list of specific surface-related information) using the get_surface() function. Else compile the surface structure from the mesh and the find_nearest_atom() function. Next, using the above mentioned surface structure compute all surface feature information. Finally return the coloring corresponding to the specified feature.

Parameters:

mesh: Trimesh: The mesh
feature: str: Name of feature we are interested in. Options: hydrophob, shape, charge, hbonds.
res_id: bool, optional: List of unique residue IDs (format from output_pdb_as_xyzrn()). Default: False.

Returns:

numpy.ndarray: Array of coloring corresponding to surface.

Raises:

NotImplementedError: If unkown feature specified error is raised

msms_mesh_and_color(feature=None, patch=False)

Return the mesh and coloring created by the MSMS binary.

The code in words: If self._mesh_needed is set to True - if the mesh could not be loaded from a file - compute the mesh using the .face and .vert files created by the MSMS binary. If the mesh is needed compute the mesh from the .face and .vert files. Finally, if a feature is specified check if the color information could be loaded from a file (self._color_needed) and compute it if needed. If no feature specified set the variable that stores color information to None. This will result in a white mesh.

Parameters:

feature: str, optional: Name of feature, same as in get_surface_features. Options: hydrophob, shape, charge, hbonds. Defaults to None.
patch: bool, optional: Set coloring of mesh manually, from a file. If set to True get_assignments() will be called. Defaults to False.

native_mesh_and_color(feature=None)

Returns a mesh without the need for the MSMS binary. The mesh and coloring is also saved to the following file names: Mesh: “root directory”/data/meshes/{pdb_id}_{model_id}.obj Color: “root directory”/data/meshes/{pdb_id}_{feature}_{model_id} Always returns a pyvista.PolyData mesh of the surface and if a feature is specified it also returns the coloring according to that feature.

The code in words: If self._mesh_needed is set to True - if the mesh could not be loaded from a file - compute the mesh using the native tools. Get the atomic positional information from the DataHandler class. Extract the surface of the combined mesh and use the o3d.geometry.TriangleMesh.create_from_point_cloud_poisson() method to create a smooth surface. Finally, if a feature is specified check if the color information could be loaded from a file (self._color_needed) and compute it if needed. If no feature specified set the variable that stores color information to None. This will result in a white mesh.

Parameters:

feature: str, optional: Name of feature, same as in get_surface_features. Options: hydrophob, shape, charge, hbonds. Defaults to “”.

return_mesh_and_color(msms=False, feature=None, patch=False, model_id=0)

Wrapper function to choose between the msms surface visualization vs the native surface visualization. If you could not download the MSMS binary leave the msms variable as False.

If the mesh and appropriate coloring has already been stored to a file, then this information will be loaded and no computation will be done.

The code in words: First, set a few class member variables that are used in the {*}_mesh_and_color() member functions. Next check if the mesh and color information has already been computed. If the files already exist load them and return. No computation will be done. Otherwise, depending on what the msms input variable is set to, compute either the msms_mesh_and_color() ir the native_mesh_and_color(). Return the mesh and color information.

Parameters:

msms: bool, optional: If true, surface generated by msms binary is returned, else the native mesh. Default: False.
feature: str, optional: Name of feature, same as in get_surface_features. Options: hydrophob, shape, charge, hbonds. Defaults to None.
patch: bool, optional: Set coloring of mesh manually. If set to True get_assignments() will be called. Defaults to False.
model_id: int, optional: The dynamic model ID of the desired molecule. Count starts at 0. Leave default value for static molecules. Default: 0.

Returns:

trimesh.Trimesh: The mesh corresponding to the surface of the protein.
numpy.ndarray: Coloring map corresponding to specified feature.

Module contents

provis.src.processing.get_residues(pdb_file)