Biological Objects¶

class Gene(name, description=None)[source]¶

Stores gene’s identifier and description (multiton).

At a time there can be only one gene with given identifier, i.e. after the first initialization, all subsequent attempts to initialize a gene with the same identifier will return exactly the same object. This is so called multiton pattern.

Example

>>> x = Gene('TP53')
>>> y = Gene('TP53')
>>> assert x is y   # passes, there is only one gene

class Phenotype(name, samples=None)[source]¶

Phenotype is a collection of samples of common origin or characteristic.

An example phenotype can be:

(Breast_cancer_sample_1, Breast_cancer_sample_2) named “Breast cancer”.

The common origin/characteristics for “Breast cancer” phenotype could be “a breast tumour”, though samples had been collected from two donors.

Another example are controls:

(Control_sample_1, Control_sample_2) named “Control”.

The common characteristic for these samples is that both are controls.

as_array()[source]¶: Returns: pandas.DataFrame object with data for all samples.

classmethod from_file(name, file_object, columns_selector=None, samples=None, delimiter='t', index_col=0, use_header=True, reverse_selection=False, prefix=None, header_line=0, description_column=None)[source]¶

Create a phenotype (collection of samples) from csv/tsv file.

Parameters:

name – a name of the phenotype (or group of samples) which will identify it (like “Tumour_1” or “Control_in_20_degrees”)
file_object –
a file (containing gene expression) of the following structure:
- names of samples separated by a tab in the first row,
- gene symbol/name followed by gene expression values for every sample in remaining rows;
an additional column “description” is allowed between genes column and sample columns, though it has to be explicitly declared with description_column argument.
columns_selector (Optional[Callable[[Sequence[int]], Sequence[int]]]) – a function which will select (and return) a subset of provided column identifiers (do not use with samples)
samples – a list of names of samples to extract from the file (do not use with columns_selector)
reverse_selection – if you want to use all columns but the selected ones (or all samples but the selected) set this to True
delimiter (str) – the delimiter of the columns
index_col (int) – column to use as the gene names
use_header – does the file have a header?
prefix – prefix for custom samples naming schema
header_line – number of non-empty line with sample names
description_column – is column with description of present in the file (on the second position, after gene identifiers)?

classmethod from_gsea_file()[source]¶: Stub: if we need to handle very specific files, for various analysis methods, we can extend Phenotype with class methods like from_gsea_file.

class Sample(name, data)[source]¶

Sample contains expression values for genes.

as_array()[source]¶: Returns: one-dimensional labeled array with Gene objects as labels

classmethod from_array(name, panda_series, descriptions=False)[source]¶

Create a sample from pd.Series or equivalent.

Parameters:	name – name of the sample panda_series (`Series`) – series object where columns represent values of genes and names are either gene identifiers of tuples: `(gene_identifier, description)` descriptions – are descriptions present in names of the series object?

classmethod from_names(name, data)[source]¶

Create a sample from a gene_name: value mapping.

Parameters:	name – name of sample data (`Mapping`[`str`, `float`]) – mapping (e.g. dict) where keys represent gene names

Biological Objects¶

Pathway Analysis

Navigation

Related Topics