Biological Objects

class Gene(name, description=None)[source]

Stores gene’s identifier and description (multiton).

At a time there can be only one gene with given identifier, i.e. after the first initialization, all subsequent attempts to initialize a gene with the same identifier will return exactly the same object. This is so called multiton pattern.

Example

>>> x = Gene('TP53')
>>> y = Gene('TP53')
>>> assert x is y   # passes, there is only one gene
class Phenotype(name, samples=None)[source]

Phenotype is a collection of samples of common origin or characteristic.

An example phenotype can be:

(Breast_cancer_sample_1, Breast_cancer_sample_2) named “Breast cancer”.

The common origin/characteristics for “Breast cancer” phenotype could be “a breast tumour”, though samples had been collected from two donors.

Another example are controls:

(Control_sample_1, Control_sample_2) named “Control”.

The common characteristic for these samples is that both are controls.

as_array()[source]

Returns: pandas.DataFrame object with data for all samples.

classmethod from_file(name, file_object, columns_selector=None, samples=None, delimiter='t', index_col=0, use_header=True, reverse_selection=False, prefix=None, header_line=0, description_column=None)[source]

Create a phenotype (collection of samples) from csv/tsv file.

Parameters:
  • name – a name of the phenotype (or group of samples) which will identify it (like “Tumour_1” or “Control_in_20_degrees”)
  • file_object

    a file (containing gene expression) of the following structure:

    • names of samples separated by a tab in the first row,
    • gene symbol/name followed by gene expression values for every sample in remaining rows;

    an additional column “description” is allowed between genes column and sample columns, though it has to be explicitly declared with description_column argument.

  • columns_selector (Optional[Callable[[Sequence[int]], Sequence[int]]]) – a function which will select (and return) a subset of provided column identifiers (do not use with samples)
  • samples – a list of names of samples to extract from the file (do not use with columns_selector)
  • reverse_selection – if you want to use all columns but the selected ones (or all samples but the selected) set this to True
  • delimiter (str) – the delimiter of the columns
  • index_col (int) – column to use as the gene names
  • use_header – does the file have a header?
  • prefix – prefix for custom samples naming schema
  • header_line – number of non-empty line with sample names
  • description_column – is column with description of present in the file (on the second position, after gene identifiers)?
classmethod from_gsea_file()[source]

Stub: if we need to handle very specific files, for various analysis methods, we can extend Phenotype with class methods like from_gsea_file.

class Sample(name, data)[source]

Sample contains expression values for genes.

as_array()[source]

Returns: one-dimensional labeled array with Gene objects as labels

classmethod from_array(name, panda_series, descriptions=False)[source]

Create a sample from pd.Series or equivalent.

Parameters:
  • name – name of the sample
  • panda_series (Series) – series object where columns represent values of genes and names are either gene identifiers of tuples: (gene_identifier, description)
  • descriptions – are descriptions present in names of the series object?
classmethod from_names(name, data)[source]

Create a sample from a gene_name: value mapping.

Parameters:
  • name – name of sample
  • data (Mapping[str, float]) – mapping (e.g. dict) where keys represent gene names