API Tutorial

Aldy can be used as a standalone application or as a Python module.

Here is a quick overview of the functions that Aldy provides. Detailed API documentation is available in API documentation.

One-step genotyping API

You can get the genotypes of a SAM/BAM with a single call:

import aldy.genotype

result = aldy.genotype.genotype(
  "cyp2d6", "/path/to/sample.bam", profile_name="wgs", output_file=None
)
print(result)

Please check aldy.solution.SolvedAllele and aldy.genotype for more details.

Loading a gene

We always start by loading a gene database for a gene that is to be genotyped:

import aldy.gene
gene = aldy.gene.Gene("path/to/gene.yaml", genome="hg38")

Alternatively, we can use genes shipped with Aldy:

import aldy.gene
import aldy.common

gene_path = aldy.common.script_path("aldy.resources.genes/cyp2d6.yml")
gene = aldy.gene.Gene(gene_path)  # for CYP2D6

Loading a SAM/BAM or VCF file

Before working with alignment files, it is necessary to load a profile as follows:

import aldy.profile
profile = aldy.profile.Profile.load(gene, "illumina")

If using VCF files, construct the profile as follows:

profile = Profile("user_provided", cn_solution=["1", "1"])

Various model parameters can be passed to the aldy.profile.Profile.load() method. One such parameter is a custom copy-number neutral region that can be passed via the cn_region parameter (by default, the CYP2D8 region is used). Please consult aldy.profile.Profile for details about the other parameters.

A SAM/BAM/CRAM or a VCF file can be loaded as follows:

import aldy.sam
sample = sample = aldy.sam.Sample(gene, profile, path="my/sample.bam")

Pass a path to the reference genome (reference parameter) when using a CRAM file.

Detecting copy number configurations and fusions

To detect copy number configurations, run:

import aldy.cn
cn_sols = aldy.cn.estimate_cn(gene, profile, sample.coverage, solver="cbc")

You can also use solver="gurobi" if you have Gurobi installed.

The result is a list of aldy.solutions.CNSolution objects described in API.

Calling the major and minor star-alleles

Once you get copy number solutions, you can call valid major star-alleles for each copy number solution:

import aldy.major

major_sols = [sol
              for cn_sol in cn_sols
              for sol in aldy.major.estimate_major(gene, sample.coverage, cn_sol, "cbc")]
# Get the best major star-allele score
min_score = min(major_sols, key=lambda m: m.score).score
# Take the best major star-allele calls
major_sols = sorted([m for m in major_sols if abs(m.score - min_score) < 1e-3],
                   key=lambda m: m.score)

You are pretty much set if you need only major star-alleles. However, if you have multiple equally likely major star-allele calls, or if you need to get the complete star-allele decomposition, then do the following:

import aldy.minor

minor_sols = aldy.minor.estimate_minor(gene, sample.coverage, major_sols, "cbc")
# Get the best minor star-allele score
min_score = min(minor_sols, key=lambda m: m.score).score
# Get the best minor (and major) star-allele
minor_sols = [m for m in minor_sols if abs(m.score - min_score) < 1e-3]

minor_sols will contain a list of aldy.solutions.MinorSolution objects that point to the optimal major star-alleles and the corresponding copy numbers.

Finally, if you want to get a nice diplotype (e.g., *1/*2+*3), just type:

minor_solution.get_major_diplotype()
minor_solution.get_minor_diplotype()  # if interested in minor star-alleles

A more detailed explanation of these functions is available in the API documentation.