API Tutorial¶
Aldy can be used as a standalone application or as a Python module.
Here is a quick overview of the functions that Aldy provides. Detailed API documentation is available in API documentation.
One-step genotyping API¶
You can get the genotypes of a SAM/BAM with a single call:
import aldy.genotype
result = aldy.genotype.genotype(
"cyp2d6", "/path/to/sample.bam", profile_name="wgs", output_file=None
)
print(result)
Please check aldy.solution.SolvedAllele
and aldy.genotype
for more details.
Loading a gene¶
We always start by loading a gene database for a gene that is to be genotyped:
import aldy.gene
gene = aldy.gene.Gene("path/to/gene.yaml", genome="hg38")
Alternatively, we can use genes shipped with Aldy:
import aldy.gene
import aldy.common
gene_path = aldy.common.script_path("aldy.resources.genes/cyp2d6.yml")
gene = aldy.gene.Gene(gene_path) # for CYP2D6
Loading a SAM/BAM or VCF file¶
Before working with alignment files, it is necessary to load a profile as follows:
import aldy.profile
profile = aldy.profile.Profile.load(gene, "illumina")
If using VCF files, construct the profile as follows:
profile = Profile("user_provided", cn_solution=["1", "1"])
Various model parameters can be passed to the aldy.profile.Profile.load()
method.
One such parameter is a custom copy-number neutral region that can be passed via the cn_region
parameter
(by default, the CYP2D8 region is used).
Please consult aldy.profile.Profile
for details about the other parameters.
A SAM/BAM/CRAM or a VCF file can be loaded as follows:
import aldy.sam
sample = sample = aldy.sam.Sample(gene, profile, path="my/sample.bam")
Pass a path to the reference genome (reference
parameter) when using a CRAM file.
Detecting copy number configurations and fusions¶
To detect copy number configurations, run:
import aldy.cn
cn_sols = aldy.cn.estimate_cn(gene, profile, sample.coverage, solver="cbc")
You can also use solver="gurobi"
if you have Gurobi installed.
The result is a list of aldy.solutions.CNSolution
objects described in API.
Calling the major and minor star-alleles¶
Once you get copy number solutions, you can call valid major star-alleles for each copy number solution:
import aldy.major
major_sols = [sol
for cn_sol in cn_sols
for sol in aldy.major.estimate_major(gene, sample.coverage, cn_sol, "cbc")]
# Get the best major star-allele score
min_score = min(major_sols, key=lambda m: m.score).score
# Take the best major star-allele calls
major_sols = sorted([m for m in major_sols if abs(m.score - min_score) < 1e-3],
key=lambda m: m.score)
You are pretty much set if you need only major star-alleles. However, if you have multiple equally likely major star-allele calls, or if you need to get the complete star-allele decomposition, then do the following:
import aldy.minor
minor_sols = aldy.minor.estimate_minor(gene, sample.coverage, major_sols, "cbc")
# Get the best minor star-allele score
min_score = min(minor_sols, key=lambda m: m.score).score
# Get the best minor (and major) star-allele
minor_sols = [m for m in minor_sols if abs(m.score - min_score) < 1e-3]
minor_sols
will contain a list of aldy.solutions.MinorSolution
objects
that point to the optimal major star-alleles and the corresponding copy numbers.
Finally, if you want to get a nice diplotype (e.g., *1/*2+*3), just type:
minor_solution.get_major_diplotype()
minor_solution.get_minor_diplotype() # if interested in minor star-alleles
A more detailed explanation of these functions is available in the API documentation.