brie-quant CLI

The brie-quant CLI (in brie>=2.0.0) uses the newly developed variational inference methods scalable to large data sets, which works both in CPU or GPU with the TensorFlow Backend. For using BRIE1 (<=0.2.4) with MCMC sampler, please refer to BRIE1.

This command allows to quantify the splicing isoform proportion Psi and detect variable splicing event along with cell level features, e.g., cell type, disease condition, development time.

As a Bayesian method, the key philosophy of BRIE is to combine likelihood (data driven) and prior (uninformative or informative). In BRIE2, a variety of prior settings are supported, as follows.

Mode 1: None imputation

In this mode, the prior is uninformative logit-normal distribution with mean=0, and learned variance. Therefore, if a splicing event in a gene doesn’t have any read, it will return a posterior with Psi’s mean=0.5 and 95% confidence interval around 0.95 (most case >0.9).

This setting is used if you have high covered data and you only want to calculate cells with sufficient reads for each interesting genes, e.g., by filtering out all genes with Psi_95CI > 0.3.

Otherwise, the 0.5 imputed genes will be confounded by the expression level, instead of the isoform proportion.

Example command line for mode 1:

brie-quant -i out_dir/brie_count.h5ad -o out_dir/brie_quant_pure.h5ad --interceptMode None

Mode 2: Aggregated imputation

This mode requires argument --interceptMode gene. It aims to learn a prior shared by all cells on each gene. The benefit for this mode is that dimension reduction can be performed, e.g., PCA and UMAP on splicing. As there are many splicing events that are not well covered, it has a high variance in the estimation, and is often suggested filtered out, which will cause missing values. Based on the cell aggregated imputation, most dimension reduction methods can be used, even it doesn’t support missing values.

Example command line for mode 2:

brie-quant -i out_dir/brie_count.h5ad -o out_dir/brie_quant_aggr.h5ad --interceptMode gene

Mode 3: Variable splicing detection

This mode requires argument -c for cell features and --LRTindex for the index (zero-based) of cell features to perform likelihood ratio test. Again we suggest to keep the cell aggregation on each gene by --interceptMode gene.

Then this mode will learn a prior from the given cell level features and perform the second fit by leaving each feature out to calculate the EBLO gain, which can be further used as likelihood ratio test.

Example command line for mode 3:

brie-quant -i out_dir/brie_count.h5ad -o out_dir/brie_quant_cell.h5ad \
    -c $DATA_DIR/cell_info.tsv --interceptMode gene --LRTindex=All

Flexible settings

There could be more flexible settings, for example only use gene features as in BRIE1 by the following command:

brie-quant -i out_dir/brie_count.h5ad -o out_dir/brie_quant_gene.h5ad \
    -g $DATA_DIR/gene_seq_features.tsv --interceptMode cell --LRTindex=All

Or use both gene features and cell features

brie-quant -i out_dir/brie_count.h5ad -o out_dir/brie_quant_all.h5ad \
    -c $DATA_DIR/cell_info.tsv -g $DATA_DIR/gene_seq_features.tsv \
    --interceptMode gene --LRTindex=All

There are more parameters for setting (brie-quant -h always give the version you are using):

Usage: brie-quant [options]

Options:
  -h, --help            show this help message and exit
  -i IN_FILE, --inFile=IN_FILE
                        Input read count matrices in AnnData h5ad or brie npz
                        format.
  -c CELL_FILE, --cellFile=CELL_FILE
                        File for cell features in tsv[.gz] with cell and
                        feature ids.
  -g GENE_FILE, --geneFile=GENE_FILE
                        File for gene features in tsv[.gz] with gene and
                        feature ids.
  -o OUT_FILE, --out_file=OUT_FILE
                        Full path of output file for annData in h5ad [default:
                        $inFile/brie_quant.h5ad]
  --LRTindex=LRT_INDEX  Index (0-based) of cell features to test with LRT:
                        All, None or comma separated integers [default: None]
  --interceptMode=INTERCEPT_MODE
                        Intercept mode: gene, cell or None [default: None]
  --layers=LAYERS       Comma separated layers two or three for estimating Psi
                        [default: isoform1,isoform2,ambiguous]

  Gene filtering:
    --minCount=MIN_COUNT
                        Minimum total counts for fitltering genes [default:
                        50]
    --minUniqCount=MIN_UNIQ_COUNT
                        Minimum unique counts for fitltering genes [default:
                        10]
    --minCell=MIN_CELL  Minimum number of cells with unique count for
                        fitltering genes [default: 30]
    --minMIF=MIN_MIF    Minimum minor isoform frequency in unique count
                        [default: 0.001]

  VI Optimization:
    --MCsize=MC_SIZE    Sample size for Monte Carlo Expectation [default: 3]
    --minIter=MIN_ITER  Minimum number of iterations [default: 5000]
    --maxIter=MAX_ITER  Maximum number of iterations [default: 20000]
    --batchSize=BATCH_SIZE
                        Element size per batch: n_gene * total cell [default:
                        500000]