1 CALISTA USER MANUAL

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 38

Download1 CALISTA USER MANUAL
Open PDF In BrowserView PDF
CALISTA: User Manual
Version 1.2.0 (1 December 2018)
Authors: Nan Papili Gao and Rudiyanto Gunawan
Institute for Chemical and Bioengineering
ETH Zurich
Contact e-mail: nanp@ethz.ch and rgunawan@buffalo.edu

Table of contents
1

Overview ........................................................................................................... 2

2

System requirements ........................................................................................ 2

3

CALISTA package ............................................................................................ 2

4

Examples ........................................................................................................... 3
4.1
Example 1. iPSC differentiation into mesodermal and endodermal cells ........................ 3
4.1.1
Data Import and Preprocessing .................................................................................... 3
4.1.2
Single-cell clustering ................................................................................................... 4
4.1.3
Reconstruction of lineage progression.......................................................................... 5
4.1.4
Determination of transition genes ................................................................................ 7
4.1.5
Pseudotemporal ordering of cells ................................................................................. 8
4.1.6
Path analysis ............................................................................................................... 8
4.2
Example 2. Hematopoietic stem cell differentiation ....................................................... 10
4.2.1
Data Import and Preprocessing .................................................................................. 10
4.2.2
Single-cell clustering ................................................................................................. 10
4.2.3
Reconstruction of lineage progression........................................................................ 11
4.2.4
Determination of transition genes .............................................................................. 12
4.2.5
Pseudotemporal ordering of cells ............................................................................... 12
4.2.6
Path analysis ............................................................................................................. 13
4.3
Example 3. Mouse embryonic fibroblast differentiation into neurons (Manual data
import)........................................................................................................................................ 13
4.3.1
Data Import and Preprocessing .................................................................................. 13
4.3.2
Single-cell clustering ................................................................................................. 16
4.3.3
Reconstruction of lineage progression........................................................................ 17
4.3.4
Determination of transition genes .............................................................................. 17
4.3.5
Pseudotemporal ordering of cells ............................................................................... 17
4.4
Example 4. Human embryonic stem cell differentiation into endodermal cells............. 18
4.4.1
Data Import and Preprocessing .................................................................................. 18
4.4.2
Single-cell clustering ................................................................................................. 18
4.4.3
Reconstruction of lineage progression........................................................................ 19
4.4.4
Determination of transition genes .............................................................................. 21
4.4.5
Pseudotemporal ordering of cells ............................................................................... 21
4.5
Example 5. Running CALISTA without time or cell stage information ........................ 22
4.5.1
Data Import and Preprocessing .................................................................................. 22
4.5.2
Single-cell clustering ................................................................................................. 22
4.5.3
Reconstruction of lineage progression and pseudotemporal ordering of cells .............. 23
4.6
Example 6. Removing undesired clusters ....................................................................... 25
4.6.1
Data Import and Preprocessing .................................................................................. 25
4.6.2
Single-cell clustering ................................................................................................. 25
4.6.3
Single-cell clustering after removing undesired clusters ............................................. 26
4.6.4
Reconstruction of lineage progression........................................................................ 27
4.6.5
Determination of transition genes .............................................................................. 28
4.6.6
Pseudotemporal ordering of cells ............................................................................... 28
4.7
Running CALISTA GUI ................................................................................................. 28
4.7.1
Data Import and Preprocessing .................................................................................. 28

1

4.7.2
Single-cell clustering ................................................................................................. 29
4.7.3
Reconstruction of lineage progression........................................................................ 30
4.7.4
Determination of transition genes .............................................................................. 31
4.7.5
Pseudotemporal ordering of cells ............................................................................... 32
4.8
Example 8. Reconstruction of developmental trajectories during zebrafish
embryogenesis ............................................................................................................................ 32
4.8.1
Data Import and Preprocessing .................................................................................. 32
4.8.2
Single-cell clustering ................................................................................................. 33
4.8.3
Reconstruction of lineage progression........................................................................ 34
4.8.4
Determination of transition genes .............................................................................. 34
4.8.5
Pseudotemporal ordering of cells ............................................................................... 34
4.8.6
Path analysis ............................................................................................................. 35
4.9
Example 9. Identification of mouse spinal cord neurons activity during behavior ....... 35
4.9.1
Data Import and Preprocessing .................................................................................. 35
4.9.2
Single-cell clustering ................................................................................................. 36
4.10 Example 9. Analysis of peripherical blood mononuclear cells (PBMCs) ....................... 37
4.10.1 Data Import and Preprocessing .................................................................................. 37
4.10.2 Single-cell clustering ................................................................................................. 37

5

Questions and comments .................................................................................38

1

Overview

This user manual is for the MATLAB distribution of CALISTA (Clustering And Lineage Inference in Single Cell
Transcriptional Analysis).
CALISTA provides a user-friendly toolbox for the analysis of single cell expression data. CALISTA accomplishes
three major tasks:
(1) Identification of cell clusters in a cell population based on single-cell gene expression data;
(2) Reconstruction of lineage progression and produce transition genes;
(3) Pseudotemporal ordering of cells along any given developmental paths in the lineage progression.
For detailed information about CALISTA, please refer to the following manuscript.
Papili Gao N., Hartmann T, Fang T., and Gunawan R., CALISTA: Clustering and lineage inference in singlecell transcriptional analysis, bioRxiv, 2018. https://doi.org/10.1101/257550

2

System requirements

This distribution of CALISTA is written for and developed in MATLAB1.
CALISTA has been successfully tested on MATLAB 2016b, 2017a, 2018a and 2018b.

3

CALISTA package

CALISTA package contains the following files and folders:
1.
2.
3.
4.
5.
6.

1

This CALISTA_USER_MANUAL.doc file
License.txt modified BSD license for CALISTA
MAIN.m CALISTA main script (use this script to run CALISTA on your own dataset)
MAIN_GUI.m GUI version of CALISTA (use this script to run CALISTA_GUI on your own
dataset)
Example scripts on how to use CALISTA subroutines and GUI version of CALISTA
Save_to_matlab.R R script describing how to convert the dataset (especially large text
files) in Matlab files

http://www.mathworks.com

2

7.

The folder Two-state model parameters containing:
a. Parameters.mat steady-state distribution functions of mRNA level
8. The folder subfunctions containing the following main subroutines (and other
subroutines):
a. import_data.m : upload single-cell expression data and perform preprocessing.
b. CALISTA_clustering_main.m: single-cell clustering in CALISTA.
c. CALISTA_transition_main.m: infer lineage progression among cell
clusters.
d. CALISTA_transition_genes_main.m: identify the key genes in lineage
progression.
e. CALISTA_ordering_main.m: perform pseudotemporal ordering of cells.
f. CALISTA_landscape_plotting_main.m: landscape plots of single
cells in the dataset based on cell-likelihood values
g. CALISTA_path_main.m: perform post-analysis along developmental path(s).
9. The folder EXAMPLES containing single-cell expression datasets used in the examples
below.
10. The folder GUI containing the subroutines used in MAIN_GUI.m
11. The folder SUPPLEMENTARY EXAMPLES containing the additional analysis
For further information on running the main subroutines in CALISTA, please use Matlab ‘help’
command followed by function_name(for example ‘help import_data’).

4

Examples

In the following, we describe the main steps of CALISTA applied to publicly available single-cell gene
expression data. For each dataset, ONLY the most important results are reported. Please refer to the file
MAIN.m for an example MATLAB script of CALISTA implementation.
4.1

Example 1. iPSC differentiation into mesodermal and endodermal cells

Analysis of RT-qPCR data of Bargaje et al. (Bargaje, et al, Cell population structure prior to bifurcation predicts
efficiency of directed differentiation in human induced pluripotent cells. Proc. Natl. Acad. Sci. U. S. A. 114, 2271–
2276 (2017)).

4.1.1

Data Import and Preprocessing

We begin with changing the current directory in MATLAB to the CALISTA folder. Then, we run
Example_1_BARGAJE_scRT_qPCR.m script in the main folder of CALISTA and import Bargaje dataset
(available in the subfolder EXAMPLES/BARGAJE).
The following are screenshots from running CALISTA on MATLAB.

3

4.1.2

Single-cell clustering

In this case, the number of clusters is determined using the eigengap plot. According to the eigengap plot below,
we set the number of clusters to 5. The following are screenshots from CALISTA single-cell clustering analysis.

4

Cell Clustering
-8

-6

-6

-4

-4

-2

-2

0

PC2

PC2

Original time/cell stage info
-8

2

Time/Stage
Time/Stage
Time/Stage
Time/Stage
Time/Stage
Time/Stage
Time/Stage
Time/Stage

4
6
8
10
15

10

5

PC1

0

-5

-10

-15

0
24
36
48
60
72
96
120

0
2
4
Cluster
Cluster
Cluster
Cluster
Cluster

6
8

1
2
3
4
5

10
15

10

5

PC1

0

-5

-10

-15

If desired, users can remove cells from specific clusters from further analysis. In this example, we do not want to
remove any clusters. Hence, we enter 0 (no cluster removal) and then 1 to proceed with lineage inference.

4.1.3

Reconstruction of lineage progression

During the lineage inference step, CALISTA automatically generates and displays a lineage graph, obtained by
adding an edge between two clusters in increasing cluster distances, until all clusters are connected to at least one
other cluster. Subsequently, users can manually add or remove one edge at time based on the cluster distances.

ATTENTION: to add an edge (press “p”), remove an edge (press “m”) or finalize the lineage progression graph
(press “enter”), the MATLAB figure of the graph must appear in foreground without any modification (e.g.,
zooming, rotation). Note that the addition/removal of the edges are performed according to increasing/decreasing
order of cluster distance.

ATTENTION: the final graph must be connected (i.e. there exists a path from any node/cluster to any other
node/cluster in the graph), otherwise a warning will be returned.

5

Lineage Progression
10

5

COMP2

0.53909

0

0.41808

0.5

6

0.

49
99
3

222

0.478

-5

Cluster:
Cluster:
Cluster:
Cluster:
Cluster:
data1

-10
20
10
0
-10
-20

COMP1

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.50
3 Cluster pseudotime: 0.75
4 Cluster pseudotime: 1.00
5 Cluster pseudotime: 1.00

0.4

0.2

0

0.6

0.8

1

Cluster pseudotime

Since the transition from cluster 1 to cluster 5 is inconsistent with the capture time info (i.e. cluster pseudotime
values for cluster 1 and 5 are 0 and 1 respectively) we remove the spurious edge between cluster 1 and 5, by
entering 1 and entering [1 5], upon the following query.

Lineage Progression
Cluster:
Cluster:
10 Cluster:
Cluster:
8 Cluster:
data1

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.50
3 Cluster pseudotime: 0.75
4 Cluster pseudotime: 1.00
5 Cluster pseudotime: 1.00

6

23 4
0.53909

COMP2

4

0.4

180

2

22
26

0.

0

1

-2
-4

8

0.
5

93

9
49

5

0.478

-6

20
0

-8
0

0.2

0.4

0.6

0.8

-20

1

Cluster pseudotime

COMP1

The final inferred lineage relationships are displayed below.
Lineage Progression

10

5

234
5
0.53909

PC2

0

26

4

0.

1

-5

Cluster:
Cluster:
Cluster:
Cluster:
Cluster:

-10
20
0
-20

PC1

0

0.2

0.4180
8

0.5
22

3

9
99

0.4

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.50
3 Cluster pseudotime: 0.75
4 Cluster pseudotime: 1.00
5 Cluster pseudotime: 1.00

0.6

0.8

1

Cluster pseudotime

In addition, CALISTA provides the following.
-

Cell clustering plot based on the cluster pseudotime

6

CALISTA cluster pseudotime 0

-10

-5

PC2

PC2

-5
0

0

5

5

10

10
-10

-5

0

5

10
-10

PC1
CALISTA cluster pseudotime 0.75

5

10

5

0

-5

-10

0

-5

-10

PC1
CALISTA cluster pseudotime 1

-5

PC2

PC2

10
-10

-5
0

0

5

5

10

10
-10

-5

0

5

10

-

CALISTA cluster pseudotime 0.5

-10

PC1

PC1

Boxplot, mean, median entropy values calculated for each cluster
Boxplot for the entropy

Entropy Mean and Median

2.8

MeanEntropy
MedianEntropy

4.5
2.6

4
3.5

Entropy value

Entropy

3
2.5
2
1.5

2.4

2.2

2

1
1.8

0.5
0

1.6
1

2

3

4

5

1

2

3

Cluster

-

5

Plot of mean expression values for each gene based on cell cluster expression level
ACVR1B

ACVR2A

5

3.5
3

ACVRL1

2

0.5

1

1.5

-0.5

0

DLL1

0.5

1

1.5

0.4

0.5

1

1.5

-0.5

0

FOXC1

0.5

1

1.5

1

1.5

3
-0.5

1

1.5

1

1.5

-0.5

0.5

1

1.5

-0.5

1.5

0.5

1

1.5

-0.5

0.5

1

1.5

-0.5

0

1.5

-0.5

0.5

1

1.5

-0.5

0.5

1

1.5

1

1.5

1

1.5

-0.5

0

0.5

1

1.5

2.4
-0.5

4

4

3

-0.5

0.5

1

1.5

1

1.5

4
3
2
1

0.5

1

1.5

0.5

1

1.5

-0.5

1.5

1.5

3
-0.5

6
4

-0.5

0.5

1

1.5

-0.5

0.5

1

1.5

0.5

1

1.5

6

2

0.5

1

1.5

-0.5

0.5

1

1.5

1

1.5

1.5

0.5

1

1.5

-0.5

-0.5

0.5

1

1.5

0.5

1

1.5

-0.5

1

1.5

-0.5

0

1.5

0.5

1

1.5

0

0.5

1

1.5

-0.5

0.5

1

1.5

2
-0.5

-0.5

1

1.5

0.5

1

1.5

-0.5

0.5

1

1.5

1.4

6

1.2

2

1

1.5

0.6
1.5

-0.5

0.5

1

1.5

-0.5

1

1.5

-0.5

1

1.5

1

1.5

0.5

1

1.5

0.6

4

0.4

1

1.5

0.4

4

1.5

-0.5

0.5

1

1.5

1

1.5

-0.5

1.5

-0.5

0.5

1

1.5

1

1.5

-0.5

1

1.5

1

1.5

-0.5

0.6

2

0.4

1

1

1.5

1

1.5

4

0.6

3

0.4

-0.5

1

1.5

-0.5

0.5

1

1.5

0.6

1

1.5

-0.5

0.5

1

1.5

0.5

1

1.5

1.5

-0.5

0.2
-0.5

0

0.5

1

1.5

1

1.5

1

1.5

1

1.5

PTX2

1

-0.5

0
0

0.5

1

1.5

-0.5

0

0.5

TBX5
1.5
1

2

-0.5

0

0
0

0.5

1

1.5

-0.5

0

0.5

1

1.5

0.5

2

0.3

1

1.5

-0.5

0.5

0.2

0
0.5

0

WNT5B

4

0.4

1

0

-0.5

WNT5A

2

0
1

1.5

2

WNT4

0.05

0.5

1

0.5

0

0.1

0

0.5

0.4
0

TBX20

3

0
0.5

0

NKX2.5

6

0.2
0

1.5

4

WNT3A

0.8

2
0.5

0.5

-0.5

0.8

0.5
0

0
0

WNT11

5

0

-0.5

1

0
0.5

8

-0.5

0.5

2

0

TBX2

0
0.5

VEGFA

0

1
0.5

TBX1

1

0

-0.5

2

4
0

T
2

-0.5

1.5

6

-0.5

1.5

KIT

2.5

1
1.5

1.5

PTX1

2

1

1

4

PTCH1
8

0.5

1

2
0

PDGFRB
3

0

0.5

4

1

0.5

0
0

NANOG

0
0.5

0

GATA6

0
0

2

0

-0.5

5

MYOCD

0
1

1.5

2

0

2

0.5

1

KDR

1

MYL4
6

0

-0.5

4

-0.5

0.5

0
0.5

2

1.5

1.5

10

ISL1

1

1

0.2

0

0.2

0.5

0.5

5
0

GATA5

6

0.6

0
-0.5

-0.5

3

0

0

FGFR2

6

0.3

-0.5

-0.5
7

6

-0.5

1.5

0.2
0.5

15
0.5

0.5

1

4
0

0
0

0
0.5

0
0

20

0

-0.5

0.5

8

GATA4

0
0.5

2

0

2

25

0
0

1.5

IRX4

6

TUBB

1

0.8

1

-0.5

1

0.1

0

SOX17

0.4
0.5

1

0.5

-0.5

-0.5

PDGFRA

0.6

0

1.5

0

10

6

0
0.5

-0.5

FGFR1

2

1

0.2
0

4

TNNT2
3

0

1.5

1.5

4

0
1

1

1

0

MYL3

0.5

0.5

0.8

TGFBR2

-0.5

0.5

1

0.5

4

0

0
-0.5

0

1
0

GAS1

0
0.5

1

0
0

1.5

0.2

0

SIRPA

0.01

1

2

FGF8

0.5
0

PDGFB

1.2

4

4
-0.5

-0.5

3

INHBA

5
4
3
2
1

SHH
0.02

0

-0.5

0.5

0.4

PDGFA

1

1.5

1.5

MSX2

0
0.5

1

1

0

8

1
0.5

2

0

0.5

FGF12

10

1

SFRP1

-0.5

6
4

3
0

FZD7

2

0

4

0

4
-0.5

8

3
2

2

HRT2

0
0

6

7

-0.5

1

2

3
0.5

SERCA

4
0

-0.5

1.5

0

MSX1

5

3
0

5

4

0

0.5

6

TGFBR1

8

-0.5

1

0.5

0

HNFA4

PARD3

4

TGFB1

4

0

-0.5

1.5

0
0

0.2
0

TGFB2

-0.5

4

0.3

10
0

1

0.5

DKK1

4

5
4

FGF10

6
0

MIXL1

10

5

0.5

15

0.5

0

FZD6

3

0.4

5

1.5

0
0

NUMB

5

1

20

7

1

4
0.5

RPL35A

1

1
0.5

6

0

-0.5

-0.5

CXCR4

6

1

0
0.5

2

20

0
0

7

RCOR2

-0.5

BMPR2
7

6

1.5

3

0

NOTCH3

2.8
2.6
2.4
2.2
2
1.8
0

-0.5

1.5

2

0

4

0.5

0

1

0
0.5

1

2
0

2

NOTCH2

0.5

EVX1

4

1

MESP2

0.5

0

NOTCH1

0

FZD4

HHIP

0.4
0.5

-0.5
6

1
0.5

0.6

0

1.5

2.6

0

-0.5

6

HEY1

1
2

1.5

1.5

0

MESP1

2.8

4

1

2

1

LTBP1

BMPR1A
8

4

2

FZD2

2
1

0
0

0.5

0
0.5

6

2
0

1

0

2

0
0

-0.5

EPCAM

5

0

0.8

LEFTY1

-0.5

1

10
8
6
4
2

FZD1

4

5

0

3
-0.5

0.5

EOMES
10

HAND2

10

5

-0.5

5
0

1.2

10

-0.5

-0.5

2

HAND1

8

4
2

15

0
0.5

BMP4

6

0
0.5

1
0

8
0

GSC

-0.5

0.8
0

2

9

4
0.5

-0.5

10

5

0

-0.5

BMP2

6

ENG

11

6

0

1.5

FSTL1

7

1

1

3

FOXH1

2

0.5

EMILIN2

1

0.2

BAMBI
7

1

1
0

1.2
1
0.8
0.6
0.4
0.2

2

0

-0.5

DLL3
3

0.6

-0.5

2

0
0

0.8

0
-0.5

ANF
1.2

3

0.1
3

1.5
-0.5

ALCAM

0.2
4

2.5

4.1.4

4

Cluster

0.1

0
0

0.5

1

1.5

-0.5

0

0.5

1

1.5

-0.5

0

0.5

Determination of transition genes

After reconstructing the lineage progression, we identify the key transition genes for any two connected clusters in
the graph, based on the gene-wise likelihood difference between having the cells separately as two clusters and
together as a single cluster. Larger differences in the gene-wise likelihood point to more informative genes. The
transition genes are selected as those whose gene-wise likelihood differences make up to more than a certain

7

percentage

of

the

cumulative

sum

of

the

likelihood

differences

of

all

genes

–

set

by

INPUTS.thr_transition_genes.
1-2

800

2-3

300
7 transition genes

11 transition genes

250
600

v gjk

v gjk

200
400

150
100

200
50
0

GA
TA

GS
C

6

EO
M

M

DK

IX
L1

ES

K1

GA
TA

4

0

EV
X1

C

2-5

800

GS

M

HA
E
W
FZ
M
NU
B
S
K
N
D1 YL
ND OM
M MP4 OX1 IT
4
B
ES T4
7
1

IX

L1

3-4

300
11 transition genes

8 transition genes

250
600

v gjk

v gjk

200
400

150
100

200
50
0

4.1.5

M

GS
IX
C
L1

EO
FZ
FG
D
H
K
D
F8 KK1 NFA DR
M
ES 1
4

LE
M
T
FT ES
P1
Y1

0

W

NT
4

M

M

ES

P2

ES

P1

DK
K1

TG

FG

FB

2

F1

2

TN
N

T2

EO

M
ES

Pseudotemporal ordering of cells

For pseudotemporal ordering of cells, CALISTA performs maximum likelihood optimization for each cell using a
linear interpolation of the cell likelihoods between any two connected clusters. The pseudotimes of the cells are
computed by linear interpolation of the cluster pseudotimes, and correspond to the maximum point of the
likelihood optimization above. Cells are subsequently assigned to the edges in the lineage progression graph. The
following screenshot gives the results of this cell-to-edge assignment.
Cell Ordering

10
8
6
4

PC2

2
0
-2
-4
-6
-8
20
0

PC1

-20

1

0.8

0.6

0.4

0.2

0

-0.2

1.2

Cell Ordering

4.1.6

Path analysis

To perform path-specific analysis, users can enter 1 upon queried. In the following, we input two developmental
paths of interest: [1 2 3 4] (mesodermal fate) and [1 2 5] (endodermal fate).
Lineage Progression

10

5

234
5
0.53909

PC2

0
0.

1

-5

Cluster:
Cluster:
Cluster:
Cluster:
Cluster:

-10
20
0

PC1

-20

0

0.2

0.4180

8

0.5
22
26

3
99
49

0.4

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.50
3 Cluster pseudotime: 0.75
4 Cluster pseudotime: 1.00
5 Cluster pseudotime: 1.00

0.6

0.8

1

Cluster pseudotime

8

For each path, the post-analysis in CALISTA generates Clustergrams, moving-averaged gene expression profiles
and co-expression networks for the transition genes detected previously based on cell orderings.

Path num 2

0.6

0.2

0

-0.2

-0.4

-0.6

-0.8

-1

Target Gene j

Source Gene i

0.4

BMP4
DKK1
EOMES
EVX1
FGF12
FGF8
FZD1
GATA4
GATA6
GSC
HAND1
HNFA4
KDR
KIT
LEFTY1
MESP1
MESP2
MIXL1
MYL4
NUMB
SOX17
T
TGFB2
TNNT2
WNT4

1

0.8

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1

BM
P
EODK 4
K
M 1
E ES
FGVX
F1 1
FG 2
F F
G ZD 8
A
G TA1
AT 4
A
H GS 6
AN C
H D
N 1
FA
KD 4
LE K R
I
M FTYT
ES 1
M P
E 1
M SP2
IX
M L1
N YL4
U
SO M
X1B
TG 7
T
TN FB
N 2
W T2
N
T4

1

0.8

BM
P
EODK 4
K
M 1
E ES
FGVX
F1 1
FG 2
F F
G ZD 8
A
G TA1
AT 4
A
H GS 6
A C
H ND
N 1
FA
KD 4
LE K R
I
M FTYT
E
M SP 1
E 1
M SP2
IX
M L1
N YL4
U
SO M
X1B
TG 7
T
TN FB
N 2
W T2
N
T4

Source Gene i

Path num 1
BMP4
DKK1
EOMES
EVX1
FGF12
FGF8
FZD1
GATA4
GATA6
GSC
HAND1
HNFA4
KDR
KIT
LEFTY1
MESP1
MESP2
MIXL1
MYL4
NUMB
SOX17
T
TGFB2
TNNT2
WNT4

Target Gene j

9

4.2

Example 2. Hematopoietic stem cell differentiation

Analysis of RT-qPCR data in Moignard et al., Characterization of transcriptional networks in blood stem and
progenitor cells using high-throughput single-cell gene expression analysis, Nat. Cell Biol. 15, 363–72 (2013).

4.2.1

Data Import and Preprocessing

We start by changing the current directory in MATLAB to the CALISTA folder. We run
Example_2_MOIGNARD_scRT_qPCR.m script in the main folder of CALISTA and load Moignard dataset (in
subfolder EXAMPLES/MOIGNARD).

4.2.2

Single-cell clustering

Following the original publication, we set the number of clusters equals to 5.

CALISTA single-cell clustering results are as follow.

10

Original time/cell stage info

Cell Clustering

Time/Stage
Time/Stage
Time/Stage

1
2
3

4

Cluster
Cluster
Cluster
Cluster
Cluster

6
2

4
0

2
0

-2

PC2

4

5
2
0

-6

-4

PC1

0

-2

-2

-4

1
2
3
4
5

PC2

-4

PC1
-6

-5

In this case, we do not need to remove any clusters (by pressing 0 upon queried). Then we proceed with further
analysis (by pressing 1 upon queried).

4.2.3

Reconstruction of lineage progression

During the lineage inference step, CALISTA automatically generates and displays a lineage graph, obtained by
adding an edge between two clusters in increasing cluster distances, until all clusters are connected to at least one
other cluster. Subsequently, users can manually add or remove one edge at time based on the cluster distances.

ATTENTION: to add an edge (press “p”), remove an edge (press “m”) or finalize the lineage progression graph
(press “enter”), the MATLAB figure of the graph must appear in foreground without any modification (e.g.,
zooming, rotation). Note that the addition/removal of the edges are performed according to increasing/decreasing
order of cluster distance.

ATTENTION: The final lineage progression graph must be connected (i.e. there is a path from any node/cluster
to any other node/cluster in the graph) otherwise a warning will be returned.

11

Lineage Progression
Cluster:
Cluster:
Cluster:
Cluster:
Cluster:
data1

4

COMP2

2
0

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.50
3 Cluster pseudotime: 0.50
4 Cluster pseudotime: 1.00
5 Cluster pseudotime: 1.00

0.53555
0.4379
9

0.53963
0.6
16
63

-2
-4
-6
10
5

1
0.5

0

COMP1

0

-5

Cluster pseudotime

Here, we do not need to remove any spurious edges, and hence we enter 0 upon queried.

In addition, CALISTA gives (not shown):
-

Cell clustering plot based on the cluster pseudotime
Boxplot, mean, median entropy values calculated for each cluster
Plot of mean expression values for each gene based on cell cluster expression level

4.2.4

Determination of transition genes

After reconstructing the lineage progression, we identify the key transition genes for any two connected clusters in
the graph, based on the gene-wise likelihood difference between having the cells separately as two clusters and
together as a single cluster. Larger differences in the gene-wise likelihood point to more informative genes. The
transition genes are selected as those whose gene-wise likelihood differences make up to more than a certain
percentage of the cumulative sum of the likelihood differences of all genes – set by
INPUTS.thr_transition_genes.

1-2

80

1-3

100
5 transition genes

4 transition genes

80

60

v gjk

v gjk

60
40

40
20

0

20

M

G

ei

s1

at

Nf

Fl

0

Er

i1

e2

a2

2-4

80

M

G

M

fi1

itf

g

Lm

ei

o2

s1

2-5

120
4 transition genes

3 transition genes

100
60

v gjk

v gjk

80
40

60
40

20
20
0

M

ei

s1

4.2.5

Nf

e2

Lm

o2

0

G

at

a2

M

G

ei

s1

fi1

b

Lm

o2

Pseudotemporal ordering of cells

For pseudotemporal ordering of cells, CALISTA performs maximum likelihood optimization for each cell using a
linear interpolation of the cell likelihoods between any two connected clusters. The pseudotimes of the cells are
computed by linear interpolation of the cluster pseudotimes, and correspond to the maximum point of the
likelihood optimization above. Cells are subsequently assigned to the edges in the lineage progression graph. The
following screenshot gives the results of this cell-to-edge assignment.

12

Cell Ordering

4

2

PC2

0

-2

-4

-6
10
1.2

5

1
0.8
0.6

0

PC1

4.2.6

0.4
-5

0.2
0

Cell Ordering

Path analysis

Finally, we perform post-analysis by entering 1 upon queried. Here, we input three developmental paths: [1 3], [1
2 5], and [1 2 4].

For each path, the post-analysis in CALISTA generates Clustergrams, moving-averaged gene expression profiles
and co-expression networks for the transition genes detected previously based on cell orderings (not shown).

4.3

Example 3. Mouse embryonic fibroblast differentiation into neurons (Manual data
import)

Analysis of RNA-seq data in Treutlein et al., Dissecting direct reprogramming from fibroblast to neuron using
single-cell RNA-seq, Nature 534, 391–395 (2016).
**Please
unzip
the
file
“3-TREUTLEIN_data_type_3_format_data_5_clusters_4.txt.zip”
in
EXAMPLES/TREUTLEIN/ before running CALISTA**

4.3.1

Data Import and Preprocessing

Again, we change the current directory in MATLAB to the CALISTA folder. Here, we run
Example_3_TREUTLEIN_scRNA_seq.m script in the main folder of CALISTA and load Treutlein dataset (in
subfolder EXAMPLES/TREUTLEIN).

13

The text file containing the original dataset can be summarized as follows (preview with the first 25 rows and 12
column):

CALISTA imports the dataset by splitting the text data from the expression (numbers) data. In particular we define
the “imported text data” as:

and “imported expression data” as:

14

CALISTA provides a preview and the dimensions of both imported text and expression data.

Based on the expression data preview, we set the starting and ending rows and columns for the expression values:
as [1 405] and [2 22525], respectively, when queried. We exclude the capture time info in the first column.

We press 1 since columns refer genes and rows refer cells.

We define the gene’s names using the text data preview [6 22529] (starting and ending columns).

15

We load the capture time/cell stage by pressing 1 (i.e. time/cell stage information is in expression data matrix is)
and selecting column 1 in the data matrix.

4.3.2
Single-cell clustering
In this case, the number of clusters is determined using the eigengap plot. According to the eigengap plot below,
we set the number of clusters to 4.

CALISTA single-cell clustering results are as follow.
Original time/cell stage info

Cell Clustering
Time/Stage 0
Time/Stage 2
Time/Stage 5
Time/Stage 20
Time/Stage 22

Cluster
Cluster
Cluster
Cluster

1
2
3
4

6
6
4
2

-5
0
-2
5

-4

-2

2
0

0

PC1

4

-4

0
2

PC2
PC1

-2

4
6

PC2

-4

We do not need to remove any cluster (by entering 0 upon queried), and continue with further analysis (by entering
1 upon queried):

16

4.3.3

Reconstruction of lineage progression

During the lineage inference step, CALISTA automatically generates and displays a lineage graph, obtained by
adding an edge between two clusters in increasing cluster distances, until all clusters are connected to at least one
other cluster. Subsequently, users can manually add or remove one edge at time based on the cluster distances.

ATTENTION: to add an edge (press “p”), remove an edge (press “m”) or finalize the lineage progression graph
(press “enter”), the MATLAB figure of the graph must appear in foreground without any modification (e.g.,
zooming, rotation). Note that the addition/removal of the edges are performed according to increasing/decreasing
order of cluster distance.

Lineage Progression

8

Cluster:
Cluster:
Cluster:
Cluster:
data1

6

COMP2

4

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.09
3 Cluster pseudotime: 1.00
4 Cluster pseudotime: 1.00

2

4
3

0
14

1.31

COMP1

0.91266

0.5

10

2
1
8

-4

447

-2

5
0

0

0.2

0.4

0.6

0.8

1

Cluster pseudotime

ATTENTION: the final graph must be connected (i.e. there is a path from any node/cluster to any other
node/cluster in the graph) otherwise a warning will be returned.
Here, we do not need to remove any spurious edges, and hence we enter 0 upon queried.

In addition, CALISTA returns (not shown):
-

4.3.4

Cell clustering plot based on the cluster pseudotime
Boxplot, mean, median entropy values calculated for each cluster
Plot of mean expression values for each gene based on cell cluster expression level

Determination of transition genes

After reconstructing the lineage progression, we identify the key transition genes for any two connected clusters in
the graph (results not shown here), based on the gene-wise likelihood difference between having the cells
separately as two clusters and together as a single cluster. Larger differences in the gene-wise likelihood point to
more informative genes. The transition genes are selected as those whose gene-wise likelihood differences make
up to more than a certain percentage of the cumulative sum of the likelihood differences of all genes – set by
INPUTS.thr_transition_genes.

4.3.5

Pseudotemporal ordering of cells

For pseudotemporal ordering of cells, CALISTA performs maximum likelihood optimization for each cell using a
linear interpolation of the cell likelihoods between any two connected clusters. The pseudotimes of the cells are
computed by linear interpolation of the cluster pseudotimes, and correspond to the maximum point of the

17

likelihood optimization above. Cells are subsequently assigned to the edges in the lineage progression graph. The
following screenshot gives the results of this cell-to-edge assignment.
Cell Ordering
8
6

PC2

4
2
0
-2
-4
0

5

PC1

4.4

10

0

0.5

1

Cell Ordering

Example 4. Human embryonic stem cell differentiation into endodermal cells

Analysis of RNA-seq data in Chu et al., Single-cell RNA-seq reveals novel regulators of human embryonic stem
cell differentiation to definitive endoderm, Genome Biol. 17, 173 (2016).
**Please unzip the file “4-CHU__data_type_4_format_data_2_clusters_4.csv.zip” in EXAMPLES/CHU/
before running CALISTA**

4.4.1

Data Import and Preprocessing

We first change the current directory in MATLAB to the CALISTA folder. We edit the
Example_4_CHU_scRNA_seq.m script in the main folder of CALISTA and import Chu dataset (in subfolder
EXAMPLES/CHU):

4.4.2
Single-cell clustering
In this case, the number of clusters is determined using the eigengap plot. According to the eigengap plot below,
we set the number of clusters to 4.

18

NOTE: CALISTA automatically returns the optimal number of clusters based on the MAXIMUM eigengap value.
However, the user might choose the number of clusters to adopt based on the FIRST eigengap.

CALISTA single-cell clustering result is shown below.
Original time/cell stage info

Time/Stage
Time/Stage
Time/Stage
Time/Stage
Time/Stage
Time/Stage

Cluster
Cluster
Cluster
Cluster

1
2
3
4

5

0

PC3

PC3

5

-10
-5
-10

Cell Clustering

0
12
24
36
72
96

0
-5

0

PC1

5

10

10

0
-10
-5
-10

PC2

0
-5

0

PC1

5

10

10

PC2

We do not need to remove any clusters (by entering 0 upon queried), and continue with further analysis (by
entering 1 upon queried).

4.4.3

Reconstruction of lineage progression

During the lineage inference step, CALISTA automatically generates and displays a lineage graph, obtained by
adding an edge between two clusters in increasing cluster distances, until all clusters are connected to at least one
other cluster. Subsequently, users can manually add or remove one edge at time based on the cluster distances.

19

ATTENTION: to add an edge (press “p”), remove an edge (press “m”) or finalize the lineage progression graph
(press “enter”), the MATLAB figure of the graph must appear in foreground without any modification (e.g.,
zooming, rotation). Note that the addition/removal of the edges are performed according to increasing/decreasing
order of cluster distance.
Lineage Progression

10

Cluster:
Cluster:
Cluster:
Cluster:
data1

COMP2

5

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.12
3 Cluster pseudotime: 0.38
4 Cluster pseudotime: 1.00

0
836
0.8

-5

0.52229

6

1.404

-10
20
0

COMP1

-20

0

0.4

0.2

0.6

1

0.8

Cluster pseudotime

ATTENTION: the final graph must be connected (i.e. there is a path from any node/cluster to any other
node/cluster in the graph), otherwise a warning will be returned.
Since the transition from cluster 4 to cluster 3 is inconsistent with the capture time info (i.e. cluster pseudotime
values for cluster 4 and 3 are 1 and 0.38 respectively), we add one further edge to produce the following lineage
progression graph by pressing “p” and then “enter”:
Lineage Progression
Cluster:
Cluster:
Cluster:
Cluster:
data1

10

COMP2

66

3
0.88

5

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.12
3 Cluster pseudotime: 0.38
4 Cluster pseudotime: 1.00

0.52229

6

73

1.8

1.404

0

-5
20
0
-10
0

0.2

0.4

0.6

Cluster pseudotime

0.8

1

-20

COMP1

Based on our definition of branching point, we consider the previous inferred lineage graph still linear, since there
is only one final cell cluster (cluster 4). Moreover, since the transition from cluster 2 to cluster 4 bypasses a cluster
with intermediate pseudotime (i.e. cluster 3), we remove the spurious edge between cluster 2 and 4, by entering 1
and entering [2 4], upon the following query:

20

Lineage Progression
10

Cluster:
Cluster:
Cluster:
Cluster:
data1

1
3
2

0

836

0.8

COMP2

5

87
1.

6

-5

-10

36

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.12
3 Cluster pseudotime: 0.38
4 Cluster pseudotime: 1.00

0.52229

1.404

4

20
0

COMP1

-20

0.2

0

0.4

0.6

0.8

1

Cluster pseudotime

The final inferred lineage relationships is shown below.
Lineage Progression

0.8

1 3
2
6

836

10
5

PC2

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.12
3 Cluster pseudotime: 0.38
4 Cluster pseudotime: 1.00

73

1.8

4

0.52229

6

10

0

0

-5

PC1

Cluster:
Cluster:
Cluster:
Cluster:

-10

-10
0

0.2

0.4

-20
0.6

Cluster pseudotime

0.8

1

In addition, CALISTA returns (not shown):
-

4.4.4

Cell clustering plot based on the cluster pseudotime
Boxplot, mean, median entropy values calculated for each cluster
Plot of mean expression values for each gene based on cell cluster expression level

Determination of transition genes

After reconstructing the lineage progression, we identify the key transition genes for any two connected clusters in
the graph (results not shown here), based on the gene-wise likelihood difference between having the cells
separately as two clusters and together as a single cluster. Larger differences in the gene-wise likelihood point to
more informative genes. The transition genes are selected as those whose gene-wise likelihood differences make
up to more than a certain percentage of the cumulative sum of the likelihood differences of all genes – set by
INPUTS.thr_transition_genes.

4.4.5

Pseudotemporal ordering of cells

For pseudotemporal ordering of cells, CALISTA performs maximum likelihood optimization for each cell using a
linear interpolation of the cell likelihoods between any two connected clusters. The pseudotimes of the cells are
computed by linear interpolation of the cluster pseudotimes, and correspond to the maximum point of the
likelihood optimization above. Cells are subsequently assigned to the edges in the lineage progression graph. The
following screenshot gives the results of this cell-to-edge assignment.

21

10

Cell Ordering

5

PC2

0
-5

PC1

-10
20
0
-20
0

4.5

0.2

0.4

Cell Ordering

0.6

0.8

1

Example 5. Running CALISTA without time or cell stage information

Analysis of RT-qPCR data in Moignard et al. “Characterization of transcriptional networks in blood stem and
progenitor cells using high-throughput single-cell gene expression analysis”. Nat. Cell Biol. 15, 363–72 (2013).
Here, we report only the main steps of the analysis. For the complete analysis please check Example 4.2.

4.5.1

Data Import and Preprocessing

We change the current directory in MATLAB to the CALISTA folder.
We then edit the Example_5_MOIGNARD_scRT_qPCR_NO_TIME_INFO.m script in the main folder of
CALISTA and load Moignard dataset (in subfolder EXAMPLES/MOIGNARD):

4.5.2

Single-cell clustering

Following the original publication, we set the number of clusters equals to 5.

22

4.5.3

Reconstruction of lineage progression and pseudotemporal ordering of cells

We follow the steps as outlined in the other examples above to infer the lineage progression and carry out
pseudotemporal ordering of single cells.
Without the time or cell stage info, CALISTA is still able to recover the cluster progression based on:

The specification of the starting cell (e.g. cell 1):

a.

The final inferred lineage relationships are as follow.
Plot after cluster relabelling

4

2

1

COMP2

0

0.53963
0.6

63

-2

-4

Cluster:
Cluster:
Cluster:

2
5
3 4
0.43799
0.53
555

16

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.50
3 Cluster pseudotime: 0.57

-6 Cluster: 4 Cluster pseudotime: 0.91
10 Cluster: 5 Cluster pseudotime: 1.00
data1
0
-10

1

0.8

0.6

0.4

0.2

0

COMP1

Cluster pseudotime

CALISTA pseudotemporal ordering gives the following outcome.
Cell Ordering

4
2

PC2

0
-2
-4
-6
10
5
0

PC1

-5

0

0.2

0.4

1

0.8

0.6

1.2

Cell Ordering

The specification of a marker gene (e.g. ‘Erg’) which is downregulated (press 2):

b.

The final inferred lineage relationships:
Plot after cluster relabelling
4

COMP2

2

1

0

0.53963
16
63

-2
-4Cluster:
Cluster:
Cluster:
Cluster:
-6Cluster:
data1
10

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.50
3 Cluster pseudotime: 0.57
4 Cluster pseudotime: 0.91
5 Cluster pseudotime: 1.00

0

COMP1

-10

0

2 5
3 4
0.43799
0.5
355
5

0.6

0.2

0.4

0.6

0.8

1

Cluster pseudotime

CALISTA pseudotemporal ordering of cells gives the following result.

23

Cell Ordering
4
3
2
1

PC2

0
-1
-2
-3
-4
-5
-6
10
5
0
-5

PC1

1.5

1

0.5

0

Cell Ordering

Without any information of the time information, cell stage, starting cell and marker genes, CALISTA is still able
to find the topology of the lineage graph, but the edges are undirected.

CALISTA single-cell clustering result is as follows.
Original time/cell stage info

Cell Clustering
Time/Stage

Cluster
Cluster
Cluster
Cluster
Cluster

0

1
2
3
4
5

4

4

6

2

4
0

-2

-4

PC2

PC1

-4

-6

2
-2

0

PC2

4

0

2
-2

6

2

0
-2

-4
-6

-4

PC1

The final inferred lineage relationships are shown below.
Plot after cluster relabelling

8

K= 1 pseudo-stage
K= 2 pseudo-stage
K= 3 pseudo-stage
K= 4 pseudo-stage
K= 5 pseudo-stage
data1

6

COMP1

4

2

63

9
53
0.

4

-0.61
6

63

-

0
3

-0.4

-2

1

9
79

2

-0.5

-4

-6
-0.2

0

0.2

0.4

1
2
3
4
5

5

35

55

3
0.6

0.8

1

1.2

Cluster progression

Therefore, CALISTA performs the pseudotemporal ordering of cells as follows:

24

Cell Ordering

8

6

4

PC1

2

0

-2

-4

-6
-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Cell Ordering

4.6

Example 6. Removing undesired clusters

Analysis of RT-qPCR data in Moignard et al., Characterization of transcriptional networks in blood stem and
progenitor cells using high-throughput single-cell gene expression analysis, Nat. Cell Biol. 15, 363–72 (2013).

4.6.1

Data Import and Preprocessing

We change the current directory in MATLAB to the CALISTA folder. We run
Example_1_MOiGNARD_scRT_qPCR_GUI.m script in the main folder of CALISTA and load Moignard
dataset (in subfolder EXAMPLES/RT-qPCR):

4.6.2

Single-cell clustering

We set the number of clusters equals to 5 following the original publication.

25

CALISTA single-cell clustering result is shown below.
Original time/cell stage info

Cell Clustering

Time/Stage
Time/Stage
Time/Stage

1
2
3

K=
K=
K=
K=
K=

4

8
6

2
4
0

2

10
2
5
0
-2

0

-2

-4

-4
-6

1
2
2
3
3

4

0

-2

COMP2

1 pseudo-stage
2 pseudo-stage
3 pseudo-stage
4 pseudo-stage
5 pseudo-stage

COMP1

-6

PC2

-4

PC1
-6

-5

Let us proceed with removing cluster 3 and 5, by entering 1 and type [5 3] upon queried.

The indices of cells to remove are saved in a csv file.

We then edit MAIN.m script again, but we set the INPUTS as described previously except:
INPUTS.cells_2_cut=0; % Manual removal of cells

We run MAIN.m once more from the workspace and import Moignard dataset (in subfolder EXAMPLES/RTqPCR). We also upload the csv file containing cell’s indices to remove.

4.6.3
Single-cell clustering after removing undesired clusters
We now set the number of clusters equals to 3.

26

We obtain the following clustering results.
Original time/cell stage info

Cell Clustering
8
1
2
3

8
K=
K=
K=

6

1 pseudo-stage
2 pseudo-stage
3 pseudo-stage

1
2
3

6

4

4

2

2

0

PC1

COMP1

Time/Stage
Time/Stage
Time/Stage

0

-2

-2

-4

-4

-6

-6
4

4.6.4

2

0

-2

COMP2

-4

-6

4

2

0

-2

PC2

-4

-6

Reconstruction of lineage progression

We continue with lineage inference step. During the lineage inference step, CALISTA provides the minimal
connected graph (with nodes = cell clusters and edges = state transitions) as starting prediction for the
developmental hierarchy. In addition, the user can also manually add or remove one edge at time based on the
cluster distance values:

ATTENTION: to add an edge (press “p”), remove an edge (press “m”) or finalize the lineage progression graph
(press “enter”), the MATLAB figure of the graph must appear in foreground without any modification (e.g.,
zooming, rotation). Note that the addition/removal of the edges are performed according to increasing/decreasing
order of cluster distance.
ATTENTION: the final graph must be connected (i.e. there exists a path from any node/cluster to any other
node/cluster in the graph), otherwise a warning will be returned.
We do not need to remove spurious edges (entering 0 upon queried)

The final inferred lineage relationship is shown below

27

Lineage Progression

4

Cluster:
Cluster:
Cluster:
data1

2

COMP2

0

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.50
3 Cluster pseudotime: 1.00

-2
0.49935

0.4873

2

-4

COMP1

-6
10

0

-10
0

1

0.8

0.6

0.4

0.2

Cluster pseudotime

In addition, CALISTA returns (not shown):
-

4.6.5

Cell clustering plot based on the cluster pseudotime
Boxplot, mean, median entropy values calculated for each cluster
Plot of mean expression values for each gene based on cell cluster expression level

Determination of transition genes

After reconstructing the lineage progression, we identify the key transition genes for any two connected clusters in
the graph (results not shown here), based on the gene-wise likelihood difference between having the cells
separately as two clusters and together as a single cluster. Larger differences in the gene-wise likelihood point to
more informative genes. The transition genes are selected as those whose gene-wise likelihood differences make
up to more than a certain percentage of the cumulative sum of the likelihood differences of all genes – set by
INPUTS.thr_transition_genes.

4.6.6

Pseudotemporal ordering of cells

For pseudotemporal ordering of cells, CALISTA performs maximum likelihood optimization for each cell using a
linear interpolation of the cell likelihoods between any two connected clusters. The pseudotimes of the cells are
computed by linear interpolation of the cluster pseudotimes, and correspond to the maximum point of the
likelihood optimization above. Cells are subsequently assigned to the edges in the lineage progression graph. The
following screenshot gives the results of this cell-to-edge assignment.
Cell Ordering

4

PC2

2
0
-2
-4
-6
10
1.5
0

1
0.5

PC1

4.7

-10

0

Cell Ordering

Running CALISTA GUI

Analysis of RT-qPCR data of Bargaje et al. (Bargaje, et al, Cell population structure prior to bifurcation predicts
efficiency of directed differentiation in human induced pluripotent cells. Proc. Natl. Acad. Sci. U. S. A. 114, 2271–
2276 (2017)).
Here, we report only the main steps of the analysis. For the complete analysis please check Example 4.1.

4.7.1

Data Import and Preprocessing

We begin with changing the current directory in MATLAB to the CALISTA folder. Then, we edit the
Example_6_BARGAJE_scRT_qPCR_GUI.m script in the main folder of CALISTA and import Bargaje dataset
(available in the subfolder EXAMPLES/MOIGNARD).
The following are screenshots from running CALISTA on MATLAB.

28

4.7.2

Single-cell clustering

In this case, the number of clusters is determined using the eigengap plot. According to the eigengap plot below,
we set the number of clusters to 5. The following are screenshots from CALISTA single-cell clustering analysis.

29

Cell Clustering
-8

-6

-6

-4

-4

-2

-2

0

PC2

PC2

Original time/cell stage info
-8

2

Time/Stage
Time/Stage
Time/Stage
Time/Stage
Time/Stage
Time/Stage
Time/Stage
Time/Stage

4
6
8
10
15

10

5

PC1

0

-5

-10

-15

0
24
36
48
60
72
96
120

0
2
4
Cluster
Cluster
Cluster
Cluster
Cluster

6
8

1
2
3
4
5

10
15

10

5

PC1

0

-5

-10

-15

If desired, users can remove cells from specific clusters from further analysis. In this example, we do not want to
remove any clusters. Hence, we enter 0 (no cluster removal) and then 1 to proceed with lineage inference.

4.7.3

Reconstruction of lineage progression

During the lineage inference step, CALISTA automatically generates and displays a lineage graph, obtained by
adding an edge between two clusters in increasing cluster distances, until all clusters are connected to at least one
other cluster. Subsequently, users can manually add or remove one edge at time based on the cluster distances.

ATTENTION: select (or unselect) checkboxes to add (or remove) specific edges. To select (or unselect) all edges
use the “Select all” checkbox.
ATTENTION: to finalize the lineage progression (by pressing the “OK” button), the final graph must be
connected (i.e. there exists a path from any node/cluster to any other node/cluster in the graph).
Since the transition from cluster 1 to cluster 5 is inconsistent with the capture time info (i.e. cluster pseudotime
values for cluster 1 and 5 are 0 and 1 respectively) we remove the spurious edge between cluster 1 and 5, by
unselecting the second checkbox:

30

We press the “OK” button to confirm and the final inferred lineage relationships are displayed below.
Lineage Progression
Cluster:
Cluster:
Cluster:
Cluster:
Cluster:
data1

10

1 Cluster pseudotime: 0.00
2 Cluster pseudotime: 0.50
3 Cluster pseudotime: 0.75
4 Cluster pseudotime: 1.00
5 Cluster pseudotime: 1.00

5
0.41808

0.52

226

49

99

3

0
0.

COMP2

0.53909

-5

-10
20
10
0
-10

COMP1

-20

0

0.2

0.4

0.6

0.8

1

Cluster pseudotime

In addition, CALISTA gives (not shown):
-

4.7.4

Cell clustering plot based on the cluster pseudotime
Boxplot, mean, median entropy values calculated for each cluster
Plot of mean expression values for each gene based on cell cluster expression level

Determination of transition genes

After reconstructing the lineage progression, we identify the key transition genes for any two connected clusters in
the graph (results not shown here), based on the gene-wise likelihood difference between having the cells
separately as two clusters and together as a single cluster. Larger differences in the gene-wise likelihood point to
more informative genes. The transition genes are selected as those whose gene-wise likelihood differences make

31

up to more than a certain percentage of the cumulative sum of the likelihood differences of all genes – set by
INPUTS.thr_transition_genes.

4.7.5

Pseudotemporal ordering of cells

For pseudotemporal ordering of cells, CALISTA performs maximum likelihood optimization for each cell using a
linear interpolation of the cell likelihoods between any two connected clusters. The pseudotimes of the cells are
computed by linear interpolation of the cluster pseudotimes, and correspond to the maximum point of the
likelihood optimization above. Cells are subsequently assigned to the edges in the lineage progression graph. The
following screenshot gives the results of this cell-to-edge assignment.

4.8

Example 8. Reconstruction of developmental trajectories during zebrafish embryogenesis

Analysis of Drop-seq data of Farrell et al. (Farrell, J. A. et al. Single-cell reconstruction of developmental
trajectories during zebrafish embryogenesis. Science 360, eaar3131 (2018)).

4.8.1

Data Import and Preprocessing

We begin with changing the current directory in MATLAB to the CALISTA folder. Then, we run
Example_7_FARRELL_scDrop_seq.m script in the main folder of CALISTA and import Farrell dataset
(available upon request due to the large file size OR run save_to_matlab.R in R to convert the original data into
Matlab file).
The following are screenshots from running CALISTA on MATLAB.

CALISTA processes the data from each time point separately:

32

4.8.2

Single-cell clustering

We run a CALISTA clustering for each data point as follows. First, the number of clusters at the final time point is
determined using the eigengap plot. According to the eigengap plot below, we set the number of clusters to 23.
The following are screenshots from CALISTA single-cell clustering analysis.

Then CALISTA will automatically detect the optimal number of clusters for the remaining time points.

33

If desired, users can remove cells from specific clusters from further analysis. In this example, we do not want to
remove any clusters. Hence, we enter 0 (no cluster removal) and then 1 to proceed with lineage inference.

4.8.3

Reconstruction of lineage progression

During the lineage inference step for time series Drop-seq data, CALISTA automatically generates and displays a
lineage graph, obtained by calculating the shortest path between each cluster at the final cluster progression time
and the starting cluster. The inferred lineage is represented by a tree-based graph:

Here, we do not need to remove any spurious edges, and hence we enter 0 upon queried.

4.8.4

Determination of transition genes

After reconstructing the lineage progression, we identify the key transition genes for any two connected clusters in
the graph, based on the gene-wise likelihood difference between having the cells separately as two clusters and
together as a single cluster. Larger differences in the gene-wise likelihood point to more informative genes. The
transition genes are selected as those whose gene-wise likelihood differences make up to more than a certain
percentage of the cumulative sum of the likelihood differences of all genes – set by
INPUTS.thr_transition_genes.

4.8.5

Pseudotemporal ordering of cells

For pseudotemporal ordering of cells, CALISTA performs maximum likelihood optimization for each cell using a
linear interpolation of the cell likelihoods between any two connected clusters. The pseudotimes of the cells are
computed by linear interpolation of the cluster pseudotimes, and correspond to the maximum point of the
likelihood optimization above. Cells are subsequently assigned to the edges in the lineage progression graph. The
following screenshot gives the results of this cell-to-edge assignment.

34

4.8.6

Path analysis

To plot the expression of marker genes along each path, users can enter 1 upon queried and load the excel file
containing the list of genes if interest. In this case the file is in EXAMPLES/FARRELL:

4.9

Example 9. Identification of mouse spinal cord neurons activity during behavior

Analysis of snRNA-seq data of Sathyamurthy et al. (Sathyamurthy, A. et al. Massively Parallel Single

Nucleus Transcriptional Profiling Defines Spinal Cord Neurons and Their Activity during Behavior.
Cell Rep. 22, 2216–2225 (2018)).
4.9.1

Data Import and Preprocessing

We begin with changing the current directory in MATLAB to the CALISTA folder. Then, we run
Example_8_SATHYAMURTHY_DropNc_seq.m script in the main folder of CALISTA and import Farrell
dataset (available upon request due to the large file size).
The following are screenshots from running CALISTA on MATLAB.

35

4.9.2

Single-cell clustering

The number of clusters is determined using the eigengap plot. According to the eigengap plot below, we set the
number of clusters to 9. The following are screenshots from CALISTA single-cell clustering analysis.
Eigengap values

1

First Eigengap: 12
Second Eigengap: 14
Third Eigengap: 9

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0

2

4

6

8

10

12

14

16

18

20

Number of clusters

If desired, users can remove cells from specific clusters from further analysis. In this example, we do not want to
remove any clusters. Hence, we enter 0 (no cluster removal) and then 1 to proceed with lineage inference.

We can load the list of marker genes and visualize the mean expression of each predicted cluster:

36

Neuron 1
Neuron 2
Oligo
Schwann
Meningeal
Astrocyte
Vascular
OPC

Sn
ap

25
R Syp
bf
o
Sn x3
hg
1
M 1
M bp
ob
p
M
og
Pl
p1
M
Pm pz
p2
2
Pr
x
D
C cn
ol
3a
1
Ig
A f2
At qp4
p1
a
G 2
Sl ja1
c1
a2
Pe Fl
ca t1
m
1
Te
M k
Pd yl9
g
C frb
sp
G g4
pr
Pd 17
gf
r
C a
t
Itg ss
a
Pt m
pr
c

Microglia

4.10

Example 9. Analysis of peripherical blood mononuclear cells (PBMCs)

Analysis of Drop-seq data of Zheng et al. (Zheng, G. X. Y. et al. Massively parallel digital transcriptional
profiling of single cells. Nat. Commun. 8, 14049 (2017).).

4.10.1

Data Import and Preprocessing

We begin with changing the current directory in MATLAB to the CALISTA folder. Then, we run
Example_8_SATHYAMURTHY_DropNc_seq.m script in the main folder of CALISTA and import Farrell
dataset (available upon request due to the large file size).
The following are screenshots from running CALISTA on MATLAB.

4.10.2

Single-cell clustering

Following the clustering analysis of the original publication, We set the number of clusters to 10. The following
are screenshots from CALISTA single-cell clustering analysis.
If desired, users can remove cells from specific clusters from further analysis. In this example, we do not want to
remove any clusters. Hence, we enter 0 (no cluster removal) and then 1 to proceed with lineage inference.

37

5

Questions and comments

Please address any problem or comment to: nanp@ethz.ch or rudiyant@buffalo.edu.

38



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
Linearized                      : No
Page Count                      : 38
PDF Version                     : 1.4
Title                           : Microsoft Word - 1 CALISTA_USER_MANUAL.docx
Producer                        : macOS Version 10.14 (Build 18A391) Quartz PDFContext
Creator                         : Word
Create Date                     : 2018:12:03 17:20:42Z
Modify Date                     : 2018:12:03 17:20:42Z
EXIF Metadata provided by EXIF.tools

Navigation menu