Description
This dataset accompanies the research article “Shape Matters: Inferring the motility of confluent cells from static images” (Soft Matter, 2025). The data was generated using simulations based on the Cellular Potts model, a computational framework widely used to study collective cell behavior and tissue morphogenesis.
Feature data
The simulations model a confluent layer of cells with subpopulations that are characterized by a different motility. Simulations were run for 6 different conditions (see labeling in the table below) and with different number of high-motility (Na) and low-motility cells.
High-motility cells
Low-motility cells
A
kappa = 1500
kappa = 0
B
kappa = 1500
kappa = 150
C
kappa = 750
kappa = 150
D
kappa = 500
kappa = 150
E
kappa = 375
kappa = 150
F
kappa = 300
kappa = 150
From each individual cell in the simulations, we extracted an extensive list of features (see Table 1 in the manuscript for the definitions). The complete dataset of extracted features is available in the folders of this repository.
File Naming Convention:Files are named using the format:
A/M/ML_data_M_.pkl
Where:
indicates the number of high-motility cells (Na)
denotes the index of the independent simulation replicate
Machine Learning results
The features are used in a machine-learning model. This model uses the features to generate a classification report. The machine-learning model has used either the complete data set (All) or a subset of the features (e.g. Local_and_Shape). The results of these model calculations are stored in the ML folder.
File Naming Convention:
Files are named using the format:
ML/A//on__trained
Where:
indicates which subset of the data has been used
indicates what training set has been used (e.g. itself = same training/testing, 1 = trained on simulations with 1 high-motility cell)
For each condition, the 20 independent classification report are provided.
Feature data
The simulations model a confluent layer of cells with subpopulations that are characterized by a different motility. Simulations were run for 6 different conditions (see labeling in the table below) and with different number of high-motility (Na) and low-motility cells.
High-motility cells
Low-motility cells
A
kappa = 1500
kappa = 0
B
kappa = 1500
kappa = 150
C
kappa = 750
kappa = 150
D
kappa = 500
kappa = 150
E
kappa = 375
kappa = 150
F
kappa = 300
kappa = 150
From each individual cell in the simulations, we extracted an extensive list of features (see Table 1 in the manuscript for the definitions). The complete dataset of extracted features is available in the folders of this repository.
File Naming Convention:Files are named using the format:
A/M/ML_data_M_.pkl
Where:
indicates the number of high-motility cells (Na)
denotes the index of the independent simulation replicate
Machine Learning results
The features are used in a machine-learning model. This model uses the features to generate a classification report. The machine-learning model has used either the complete data set (All) or a subset of the features (e.g. Local_and_Shape). The results of these model calculations are stored in the ML folder.
File Naming Convention:
Files are named using the format:
ML/A//on__trained
Where:
indicates which subset of the data has been used
indicates what training set has been used (e.g. itself = same training/testing, 1 = trained on simulations with 1 high-motility cell)
For each condition, the 20 independent classification report are provided.
| Date made available | 19 Jun 2025 |
|---|---|
| Publisher | Zenodo |