import anndata as ad
import pooch
import scanpy as sc
Preprocessing and clustering
=50, facecolor="white") sc.settings.set_figure_params(dpi
The data used in this basic preprocessing and clustering tutorial was collected from bone marrow mononuclear cells of healthy human donors and was part of openproblem’s NeurIPS 2021 benchmarking dataset {cite}luecken2021
. The samples used in this tutorial were measured using the 10X Multiome Gene Expression and Chromatin Accessability kit.
We are reading in the count matrix into an AnnData object, which holds many slots for annotations and different representations of the data.
= pooch.create(
EXAMPLE_DATA =pooch.os_cache("scverse_tutorials"),
path="doi:10.6084/m9.figshare.22716739.v1/",
base_url
) EXAMPLE_DATA.load_registry_from_doi()
= {
samples "s1d1": "s1d1_filtered_feature_bc_matrix.h5",
"s1d3": "s1d3_filtered_feature_bc_matrix.h5",
}= {}
adatas
for sample_id, filename in samples.items():
= EXAMPLE_DATA.fetch(filename)
path = sc.read_10x_h5(path)
sample_adata
sample_adata.var_names_make_unique()= sample_adata
adatas[sample_id]
= ad.concat(adatas, label="sample")
adata adata.obs_names_make_unique()
Downloading file 's1d1_filtered_feature_bc_matrix.h5' from 'doi:10.6084/m9.figshare.22716739.v1/s1d1_filtered_feature_bc_matrix.h5' to '/home/runner/.cache/scverse_tutorials'.
/home/runner/miniconda3/envs/tutorials/lib/python3.12/site-packages/anndata/_core/anndata.py:1756: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
utils.warn_names_duplicates("var")
/home/runner/miniconda3/envs/tutorials/lib/python3.12/site-packages/anndata/_core/anndata.py:1756: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
utils.warn_names_duplicates("var")
Downloading file 's1d3_filtered_feature_bc_matrix.h5' from 'doi:10.6084/m9.figshare.22716739.v1/s1d3_filtered_feature_bc_matrix.h5' to '/home/runner/.cache/scverse_tutorials'.
/home/runner/miniconda3/envs/tutorials/lib/python3.12/site-packages/anndata/_core/anndata.py:1756: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
utils.warn_names_duplicates("var")
/home/runner/miniconda3/envs/tutorials/lib/python3.12/site-packages/anndata/_core/anndata.py:1756: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
utils.warn_names_duplicates("var")
/home/runner/miniconda3/envs/tutorials/lib/python3.12/site-packages/anndata/_core/anndata.py:1754: UserWarning: Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
utils.warn_names_duplicates("obs")
The data contains 8,785 cells and 36,601 measured genes. This tutorial includes a basic preprocessing and clustering workflow.
Quality Control
The scanpy function {func}~scanpy.pp.calculate_qc_metrics
calculates common quality control (QC) metrics, which are largely based on calculateQCMetrics
from scater {cite}McCarthy2017
. One can pass specific gene population to {func}~scanpy.pp.calculate_qc_metrics
in order to calculate proportions of counts for these populations. Mitochondrial, ribosomal and hemoglobin genes are defined by distinct prefixes as listed below.
# mitochondrial genes
"mt"] = adata.var_names.str.startswith("MT-") # "MT-" for human, "Mt-" for mouse
adata.var[# ribosomal genes
"ribo"] = adata.var_names.str.startswith(("RPS", "RPL"))
adata.var[# hemoglobin genes
"hb"] = adata.var_names.str.contains("^HB[^(P)]") adata.var[
=["mt", "ribo", "hb"], inplace=True, log1p=True) sc.pp.calculate_qc_metrics(adata, qc_vars
One can now inspect violin plots of some of the computed QC metrics:
- the number of genes expressed in the count matrix
- the total counts per cell
- the percentage of counts in mitochondrial genes
"n_genes_by_counts", "total_counts", "pct_counts_mt"], jitter=0.4, multi_panel=True) sc.pl.violin(adata, [
Additionally, it is useful to consider QC metrics jointly by inspecting a scatter plot colored by pct_counts_mt
.
"total_counts", "n_genes_by_counts", color="pct_counts_mt") sc.pl.scatter(adata,
Based on the QC metric plots, one could now remove cells that have too many mitochondrial genes expressed or too many total counts by setting manual or automatic thresholds. However, it proved to be beneficial to apply a very permissive filtering strategy in the beginning for your single-cell analysis and filter low quality cells during clustering or revisit the filtering again at a later point. We therefore now only filter cells with less than 100 genes expressed and genes that are detected in less than 3 cells.
Additionally, it is important to note that for datasets with multiple batches, quality control should be performed for each sample individually as quality control thresholds can very substantially between batches.
=100)
sc.pp.filter_cells(adata, min_genes=3) sc.pp.filter_genes(adata, min_cells
Doublet detection
As a next step, we run a doublet detection algorithm. Identifying doublets is crucial as they can lead to misclassifications or distortions in downstream analysis steps. Scanpy contains the doublet detection method Scrublet {cite}Wolock2019
. Scrublet predicts cell doublets using a nearest-neighbor classifier of observed transcriptomes and simulated doublets. {func}scanpy.pp.scrublet
adds doublet_score
and predicted_doublet
to .obs
. One can now either filter directly on predicted_doublet
or use the doublet_score
later during clustering to filter clusters with high doublet scores.
="sample") sc.pp.scrublet(adata, batch_key
Alternative methods for doublet detection within the scverse ecosystem are DoubletDetection and SOLO. You can read more about these in the Doublet Detection chapter of Single Cell Best Practices.
Normalization
The next preprocessing step is normalization. A common approach is count depth scaling with subsequent log plus one (log1p) transformation. Count depth scaling normalizes the data to a “size factor” such as the median count depth in the dataset, ten thousand (CP10k) or one million (CPM, counts per million). The size factor for count depth scaling can be controlled via target_sum
in pp.normalize_total
. We are applying median count depth normalization with log1p transformation (AKA log1PF).
# Saving count data
"counts"] = adata.X.copy() adata.layers[
# Normalizing to median total counts
sc.pp.normalize_total(adata)# Logarithmize the data:
sc.pp.log1p(adata)
Feature selection
As a next step, we want to reduce the dimensionality of the dataset and only include the most informative genes. This step is commonly known as feature selection. The scanpy function pp.highly_variable_genes
annotates highly variable genes by reproducing the implementations of Seurat {cite}Satija2015
, Cell Ranger {cite}Zheng2017
, and Seurat v3 {cite}stuart2019comprehensive
depending on the chosen flavor
.
=2000, batch_key="sample") sc.pp.highly_variable_genes(adata, n_top_genes
sc.pl.highly_variable_genes(adata)
Dimensionality Reduction
Reduce the dimensionality of the data by running principal component analysis (PCA), which reveals the main axes of variation and denoises the data.
sc.tl.pca(adata)
Let us inspect the contribution of single PCs to the total variance in the data. This gives us information about how many PCs we should consider in order to compute the neighborhood relations of cells, e.g. used in the clustering function {func}~scanpy.tl.leiden
or {func}~scanpy.tl.tsne
. In our experience, often a rough estimate of the number of PCs does fine.
=50, log=True) sc.pl.pca_variance_ratio(adata, n_pcs
Visualization
Let us compute the neighborhood graph of cells using the PCA representation of the data matrix.
sc.pp.neighbors(adata)
/home/runner/miniconda3/envs/tutorials/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
We suggest embedding the graph in two dimensions using UMAP (McInnes et al., 2018), see below.
sc.tl.umap(adata)
We can now visualize the UMAP according to the sample
.
="sample") sc.pl.umap(adata, color
Even though the data considered in this tutorial includes two different samples, we only observe a minor batch effect and we can continue with clustering and annotation of our data.
If you inspect batch effects in your UMAP it can be beneficial to integrate across samples and perform batch correction/integration.
Clustering
As with Seurat and many other frameworks, we recommend the Leiden graph-clustering method (community detection based on optimizing modularity) {cite}traag2019louvain
. Note that Leiden clustering directly clusters the neighborhood graph of cells, which we already computed in the previous section.
="igraph") sc.tl.leiden(adata, flavor
=["leiden"]) sc.pl.umap(adata, color
Re-assess quality control and cell filtering
As indicated before, we will now re-assess our filtering strategy by visualizing different QC metrics using UMAP.
"predicted_doublet"] = adata.obs["predicted_doublet"].astype("category")
adata.obs[
sc.pl.umap(
adata,=["leiden", "predicted_doublet", "doublet_score"],
color# increase horizontal space between panels
=0.5,
wspace )
We can now subset the AnnData object to exclude cells predicted as doublets:
= adata[~adata.obs["predicted_doublet"].to_numpy()].copy() adata
sc.pl.umap(=["leiden", "log1p_total_counts", "pct_counts_mt", "log1p_n_genes_by_counts"], wspace=0.5, ncols=2
adata, color )
Cell-type annotation
import celltypist as ct
import decoupler as dc
We have now reached a point where we have obtained a set of cells with decent quality, and we can proceed to their annotation to known cell types. Typically, this is done using genes that are exclusively expressed by a given cell type, or in other words these genes are the marker genes of the cell types, and are thus used to distinguish the heterogeneous groups of cells in our data. Previous efforts have collected and curated various marker genes into available resources, such as CellMarker, TF-Marker, and PanglaoDB.
Commonly and classically, cell type annotation uses those marker genes subsequent to the grouping of the cells into clusters. So, let’s generate a set of clustering solutions which we can then use to annotate our cell types. Here, we will use the Leiden clustering algorithm which will extract cell communities from our nearest neighbours graph.
="igraph", key_added="leiden_res0_02", resolution=0.02)
sc.tl.leiden(adata, flavor="igraph", key_added="leiden_res0_5", resolution=0.5)
sc.tl.leiden(adata, flavor="igraph", key_added="leiden_res2", resolution=2) sc.tl.leiden(adata, flavor
Notably, the number of clusters that we define is largely arbitrary, and so is the resolution
parameter that we use to control for it. As such, the number of clusters is ultimately bound to the stable and biologically-meaningful groups that we can ultimately distringuish, typically done by experts in the corresponding field or by using expert-curated prior knowledge in the form of markers.
sc.pl.umap(
adata,=["leiden_res0_02", "leiden_res0_5", "leiden_res2"],
color="on data",
legend_loc )
Though UMAPs should not be over-interpreted, here we can already see that in the highest resolution our data is over-clustered, while the lowest resolution is likely grouping cells which belong to distinct cell identities.
Marker gene set
Let’s define a set of marker genes for the main cell types that we expect to see in this dataset. These were adapted from Single Cell Best Practices annotation chapter, for a more detailed overview and best practices in cell type annotation, we refer the user to it.
= {
marker_genes "CD14+ Mono": ["FCN1", "CD14"],
"CD16+ Mono": ["TCF7L2", "FCGR3A", "LYN"],
# Note: DMXL2 should be negative
"cDC2": ["CST3", "COTL1", "LYZ", "DMXL2", "CLEC10A", "FCER1A"],
"Erythroblast": ["MKI67", "HBA1", "HBB"],
# Note HBM and GYPA are negative markers
"Proerythroblast": ["CDK6", "SYNGR1", "HBM", "GYPA"],
"NK": ["GNLY", "NKG7", "CD247", "FCER1G", "TYROBP", "KLRG1", "FCGR3A"],
"ILC": ["ID2", "PLCG2", "GNLY", "SYNE1"],
"Naive CD20+ B": ["MS4A1", "IL4R", "IGHD", "FCRL1", "IGHM"],
# Note IGHD and IGHM are negative markers
"B cells": ["MS4A1", "ITGB1", "COL4A4", "PRDM1", "IRF4", "PAX5", "BCL11A", "BLK", "IGHD", "IGHM"],
"Plasma cells": ["MZB1", "HSP90B1", "FNDC3B", "PRDM1", "IGKC", "JCHAIN"],
"Plasmablast": ["XBP1", "PRDM1", "PAX5"], # Note PAX5 is a negative marker
"CD4+ T": ["CD4", "IL7R", "TRBC2"],
"CD8+ T": ["CD8A", "CD8B", "GZMK", "GZMA", "CCL5", "GZMB", "GZMH", "GZMA"],
"T naive": ["LEF1", "CCR7", "TCF7"],
"pDC": ["GZMB", "IL3RA", "COBLL1", "TCF4"],
}
def group_max(adata: sc.AnnData, groupby: str) -> str:
import pandas as pd
= sc.get.aggregate(adata, by=groupby, func="mean")
agg return pd.Series(agg.layers["mean"].sum(1), agg.obs[groupby]).idxmax()
="leiden_res0_02") sc.pl.dotplot(adata, marker_genes, groupby
Here, we can see that cluster {eval}group_max(adata[:, marker_genes["NK"]], "leiden_res0_02")
perhaps contains an admixture of monocytes and dendritic cells, while in cluster {eval}group_max(adata[:, marker_genes["B cells"]], "leiden_res0_02")
we have different populations of B lymphocytes. Thus, we should perhaps consider a higher clustering resolution.
="leiden_res0_5") sc.pl.dotplot(adata, marker_genes, groupby
This seems like a resolution that suitable to distinguish most of the different cell types in our data. Ideally, one would look specifically into each cluster, and attempt to subcluster those if required.
Automatic label prediction
In addition to using marker collections to annotate our labels, there exist approaches to automatically annotate scRNA-seq datasets. One such tool is CellTypist, which uses gradient-descent optimised logistic regression classifiers to predict cell type annotations.
First, we need to retrive the CellTypist models that we wish to use, in this case we will use models with immune cell type and subtype populations generated using 20 tissues from 18 studies (Domínguez Conde, et al. 2022).
=["Immune_All_Low.pkl"], force_update=True) ct.models.download_models(model
📜 Retrieving model list from server https://celltypist.cog.sanger.ac.uk/models/models.json
📚 Total models in list: 50
📂 Storing models in /home/runner/.celltypist/data/models
💾 Total models to download: 1
💾 Downloading model [1/1]: Immune_All_Low.pkl
Then we predict the major cell type annotations. In this case we will enable majority_voting
, which will assign a label to the clusters that we obtained previously.
= ct.models.Model.load(model="Immune_All_Low.pkl")
model = ct.annotate(adata, model="Immune_All_Low.pkl", majority_voting=True, over_clustering="leiden_res0_5")
predictions # convert back to anndata||
= predictions.to_adata() adata
⚠️ Warning: invalid expression matrix, expect ALL genes and log1p normalized expression to 10000 counts per cell. The prediction result may not be accurate
🔬 Input data has 16828 cells and 23427 genes
🔗 Matching reference genes in the model
🧬 5852 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
🗳️ Majority voting the predictions
✅ Majority voting done!
Let’s examine the results of automatic clustering:
="majority_voting", ncols=1) sc.pl.umap(adata, color
Note that our previously ‘Unknown’ cluster is now assigned as ‘Pro-B cells’.
Annotation with enrichment analysis
Automatic cell type labelling with methods that require pre-trained models will not always work as smoothly, as such classifiers need to be trained on and be representitive for a given tissue and the cell types within it. So, as a more generalizable approach to annotate the cells, we can also use the marker genes from any database, for example PanglaoDB. Here we will use it with simple multi-variate linear regression, implemented in decoupler. Essentially, this will test if any collection of genes are enriched in any of the cells. Ultimately, this approach is similar to many other marker-based classifiers.
Let’s get canonical cell markers using with decoupler which queries the OmniPath metadata-base to obtain the PanglaoDB marker gene database with cannonical cell type markers.
# Query Omnipath and get PanglaoDB
= dc.get_resource(name="PanglaoDB", organism="human")
markers # Keep canonical cell type markers alone
= markers[markers["canonical_marker"]]
markers
# Remove duplicated entries
= markers[~markers.duplicated(["cell_type", "genesymbol"])]
markers markers.head()
Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
Downloading data from `https://omnipathdb.org/about?format=text`
Downloading data from `https://omnipathdb.org/resources?format=json`
0.00B [00:00, ?B/s]56.5kB [00:00, 168MB/s]
Downloading annotations for all proteins from the following resources: `['PanglaoDB']`
Downloading data from `https://omnipathdb.org/annotations?entity_types=protein&format=tsv&resources=PanglaoDB`
0.00B [00:00, ?B/s]449kB [00:00, 4.59MB/s]1.78MB [00:00, 6.86MB/s]4.76MB [00:00, 15.6MB/s]5.91MB [00:00, 15.9MB/s]
genesymbol | canonical_marker | cell_type | germ_layer | human | human_sensitivity | human_specificity | mouse | mouse_sensitivity | mouse_specificity | ncbi_tax_id | organ | ubiquitiousness | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | CTRB1 | True | Acinar cells | Endoderm | True | 1.000000 | 0.000629 | True | 0.957143 | 0.015920 | 9606 | Pancreas | 0.017 |
2 | KLK1 | True | Endothelial cells | Mesoderm | True | 0.000000 | 0.008420 | True | 0.000000 | 0.014915 | 9606 | Vasculature | 0.013 |
5 | KLK1 | True | Principal cells | Mesoderm | True | 0.000000 | 0.008145 | True | 0.285714 | 0.014058 | 9606 | Kidney | 0.013 |
6 | KLK1 | True | Acinar cells | Endoderm | True | 0.833333 | 0.005031 | True | 0.314286 | 0.012826 | 9606 | Pancreas | 0.013 |
7 | KLK1 | True | Plasmacytoid dendritic cells | Mesoderm | True | 0.000000 | 0.008202 | True | 1.000000 | 0.012914 | 9606 | Immune system | 0.013 |
=adata, net=markers, weight=None, source="cell_type", target="genesymbol", verbose=True, use_raw=False) dc.run_mlm(mat
Running mlm on mat with 16828 samples and 23427 targets for 118 sources.
0%| | 0/2 [00:00<?, ?it/s] 50%|█████ | 1/2 [00:05<00:05, 5.44s/it]100%|██████████| 2/2 [00:09<00:00, 4.45s/it]100%|██████████| 2/2 [00:09<00:00, 4.60s/it]
The obtained results are stored in the .obsm key, with mlm_estimate
representing coefficient t-values:
"mlm_estimate"].head() adata.obsm[
Endothelial cells | Principal cells | Acinar cells | Plasmacytoid dendritic cells | Paneth cells | Fibroblasts | Ductal cells | Macrophages | Luminal epithelial cells | Pulmonary alveolar type II cells | ... | Enterochromaffin cells | Urothelial cells | Peritubular myoid cells | Germ cells | Distal tubule cells | Follicular cells | Taste receptor cells | GABAergic neurons | Meningeal cells | Merkel cells | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AAACCCAAGGATGGCT-1 | -0.707396 | -0.849383 | -0.845672 | -0.492002 | -1.367829 | 0.578436 | 0.238030 | -1.056763 | 2.040976 | -0.677132 | ... | -0.254653 | -0.732992 | -0.524543 | -0.425869 | 0.134747 | -0.597432 | -0.938093 | -0.469835 | 0.039471 | -0.104978 |
AAACCCAAGGCCTAGA-1 | -0.222257 | 0.336049 | 0.262638 | 0.574957 | 1.222508 | 2.367615 | 0.674121 | -1.631422 | 1.335576 | -0.514058 | ... | -0.363779 | -1.063999 | 0.262833 | -0.071403 | -1.137959 | -0.811549 | -0.769173 | -0.573635 | -0.149589 | -1.024753 |
AAACCCAAGTGAGTGC-1 | -0.619661 | 0.646598 | 0.380288 | -1.134570 | 0.735214 | 0.339773 | -1.128929 | 1.977640 | 2.373118 | -0.229449 | ... | -0.106915 | -0.388254 | -0.557494 | 1.438753 | -0.358688 | -0.401626 | -0.733657 | -0.284470 | 0.014862 | -0.348243 |
AAACCCACAAGAGGCT-1 | -0.259110 | -1.807428 | -1.449395 | 12.765751 | -1.157377 | 2.245884 | 0.020253 | 1.010314 | 2.840580 | -1.037617 | ... | -0.700302 | -1.006705 | 2.407655 | -0.961870 | -0.362841 | -0.810489 | -0.560645 | -0.565012 | -0.357717 | 0.842545 |
AAACCCACATCGTGGC-1 | 0.826456 | -0.495052 | 1.222837 | -1.506009 | -0.619046 | 1.809103 | -0.611490 | 1.060724 | -0.686973 | 0.737467 | ... | -0.995505 | -0.386358 | -0.702472 | -0.736160 | -0.276282 | -0.195236 | 1.622501 | -0.207567 | -0.410453 | 3.025584 |
5 rows × 118 columns
To visualize the obtianed scores, we can re-use any of scanpy’s plotting functions. First though, we will extract them from the adata object.
= dc.get_acts(adata=adata, obsm_key="mlm_estimate")
acts
sc.pl.umap(
acts,=[
color"majority_voting",
"B cells",
"T cells",
"Monocytes",
"Erythroid-like and erythroid precursor cells",
"NK cells",
],=0.5,
wspace=3,
ncols )
These results further confirm the our automatic annotation with Celltypist. In addition, here we can also transfer the max over-representation score estimates to assign a label to each cluster.
= dc.summarize_acts(acts, groupby="leiden_res0_5", min_std=1.0)
mean_enr = dc.assign_groups(mean_enr)
annotation_dict "dc_anno"] = [annotation_dict[clust] for clust in adata.obs["leiden_res0_5"]] adata.obs[
Let’s compare all resulting annotations here
=["majority_voting", "dc_anno"], ncols=1) sc.pl.umap(adata, color
... storing 'dc_anno' as categorical
Great. We can see that the different approaches to annotate the data are largely concordant. Though, these annotations are decent, cell type annotation is laborous and repetitive task, one which typically requires multiple rounds of sublucstering and re-annotation. Nevertheless, we now have a good basis with which we can further proceed with manually refining our annotations.
Differentially-expressed Genes as Markers
Furthermore, one can also calculate marker genes per cluster and then look up whether we can link those marker genes to any known biology, such as cell types and/or states. This is typically done using simple statistical tests, such as Wilcoxon and t-test, for each cluster vs the rest.
# Obtain cluster-specific differentially expressed genes
="leiden_res0_5")
sc.tl.rank_genes_groups(adata, groupby# Filter those
=1.5) sc.tl.filter_rank_genes_groups(adata, min_fold_change
We can then visualize the top 5 differentially-expressed genes on a dotplot.
="leiden_res0_5", standard_scale="var", n_genes=5) sc.pl.rank_genes_groups_dotplot(adata, groupby
WARNING: dendrogram data not found (using key=dendrogram_leiden_res0_5). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
We see that LYZ, ACT8, S100A6, S100A4, and CST3 are all highly expressed in cluster 3
. Let’s visualize those at the UMAP space:
= ["LYZ", "ACTB", "S100A6", "S100A4", "CST3"]
cluster3_genes =[*cluster3_genes, "leiden_res0_5"], legend_loc="on data", frameon=False, ncols=3) sc.pl.umap(adata, color
Similarly, we can also generate a Violin plot with the distrubtions of the same genes across the clusters.
=cluster3_genes[0:3], groupby="leiden_res0_5", multi_panel=True) sc.pl.violin(adata, keys
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting.