Skip to content

Beyond the annotation bottleneck: AI tactics for large-scale screens

What stood out across the BC2 2025 seminar was how fast “large‑screen” biology, gigapixel slides, spatial omics, and high‑content perturbation assays, is learning to live with fewer, better labels. Rather than chasing an ever‑growing mountain of annotations, groups are cleaning data locally, aligning tissues globally, and inviting experts back into the loop exactly where algorithms disagree. The result isn’t headline‑grabbing automation so much as dependable acceleration.

Stephanie Hicks’ team offered the clearest blueprint for that shift on the spatial side. First, fix the data where biology and artefacts get confounded. In spot‑based platforms, library size varies with tissue composition, white matter looks “low quality” by global thresholds and gets thrown away. SpotSweeper swaps those brittle global cutoffs for local QC, flagging outliers and region‑level artifacts relative to each spot’s neighborhood rather than the whole slide. It’s a pragmatic change that preserves real signal and standardizes what downstream models see [1].

Second, integrate modalities to define structure once and reuse it everywhere. Proust, a graph‑based, contrastive autoencoder, fuses RNA, protein immunofluorescence, and H&E into spatial domains that map cleanly onto known anatomy and travel across datasets. The point isn’t a new leaderboard score; it’s a reusable scaffold of tissue organization for analysis and model training [2].

Third, line up sections across samples and atlases so “where” is comparable before we ask “what changes.” STalign uses diffeomorphic metric mapping to register spatial datasets, even with partial overlaps, and align them to 3D frameworks, enabling multi‑sample synthesis and consistent region‑of‑interest definitions without hand‑tuned warps [3].

If spatial pipelines are becoming cleaner and more comparable, the field is also getting more honest about uncertainty. Jieran Sun and colleagues stress‑tested 22 spatially aware clustering methods across 15 datasets. No single tool won; manual “ground truth” wasn’t reliably so; and performance swung with platform and tissue. Their SACCELERATOR workflow lands on a consensus map and, crucially, marks high‑entropy regions where methods disagree explicit invitations for expert review rather than silent model drift [4]. That pattern mirrors what’s working in digital pathology: concentrate expert time where the algorithm can’t be confident.

On the high‑content screening side, the message was similar: benchmark what matters biologically, not only statistically. Jean Radig’s scArchon puts single‑cell perturbation predictors through side‑by‑side tests that include gene‑level readouts of the actual perturbation signature. Some deep models that shine on aggregate metrics miss those core signatures, while simpler baselines hold up evidence that rigor in evaluation is at least as valuable as novelty in architecture for drug‑response forecasting [5].

The organoid screening efforts pushed in a compatible direction: automate the mechanics (thousands of samples across tissues), quantify effects with platform‑agnostic distances, and then use model‑flagged outliers to call in domain expertise. It’s less “AI replaces the pathologist” and more “AI narrows the search space so the pathologist, and biologist, can decide.” For now, the most reproducible wins come from careful pipelines and explicit uncertainty, not from end‑to‑end black boxes.

Taken together, these threads offer a practical answer to the label bottleneck in high‑throughput biology:

  • Make the screen clean. Local QC protects biology from global heuristics and removes barcode‑ or chemistry‑driven artifacts before they poison training [1].
  • Make space consistent. Atlas‑aware alignment and multimodal domain detection let models learn over shared anatomical units rather than slide‑specific quirks [3, 2].
  • Make uncertainty visible. Consensus maps and gene‑level evaluations show where models fail and where experts should focus, speeding iteration without overclaiming generalization [4, 5].

If industry follows this arc, the near‑term win won’t be a single, universal annotator. It will be a reliable reduction: fewer labels, placed where they matter; fewer artifacts, caught before they spread; and fewer silent failures, replaced by flagged ambiguity. That’s how big screens get faster, and how AI earns trust in the loop rather than trying to step out of it.

Milad Adibi

References

  • [1] Totty, M., S. C. Hicks, and B. Guo. 2025. “SpotSweeper: Spatially Aware Quality Control for Spatial Transcriptomics.” Nature Methods 22: 1520–1530.
  • [2] Yao, J., et al. 2024. “Spatial Domain Detection Using Contrastive Self‑Supervised Learning for Spatial Multi‑Omics Technologies.” bioRxiv (February 2, 2024). https://doi.org/10.1101/2024.02.02.578662.
  • [3] Clifton, K., et al. 2023. “STalign: Alignment of Spatial Transcriptomics Data Using Diffeomorphic Metric Mapping.” Nature Communications 14: 8123.
  • [4] Sun, J., et al. 2025. “Beyond Benchmarking: An Expert‑Guided Consensus Approach to Spatially Aware Clustering.” bioRxiv (June 23, 2025). https://doi.org/10.1101/2025.06.23.660861.
  • [5] Radig, J., et al. 2025. “Tracking Biological Hallucinations in Single‑Cell Perturbation Predictions Using scArchon, a Comprehensive Benchmarking Platform.” bioRxiv (June 23, 2025). https://doi.org/10.1101/2025.06.23.661046.