๐ArXiv AIโขFreshcollected in 5h
ZeroFolio: Domain-Free Algorithm Selection

๐กBeats hand-crafted features for algorithm selection across 7 domains with zero expertise
โก 30-Second TL;DR
What Changed
Uses pretrained text embeddings on raw instance files without domain knowledge
Why It Matters
Simplifies algorithm selection in AutoML by eliminating feature engineering needs. Enables cross-domain portability, potentially accelerating solver portfolios in optimization tasks.
What To Do Next
Test ZeroFolio on ASlib datasets using Sentence Transformers for embeddings.
Who should care:Researchers & Academics
๐ง Deep Insight
AI-generated analysis for this event.
๐ Enhanced Key Takeaways
- โขZeroFolio addresses the 'feature engineering bottleneck' in Algorithm Selection (AS) by bypassing the need for domain-specific feature extractors, which are notoriously difficult and expensive to design for new problem classes.
- โขThe approach leverages the inherent structural information present in raw problem files (e.g., DIMACS format for SAT), treating them as unstructured text to capture latent features via Large Language Model (LLM) embeddings.
- โขBy utilizing a non-parametric k-NN approach, ZeroFolio avoids the training overhead associated with deep learning-based end-to-end selectors, making it highly adaptable to new domains without retraining the core model.
๐ Competitor Analysisโธ Show
| Feature | ZeroFolio | ASlib-based Random Forests | Deep Learning Selectors (e.g., NeuroSAT) |
|---|---|---|---|
| Feature Engineering | None (Raw text) | Manual/Domain-specific | Learned/End-to-end |
| Training Overhead | Minimal (k-NN) | Moderate | High |
| Generalization | High (Domain-free) | Low (Domain-specific) | Moderate |
| Benchmarks | 11 ASlib scenarios | ASlib standard | Varies by architecture |
๐ ๏ธ Technical Deep Dive
- โขEmbedding Strategy: Utilizes pretrained transformer-based encoders to map raw instance files into high-dimensional vector spaces.
- โขLine Shuffling: A data augmentation technique applied to instance files to ensure the model remains invariant to the order of constraints or clauses, preventing overfitting to file formatting.
- โขDistance Metric: Employs Manhattan distance (L1 norm) for k-NN, which has been empirically shown to be more robust than Euclidean distance in high-dimensional embedding spaces for this task.
- โขWeighting Scheme: Implements inverse-distance weighting to prioritize the performance of the most similar historical instances when predicting the optimal algorithm for a new query.
๐ฎ Future ImplicationsAI analysis grounded in cited sources
ZeroFolio will reduce the barrier to entry for deploying automated algorithm selection in industrial optimization pipelines.
Eliminating the requirement for expert-crafted feature extractors allows non-specialists to apply algorithm selection to proprietary problem formats.
Future iterations will likely integrate multi-modal embeddings to incorporate both structural and semantic information from problem files.
Current text-based embeddings may miss high-level structural properties that could be captured by graph-aware or hybrid neural architectures.
โณ Timeline
2025-09
Initial research phase exploring LLM-based embeddings for combinatorial problem instances.
2026-02
Development of the ZeroFolio framework and validation against standard ASlib benchmarks.
2026-04
Publication of the ZeroFolio research paper on ArXiv.
๐ฐ
Weekly AI Recap
Read this week's curated digest of top AI events โ
๐Related Updates
AI-curated news aggregator. All content rights belong to original publishers.
Original source: ArXiv AI โ