ZeroFolio: Domain-Free Algorithm Selection

Post LinkedIn

📄Read original on ArXiv AI

#algorithm-selection #text-embeddings #automlzerofolio

💡Beats hand-crafted features for algorithm selection across 7 domains with zero expertise

⚡ 30-Second TL;DR

What Changed

Uses pretrained text embeddings on raw instance files without domain knowledge

Why It Matters

Simplifies algorithm selection in AutoML by eliminating feature engineering needs. Enables cross-domain portability, potentially accelerating solver portfolios in optimization tasks.

What To Do Next

Test ZeroFolio on ASlib datasets using Sentence Transformers for embeddings.

Who should care:Researchers & Academics

🧠 Deep Insight

AI-generated analysis for this event.

🔑 Enhanced Key Takeaways

•ZeroFolio addresses the 'feature engineering bottleneck' in Algorithm Selection (AS) by bypassing the need for domain-specific feature extractors, which are notoriously difficult and expensive to design for new problem classes.
•The approach leverages the inherent structural information present in raw problem files (e.g., DIMACS format for SAT), treating them as unstructured text to capture latent features via Large Language Model (LLM) embeddings.
•By utilizing a non-parametric k-NN approach, ZeroFolio avoids the training overhead associated with deep learning-based end-to-end selectors, making it highly adaptable to new domains without retraining the core model.

📊 Competitor Analysis▸ Show

Feature	ZeroFolio	ASlib-based Random Forests	Deep Learning Selectors (e.g., NeuroSAT)
Feature Engineering	None (Raw text)	Manual/Domain-specific	Learned/End-to-end
Training Overhead	Minimal (k-NN)	Moderate	High
Generalization	High (Domain-free)	Low (Domain-specific)	Moderate
Benchmarks	11 ASlib scenarios	ASlib standard	Varies by architecture

🛠️ Technical Deep Dive

•Embedding Strategy: Utilizes pretrained transformer-based encoders to map raw instance files into high-dimensional vector spaces.
•Line Shuffling: A data augmentation technique applied to instance files to ensure the model remains invariant to the order of constraints or clauses, preventing overfitting to file formatting.
•Distance Metric: Employs Manhattan distance (L1 norm) for k-NN, which has been empirically shown to be more robust than Euclidean distance in high-dimensional embedding spaces for this task.
•Weighting Scheme: Implements inverse-distance weighting to prioritize the performance of the most similar historical instances when predicting the optimal algorithm for a new query.

🔮 Future ImplicationsAI analysis grounded in cited sources

ZeroFolio will reduce the barrier to entry for deploying automated algorithm selection in industrial optimization pipelines.

Eliminating the requirement for expert-crafted feature extractors allows non-specialists to apply algorithm selection to proprietary problem formats.

Future iterations will likely integrate multi-modal embeddings to incorporate both structural and semantic information from problem files.

Current text-based embeddings may miss high-level structural properties that could be captured by graph-aware or hybrid neural architectures.

⏳ Timeline

2025-09

Initial research phase exploring LLM-based embeddings for combinatorial problem instances.

2026-02

Development of the ZeroFolio framework and validation against standard ASlib benchmarks.

2026-04

Publication of the ZeroFolio research paper on ArXiv.

📄Read original article on ArXiv AI

📰

Weekly AI Recap

Read this week's curated digest of top AI events →

👉Related Updates

Same topic

Explore #algorithm-selection

Same product