Explores semantic labeling for TPRA questionnaires using LLMs and hybrid SSSL. Compares direct labeling vs. clustering and propagation. Improves retrieval when labels are discriminative.
Key Points
- 1.Clusters in embedding space
- 2.Labels subset then propagates via k-NN
- 3.Reduces LLM cost substantially
Impact Analysis
Automates tailoring of cybersecurity assessments. Boosts retrieval beyond keywords. Scales to large repositories efficiently.
Technical Details
Hybrid semi-supervised pipeline. Retrieval in label space vs. direct similarity. Consistent labels key to gains.