Publication

"Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration." AISTATS, 2026.
“Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions.” ICML, 2026
“ATLAS: Adaptive TDA-guided Landscape-Aware Transistor Sizing”, ICCAD, 2026

ICCAD 2026

ATLAS: Adaptive TDA-guided Landscape-Aware Transistor Sizing

ICML 2026

Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions

"ATLAS: Adaptive TDA-guided Landscape-Aware Transistor Sizing”

Analog transistor sizing, finding design parameters that simultaneously satisfy multiple performance specifications, is a labor-intensive bottleneck in circuit design. To support analog circuit experts, various automation methods have been proposed, including Bayesian optimization (BO), reinforcement learning (RL), and others. Yet existing methods are oblivious to the topological structure of a feasible design space, which can fragment into disconnected regions due to operating-regime transitions, conflicting specification trade-offs, and nonconvex device physics. This topological blindness causes the optimizer to converge within a single feasible region while missing others that may contain superior designs. To address this limitation, we propose ATLAS, a BO framework utilizing Topological Data Analysis (TDA). At each iteration, a Mapper graph is constructed over a surrogate-predicted feasible region to estimate connected regions, enabling topology-aware exploration from the very first iteration without any observed feasible points. A topological sensitivity score classifies candidates as bridge, frontier, or interior points, injecting a targeted exploration bonus into the acquisition function. Experiments on four analog circuit benchmarks in the GF180 and SKY130 processes demonstrate that ATLAS finds feasible designs with significantly fewer simulations than baselines, including RL and BO methods. To our knowledge, this is the first work to apply topological data analysis to analog circuit design automation.

"Robust Linear Dueling Bandits with Post-serving Context under Unknown Delays and Adversarial Corruptions”

We study linear dueling bandits in volatile environments characterized by the simultaneous presence of post-serving contexts, delayed feedback, and adversarial corruption. Feedback is subject to unknown stochastic or adversarial delays and a cumulative corruption budget C. To address these challenges, we propose RCDP-UCB, which integrates a learned approximator that predicts post-serving contexts from pre-serving information. It further employs an adaptive weighting strategy that clips feature vectors to mitigate the impact of corrupted and delayed observations simultaneously. Under standard regularity conditions and a parametric post-serving mapping, we rigorously establish that our algorithm is delay-regime-agnostic, achieving a regret upper bound of Õ(d(√T + 𝒞 + 𝒟)), where d is the total feature dimension and D encapsulates the delay complexity. Crucially, our analysis reveals an additive cost structure between corruption and delay, avoiding the multiplicative degradation typical of prior works. We further establish lower bounds that nearly match our upper bounds up to a √d factor for adversarial delays in the absence of post-serving contexts.

AISTATS 2026

Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration

"Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration”

RLHF is essential for aligning AI with human intent because it transforms the subjective task of evaluation into a series of pairwise "duels" between model outputs, a framework known as the Contextual Dueling Bandit problem. This approach is fundamentally more reliable than absolute scoring, as humans are naturally better at making relative comparisons than assigning consistent numerical ratings. Our research introduces NVLDB, which bridges this alignment process with deep learning by using neural networks to approximate complex, non-linear human preferences while proving that the cumulative error, or regret, remains sublinear under standard assumptions. Unlike previous neural dueling bandit methods that incurred massive computational overhead by using the gradients of all learnable parameters and required an impractical network width of T raised to the power of fourteen, our approach utilizes a "shallow exploration" strategy that focuses only on the final layer's gradients. This not only significantly improves computational efficiency and reduces the required network width to a more realistic T raised to the power of six, but also introduces variance-awareness to better handle the inherent noise in human preference feedback.