UT Austin, Google, Google
Abstract
In-context Ranking (ICR) is an emerging paradigm for Information Retrieval
(IR), which leverages contextual understanding of LLMs by directly
incorporating the task description, candidate documents, and the query into the
model's input prompt and tasking the LLM to identify relevant document(s).
While it is effective, efficiency is a significant challenge in this paradigm,
especially as the candidate list grows due to quadratic/super-linear scaling of
attention operation with context length. To this end, this paper first
identifies inherent and exploitable structures in the attention of LLMs
finetuned for ICR: (1) inter-document block sparsity: attention is dense within
each document block but sparse across different documents in the context; and
(2) query-document block relevance: the attention scores from certain query
tokens to a document block in middle layers strongly correlate with that
document's actual relevance. Motivated by these observations, we introduce
BlockRank (Blockwise In-context Ranking), a novel method that adapts the
attention operation in an LLM by (a) architecturally enforcing the observed
inter-document block sparsity, reducing attention complexity from quadratic to
linear without loss in performance, and (b) optimizing query-document block
relevance for true relevant documents during fine-tuning using an auxiliary
contrastive training objective, improving retrieval in attention. Experiments
on BEIR, MSMarco and NQ with Mistral-7B demonstrate that BlockRank Mistral
matches or outperforms existing SOTA listwise rankers and controlled fine-tuned
baseline while being significantly more efficient at inference (4.7x for 100
MSMarco documents in context) and scaling gracefully to long-context
shortlists, around 500 documents in-context (approximately 100K context length)
within a second, presenting a scalable and effective solution for ICR.
Southern University of Sc
Abstract
Stochastic transitivity is central for rank aggregation based on pairwise
comparison data. The existing models, including the Thurstone, Bradley-Terry
(BT), and nonparametric BT models, adopt a strong notion of stochastic
transitivity, known as strong stochastic transitivity (SST). This assumption
imposes restrictive monotonicity constraints on the pairwise comparison
probabilities, which is often unrealistic for real-world applications. This
paper introduces a maximum score estimator for aggregating ranks, which only
requires the assumption of weak stochastic transitivity (WST), the weakest
assumption needed for the existence of a global ranking. The proposed estimator
allows for sparse settings where the comparisons between many pairs are missing
with possibly nonuniform missingness probabilities. We show that the proposed
estimator is consistent, in the sense that the proportion of discordant pairs
converges to zero in probability as the number of players diverges. We also
establish that the proposed estimator is nearly minimax optimal for the
convergence of a loss function based on Kendall's tau distance. The power of
the proposed method is shown via a simulation study and an application to rank
professional tennis players.
AI Insights - Extends BradleyāTerryāLuce to sparse graphs, delivering asymptotic guarantees rarely seen in BT work.
- Proposes a maximumāscore estimator that tolerates nonāuniform missingness, achieving consistency in discordantāpair proportion.
- Establishes a nearly minimaxāoptimal bound for Kendallāsātau loss, tightening risk limits under strong stochastic transitivity.
- Simulation results show the estimator outperforms classic BT and nonparametric BT on synthetic sparse tournaments.
- Applies the method to rank professional tennis players, recovering plausible hierarchies from incomplete match data.
- Insights extend to socialāscience surveys and machineālearning preference learning, offering a highādimensional toolkit for noisy, incomplete comparisons.