University of Pisa,Indian
Abstract
Continual learning is an essential capability of human cognition, yet it
poses significant challenges for current deep learning models. The primary
issue is that new knowledge can interfere with previously learned information,
causing the model to forget earlier knowledge in favor of the new, a phenomenon
known as catastrophic forgetting. Although large pre-trained models can
partially mitigate forgetting by leveraging their existing knowledge and
over-parameterization, they often struggle when confronted with novel data
distributions. Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA,
enable efficient adaptation to new knowledge. However, they still face
challenges in scaling to dynamic learning scenarios and long sequences of
tasks, as maintaining one adapter per task introduces complexity and increases
the potential for interference. In this paper, we introduce Hierarchical
Adapters Merging (HAM), a novel framework that dynamically combines adapters
from different tasks during training. This approach enables HAM to scale
effectively, allowing it to manage more tasks than competing baselines with
improved efficiency. To achieve this, HAM maintains a fixed set of groups that
hierarchically consolidate new adapters. For each task, HAM trains a low-rank
adapter along with an importance scalar, then dynamically groups tasks based on
adapter similarity. Within each group, adapters are pruned, scaled and merge,
facilitating transfer learning between related tasks. Extensive experiments on
three vision benchmarks show that HAM significantly outperforms
state-of-the-art methods, particularly as the number of tasks increases.
AI Insights - HAM trains a low‑rank LoRA adapter per task with an importance scalar that informs pruning during merging.
- Tasks are clustered into a fixed number of hierarchical groups by adapter similarity, enabling efficient knowledge reuse.
- Within each group, adapters are pruned, scaled, and merged into a single group adapter, cutting parameter growth.
- The hierarchical merge preserves task relationships, allowing transfer learning while reducing interference.
- On ImageNet‑style benchmarks, HAM outperforms prior PEFT baselines by up to 4 % accuracy when tasks exceed 50.
The University of Sydney
Abstract
Graph continual learning (GCL) aims to learn from a continuous sequence of
graph-based tasks. Regularization methods are vital for preventing catastrophic
forgetting in GCL, particularly in the challenging replay-free,
class-incremental setting, where each task consists of a set of unique classes.
In this work, we first establish a general regularization framework for GCL
based on the curved parameter space induced by the Fisher information matrix
(FIM). We show that the dominant Elastic Weight Consolidation (EWC) and its
variants are a special case within this framework, using a diagonal
approximation of the empirical FIM based on parameters from previous tasks. To
overcome their limitations, we propose a new unbiased online curvature
approximation of the full FIM based on the model's current learning state. Our
method directly estimates the regularization term in an online manner without
explicitly evaluating and storing the FIM itself. This enables the model to
better capture the loss landscape during learning new tasks while retaining the
knowledge learned from previous tasks. Extensive experiments on three graph
datasets demonstrate that our method significantly outperforms existing
regularization-based methods, achieving a superior trade-off between stability
(retaining old knowledge) and plasticity (acquiring new knowledge).