Abstract
Word Sense Disambiguation (WSD) is a key task in Natural Language Processing (NLP), involving selecting the correct
meaning of a word based on its context. With Pretrained Language Models (PLMs) like BERT and DeBERTa now well established,
significant progress has been made in understanding contextual semantics. Nevertheless, how well these models inherently
disambiguate word senses remains uncertain. In this work, we evaluate several encoder-only PLMs across two popular inventories
(i.e. WordNet and the Oxford Dictionary of English) by analyzing their ability to separate word senses without any task-specific
fine-tuning. We compute centroids of word senses and measure similarity to assess performance across different layers. Our
results show that DeBERTa-v3 delivers the best performance on the task, with the middle layers (specifically the 7th and 8th layers)
achieving the highest accuracy, outperforming the output layer by approximately 15 percentage points. Our experiments also
explore the inherent structure of WordNet and ODE sense inventories, highlighting their influence on the overall model behavior
and performance. Finally, based on our findings, we develop a small, efficient model for the WSD task that attains robust
performance while significantly reducing the carbon footprint.