As Good as It KAN Get: High-Fidelity Audio Representation

Abstract:

Implicit neural representations (INR) have gained prominence for efficiently encoding multimedia data, yet their applications in audio signals remain limited. This study introduces the Kolmogorov-Arnold Network (KAN), a novel architecture using learnable activation functions, as an effective INR model for audio representation. KAN demonstrates superior perceptual performance over previous INRs, achieving the lowest Log-SpectralDistance of 1.29 and the highest Perceptual Evaluation of Speech Quality of 3.57 for 1.5 s audio. To extend KAN’s utility, we propose FewSound, a hypernetwork-based architecture that enhances INR parameter updates. FewSound outperforms the state-of-the-art HyperSound, with a 33.3% improvement in MSE and 60.87% in SI-SNR. These results show KAN as a robust and adaptable audio representation with the potential for scalability and integration into various hypernetwork frameworks. The source code can be accessed at this https URL.

Autorzy: Maciej Rut, Piotr Kawa, Przemysław Spurek, Piotr Syga

Zobacz więcej publikacji

Predicting mortality and short-term outcomes of continuous kidney replacement therapies in neonates and infants

Anna Deja, Kamil Deja, Andrea Cappoli, Raffaella Labbadia, Rute Baeta Baptista, Zainab Arslan, Jun Oh, Aysun Karabay Bayazit, Dincer Yildizdas, Claus Peter Schmitt, Marcin Tkaczyk, Mirjana Cvetkovic, Mirjana Kostic, Augustina Jankauskiene, Ernestas Virsilas, Germana Longo, Enrico Vidal, Sevgi Mir, Ipek Kaplan Bulut, Andrea Pasini, Fabio Paglialonga, Giovanni Montini, Ebru Yilmaz, Liane Correia-Costa, Ana Teixeira, Franz Schaefer, Isabella Guzzo

Adapt & Align: Continual Learning with Generative Models’ Latent Space Alignment

Kamil Deja, Bartosz Cywiński, Jan Rybarczyk, Tomasz Trzciński