RippleSense: Scalable and Efficient Wideband Spectrum Sensing
Andreas Kuster
,
Yanbo Zhang
,
Mo Li
ACM/IEEE International Conference on Embedded Artificial Intelligence and Sensing Systems (SenSys'26),
May 2026
Dynamic Spectrum Sharing (DSS) is essential for optimizing spectrum utilization in modern wireless systems, but it requires high analog-to-digital converter (ADC) sampling rates, leading to increased costs and power consumption. Existing sub-Nyquist sampling techniques partially address these issues but struggle with dense spectrum adaptability, real-time performance, and efficiency.
This paper presents RippleSense, a scalable and efficient wideband spectrum sensing approach capable of capturing GHz of densely occupied spectrum at sub-Nyquist ADC sampling rates. A novel sub-Nyquist sampling method is introduced by injecting distinct signatures into observed signals over different Nyquist zones before sampling, allowing programmatically reconstructing the full spectrum even after the Nyquist zones are folded to baseband due to inadequate ADC sampling rates. To showcase the scalability of this approach, a high-performance, multi-GHz spectrum sensing platform is implemented together with a highly parallelizable reconstruction algorithm that can process the data stream in real-time.
Experimental evaluation has shown that the proposed approach supports operating configurations with signal-to-noise ratios as low as –10 dB and time resolutions down to 10 ns, enabling the capture of single radar pulses across bandwidths of up to 10 GHz using our prototype.
MC-LoRa: Multi-node Concurrent Localization for LoRaWAN Indoors and Outdoors
Han Hao
,
Wei Xi
,
Andreas Kuster
,
Amalinda Gamage
,
Xianjin Xia
UbiComp/Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT'25),
March 2025
Multi-node localization is crucial for large-scale and densely deployed Internet of Things (IoT) devices connected via LoRaWAN. Due to limitations in bandwidth and the number of RX chains (antennas), existing LoRaWAN-based localization methods often rely on frequency hopping or additional infrastructure to improve location accuracy. Although promising, these methods struggle to localize multiple nodes during packet collisions. In this paper, we propose MC-LoRa, which features a multi-node localization pipeline that includes reliable preamble detection under the near-far effect, tackling inter-symbol interference among multiple packets, and a virtual antenna array method to obtain extra channel state measurements within a single channel. This approach not only enhances angle resolution in our AoA-based system but also eliminates the need for time-consuming frequency hopping, requiring only software processing in existing gateways. Our extensive evaluation results show that MC-LoRa achieves median errors of 7.1m (single-node), 9.2m (multi-node) in an outdoor area of 140m × 100m, and 2.0m (single-node), 3.9m (multi-node) in an indoor area of 20m × 16m, which represent improvements of 1.1×, 2× and 1.5×, 1.7× compared to the baseline. Additionally, MC-LoRa can provide localization service for hundreds of LoRaWAN nodes with accuracy comparable to that of a state-of-the-art single-node system. Its wide localization range and high accuracy enable MC-LoRa to benefit a variety of applications, including asset tracking, navigation in vast indoor spaces (e.g., airports, warehouses and halls), and smart cities.
@article{10.1145/3712279,
author = {Hao, Han and Xi, Wei and Kuster, Andreas and Gamage, Amalinda and Xia, Xianjin},
title = {MC-LoRa: Multi-node Concurrent Localization for LoRaWAN Indoors and Outdoors},
year = {2025},
issue_date = {March 2025},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {9},
number = {1},
url = {https://doi.org/10.1145/3712279},
doi = {10.1145/3712279},
abstract = {Multi-node localization is crucial for large-scale and densely deployed Internet of Things (IoT) devices connected via LoRaWAN. Due to limitations in bandwidth and the number of RX chains (antennas), existing LoRaWAN-based localization methods often rely on frequency hopping or additional infrastructure to improve location accuracy. Although promising, these methods struggle to localize multiple nodes during packet collisions. In this paper, we propose MC-LoRa, which features a multi-node localization pipeline that includes reliable preamble detection under the near-far effect, tackling inter-symbol interference among multiple packets, and a virtual antenna array method to obtain extra channel state measurements within a single channel. This approach not only enhances angle resolution in our AoA-based system but also eliminates the need for time-consuming frequency hopping, requiring only software processing in existing gateways. Our extensive evaluation results show that MC-LoRa achieves median errors of 7.1m (single-node), 9.2m (multi-node) in an outdoor area of 140m \texttimes{} 100m, and 2.0m (single-node), 3.9m (multi-node) in an indoor area of 20m \texttimes{} 16m, which represent improvements of 1.1\texttimes{}, 2\texttimes{} and 1.5\texttimes{}, 1.7\texttimes{} compared to the baseline. Additionally, MC-LoRa can provide localization service for hundreds of LoRaWAN nodes with accuracy comparable to that of a state-of-the-art single-node system. Its wide localization range and high accuracy enable MC-LoRa to benefit a variety of applications, including asset tracking, navigation in vast indoor spaces (e.g., airports, warehouses and halls), and smart cities.},
journal = {Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.},
month = mar,
articleno = {4},
numpages = {32},
keywords = {AoA, LoRaWAN, concurrent localization}
}
Beyond the Noise: Innovating Information Verification in the Digital Age
Andreas Kuster
St. Gallen Symposium, Switzerland,
One of the three winners of the 53rd St.Gallen Symposium's Global Essay Competition,
May 2024
Physical memory protection is a hardware mechanism designed to prevent unauthorized access to specific memory regions, enabling the deployment of Trusted Execution Environments (TEEs). The RISC-V instruction set architecture specifies PMP for RISC-V cores but leaves other system bus masters as found in heterogeneous computing systems out of scope. This work presents Protego, an open-source I/O physical memory protection (IOPMP) unit based on the RISC-V PMP specification that extends PMP to other system bus masters. We demonstrate that Protego is effective in protecting sensitive data in memory and preventing unauthorized access at small hardware costs of below 40 kGE for a 64-bit system and negligible performance impact, making it a valuable tool for creating TEEs in heterogeneous computing systems.
Python FPGA Programming with Data-Centric Multi-Level Design
Johannes de Fine Licht
,
Tiziano De Matteis
,
Tal Ben-Nun
,
Andreas Kuster
,
Oliver Rausch
,
Manuel Burger
,
Carl-Johannes Johnsen
,
Torsten Hoefler
Although high-level synthesis (HLS) tools have significantly improved programmer productivity over hardware description languages, developing for FPGAs remains tedious and error prone. Programmers must learn and implement a large set of vendor-specific syntax, patterns, and tricks to optimize (or even successfully compile) their applications, while dealing with ever-changing toolflows from the FPGA vendors. We propose a new way to develop, optimize, and compile FPGA programs. The Data-Centric parallel programming (DaCe) framework allows applications to be defined by their dataflow and control flow through the Stateful DataFlow multiGraph (SDFG) representation, capturing the abstract program characteristics, and exposing a plethora of optimization opportunities. In this work, we show how extending SDFGs with multi-level Library Nodes incorporates both domain-specific and platform-specific optimizations into the design flow, enabling knowledge transfer across application domains and FPGA vendors. We present the HLS-based FPGA code generation backend of DaCe, and show how SDFGs are code generated for either FPGA vendor, emitting efficient HLS code that is structured and annotated to implement the desired architecture.
StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems
Johannes de Fine Licht
,
Andreas Kuster
,
Tiziano De Matteis
,
Tal Ben-Nun
,
Dominic Hofer
,
Torsten Hoefler
In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO'21),
May 2021
Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the general case of mapping directed acyclic graphs of heterogeneous stencil computations to spatial computing systems, assuming large input programs without an iterative component. StencilFlow maximizes temporal locality and ensures deadlock freedom in this setting, providing end-to-end analysis and mapping from a high-level program description to distributed hardware. We evaluate our generated architectures on a Stratix 10 FPGA testbed, yielding 1.31 TOp/s and 4.18 TOp/s on single-device and multi-device, respectively, demonstrating the highest performance recorded for stencil programs on FPGAs to date. We then leverage the framework to study a complex stencil program from a production weather simulation application. Our work enables productively targeting distributed spatial computing systems with large stencil programs, and offers insight into architecture characteristics required for their efficient execution in practice.
@inproceedings{stencilflow,
author = {Johannes de Fine Licht, Andreas Kuster, Tiziano De Matteis, Tal Ben-Nun, Dominic Hofer, Torsten Hoefler},
title = {StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems},
year = {2021},
booktitle = {Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO'21)},
series = {CGO '21},
}
Poster: StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems
Andreas Kuster
,
Johannes de Fine Licht
,
Tiziano De Matteis
,
Tal Ben-Nun
,
Dominic Hofer
,
Torsten Hoefler
In Proceedings of the Platform for Advanced Scientific Computing (PASC’21) ,
May 2021
Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the general case of mapping directed acyclic graphs of heterogeneous stencil computations to spatial computing systems, assuming large input programs without an iterative component. StencilFlow maximizes temporal locality and ensures deadlock freedom in this setting, providing end-to-end analysis and mapping from a high-level program description to distributed hardware. We evaluate our generated architectures on a Stratix 10 FPGA testbed, yielding 1.31 TOp/s and 4.18 TOp/s on single-device and multi-device, respectively, demonstrating the highest performance recorded for stencil programs on FPGAs to date. We then leverage the framework to study a complex stencil program from a production weather simulation application. Our work enables productively targeting distributed spatial computing systems with large stencil programs, and offers insight into architecture characteristics required for their efficient execution in practice.
reproducing "ner and pos when nothing is capitalized"
Andreas Kuster
,
Jakub Filipek
,
and Viswa Virinchi Muppirala
Capitalization is an important feature in many NLP tasks such as Named Entity Recognition (NER) or Part of Speech Tagging (POS). We are trying to reproduce results of paper which shows how to mitigate a significant performance drop when casing is mismatched between training and testing data. In particular we show that lowercasing 50% of the dataset provides the best performance, matching the claims of the original paper. We also show that we got slightly lower performance in almost all experiments we have tried to reproduce, suggesting that there might be some hidden factors impacting our performance. Lastly, we make all of our work available in a public github repository.
DaCe - Data Centric Parallel Programming
Tal Ben-Nun
,
Tiziano De Matteis
,
Oliver Rausch
,
and Carl Johnsen
,
Saurabh Raje
,
Andreas Kuster
,
Philipp Schaad
,
Manuel Burger
,
Neville Walo
,
Luca Lavarini
,
Stefan Scholbe
,
Dominic Hofer
,
Lukas Trümper
,
Andrei Ivanov
,
Gabriel Gavrilas
,
Thomas Baumann
,
Berke Ates
,
Benjamin Simmonds
,
Noah Huetter
,
Jan Kleine
,
Marc Widmer
,
Timo Schneider
,
Tom Hu
,
Florian Deconinck
,
Felix Thaler
,
Johann Dahm
,
Mamy Ratsimbazafy
,
Simon Jacob
,
Backes Thierry
,
Till Ehrengruber
,
Valentin Anklin
DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages, and maps it to high-performance CPU, GPU, and FPGA programs, which can be optimized to achieve state-of-the-art. Internally, DaCe uses the Stateful DataFlow multiGraph (SDFG) data-centric intermediate representation: A transformable, interactive representation of code based on data movement. Since the input code and the SDFG are separate, it is possible to optimize a program without changing its source, so that it stays readable. On the other hand, transformations are customizable and user-extensible, so they can be written once and reused in many applications. With data-centric parallel programming, we enable direct knowledge transfer of performance optimization, regardless of the application or the target processor.