publications | A. Kuster

2024

SGS

Beyond the Noise: Innovating Information Verification in the Digital Age

Andreas Kuster

May 2024

Abs HTML PDF

In this exploration of the digital age’s challenges, particularly the verification of information amidst widespread misinformation, we confront the complexities of navigating an information ecosystem saturated with both accurate and false content. This essay argues for a comprehensive strategy that integrates technological innovation, educational reform, and policy intervention to address the scarcity of effective information verification mechanisms. However, it also posits a critical reflection on whether the conventional approach of striving for more, more technology, more education, more regulation, is sufficient or appropriate in addressing the root causes of misinformation’s pervasiveness and appeal. The digital landscape, characterized by platforms like TikTok and YouTube Shorts, caters to and amplifies our preference for brief, easily digestible content, often at the expense of depth and accuracy. This tendency not only facilitates the spread of misinformation but also fosters a culture of shallow engagement and polarized discussions. Recognizing this, the essay suggests that alongside enhancing verification mechanisms through AI, blockchain technology, and automated fact-checking tools, there’s a significant need for a societal shift towards mindful information consumption. It advocates for balancing the pursuit of technological and regulatory solutions with cultivating a culture that values critical thinking, patience, and a deeper engagement with complex issues. This dual approach—aiming for a more informed society while also thriving with less sensational and superficial content—proposes a holistic strategy to mitigate misinformation’s impact. By addressing both the supply of verified information and the demand for quick, sensational content, we can foster a digital ecosystem where truth and depth are valued over speed and sensationalism.

2023

SSH-SOC

Protego: A Low-Overhead Open-Source I/O Physical Memory Protection Unit for RISC-V

In Proceedings of the 1st Safety and Security in Heterogeneous Open System-on-Chip Platforms Workshop (SSH-SOC ’23) May 2023

Abs HTML PDF

Physical memory protection is a hardware mechanism designed to prevent unauthorized access to specific memory regions, enabling the deployment of Trusted Execution Environments (TEEs). The RISC-V instruction set architecture specifies PMP for RISC-V cores but leaves other system bus masters as found in heterogeneous computing systems out of scope. This work presents Protego, an open-source I/O physical memory protection (IOPMP) unit based on the RISC-V PMP specification that extends PMP to other system bus masters. We demonstrate that Protego is effective in protecting sensitive data in memory and preventing unauthorized access at small hardware costs of below 40 kGE for a 64-bit system and negligible performance impact, making it a valuable tool for creating TEEs in heterogeneous computing systems.

2022

arXiv

Python FPGA Programming with Data-Centric Multi-Level Design

May 2022

Abs arXiv HTML PDF

Although high-level synthesis (HLS) tools have significantly improved programmer productivity over hardware description languages, developing for FPGAs remains tedious and error prone. Programmers must learn and implement a large set of vendor-specific syntax, patterns, and tricks to optimize (or even successfully compile) their applications, while dealing with ever-changing toolflows from the FPGA vendors. We propose a new way to develop, optimize, and compile FPGA programs. The Data-Centric parallel programming (DaCe) framework allows applications to be defined by their dataflow and control flow through the Stateful DataFlow multiGraph (SDFG) representation, capturing the abstract program characteristics, and exposing a plethora of optimization opportunities. In this work, we show how extending SDFGs with multi-level Library Nodes incorporates both domain-specific and platform-specific optimizations into the design flow, enabling knowledge transfer across application domains and FPGA vendors. We present the HLS-based FPGA code generation backend of DaCe, and show how SDFGs are code generated for either FPGA vendor, emitting efficient HLS code that is structured and annotated to implement the desired architecture.

2021

CGO
StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems

In Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO’21) May 2021

Abs arXiv Bib HTML PDF Code

Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the general case of mapping directed acyclic graphs of heterogeneous stencil computations to spatial computing systems, assuming large input programs without an iterative component. StencilFlow maximizes temporal locality and ensures deadlock freedom in this setting, providing end-to-end analysis and mapping from a high-level program description to distributed hardware. We evaluate our generated architectures on a Stratix 10 FPGA testbed, yielding 1.31 TOp/s and 4.18 TOp/s on single-device and multi-device, respectively, demonstrating the highest performance recorded for stencil programs on FPGAs to date. We then leverage the framework to study a complex stencil program from a production weather simulation application. Our work enables productively targeting distributed spatial computing systems with large stencil programs, and offers insight into architecture characteristics required for their efficient execution in practice.
@inproceedings{stencilflow, author = {}, title = {StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems}, year = {2021}, booktitle = {Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO'21)}, series = {CGO '21}, }
PASC

Poster: StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems

In Proceedings of the Platform for Advanced Scientific Computing (PASC’21) May 2021

Abs PDF

Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the general case of mapping directed acyclic graphs of heterogeneous stencil computations to spatial computing systems, assuming large input programs without an iterative component. StencilFlow maximizes temporal locality and ensures deadlock freedom in this setting, providing end-to-end analysis and mapping from a high-level program description to distributed hardware. We evaluate our generated architectures on a Stratix 10 FPGA testbed, yielding 1.31 TOp/s and 4.18 TOp/s on single-device and multi-device, respectively, demonstrating the highest performance recorded for stencil programs on FPGAs to date. We then leverage the framework to study a complex stencil program from a production weather simulation application. Our work enables productively targeting distributed spatial computing systems with large stencil programs, and offers insight into architecture characteristics required for their efficient execution in practice.
arXiv

reproducing "ner and pos when nothing is capitalized"

Andreas Kuster, Jakub Filipek, and Viswa Virinchi Muppirala

May 2021

Abs arXiv HTML PDF

Capitalization is an important feature in many NLP tasks such as Named Entity Recognition (NER) or Part of Speech Tagging (POS). We are trying to reproduce results of paper which shows how to mitigate a significant performance drop when casing is mismatched between training and testing data. In particular we show that lowercasing 50% of the dataset provides the best performance, matching the claims of the original paper. We also show that we got slightly lower performance in almost all experiments we have tried to reproduce, suggesting that there might be some hidden factors impacting our performance. Lastly, we make all of our work available in a public github repository.

2019

DaCe - Data Centric Parallel Programming

Tal Ben-Nun, Tiziano De Matteis, Oliver Rausch, and 29 more authors

May 2019

Abs

DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages, and maps it to high-performance CPU, GPU, and FPGA programs, which can be optimized to achieve state-of-the-art. Internally, DaCe uses the Stateful DataFlow multiGraph (SDFG) data-centric intermediate representation: A transformable, interactive representation of code based on data movement. Since the input code and the SDFG are separate, it is possible to optimize a program without changing its source, so that it stays readable. On the other hand, transformations are customizable and user-extensible, so they can be written once and reused in many applications. With data-centric parallel programming, we enable direct knowledge transfer of performance optimization, regardless of the application or the target processor.