Publications

I have started publishing during my master's studies in 2009. Most of my publications are on GPU algorithms, GPU work scheduling and related topics with a focus on rendering. However, I am also publishing in the fields of visualization, information visualization, human computer interaction, and augmented reality. For a current list of citations see my Google Scholar page. In the following you will find my self-maintained list of publications with some additional information and material about them. If you have any questions about the individual publications do not hesitate to contact me.

Selection:

2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
C38
Mathias Parger, Chengcheng Tang, Thomas Neff, Christopher D. Twigg, Cem Keskin, Robert Wang, Markus Steinberger:
MotionDeltaCNN: Sparse CNN Inference of Frame Differences in Moving Camera Videos with Spherical Buffers and Padded Convolutions
Abstract: Convolutional neural network inference on video input is computationally expensive and requires high memory bandwidth. Recently, DeltaCNN managed to reduce the cost by only processing pixels with significant updates over the previous frame. However, DeltaCNN relies on static camera input. Moving cameras add new challenges in how to fuse newly unveiled image regions with already processed regions efficiently to minimize the update rate - without increasing memory overhead and without knowing the camera extrinsics of future frames. In this work, we propose MotionDeltaCNN, a sparse CNN inference framework that supports moving cameras. We introduce spherical buffers and padded convolutions to enable seamless fusion of newly unveiled regions and previously processed regions – without increasing memory footprint. Our evaluation shows that we outperform DeltaCNN by up to 90% for moving camera videos.
Proceedings of the IEEE/CVF International on Computer Vision , 2023

@inproceedings{parger2023motiondeltacnn,
title={MotionDeltaCNN: Sparse CNN Inference of Frame Differences in Moving Camera Videos with Spherical Buffers and Padded Convolutions},
author={Parger, Mathias and Tang, Chengcheng and Neff, Thomas and Twigg, Christopher D and Keskin, Cem and Wang, Robert and Steinberger, Markus},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={17292--17301},
year={2023}
}
×
J41
Alexander Weinrauch, Wolfgang Tatzgern, Pascal Stadlbauer, Alexis Crickx, Jozef Hladky, Arno Coomans, Martin Winter, Joerg H. Mueller, Markus Steinberger:
Effect-based Multi-viewer Caching for Cloud-native Rendering
Abstract: With cloud computing becoming ubiquitous, it appears as virtually everything can be offered as-a-service. However, real-time rendering in the cloud forms a notable exception, where the cloud adoption stops at running individual game instances in compute centers. In this paper, we explore whether a cloud-native rendering architecture is viable and scales to multi-client rendering scenarios. To this end, we propose world-space and on-surface caches to share rendering computations among viewers placed in the same virtual world. We discuss how caches can be utilized on an effect-basis and demonstrate that a large amount of computations can be saved as the number of viewers in a scene increases. Caches can easily be set up for various effects, including ambient occlusion, direct illumination, and diffuse global illumination. Our results underline that the image quality using cached rendering is on par with screen-space rendering and due to its simplicity and inherent coherence, cached rendering may even have advantages in single viewer setups. Analyzing the runtime and communication costs, we show that cached rendering is already viable in multi-GPU systems. Building on top of our research, cloud-native rendering may be just around the corner.
ACM Transactions on Graphics (SIGGRAPH '23), 2023

@article{10.1145/3592431,
author = {Weinrauch, Alexander and Tatzgern, Wolfgang and Stadlbauer, Pascal and Crickx, Alexis and Hladky, Jozef and Coomans, Arno and Winter, Martin and Mueller, Joerg H. and Steinberger, Markus},
title = {Effect-Based Multi-Viewer Caching for Cloud-Native Rendering},
year = {2023},
issue_date = {August 2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {42},
number = {4},
issn = {0730-0301},
url = {https://doi.org/10.1145/3592431},
doi = {10.1145/3592431},
abstract = {With cloud computing becoming ubiquitous, it appears as virtually everything can be offered as-a-service. However, real-time rendering in the cloud forms a notable exception, where the cloud adoption stops at running individual game instances in compute centers. In this paper, we explore whether a cloud-native rendering architecture is viable and scales to multi-client rendering scenarios. To this end, we propose world-space and on-surface caches to share rendering computations among viewers placed in the same virtual world. We discuss how caches can be utilized on an effect-basis and demonstrate that a large amount of computations can be saved as the number of viewers in a scene increases. Caches can easily be set up for various effects, including ambient occlusion, direct illumination, and diffuse global illumination. Our results underline that the image quality using cached rendering is on par with screen-space rendering and due to its simplicity and inherent coherence, cached rendering may even have advantages in single viewer setups. Analyzing the runtime and communication costs, we show that cached rendering is already viable in multi-GPU systems. Building on top of our research, cloud-native rendering may be just around the corner.},
journal = {ACM Trans. Graph.},
month = {jul},
articleno = {87},
numpages = {16},
keywords = {cloud computing, scalability, real-time rendering, ray tracing, distributed rendering}
}
×
J40
Philip Voglreiter, Bernhard Kerbl, Alexander Weinrauch, Joerg H. Mueller, Thomas Neff, Markus Steinberger, Dieter Schmalstieg:
Trim Regions for Online Computation of From-Region Potentially Visible Sets
Abstract: Visibility computation is a key element in computer graphics applications. More specifically, a from-region potentially visible set (PVS) is an established tool in rendering acceleration, but its high computational cost means a from-region PVS is almost always precomputed. Precomputation restricts the use of PVS to static scenes and leads to high storage cost, in particular, if we need fine-grained regions. For dynamic applications, such as streaming content over a variable-bandwidth network, online PVS computation with configurable region size is required. We address this need with trim regions, a new method for generating from-region PVS for arbitrary scenes in real time. Trim regions perform controlled erosion of object silhouettes in image space, implicitly applying the shrinking theorem known from previous work. Our algorithm is the first that applies automatic shrinking to unconstrained 3D scenes, including non-manifold meshes, and does so in real time using an efficient GPU execution model. We demonstrate that our algorithm generates a tight PVS for complex scenes and outperforms previous online methods for from-viewpoint and from-region PVS. It runs at 60 Hz for realistic game scenes consisting of millions of triangles and computes PVS with a tightness matching or surpassing existing approaches.
ACM Transactions on Graphics (SIGGRAPH '23), 2023

@article{10.1145/3592434,
author = {Voglreiter, Philip and Kerbl, Bernhard and Weinrauch, Alexander and Mueller, Joerg Hermann and Neff, Thomas and Steinberger, Markus and Schmalstieg, Dieter},
title = {Trim Regions for Online Computation of From-Region Potentially Visible Sets},
year = {2023},
issue_date = {August 2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {42},
number = {4},
issn = {0730-0301},
url = {https://doi.org/10.1145/3592434},
doi = {10.1145/3592434},
abstract = {Visibility computation is a key element in computer graphics applications. More specifically, a from-region potentially visible set (PVS) is an established tool in rendering acceleration, but its high computational cost means a from-region PVS is almost always precomputed. Precomputation restricts the use of PVS to static scenes and leads to high storage cost, in particular, if we need fine-grained regions. For dynamic applications, such as streaming content over a variable-bandwidth network, online PVS computation with configurable region size is required. We address this need with trim regions, a new method for generating from-region PVS for arbitrary scenes in real time. Trim regions perform controlled erosion of object silhouettes in image space, implicitly applying the shrinking theorem known from previous work. Our algorithm is the first that applies automatic shrinking to unconstrained 3D scenes, including non-manifold meshes, and does so in real time using an efficient GPU execution model. We demonstrate that our algorithm generates a tight PVS for complex scenes and outperforms previous online methods for from-viewpoint and from-region PVS. It runs at 60 Hz for realistic game scenes consisting of millions of triangles and computes PVS with a tightness matching or surpassing existing approaches.},
journal = {ACM Trans. Graph.},
month = {jul},
articleno = {85},
numpages = {15},
keywords = {real-time, potentially visible sets}
}
×
C37
Pascal Stadlbauer, Alexander Weinrauch, Wolfgang Tatzgern, Markus Steinberger:
Surface Light Cones: Sharing Direct Illumination for Efficient Multi-viewer Rendering
Abstract: Even though stochastic methods and hardware supported ray tracing are increasingly used for computing direct illumination, the efficient real-time rendering of dynamic area light sources still forms a challenge. In this paper, we propose a method for representing and caching direct illumination information using a compact multi-cone representation that is stored on the surface of objects. While shading due to direct illumination is typically heavily view-dependent, the incoming radiance for surface points is view-independent. Relying on cones, to represent the projection of the dominant visible light sources, allows to reuse the incoming radiance information across frames and even among multiple cameras or viewers within the same scene. Progressively refining and updating the cone structures not only allows to adapt to dynamic scenes, but also leads to reduced noise levels in the output images compared to sampling based methods. Relying on surface light cones allows to render single viewer setups 2-3x faster than random sampling, and 1.5-2x faster than reservoir-based sampling with the same quality. The main selling point for surface light cones is multi-camera rendering, For stereo rendering, our approach essentially halves the time required for determining direct light visibility. For rendering in the cloud, where multiple viewers are positioned close to another, such as in virtual meetings, gathering locations in games, or online events such as virtual concerts, our approach can reduce overall rendering times by a factor of 20x for as few as 16 viewers in a scene compared to traditional light sampling. Finally, under heavily constraint ray budgets where noise levels typically overshadow bias, surface light cones can dramatically reduce noise.
High-Performance Graphics - Symposium Papers (HPG'23), 2023

@inproceedings {10.2312:hpg.20231137,
booktitle = {High-Performance Graphics - Symposium Papers},
editor = {Bikker, Jacco and Gribble, Christiaan},
title = {{Surface Light Cones: Sharing Direct Illumination for Efficient Multi-viewer Rendering}},
author = {Stadlbauer, Pascal and Weinrauch, Alexander and Tatzgern, Wolfgang and Steinberger, Markus},
year = {2023},
publisher = {The Eurographics Association},
ISSN = {2079-8687},
ISBN = {978-3-03868-229-5},
DOI = {10.2312/hpg.20231137}
}
×
C36
Robert Stojanovic, Alexander Weinrauch, Wolfgang Tatzgern, Andreas Kurz, Markus Steinberger:
Efficient Rendering of Participating Media for Multiple Viewpoints
Abstract: Achieving realism in modern games requires the integration of participating media effects, such as fog, dust, and smoke. However, due to the complex nature of scattering and partial occlusions within these media, real-time rendering of high-quality participating media remains a computational challenge. To address this challenge, traditional approaches of real-time participating media rendering involve storing temporary results in a view-aligned grid before ray marching through these cached values. In this paper, we investigate alternative hybrid worldand view-aligned caching methods that allow for the sharing of intermediate computations across cameras in a scene. This approach is particularly relevant for multi-camera setups, such as stereo rendering for VR and AR, local split-screen games, or cloud-based rendering for game streaming, where a large number of players may be in the same location. Our approach relies on a view-aligned grid for near-field computations, which enables us to capture high-frequency shadows in front of a viewer. Additionally, we use a world-space caching structure to selectively activate distant computations based on each viewer's visibility, allowing for the sharing of computations and maintaining high visual quality. The results of our evaluation demonstrate computational savings of up to 50% or more, without compromising visual quality.
High-Performance Graphics - Symposium Papers (HPG'23), 2023

@inproceedings {10.2312:hpg.20231136,
booktitle = {High-Performance Graphics - Symposium Papers},
editor = {Bikker, Jacco and Gribble, Christiaan},
title = {{Efficient Rendering of Participating Media for Multiple Viewpoints}},
author = {Stojanovic, Robert and Weinrauch, Alexander and Tatzgern, Wolfgang and Kurz, Andreas and Steinberger, Markus},
year = {2023},
publisher = {The Eurographics Association},
ISSN = {2079-8687},
ISBN = {978-3-03868-229-5},
DOI = {10.2312/hpg.20231136}
}
×
C35
Thomas Neff, Brian Budge, Zhao Dong, Dieter Schmalstieg, Markus Steinberger:
PSAO: Point-Based Split Rendering for Ambient Occlusion
Abstract: Recent advances in graphics hardware have enabled ray tracing to produce high-quality ambient occlusion (AO) in real-time, which is not plagued by the artifacts typically found in real-time screen-space approaches. However, the high computational cost of ray tracing remains a significant hurdle for low-power devices like standalone VR headsets or smartphones. To address this challenge, inspired by point-based global illumination and texture-space split rendering, we propose point-based split ambient occlusion (PSAO), a novel split-rendering system that streams points sparsely from server to client. PSAO first evenly distributes points across the scene, and then subsequently only transmits points that changed more than a given threshold, using an efficient hash grid to blend neighboring points for the final compositing pass on the client. PSAO outperforms recent texture-space shading approaches in terms of quality and required network bit rate, while demonstrating performance similar to commonly used lower-quality screen-space approaches. Our point-based split rendering representation lends itself to highly compressible signals such as AO and is scalable towards quality or bandwidth requirements by adjusting the number of points in the scene.
High-Performance Graphics - Symposium Papers (HPG'23), 2023

@inproceedings {10.2312:hpg.20231131,
booktitle = {High-Performance Graphics - Symposium Papers},
editor = {Bikker, Jacco and Gribble, Christiaan},
title = {{PSAO: Point-Based Split Rendering for Ambient Occlusion}},
author = {Neff, Thomas and Budge, Brian and Dong, Zhao and Schmalstieg, Dieter and Steinberger, Markus},
year = {2023},
publisher = {The Eurographics Association},
ISSN = {2079-8687},
ISBN = {978-3-03868-229-5},
DOI = {10.2312/hpg.20231131}
}
×
C34
Alexander Weinrauch, Stephan Lorbek, Wolfgang Tatzgern, Pascal Stadlbauer, Markus Steinberger:
Clouds in the Cloud: Efficient Cloud-Based Rendering of Real-Time Volumetric Clouds
Abstract: Volumetric clouds play a crucial role in creating realistic, dynamic, and immersive virtual outdoor environments. However, rendering volumetric clouds in real-time presents a significant computational challenge on end-user devices. In this paper, we investigate the viability of moving computations to remote servers in the cloud and sharing them among many viewers in the same virtual world, without compromising the perceived quality of the final renderings. We propose an efficient rendering method for volumetric clouds and cloud shadows utilizing caches placed in the cloud layers and directly on the surface of objects. Volumetric cloud properties, like density and lightning, are cached on spheres positioned to represent cloud layers at varying heights. Volumetric cloud shadows are cached directly on the surfaces of receiving objects. This allows efficient rendering in scenarios where multiple viewers observe the same cloud formations by sharing redundant calculations and storing them over multiple frames. Due to the placement and structure of our caches, viewers on the ground still perceive plausible parallax under movement on the ground. In a user study, we found that viewers hardly perceive quality reductions even when computations are shared for viewers that are hundreds of meters apart. Due to the smoothness of the appearance of clouds, caching structures can use significantly reduced resolution and as such allow for efficient rendering even in single-viewer scenarios. Our quantitative experiments demonstrate computational cost savings proportional to the number of viewers placed in the scene when relying on our caches compared to traditional rendering.
High-Performance Graphics - Symposium Papers (HPG'23), 2023

@inproceedings {10.2312:hpg.20231138,
booktitle = {High-Performance Graphics - Symposium Papers},
editor = {Bikker, Jacco and Gribble, Christiaan},
title = {{Clouds in the Cloud: Efficient Cloud-Based Rendering of Real-Time Volumetric Clouds}},
author = {Weinrauch, Alexander and Lorbek, Stephan and Tatzgern, Wolfgang and Stadlbauer, Pascal and Steinberger, Markus},
year = {2023},
publisher = {The Eurographics Association},
ISSN = {2079-8687},
ISBN = {978-3-03868-229-5},
DOI = {10.2312/hpg.20231138}
}
×
PA09
Thomas Neff, Joerg H. Mueller, Markus Steinberger, Dieter Schmalstieg:
Meshlet shading atlas
Abstract: Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may divide at least one scene into a plurality of meshlets, each of the meshlets including a plurality of primitives, and each of the primitives including plurality of vertices. The apparatus may also calculate a pair of texture coordinates for each of the plurality of vertices. Further, the apparatus may select a size of each of the plurality of meshlets in the at least one scene based on the pair of the texture coordinates and based on a perspective projection of each of the plurality of meshlets. The apparatus may also calculate layout information in a meshlet atlas for each of the meshlets in the at least one scene. Moreover, the apparatus may shade each of a plurality of pixels in the meshlet atlas based on the calculated layout information.
US Patent App. 17/934,159, 2023
C33
Michael Kenzel, Stefan Lemme, Richard Membarth, Matthias Kurtenacker, Hugo Devillers, Markus Steinberger, Philipp Slusallek:
AnyQ: An Evaluation Framework for Massively-Parallel Queue Algorithms
Abstract: Concurrent queue algorithms have been subject to extensive research. However, the target hardware and evaluation methodology on which the published results for any two given concurrent queue algorithms are based often share only minimal overlap. A meaningful comparison is, thus, exceedingly difficult. With the continuing trend towards more and more heterogeneous systems, it is becoming more and more important to not only evaluate and compare novel and existing queue algorithms across a wider range of target architectures, but to also be able to continuously re-evaluate queue algorithms in light of novel architectures and capabilities.To address this need, we present AnyQ, an evaluation framework for concurrent queue algorithms. We design a set of programming abstractions that enable the mapping of concurrent queue algorithms and benchmarks to a wide variety of target architectures. We demonstrate the effectiveness of these abstractions by showing that a queue algorithm expressed in a portable, high-level manner can achieve performance comparable to hand-crafted implementations. We design a system for testing and benchmarking queue algorithms. Using the developed framework, we investigate concurrent queue algorithm performance across a range of both CPU as well as GPU architectures. In hopes that it may serve the community as a starting point for building a common repository of concurrent queue algorithms as well as a base for future research, all code and data is made available as open source software at https://anydsl.github.io/anyq.
IEEE International Parallel and Distributed Processing Symposium (IPDPS'23), 2023

@INPROCEEDINGS{10177434,
author={Kenzel, Michael and Lemme, Stefan and Membarth, Richard and Kurtenacker, Matthias and Devillers, Hugo and Steinberger, Markus and Slusallek, Philipp},
booktitle={2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
title={AnyQ: An Evaluation Framework for Massively-Parallel Queue Algorithms},
year={2023},
volume={},
number={},
pages={736-745},
doi={10.1109/IPDPS54959.2023.00079}}
×
J39
Alexander Weinrauch, Daniel Mlakar, Hans-Peter Seidel, Markus Steinberger, Rhaleb Zayer:
A Variational Loop Shrinking Analogy for Handle and Tunnel Detection and Reeb Graph Construction on Surface
Abstract: The humble loop shrinking property played a central role in the inception of modern topology but it has been eclipsed by more abstract algebraic formalisms. This is particularly true in the context of detecting relevant non-contractible loops on surfaces where elaborate homological and/or graph theoretical constructs are favored in algorithmic solutions. In this work, we devise a variational analogy to the loop shrinking property and show that it yields a simple, intuitive, yet powerful solution allowing a streamlined treatment of the problem of handle and tunnel loop detection. Our formalization tracks the evolution of a diffusion front randomly initiated on a single location on the surface. Capitalizing on a diffuse interface representation combined with a set of rules for concurrent front interactions, we develop a dynamic data structure for tracking the evolution on the surface encoded as a sparse matrix which serves for performing both diffusion numerics and loop detection and acts as the workhorse of our fully parallel implementation. The substantiated results suggest our approach outperforms state of the art and robustly copes with highly detailed geometric models. As a byproduct, our approach can be used to construct Reeb graphs by diffusion thus avoiding commonly encountered issues when using Morse functions.
Computer Graphics Forum (EG'23), 2023

@article{https://doi.org/10.1111/cgf.14763,
author = {Weinrauch, Alexander. and Mlakar, Daniel. and Seidel, Hans-Peter. and Steinberger, Markus. and Zayer, Rhaleb.},
title = {A Variational Loop Shrinking Analogy for Handle and Tunnel Detection and Reeb Graph Construction on Surfaces},
journal = {Computer Graphics Forum},
volume = {42},
number = {2},
pages = {309-320},
keywords = {CCS Concepts, • Computing methodologies → Shape analysis, Massively parallel algorithms},
doi = {https://doi.org/10.1111/cgf.14763},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.14763},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14763},
abstract = {Abstract The humble loop shrinking property played a central role in the inception of modern topology but it has been eclipsed by more abstract algebraic formalisms. This is particularly true in the context of detecting relevant non-contractible loops on surfaces where elaborate homological and/or graph theoretical constructs are favored in algorithmic solutions. In this work, we devise a variational analogy to the loop shrinking property and show that it yields a simple, intuitive, yet powerful solution allowing a streamlined treatment of the problem of handle and tunnel loop detection. Our formalization tracks the evolution of a diffusion front randomly initiated on a single location on the surface. Capitalizing on a diffuse interface representation combined with a set of rules for concurrent front interactions, we develop a dynamic data structure for tracking the evolution on the surface encoded as a sparse matrix which serves for performing both diffusion numerics and loop detection and acts as the workhorse of our fully parallel implementation. The substantiated results suggest our approach outperforms state of the art and robustly copes with highly detailed geometric models. As a byproduct, our approach can be used to construct Reeb graphs by diffusion thus avoiding commonly encountered issues when using Morse functions.},
year = {2023}
}
×
J38
Jozef Hladky, Michael Stengel, Nicholas Vining, Bernhard Kerbl, Hans-Peter Seidel, Markus Steinberger:
QuadStream: A Quad-Based Scene Streaming Architecture for Novel Viewpoint Reconstruction
Abstract: Streaming rendered 3D content over a network to a thin client device, such as a phone or a VR/AR headset, brings high-fidelity graphics to platforms where it would not normally possible due to thermal, power, or cost constraints. Streamed 3D content must be transmitted with a representation that is both robust to latency and potential network dropouts. Transmitting a video stream and reprojecting to correct for changing viewpoints fails in the presence of disocclusion events; streaming scene geometry and performing high-quality rendering on the client is not possible on limited-power mobile GPUs. To balance the competing goals of disocclusion robustness and minimal client workload, we introduce QuadStream, a new streaming content representation that reduces motion-to-photon latency by allowing clients to efficiently render novel views without artifacts caused by disocclusion events. Motivated by traditional macroblock approaches to video codec design, we decompose the scene seen from positions in a view cell into a series of quad proxies, or view-aligned quads from multiple views. By operating on a rasterized G-Buffer, our approach is independent of the representation used for the scene itself; the resulting QuadStream is an approximate geometric representation of the scene that can be reconstructed by a thin client to render both the current view and nearby adjacent views. Our technical contributions are an efficient parallel quad generation, merging, and packing strategy for proxy views covering potential client movement in a scene; a packing and encoding strategy that allows masked quads with depth information to be transmitted as a frame-coherent stream; and an efficient rendering approach for rendering our QuadStream representation into entirely novel views on thin clients. We show that our approach achieves superior quality compared both to video data streaming methods, and to geometry-based streaming.
ACM Transactions on Graphics (SIGGRAPH Asia'22), 2022

@article{10.1145/3550454.3555524,
author = {Hladky, Jozef and Stengel, Michael and Vining, Nicholas and Kerbl, Bernhard and Seidel, Hans-Peter and Steinberger, Markus},
title = {QuadStream: A Quad-Based Scene Streaming Architecture for Novel Viewpoint Reconstruction},
year = {2022},
issue_date = {December 2022},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {41},
number = {6},
issn = {0730-0301},
url = {https://doi.org/10.1145/3550454.3555524},
doi = {10.1145/3550454.3555524},
month = {nov},
articleno = {233},
numpages = {13}
}
×
C32
Andreas Kurz, Thomas Neff, Zhaoyang Lv, Michael Zollhöfer, Markus Steinberger:
AdaNeRF: Adaptive Sampling for Real-Time Rendering of Neural Radiance Fields
Abstract: Novel view synthesis has recently been revolutionized by learning neural radiance fields directly from sparse observations. However, rendering images with this new paradigm is slow due to the fact that an accurate quadrature of the volume rendering equation requires a large number of samples for each ray. Previous work has mainly focused on speeding up the network evaluations that are associated with each sample point, e.g., via caching of radiance values into explicit spatial data structures, but this comes at the expense of model compactness. In this paper, we propose a novel dual-network architecture that takes an orthogonal direction by learning how to best reduce the number of required sample points. To this end, we split our network into a sampling and shading network that are jointly trained. Our training scheme employs fixed sample positions along each ray, and incrementally introduces sparsity throughout training to achieve high quality even at low sample counts. After fine-tuning with the target number of samples, the resulting compact neural representation can be rendered in real-time. Our experiments demonstrate that our approach outperforms concurrent compact neural representations in terms of quality and frame rate and performs on par with highly efficient hybrid representations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
PA08
Philip Voglreiter, Dieter Schmalstieg, Markus Steinberger:
Methods and apparatus for order-independent occlusion computations
Abstract: The present disclosure relates to methods and apparatus for graphics processing. The apparatus can determine geometry information for each of a plurality of primitives associated with a viewpoint in a scene. The apparatus can also calculate at least one of surface information and disocclusion information based on the geometry information for each of the plurality of primitives, where the surface information and the disocclusion information may be associated with a volumetric grid based on a viewing area corresponding to the viewpoint. Also, the apparatus can calculate visibility information for each of the plurality of primitives based on at least one of the surface information and the disocclusion information, where the visibility information may be associated with the volumetric grid. The apparatus can also determine whether each of the plurality of primitives is visible based on the visibility information for each of the plurality of primitives.
US Patent 11,380,047, 2022
C31
Mathias Parger, Chengcheng Tang, Christopher D. Twigg, Cem Keskin, Robert Wang, Markus Steinberger:
DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos
Abstract: Commonly used image-space layouts of shading points, such as used in deferred shading, are strictly view-dependent, which restricts efficient caching and temporal amortization. In contrast, texture-space layouts can represent shading on all surface points and can be tailored to the needs of a particular application. However, the best grouping of shading points—which we call a shading unit—in texture space remains unclear. Choices of shading unit granularity (how many primitives or pixels per unit) and in shading unit parametrization (how to assign texture coordinates to shading points) lead to different outcomes in terms of final image quality, overshading cost, and memory consumption. Among the possible choices, shading units consisting of larger groups of scene primitives, so-called meshlets, remain unexplored as of yet. In this paper, we introduce a taxonomy for analyzing existing texture-space shading methods based on the group size and parametrization of shading units. Furthermore, we introduce a novel texture-space layout strategy that operates on large shading units: the meshlet shading atlas. We experimentally demonstrate that the meshlet shading atlas outperforms previous approaches in terms of image quality, run-time performance and temporal upsampling for a given number of fragment shader invocations. The meshlet shading atlas lends itself to work together with popular cluster-based rendering of meshes with high geometric detail.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

@inproceedings{parger2022deltacnn,
title={DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos},
author={Parger, Mathias and Tang, Chengcheng and Twigg, Christopher D and Keskin, Cem and Wang, Robert and Steinberger, Markus},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={12497--12506},
year={2022}
}
×
J37
Thomas Neff, Joerg H. Mueller, Markus Steinberger, Dieter Schmalstieg:
Meshlets and How to Shade Them: A Study on Texture-Space Shading
Abstract: Commonly used image-space layouts of shading points, such as used in deferred shading, are strictly view-dependent, which restricts efficient caching and temporal amortization. In contrast, texture-space layouts can represent shading on all surface points and can be tailored to the needs of a particular application. However, the best grouping of shading points—which we call a shading unit—in texture space remains unclear. Choices of shading unit granularity (how many primitives or pixels per unit) and in shading unit parametrization (how to assign texture coordinates to shading points) lead to different outcomes in terms of final image quality, overshading cost, and memory consumption. Among the possible choices, shading units consisting of larger groups of scene primitives, so-called meshlets, remain unexplored as of yet. In this paper, we introduce a taxonomy for analyzing existing texture-space shading methods based on the group size and parametrization of shading units. Furthermore, we introduce a novel texture-space layout strategy that operates on large shading units: the meshlet shading atlas. We experimentally demonstrate that the meshlet shading atlas outperforms previous approaches in terms of image quality, run-time performance and temporal upsampling for a given number of fragment shader invocations. The meshlet shading atlas lends itself to work together with popular cluster-based rendering of meshes with high geometric detail.
Computer Graphics Forum (EG'22), 2022

@article{https://doi.org/10.1111/cgf.14474,
author = {Neff, T. and Mueller, J. H. and Steinberger, M. and Schmalstieg, D.},
title = {Meshlets and How to Shade Them: A Study on Texture-Space Shading},
journal = {Computer Graphics Forum},
volume = {41},
number = {2},
pages = {277-287},
keywords = {CCS Concepts, Computing methodologies Rendering, Texturing},
doi = {https://doi.org/10.1111/cgf.14474},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.14474},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14474},
year = {2022}
}
×
O05
Bernhard Kerbl, Michael Kenzel, Martin Winter, Markus Steinberger:
CUDA and Application to Task-Based Programming
Outline: Since its inception, the CUDA programming model has been continuously evolving. Because the CUDA toolkit aims to consistently expose cutting-edge capabilities for general purpose compute jobs to its users, the added features in each new version reflect the rapid changes that we observe in GPU architectures. Over the years, the changes in hardware, a growing scope of built-in functions and libraries, as well as an advancing C++ standard compliance have expanded the design choices when coding for CUDA, and significantly altered the directives to achieve peak performance. In this tutorial, we give a thorough introduction to the CUDA toolkit, demonstrate how a contemporary application can benefit from recently introduced features and how they can be applied to task-based GPU scheduling in particular.
To provide a profound understanding of how CUDA applications can achieve peak performance, Part 1 of this tutorial outlines the modern CUDA architecture. Following a basic introduction, we expose how language features are linked to – and constrained by – the underlying physical hardware components. Furthermore, we describe common applications for massively parallel programming, offer a detailed breakdown of potential issues and list ways to mitigate performance impacts. An examplary analysis of PTX and SASS snippets illustrates how code patterns in CUDA are mapped to actual hardware instructions.
In Part 2, we will focus on novel features that were enabled by the arrival of CUDA 10+ toolkits and the Volta+ architectures, such as ITS, tensor cores and the graph API. In addition to basic use case demonstrations, we outline our own experiences with these capabilities and their potential performance benefits. We also discuss how long-standing best practices are affected by these changes and describe common caveats for dealing with legacy code on recent GPU models. We show how these considerations can be implemented in practice by presenting state-of-the-art research into task-based GPU scheduling, and how the dynamic adjustment of thread roles and group configurations can significantly increase performance.
Eurographics Tutorials'22, 2022
PA07
Dieter Schmalstieg, Markus Steinberger, Wolfgang Tatzgern:
Billboard layers in object-space rendering
Abstract: The present disclosure relates to methods and apparatus for graphics processing. The apparatus may configure a plurality of billboards associated with a viewpoint of a first frame of a plurality of frames, the plurality of billboards being configured in one or more layers at least partially around the viewpoint, the configuration of the plurality of billboards being based on one or more volumetric elements between at least one of the plurality of billboards and the viewpoint. The apparatus may also render an image associated with each of the one or more volumetric elements between at least one billboard of the plurality of billboards and the viewpoint, the rendered image including a set of pixels. The apparatus may also store data in the at least one billboard based on the rendered image associated with each of the one or more volumetric elements, the data corresponding to the set of pixels.
US Patent App. 17/400,031, 2022
PA06
Dieter Schmalstieg, Pascal Stadlbauer, Markus Steinberger:
Image-space function transmission
Abstract: The present disclosure relates to methods and apparatus for graphics processing at a server and/or a client device. In some aspects, the apparatus may convert application data for at least one frame, the application data corresponding to one or more image functions or one or more data channels. The apparatus may also encode the application data for the at least one frame, the application data being associated with a data stream, the application data being encoded via a video encoding process. The apparatus may also transmit the encoded application data for the at least one frame. Additionally, the apparatus may receive application data for at least one frame, the application data being associated with a data stream. The apparatus may also decode the application data for the at least one frame; and convert the application data for the at least one frame.
US Patent App. 17/400,048, 2022
PA05
Dieter Schmalstieg, Markus Steinberger, Daniel Mlakar:
Compressed geometry rendering and streaming
Abstract: The present disclosure relates to methods and apparatus for graphics processing. The apparatus may identify at least one mesh associated with at least one frame. The apparatus may also divide the at least one mesh into a plurality of groups of primitives, each of the plurality of groups of primitives including at least one primitive and a plurality of vertices. The apparatus may also compress the plurality of groups of primitives into a plurality of groups of compressed primitives, the plurality of groups of compressed primitives being associated with random access. Additionally, the apparatus may decompress the plurality of groups of compressed primitives, at least one first group of the plurality of groups of compressed primitives being decompressed in parallel with at least one second group of the plurality of groups of compressed primitives.
US Patent App. 17/400,065, 2022
C30
Benedikt Mayr, Alexander Weinrauch, Mathias Parger, Markus Steinberger:
Are van Emde Boas trees viable on the GPU?
Abstract: Van Emde Boas trees show an asymptotic query complexity surpassing the performance of traditional data structure for performing search queries in large data sets. However, their implementation and large constant overheads prohibit their widespread use. In this work, we ask, whether van Emde Boas trees are viable on the GPU. We presents a novel algorithm to construct a van Emde Boas tree utilizing the parallel compute power and memory bandwidth of modern GPUs. By analyzing the structure of a sorted data set, our method is able to build a van Emde Boas tree efficiently in parallel with little thread synchronization. We compare querying data using a van Emde Boas tree and binary search on the GPU and show that for very large data sets, the van Emde Boas tree outperforms a binary search by up to 1.2x while similarly increasing space requirements. Overall, we confirm that van Emde Boas trees are viable for certain scenarios on the GPU.
IEEE High Performance Extreme Computing, 2021
J36
Thomas Neff, Pascal Stadlbauer, Mathias Parger, Andreas Kurz, Joerg H Mueller, Chakravarty R. A. Chaitanya, Anton Kaplanyan, Markus Steinberger:
DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks
Abstract: The recent research explosion around Neural Radiance Fields (NeRFs) shows that there is immense potential for implicitly storing scene and lighting information in neural networks, e.g., for novel view generation. However, one major limitation preventing the widespread use of NeRFs is the prohibitive computational cost of excessive network evaluations along each view ray, requiring dozens of petaFLOPS when aiming for real-time rendering on current devices. We show that the number of samples required for each view ray can be significantly reduced when local samples are placed around surfaces in the scene. To this end, we propose a depth oracle network, which predicts ray sample locations for each view ray with a single network evaluation. We show that using a classification network around logarithmically discretized and spherically warped depth values is essential to encode surface locations rather than directly estimating depth. The combination of these techniques leads to DONeRF, a dual network design with a depth oracle network as a first step and a locally sampled shading network for ray accumulation. With our design, we reduce the inference costs by up to 48x compared to NeRF. Using an off-the-shelf inference API in combination with simple compute kernels, we are the first to render raymarching-based neural representations at interactive frame rates (15 frames per second at 800x800) on a single GPU. At the same time, since we focus on the important parts of the scene around surfaces, we achieve equal or better quality compared to NeRF.
Computer Graphics Forum (EGSR'21), 2021

@article {neff2021donerf,
journal = {Computer Graphics Forum},
title = {{DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks}},
author = {Neff, Thomas and Stadlbauer, Pascal and Parger, Mathias and Kurz, Andreas and Mueller, Joerg H. and Chaitanya, Chakravarty R. Alla and Kaplanyan, Anton S. and Steinberger, Markus},
year = {2021},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/cgf.14340},
url = {https://doi.org/10.1111/cgf.14340},
volume = {40},
number = {4},
}
×
J35
Mathias Parger, Chengcheng Tang, Yuanlu Xu, Christopher D. Twigg, Lingling Tao, Yijing Li, Rob Wang, Markus Steinberger:
UNOC: Understanding Occlusion for Embodied Presence in Virtual Reality
Abstract: Tracking body and hand motions in 3D space is essential for social and self-presence in augmented and virtual environments. Unlike the popular 3D pose estimation setting, the problem is often formulated as egocentric tracking based on embodied perception (e.g., egocentric cameras, handheld sensors). In this article, we propose a new data-driven framework for egocentric body tracking, targeting challenges of omnipresent occlusions in optimization-based methods (e.g., inverse kinematics solvers). We first collect a large-scale motion capture dataset with both body and finger motions using optical markers and inertial sensors. This dataset focuses on social scenarios and captures ground truth poses under self-occlusions and body-hand interactions. We then simulate the occlusion patterns in head-mounted camera views on the captured ground truth using a ray casting algorithm and learn a deep neural network to infer the occluded body parts. Our experiments show that our method is able to generate high-fidelity embodied poses by applying the proposed method to the task of real-time egocentric body tracking, finger motion synthesis, and 3-point inverse kinematics.
IEEE Transactions on Visualization and Computer Graphics, 2021

@inproceedings{9444887,
author={Parger, Mathias and Tang, Chengcheng and Xu, Yuanlu and Twigg, Christopher D. and Tao, Lingling and Li, Yijing and Wang, Robert and Steinberger, Markus},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={UNOC: Understanding Occlusion for Embodied Presence in Virtual Reality},
year={2022},
volume={28},
number={12},
pages={4240-4251},
doi={10.1109/TVCG.2021.3085407}
}
×
C29
Daniel Mlakar, Martin Winter, Mathias Parger, Markus Steinberger:
Speculative Parallel Reverse Cuthill-McKee Reordering on Multi- and Many-core Architectures
Abstract: Bandwidth reduction of sparse matrices is used to reduce fill-in of linear solvers and to increase performance of other sparse matrix operations, e.g., sparse matrix vector multiplication in iterative solvers. To compute a bandwidth reducing permutation, Reverse Cuthill-McKee (RCM) reordering is often applied, which is challenging to parallelize, as its core is inherently serial. As many-core architectures, like the GPU, offer subpar single-threading performance and are typically only connected to high-performance CPU cores via a slow memory bus, neither computing RCM on the GPU nor moving the data to the CPU are viable options. Nevertheless, reordering matrices, potentially multiple times in-between operations, might be essential for high throughput. Still, to the best of our knowledge, we are the first to propose an RCM implementation that can execute on multicore CPUs and many-core GPUs alike, moving the computation to the data rather than vice versa. Our algorithm parallelizes RCM into mostly independent batches of nodes. For every batch, a single CPU-thread/a GPU thread-block speculatively discovers child nodes and sorts them according to the RCM algorithm. Before writing their permutation, we re-evaluate the discovery and build new batches. To increase parallelism and reduce dependencies, we create a signaling chain along successive batches and introduce early signaling conditions. In combination with a parallel work queue, new batches are started in order and the resulting RCM permutation is identical to the ground-truth single-threaded algorithm. We propose the first RCM implementation that runs on the GPU. It achieves several orders of magnitude speed-up over NVIDIA’s single-threaded cuSolver RCM implementation and is significantly faster than previous parallel CPU approaches. Our results are especially significant for many-core architectures, as it is now possible to include RCM reordering into sequences of sparse matrix operations without major performance loss.
IEEE International Parallel and Distributed Processing Symposium (IPDPS'21), 2021

@inproceedings{SpeculativeRCM,
title={Speculative Parallel Reverse Cuthill-McKee Reordering on Multi- and Many-core Architectures},
author={Mlakar, Daniel and Winter, Martin and Parger, Mathias and Steinberger, Markus},
booktitle={2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
year={2021},
organization={IEEE}
}
×
O04
Michael Kenzel, Bernhard Kerbl, Martin Winter, Markus Steinberger:
CUDA and Application to Task-Based Programming
Outline: Since its inception, the CUDA programming model has been continuously evolving. Because the CUDA toolkit aims to consistently expose cutting-edge capabilities for general purpose compute jobs to its users, the added features in each new version reflect the rapid changes that we observe in GPU architectures. Over the years, the changes in hardware, a growing scope of built-in functions and libraries, as well as an advancing C++ standard compliance have expanded the design choices when coding for CUDA, and significantly altered the directives to achieve peak performance. In this tutorial, we give a thorough introduction to the CUDA toolkit, demonstrate how a contemporary application can benefit from recently introduced features and how they can be applied to task-based GPU scheduling in particular.
To provide a profound understanding of how CUDA applications can achieve peak performance, Part 1 of this tutorial outlines the modern CUDA architecture. Following a basic introduction, we expose how language features are linked to – and constrained by – the underlying physical hardware components. Furthermore, we describe common applications for massively parallel programming, offer a detailed breakdown of potential issues and list ways to mitigate performance impacts. An examplary analysis of PTX and SASS snippets illustrates how code patterns in CUDA are mapped to actual hardware instructions.
In Part 2, we will focus on novel features that were enabled by the arrival of CUDA 10+ toolkits and the Volta+ architectures, such as ITS, tensor cores and the graph API. In addition to basic use case demonstrations, we outline our own experiences with these capabilities and their potential performance benefits. We also discuss how long-standing best practices are affected by these changes and describe common caveats for dealing with legacy code on recent GPU models. We show how these considerations can be implemented in practice by presenting state-of-the-art research into task-based GPU scheduling, and how the dynamic adjustment of thread roles and group configurations can significantly increase performance.
Eurographics Tutorials'21, 2021
J34
Jozef Hladky, Hans-Peter Seidel, Markus Steinberger:
SnakeBinning: Efficient Temporally Coherent Triangle Packing for Shading Streaming
Abstract: Streaming rendering, e.g., rendering in the cloud and streaming via a mobile connection, suffers from increased latency and unreliable connections. High quality framerate upsampling can hide these issues, especially when capturing shading into an atlas and transmitting it alongside geometric information. The captured shading information must consider triangle footprints and temporal stability to ensure efficient video encoding. Previous approaches only consider either temporal stability or sample distributions, but none focuses on both. With SnakeBinning, we present an efficient triangle packing approach that adjusts sample distributions and caters for temporal coherence. Using a multi-dimensional binning approach, we enforce tight packing among triangles while creating optimal sample distributions. Our binning is built on top of hardware supported real-time rendering where bins are mapped to individual pixels in a virtual framebuffer. Fragment shader interlock and atomic operations enforce global ordering of triangles within each bin, and thus temporal coherence according to the primitive order is achieved. Resampling the bin distribution guarantees high occupancy among all bins and a dense atlas packing. Shading samples are directly captured into the atlas using a rasterization pass, adjusting samples for perspective effects and creating a tight packing. Comparison to previous atlas packing approaches shows that our approach is faster than previous work and achieves the best sample distributions while maintaining temporal coherence. In this way, SnakeBinning achieves the highest rendering quality under equal atlas memory requirements. At the same time, its temporal coherence ensures that we require equal or less bandwidth than previous state-of-the-art. As SnakeBinning outperforms previous approach in all relevant aspects, it is the preferred choice for texture-based streaming rendering.
Computer Graphics Forum (Eurographics'21), 2021

@article {10.1111:cgf.142648,
journal = {Computer Graphics Forum},
title = {{SnakeBinning: Efficient Temporally Coherent Triangle Packing for Shading Streaming}},
author = {Hladky, Jozef and Seidel, Hans-Peter and Steinberger, Markus},
year = {2021},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/cgf.142648}
}
×
J33
Joerg H. Mueller, Thomas Neff, Philip Voglreiter, Markus Steinberger, Dieter Schmalstieg:
Temporally Adaptive Shading Reuse for Real-Time Rendering and Virtual Reality
Abstract: Temporal coherence has the potential to enable a huge reduction of shading costs in rendering. Existing techniques focus either only on spatial shading reuse or cannot adaptively choose temporal shading frequencies. We find that temporal shading reuse is possible for extended periods of time for a majority of samples, and we show under which circumstances users perceive temporal artifacts. Our analysis implies that we can approximate shading gradients to efficiently determine when and how long shading can be reused. Whereas visibility usually stays temporally coherent from frame to frame for more than 90%, we find that even in heavily animated game scenes with advanced shading, typically more than 50% of shading is also temporally coherent. To exploit this potential, we introduce a temporally adaptive shading framework and apply it to two real-time methods. Its application saves more than 57% of the shader invocations, reducing overall rendering times up to in virtual reality applications without a noticeable loss in visual quality. Overall, our work shows that there is significantly more potential for shading reuse than currently exploited.
ACM Transaction on Graphics (TOG)
Presented at SIGGRAPH 2021, 2021

@article{10.1145/3446790,
author = {Mueller, Joerg H. and Neff, Thomas and Voglreiter, Philip and Steinberger, Markus and Schmalstieg, Dieter},
title = {Temporally Adaptive Shading Reuse for Real-Time Rendering and Virtual Reality},
year = {2021},
issue_date = {April 2021},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {40},
number = {2},
issn = {0730-0301},
url = {https://doi.org/10.1145/3446790},
doi = {10.1145/3446790},
journal = {ACM Trans. Graph.},
month = apr,
articleno = {11},
numpages = {14}
}
×
C28
Martin Winter, Mathias Parger, Daniel Mlakar, Markus Steinberger:
Are dynamic memory managers on GPUs slow?: a survey and benchmarks
Abstract: Dynamic memory management on GPUs is generally understood to be a challenging topic. On current GPUs, hundreds of thousands of threads might concurrently allocate new memory or free previously allocated memory. This leads to problems with thread contention, synchronization overhead and fragmentation. Various approaches have been proposed in the last ten years and we set out to evaluate them on a level playing field on modern hardware to answer the question, if dynamic memory managers are as slow as commonly thought of. In this survey paper, we provide a consistent framework to evaluate all publicly available memory managers in a large set of scenarios. We summarize each approach and thoroughly evaluate allocation performance (thread-based as well as warp-based), and look at performance scaling, fragmentation and real-world performance considering a synthetic workload as well as updating dynamic graphs. We discuss the strengths and weaknesses of each approach and provide guidelines for the respective best usage scenario. We provide a unified interface to integrate any of the tested memory managers into an application and switch between them for benchmarking purposes. Given our results, we can dispel some of the dread associated with dynamic memory managers on the GPU.
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'21), 2021

@inproceedings{10.1145/3437801.3441612,
author = {Winter, Martin and Parger, Mathias and Mlakar, Daniel and Steinberger, Markus},
title = {Are Dynamic Memory Managers on GPUs Slow? A Survey and Benchmarks},
year = {2021},
isbn = {9781450382946},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3437801.3441612},
doi = {10.1145/3437801.3441612},
booktitle = {Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
pages = {219–233},
numpages = {15},
location = {Virtual Event, Republic of Korea},
series = {PPoPP '21}
}
×
PA04
Mark Dokter, Dieter Schmalstieg, Markus Steinberger:
Methods and apparatus for improving subpixel visibility
Abstract: The present disclosure relates to methods and devices for operation of a GPU. The device can determine a first subset of primitives associated with a set of objects within an image. The first subset of primitives can be based on a first viewpoint with respect to the set of objects. The device can also determine, for a second viewpoint with respect to the set of objects, a second subset of primitives excluding the first subset of primitives. In some aspects, the second subset of primitives can have a difference in depth with respect to the first subset of primitives that is less than a threshold depth. Additionally, the device can mark the first subset of primitives and the second subset of primitives as visible. Further, the device can generate graphical content based on the marked first subset of primitives and the marked second subset of primitives.
US Patent app., 2020
PA03
Jozef Hladky, Markus Steinberger, Hans-Peter Seidel:
Real-time potentially visible set for streaming rendering
Abstract: The invention relates to a method for transmitting 3D model data, the 3D model data comprising polygons, from a server to a client for rendering, the method comprising: obtaining the 3D model data by the server; and transmitting the 3D model data from the server to the client. According to the invention, the 3D model data is obtained by the server, based on a given multitude of possible views.
US Patent app., 2020
C27
Martin Winter, Daniel Mlakar, Mathias Parger, Markus Steinberger:
Ouroboros: virtualized queues for dynamic memory management on GPUs
Abstract: Dynamic memory allocation on a single instruction, multiple threads architecture, like the Graphics Processing Unit (GPU), is challenging and implementation guidelines caution against it. Data structures must rise to the challenge of thousands of concurrently active threads trying to allocate memory. Efficient queueing structures have been used in the past to allow for simple allocation and reuse of memory directly on the GPU but do not scale well to different allocation sizes, as each requires its own queue.
In this work, we propose Ouroboros, a virtualized queueing structure, managing dynamically allocatable data chunks, whilst being built on top of these same chunks. Data chunks are interpreted on-the-fly either as building blocks for the virtualized queues or as paged user data. Re-usable user memory is managed in one of two ways, either as individual pages or as chunks containing pages. The queueing structures grow and shrink dynamically, only currently needed queue chunks are held in memory and freed up queue chunks can be reused within the system. Thus, we retain the performance benefits of an efficient, static queue design while keeping the memory requirements low. Performance evaluation on an NVIDIA TITAN V with the native device memory allocator in CUDA 10.1 shows speed-ups between 11X and 412X, with an average of 118X. For real-world testing, we integrate our allocator into faimGraph, a dynamic graph framework with proprietary memory management. Throughout all memory-intensive operations, such as graph initialization and edge updates, our allocator shows similar to improved performance. Additionally, we show improved algorithmic performance on PageRank and Static Triangle Counting.
Overall, our memory allocator can be efficiently initialized, allows for high-throughput allocation and offers, with its per-thread allocation model, a drop-in replacement for comparable dynamic memory allocators.
International Conference on Supercomputing (ICS'20), 2020

@inproceedings{10.1145/3392717.3392742,
author = {Winter, Martin and Mlakar, Daniel and Parger, Mathias and Steinberger, Markus},
title = {Ouroboros: Virtualized Queues for Dynamic Memory Management on GPUs},
year = {2020},
isbn = {9781450379830},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3392717.3392742},
doi = {10.1145/3392717.3392742},
booktitle = {Proceedings of the 34th ACM International Conference on Supercomputing},
articleno = {38},
numpages = {12},
keywords = {queueing, dynamic graphs, resource management, dynamic memory allocation, GPU},
location = {Barcelona, Spain},
series = {ICS ’20}
}
×
C26
Johannes Unterguggenberger, Bernhard Kerbl, Markus Steinberger, Dieter Schmalstieg, Michael Wimmer:
Fast Multi-View Rendering for Real-Time Applications
Abstract: Efficient rendering of multiple views can be a critical performance factor for real-time rendering applications. Generating more than one view multiplies the amount of rendered geometry, which can cause a huge performance impact. Minimizing that impact has been a target of previous research and GPU manufacturers, who have started to equip devices with dedicated acceleration units. However, vendor-specific acceleration is not the only option to increase multi-view rendering (MVR) performance. Available graphics API features, shader stages and optimizations can be exploited for improved MVR performance, while generally offering more versatile pipeline configurations, including the preservation of custom tessellation and geometry shaders. In this paper, we present an exhaustive evaluation of MVR pipelines available on modern GPUs. We provide a detailed analysis of previous techniques, hardware-accelerated MVR and propose a novel method, leading to the creation of an MVR catalogue. Our analyses cover three distinct applications to help gain clarity on overall MVR performance characteristics. Our interpretation of the observed results provides a guideline for selecting the most appropriate one for various use cases on different GPU architectures.
Eurographics Symposium on Parallel Graphics and Visualization (EGPGV ‘20), 2020

@inproceedings{Unterguggenberger:2020:FMVR,
title = {Fast Multi-View Rendering for Real-Time Applications},
author = {Johannes Unterguggenberger and Bernhard Kerbl and Markus
Steinberger and Dieter Schmalstieg and Michael Wimmer},
year = {2020},
month = may,
booktitle = {Eurographics Symposium on Parallel Graphics and
Visualization},
event = {EGPGV 2020},
pages = {13--23},
}
×
J32
Pascal Stadlbauer, Daniel Mlakar, Hans-Peter Seidel, Markus Steinberger, Rhaleb Zayer:
Interactive Modeling of Cellular Structures on Surfaces with Application to Additive Manufacturing
Abstract: The rich and evocative patterns of natural tessellations endow them with an unmistakable artistic appeal and structural properties which are echoed across design, production, and manufacturing. Unfortunately, interactive control of such patterns-as modeled by Voronoi diagrams, is limited to the simple two dimensional case and does not extend well to freeform surfaces. We present an approach for direct modeling and editing of such cellular structures on surface meshes. The overall modeling experience is driven by a set of editing primitives which are efficiently implemented on graphics hardware. We feature a novel application for 3D printing on modern support-free additive manufacturing platforms. Our method decomposes the input surface into a cellular skeletal structure which hosts a set of overlay shells. In this way, material saving can be channeled to the shells while structural stability is channeled to the skeleton. To accommodate the available printer build volume, the cellular structure can be further split into moderately sized parts. Together with shells, they can be conveniently packed to save on production time. The assembly of the printed parts is streamlined by a part numbering scheme which respects the geometric layout of the input model.
Computer Graphics Forum / Eurographics (EG'20), 2020

@article {10.1111:cgf.13929,
journal = {Computer Graphics Forum},
title = {{Interactive Modeling of Cellular Structures on Surfaces with Application to Additive Manufacturing}},
author = {Stadlbauer, Pascal and Mlakar, Daniel and Seidel, Hans-Peter and Steinberger, Markus and Zayer, Rhaleb},
year = {2020},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/cgf.13929}
}
×
J31
Daniel Mlakar, Martin Winter, Pascal Stadlbauer, Hans-Peter Seidel, Markus Steinberger, Rhaleb Zayer:
Subdivision-Specialized Linear Algebra Kernels for Static and Dynamic Mesh Connectivity on the GPU
Abstract: Subdivision surfaces have become an invaluable asset in production environments. While progress over the last years has allowed the use of graphics hardware to meet performance demands during animation and rendering, high-performance is limited to immutable mesh connectivity scenarios. Motivated by recent progress in mesh data structures, we show how the complete Catmull-Clark subdivision scheme can be abstracted in the language of linear algebra. While this high-level formulation allows for a fully parallel implementation with significant performance gains, the underlying algebraic operations require further specialization for modern parallel hardware. Integrating domain knowledge about the mesh matrix data structure, we replace costly general linear algebra operations like matrix-matrix multiplication by specialized kernels. By further considering innate properties of Catmull-Clark subdivision, like the quad-only structure after refinement, we achieve an additional order of magnitude in performance and significantly reduce memory footprints. Our approach can be adapted seamlessly for different use cases, such as regular subdivision of dynamic meshes, fast evaluation for immutable topology and feature-adaptive subdivision for efficient rendering of animated models. In this way, patchwork solutions are avoided in favor of a streamlined solution with consistent performance gains throughout the production pipeline. The versatility of the sparse matrix linear algebra abstraction underlying our work is further demonstrated by extension to other schemes such as √3 and Loop subdivision.
Eurographics '20 Best Paper Award
Computer Graphics Forum / Eurographics (EG'20), 2020

@article {10.1111:cgf.13934,
journal = {Computer Graphics Forum},
title = {{Subdivision-Specialized Linear Algebra Kernels for Static and Dynamic Mesh Connectivity on the GPU}},
author = {Mlakar, Daniel and Winter, Martin and Stadlbauer, Pascal and Seidel, Hans-Peter and Steinberger, Markus and Zayer, Rhaleb},
year = {2020},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/cgf.13934}
}
×
C25
Wolfgang Tatzgern, Benedikt Mayr, Bernhard Kerbl, Markus Steinberger:
Stochastic Substitute Trees for Real-Time Global Illumination
Abstract: With the introduction of hardware-supported ray tracing and deep learning for denoising, computer graphics has made a considerable step toward real-time global illumination. In this work, we present an alternative global illumination method: The stochastic substitute tree (SST), a hierarchical structure inspired by lightcuts with light probability distributions as inner nodes. Our approach distributes virtual point lights (VPLs) in every frame and efficiently constructs the SST over those lights by clustering according to Morton codes. Global illumination is approximated by sampling the SST and considers the BRDF at the hit location as well as the SST nodes’ intensities for importance sampling directly from inner nodes of the tree. To remove the introduced Monte Carlo noise, we use a recurrent autoencoder. In combination with temporal filtering, we deliver real-time global illumination for complex scenes with challenging light distributions.
Symposium on Interactive 3D Graphics and Games (I3D '20), 2020

@inbook{10.1145/3384382.3384521,
author = {Tatzgern, Wolfgang and Mayr, Benedikt and Kerbl, Bernhard and Steinberger, Markus},
title = {Stochastic Substitute Trees for Real-Time Global Illumination},
year = {2020},
isbn = {9781450375894},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3384382.3384521},
booktitle = {Symposium on Interactive 3D Graphics and Games},
articleno = {2},
numpages = {9}
}
×
C24
Mathias Parger, Martin Winter, Daniel Mlakar, Markus Steinberger:
spECK: Accelerating GPU Sparse Matrix-Matrix Multiplication through Lightweight Analysis
Abstract: Sparse general matrix-matrix multiplication on GPUs is challenging due to the varying sparsity patterns of sparse matrices. Existing solutions achieve good performance for certain types of matrices, but fail to accelerate all kinds of matrices in the same manner. Our approach combines multiple strategies with dynamic parameter selection to dynamically choose and tune the best fitting algorithm for each row of the matrix. This choice is supported by a lightweight, multi-level matrix analysis, which carefully balances analysis cost and expected performance gains. Our evaluation on thousands of matrices with various characteristics shows that we outperform all currently available solutions in 79% over all matrices with >15k products and that we achieve the second best performance in 15%. For these matrices, our solution is on average 83% faster than the second best approach and up to 25X faster than other state-of-the-art GPU implementations. Using our approach, applications can expect great performance independent of the matrices they work on.
Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '20), 2020

@inproceedings{10.1145/3332466.3374521,
author = {Parger, Mathias and Winter, Martin and Mlakar, Daniel and Steinberger, Markus},
title = {SpECK: Accelerating GPU Sparse Matrix-Matrix Multiplication through Lightweight Analysis},
year = {2020},
isbn = {9781450368186},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3332466.3374521},
doi = {10.1145/3332466.3374521},
booktitle = {Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
pages = {362–375},
numpages = {14},
location = {San Diego, California},
series = {PPoPP ’20}
}
×
J30
Jozef Hladky, Hans-Peter Seidel, Markus Steinberger:
The Camera Offset Space: Real-time Potentially Visible SetComputations for Streaming Rendering
Abstract: Potential visibility has historically always been of importance when rendering performance was insufficient. With the rise of virtual reality, rendering power may once again be insufficient, e.g., for integrated graphics of head-mounted displays. To tackle the issue of efficient potential visibility computations on modern graphics hardware, we introduce the camera offset space (COS). Opposite to how traditional visibility computations work---where one determines which pixels are covered by an object under all potential viewpoints---the COS describes under which camera movement a sample location is covered by a triangle. In this way, the COS opens up a new set of possibilities for visibility computations. By evaluating the pairwise relations of triangles in the COS, we show how to efficiently determine occluded triangles. Constructing the COS for all pixels of a rendered view leads to a complete potentially visible set (PVS) for complex scenes. By fusing triangles to larger occluders, including locations between pixel centers, and considering camera rotations, we describe an exact PVS algorithm that includes all viewing directions inside a view cell. Implementing the COS is a combination of real-time rendering and compute steps. We provide the first GPU PVS implementation that works without preprocessing, on-the-fly, on unconnected triangles. This opens the door to a new approach of rendering for virtual reality head-mounted displays and server-client settings for streaming 3D applications such as video games.
ACM Transactions on Graphics (SIGGRAPH Asia'19), 2019

@article{Hladky:2019:COS:3355089.3356530,
author = {Hladky, Jozef and Seidel, Hans-Peter and Steinberger, Markus},
title = {The Camera Offset Space: Real-time Potentially Visible Set Computations for Streaming Rendering},
journal = {ACM Trans. Graph.},
issue_date = {November 2019},
volume = {38},
number = {6},
month = nov,
year = {2019},
issn = {0730-0301},
pages = {231:1--231:14},
articleno = {231},
numpages = {14},
url = {http://doi.acm.org/10.1145/3355089.3356530},
doi = {10.1145/3355089.3356530},
acmid = {3356530},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {GPU, potentially visible set, real-time rendering, streaming rendering, virtual reality, visibility},
}
×
C23
Dominik Tödling, Martin Winter, Markus Steinberger:
Breadth-First Search on Dynamic Graphs using Dynamic Parallelism on the GPU
Abstract: Breadth-First Search is an important basis for many different graph-based algorithms with applications ranging from peer-to-peer networking to garbage collection. However, the performance of different approaches depends strongly on the type of graph. In this paper, we present an efficient algorithm that performs well on a variety of different graphs. As part of this, we look into utilizing dynamic parallelism in order to both reduce overhead from latency between the CPU and GPU, as well as speed up the algorithm itself. Lastly, integrate the algorithm with the faimGraph framework for dynamic graphs and examine the relative performance to a Compressed-Sparse-Row data structure. We show that our algorithm can be well adapted to the dynamic setting and outperforms another competing dynamic graph framework on our test set.
High Performance Extreme Computing, 2019

@INPROCEEDINGS{8916476,
author={Dominik Tödling and Martin Winter and Markus Steinberger},
booktitle={2019 IEEE High Performance Extreme Computing Conference (HPEC)},
title={Breadth-First Search on Dynamic Graphs using Dynamic Parallelism on the GPU},
year={2019},
volume={},
number={},
pages={1-7},
doi={10.1109/HPEC.2019.8916476},
ISSN={2377-6943},
month={Sep.},}
×
O03
Joerg H. Mueller, Thomas Neff, Philip Voglreiter, Mina Makar, Markus Steinberger, Dieter Schmalstieg:
Shading Atlas Streaming Demonstration
Abstract: Streaming high quality rendering for virtual reality applications requires minimizing perceived latency. Shading Atlas Streaming (SAS) [Mueller et al. 2018] is a novel object-space rendering framework suitable for streaming virtual reality content. SAS decouples server-side shading from client-side rendering, allowing the client to perform framerate upsampling and latency compensation autonomously for short periods of time. The shading information created by the server in object space is temporally coherent and can be efficiently compressed using standard MPEG encoding. SAS compares favorably to previous methods for remote image-based rendering in terms of image quality and network bandwidth efficiency. SAS allows highly efficient parallel allocation in a virtualized-texture-like memory hierarchy, solving a common efficiency problem of object-space shading. With SAS, untethered virtual reality headsets can benefit from high quality rendering without paying in increased latency. Visitors will be able to try SAS by roaming the exhibit area wearing a Snapdragon 845 based headset connected via consumer WiFi.
ACM SIGGRAPH 2019 Emerging Technologies, 2019

@inproceedings{Mueller:2019:SAS:3305367.3327981,
author = {Mueller, Joerg H. and Neff, Thomas and Voglreiter, Philip and Makar, Mina and Steinberger, Markus and Schmalstieg, Dieter},
title = {Shading Atlas Streaming Demonstration},
booktitle = {ACM SIGGRAPH 2019 Emerging Technologies},
series = {SIGGRAPH '19},
year = {2019},
isbn = {978-1-4503-6308-2},
location = {Los Angeles, California},
pages = {22:1--22:2},
articleno = {22},
numpages = {2},
url = {http://doi.acm.org/10.1145/3305367.3327981},
doi = {10.1145/3305367.3327981},
acmid = {3327981},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {object-space shading, shading, streaming, texture atlas, virtual reality},
}
×
J29
Jozef Hladky, Hans-Peter Seidel, Markus Steinberger:
Tessellated Shading Streaming
Abstract: Presenting high-fidelity 3D content on compact portable devices with low computational power is challenging. Smartphones, tablets and head-mounted displays (HMDs) suffer from thermal and battery-life constraints and thus cannot match the render quality of desktop PCs and laptops. Streaming rendering enables to show high-quality content but can suffer from potentially high latency. We propose an approach to efficiently capture shading samples in object space and packing them into a texture. Streaming this texture to the client, we support temporal frame up-sampling with high fidelity, low latency and high mobility. We introduce two novel sample distribution strategies and a novel triangle representation in the shading atlas space. Since such a system requires dynamic parallelism, we propose an implementation exploiting the power of hardware-accelerated tessellation stages. Our approach allows fast de-coding and rendering of extrapolated views on a client device by using hardwareaccelerated interpolation between shading samples and a set of potentially visible geometry. A comparison to existing shading methods shows that our sample distributions allow better client shading quality than previous atlas streaming approaches and outperforms image-based methods in all relevant aspects.
Computer Graphics Forum / Eurographics Symposium on Rendering (EGSR'19), 2019

@article {Hladky:2019:TSS:10.1111:cgf.13780,
journal = {Computer Graphics Forum},
title = {{Tessellated Shading Streaming}},
author = {Hladky, Jozef and Seidel, Hans-Peter and Steinberger, Markus},
year = {2019},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/cgf.13780}
}
×
PA02
Dieter Schmalstieg, Markus Steinberger, Philip Voglreiter:
Accelerated occlusion computation
Abstract: A system utilizing specified occlusion techniques to reduce the overall amount of occlusion computations required to generate an image in a changing viewpoint environment. Viewpoint location changes about a current viewpoint can result in a different image view with the potential exposure of previously occluded portions of a scene image that would now be now visible from that new viewpoint location. To reduce the amount of occlusion computations required to render an associated image in a changing viewpoint environment, techniques are described that reduce occlusion computations using, for example, one or more trim region rendering techniques. Some techniques include generating potential visible set (PVS) information based on a viewport including an original viewpoint location and an anticipated change in viewpoint location.
US Patent App. 15/867,401, 2019
P03
Florian Michelic, Michael Kenzel, Karl Haubenwallner, Markus Steinberger, Bernhard Kerbl:
From Ground to Space: Real-time Rendering of Procedural Planets at Arbitrary Altitudes
Abstract: As virtual 3D worlds continue to expand in size, a growing number of exploratory games and simulations aim to visualize entire planets and galaxies in real-time. However, consistent rendering of the whole planetary surface at varying levels of detail poses a number of challenges. As the viewer transitions from observing a planet from a distance to the ground level, fine-granular details manifest in its terrain geometry. For presenting these varying amounts of detail, a suitable solution must consider how the geometry should behave to ensure seamless accommodation for all frames of reference. Instead of storing a complete model of the planetary surface, more efficient alternatives employ structured base meshes of limited complexity, which can then be enhanced with elevation data from a height field. Based on this concept, we propose a new method for applying terrain data to spherical surfaces of arbitrary scale. Our approach uses a uniform grid that is mapped to a paraboloid and can be offset to influence the distribution of height field samples on the sphere. Starting from its initial layout, we show how the grid can be altered at runtime to avoid a variety of notorious artifacts during camera motion. We further derive suitable grid offset parameters to ensure correct visibility of distant objects behind the horizon. By decoupling the mesh geometry from the height field data and display resolution, the overhead of our method can be bounded and is subject only to the grid configuration. Compared to previous approaches, our solution is easy to implement, requires no preprocessing and achieves reduced runtime for rendering scenes at a similar quality.
I3D 2019 Best Poster Award
Symposium on Interactive 3D Graphics and Games 2011 (I3D), 2019
J28
Mark Dokter, Jozef Hladky, Mathias Parger, Dieter Schmalstieg, Hans-Peter Seidel, Markus Steinberger:
Hierarchical Rasterization of Curved Primitives for Vector Graphics Rendering on the GPU
Abstract: In this paper, we introduce the CPatch, a curved primitive that can be used to construct arbitrary vector graphics. A CPatch is a generalization of a 2D polygon: Any number of curves up to a cubic degree bound a primitive. We show that a CPatch can be rasterized efficiently in a hierarchical manner on the GPU, locally discarding irrelevant portions of the curves. Our rasterizer is fast and scalable, works on all patches in parallel, and does not require any approximations. We show a parallel implementation of our rasterizer, which naturally supports all kinds of color spaces, blending and super‐sampling. Additionally, we show how vector graphics input can efficiently be converted to a CPatch representation, solving challenges like patch self intersections and false inside‐outside classification. Results indicate that our approach is faster than the state-of-the-art, more flexible and could potentially be implemented in hardware.
Computer Graphics Forum / Eurographics (EG'19), 2019

@article{Dokter:2019:HRC:10.1111/cgf.13622,
author = {Dokter, Mark and Hladky, Jozef and Parger, Mathias and Schmalstieg, Dieter and Seidel, Hans-Peter and Steinberger, Markus},
title = {Hierarchical Rasterization of Curved Primitives for Vector Graphics Rendering on the GPU},
journal = {Computer Graphics Forum},
volume = {38},
number = {2},
pages = {93-103},
doi = {10.1111/cgf.13622},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13622},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.13622},
year = {2019}
}
×
C22
Martin Winter, Daniel Mlakar, Rhaleb Zayer, Hans-Peter Seidel, Markus Steinberger:
Adaptive sparse matrix-matrix multiplication on the GPU
Abstract: In the ongoing efforts targeting the vectorization of linear algebra primitives, sparse matrix-matrix multiplication (SpGEMM) has received considerably less attention than sparse Matrix-Vector multiplication (SpMV). While both are equally important, this disparity can be attributed mainly to the additional formidable challenges raised by SpGEMM.
In this paper, we present a dynamic approach for addressing SpGEMM on the GPU. Our approach works directly on the standard compressed sparse rows (CSR) data format. In comparison to previous SpGEMM implementations, our approach guarantees a homogeneous, load-balanced access pattern to the first input matrix and improves memory access to the second input matrix. It adaptively re-purposes GPU threads during execution and maximizes the time efficient on-chip scratchpad memory can be used. Adhering to a completely deterministic scheduling pattern guarantees bit-stable results during repetitive execution, a property missing from other approaches. Evaluation on an extensive sparse matrix benchmark suggests our approach being the fastest SpGEMM implementation for highly sparse matrices (80% of the set). When bit-stable results are sought, our approach is the fastest across the entire test set.
Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, 2019

@inproceedings{Winter:2019:ASM:3293883.3295701,
author = {Winter, Martin and Mlakar, Daniel and Zayer, Rhaleb and Seidel, Hans-Peter and Steinberger, Markus},
title = {Adaptive Sparse Matrix-matrix Multiplication on the GPU},
booktitle = {Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming},
series = {PPoPP '19},
year = {2019},
isbn = {978-1-4503-6225-2},
location = {Washington, District of Columbia},
pages = {68--81},
numpages = {14},
url = {http://doi.acm.org/10.1145/3293883.3295701},
doi = {10.1145/3293883.3295701},
acmid = {3295701},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {ESC, GPU, SpGEMM, adaptive, bit-stable, sparse matrix},
}
×
J27
Rhaleb Zayer, Daniel Mlakar, Markus Steinberger, Hans-Peter Seidel:
Layered Fields for Natural Tessellations on Surfaces
Abstract: Mimicking natural tessellation patterns is a fascinating multi-disciplinary problem. Geometric methods aiming at reproducing such partitions on surface meshes are commonly based on the Voronoi model and its variants, and are often faced with challenging issues such as metric estimation, geometric, topological complications, and most critically, parallelization. In this paper, we introduce an alternate model which may be of value for resolving these issues. We drop the assumption that regions need to be separated by lines. Instead, we regard region boundaries as narrow bands and we model the partition as a set of smooth functions layered over the surface. Given an initial set of seeds or regions, the partition emerges as the solution of a time dependent set of partial differential equations describing concurrently evolving fronts on the surface. Our solution does not require geodesic estimation, elaborate numerical solvers, or complicated bookkeeping data structures. The cost per time-iteration is dominated by the multiplication and addition of two sparse matrices. Extension of our approach in a Lloyd’s algorithm fashion can be easily achieved and the extraction of the dual mesh can be conveniently preformed in parallel through matrix algebra. As our approach relies mainly on basic linear algebra kernels, it lends itself to efficient implementation on modern graphics hardware.
ACM Transactions on Graphics (SIGGRAPH Asia'18), 2018

@article{Zayer:2018:LFN,
author = {Zayer, Rhaleb and Mlakar, Daniel and Steinberger, Markus and Seidel, Hans-Peter},
title = {Layered Fields for Natural Tessellations on Surfaces},
journal = {ACM Trans. Graph.},
issue_date = {November 2018},
volume = {37},
number = {6},
month = nov,
year = {2018},
articleno = {264},
numpages = {15},
doi = {10.1145/3272127.3275072},
publisher = {ACM},
address = {New York, NY, USA},
}
×
J26
Joerg H. Mueller, Philip Voglreiter, Mark Dokter, Thomas Neff, Mina Makar, Markus Steinberger, Dieter Schmalstieg:
Shading Atlas Streaming
Abstract: Streaming high quality rendering for virtual reality applications requires minimizing perceived latency. We introduce Shading Atlas Streaming (SAS), a novel object-space rendering framework suitable for streaming virtual reality content. SAS decouples server-side shading from client-side rendering, allowing the client to perform framerate upsampling and latency compensation autonomously for short periods of time. The shading information created by the server in object space is temporally coherent and can be efficiently compressed using standard MPEG encoding. Our results show that SAS compares favorably to previous methods for remote image-based rendering in terms of image quality and network bandwidth efficiency. SAS allows highly efficient parallel allocation in a virtualized-texture-like memory hierarchy, solving a common efficiency problem of object-space shading. With SAS, untethered virtual reality headsets can benefit from high quality rendering without paying in increased latency.
ACM Transactions on Graphics (SIGGRAPH Asia'18), 2018

@article{Mueller:2018:SAS,
author = {Mueller, Joerg H. and Voglreiter, Philip and Dokter, Mark and Neff, Thomas and Makar, Mina and Steinberger, Markus and Schmalstieg, Dieter},
title = {Shading Atlas Streaming},
journal = {ACM Trans. Graph.},
issue_date = {November 2018},
volume = {37},
number = {6},
month = nov,
year = {2018},
articleno = {199},
numpages = {16},
doi = {10.1145/3272127.3275087},
publisher = {ACM},
address = {New York, NY, USA},
}
×
C21
Mathias Parger, Joerg H. Mueller, Dieter Schmalstieg, Markus Steinberger:
Human Upper-Body Inverse Kinematics for Increased Embodiment in Consumer-Grade Virtual Reality
Abstract: Having a virtual body can increase embodiment in virtual reality (VR) applications. However, comsumer-grade VR falls short of delivering sufficient sensory information for full-body motion capture. Consequently, most current VR applications do not even show arms, although they are often in the field of view. We address this shortcoming with a novel human upper-body inverse kinematics algorithm specifically targeted at tracking from head and hand sensors only. We present heuristics for elbow positioning depending on the shoulder-to-hand distance and for avoiding reaching unnatural joint limits. Our results show that our method increases the accuracy compared to general inverse kinematics applied to human arms with the same tracking input. In a user study, participants preferred our method over displaying disembodied hands without arms, but also over a more expensive motion capture system. In particular, our study shows that virtual arms animated with our inverse kinematics system can be used for applications involving heavy arm movement. We demonstrate that our method can not only be used to increase embodiment, but can also support interaction involving arms or shoulders, such as holding up a shield.
Symposium on Virtual Reality Software and Technology (VRST ’18), 2018

@inproceedings{Parger:2018:HUI,
author = {Parger, Mathias and Mueller, Joerg H. and Schmalstieg, Dieter and Steinberger, Markus},
title = {Human Upper-Body Inverse Kinematics for Increased Embodiment in Consumer-Grade Virtual Reality},
booktitle = {Symposium on Virtual Reality Software and Technology},
series = {VRST '18},
year = {2018},
pages = {10},
isbn = {978-1-4503-6086-9/18/11},
doi = {10.1145/3281505.3281529},
location = {Tokyo, Japan},
publisher = {ACM},
address = {New York, NY, USA},
}
×
C20
Martin Winter, Daniel Mlakar, Rhaleb Zayer, Hans-Peter Seidel, Markus Steinberger:
faimGraph: High Performance Management of Fully-Dynamic Graphs under tight Memory Constraints on the GPU
Abstract: In this paper, we present a fully-dynamic graph data structure for the Graphics Processing Unit (GPU). It delivers high update rates while keeping a low memory footprint using autonomous memory management directly on the GPU. The data structure is fully-dynamic, allowing not only for edge but also vertex updates. Performing the memory management on the GPU allows for fast initialization times and efficient update procedures without additional intervention or reallocation procedures from the host. Our optimized approach performs initialization completely in parallel; up to 300x faster compared to previous work. It achieves up to 200 million edge updates per second for sorted and unsorted update batches; up to 30x faster than previous work. Furthermore, it can perform more than 300 million adjacency queries and millions of vertex updates per second. On account of efficient memory management techniques like a queuing approach, currently unused memory is reused later on by the framework, permitting the storage of tens of millions of vertices and hundreds of millions of edges in GPU memory. We evaluate algorithmic performance using a PageRank and a Static Triangle Counting (STC) implementation, demonstrating the suitability of the framework even for memory access intensive algorithms.
High Performance Computing, Networking, Storage and Analysis (SC’18), 2018

@inproceedings{Winter:2018:faimGraph,
author = {Winter, Martin and Mlakar, Daniel and Zayer, Rhaleb and Seidel, Hans-Peter and Steinberger, Markus},
title = {faimGraph: High Performance Management of Fully-Dynamic Graphs under tight Memory Constraints on the GPU},
booktitle = {High Performance Computing, Networking, Storage and Analysis},
series = {SC '18},
year = {2018},
isbn = {978-1-5386-8384-2/18},
location = {Dallas, Texas, USA},
}
×
J25
Michael Kenzel, Bernhard Kerbl, Dieter Schmalstieg, Markus Steinberger:
A High-Performance Software Graphics Pipeline Architecture for the GPU
Abstract: In this paper, we present a real-time graphics pipeline implemented entirely in software on a modern GPU. As opposed to previous work, our approach features a fully-concurrent, multi-stage, streaming design with dynamic load balancing, capable of operating efficiently within bounded memory. We address issues such as primitive order, vertex reuse, and screen-space deriva- tives of dependent variables, which are essential to real-world applications, but have largely been ignored by comparable work in the past. The power of a software approach lies in the ability to tailor the graphics pipeline to any given application. In exploration of this potential, we design and implement four novel pipeline modifications. Evaluation of the performance of our approach on more than 100 real-world scenes collected from video games shows rendering speeds within one order of magnitude of the hardware graphics pipeline as well as significant improvements over previous work, not only in terms of capabilities and performance, but also robustness.
ACM Transactions on Graphics (SIGGRAPH'18), 2018

@article{Kenzel:2018:CURE,
author = {Kenzel, Michael and Kerbl, Bernhard and Schmalstieg, Dieter and Steinberger, Markus},
title = {A High-Performance Software Graphics Pipeline Architecture for the GPU},
journal = {ACM Trans. Graph.},
issue_date = {November 2018},
volume = {37},
number = {4},
month = nov,
year = {2018},
articleno = {140},
numpages = {15},
doi = {10.1145/3197517.3201374},
publisher = {ACM},
address = {New York, NY, USA},
}
×
J24
Bernhard Kerbl, Michael Kenzel, Elena Ivanchenko, Dieter Schmalstieg, Markus Steinberger:
Revisiting The Vertex Cache: Understanding and Optimizing Vertex Processing on the modern GPU
Abstract: In this paper, we question the premise that graphics hardware uses a post-transform cache to avoid redundant vertex shader invocations. A large body of existing work on optimizing indexed triangle sets for rendering speed is based upon this widely-accepted assumption. We conclusively show that this assumption does not hold up on modern graphics hardware. We design and conduct experiments that demonstrate the behavior of current hardware of all major vendors to be inconsistent with the presence of a common post-transform cache. Our results strongly suggest that modern hardware rather relies on a batch-based approach, most likely for reasons of scalability. A more thorough investigation based on these initial experiments allows us to partially uncover the actual strategies implemented on graphics processors today. We reevaluate existing mesh optimization algorithms in light of these new findings and present a new mesh optimization algorithm designed from the ground up to target architectures that rely on batch-based vertex reuse. In an extensive evaluation, we measure and compare the real-world performance of various optimization algorithms on modern hardware. Our results show that some established algorithms still perform well. However, if the batching strategy of the target architecture is known, our approach can significantly outperform these previous state-of-the-art methods.
Proceedings of the ACM on Computer Graphics and Interaction Techniques (HPG'18), 2018

@article{Kerbl:2018:SR,
author = {Kerbl, Bernhard and Kenzel, Michael and Ivanchenko, Elena and Schmalstieg, Dieter and Steinberger, Markus},
title = {Revisiting The Vertex Cache: Understanding and Optimizing Vertex Processing on the modern GPU},
journal = {Proc. ACM Comput. Graph. Interact. Tech.},
issue_date = {August 2018},
volume = {1},
number = {2},
month = aug,
year = {2018},
articleno = {29},
numpages = {16},
doi = {10.1145/3233302},
publisher = {ACM},
address = {New York, NY, USA},
}
×
J23
Michael Kenzel, Bernhard Kerbl, Wolfgang Tatzgern, Elena Ivanchenko, Dieter Schmalstieg, Markus Steinberger:
On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing
Abstract: Due to its flexibility, compute mode is becoming more and more attractive as a way to implement many of the algorithms part of a state-of-the-art rendering pipeline. A key problem commonly encountered in graphics applications is streaming vertex and geometry processing. In a typical triangle mesh, the same vertex is on average referenced six times. To avoid redundant computation during rendering, a post-transform cache is traditionally employed to reuse vertex processing results. However, such a vertex cache can generally not be implemented efficiently in software and does not scale well as parallelism increases. We explore alternative strategies for reusing per-vertex results on-the-fly during massively-parallel software geometry processing. Given an input stream divided into batches, we analyze the effectiveness of sorting, hashing, and intra-threadgroup communication for identifying and exploiting local reuse potential. We design and present four vertex reuse strategies tailored to modern GPU architectures. We demonstrate that, in a variety of applications, these strategies not only achieve effective reuse of vertex processing results, but can boost performance by up to 2–3x compared to a naïve approach. Curiously, our experiments also show that our batch-based approaches exhibit behavior similar to the OpenGL implementation on current graphics hardware.
Proceedings of the ACM on Computer Graphics and Interaction Techniques (HPG'18), 2018

@article{Kenzel:2018:OVR,
author = {Kenzel, Michael and Kerbl, Bernhard and Tatzgern, Wolfgang and Ivanchenko, Elena and Schmalstieg, Dieter and Steinberger, Markus},
title = {On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing},
journal = {Proc. ACM Comput. Graph. Interact. Tech.},
issue_date = {August 2018},
volume = {1},
number = {2},
month = aug,
year = {2018},
articleno = {28},
numpages = {17},
doi = {10.1145/3233303},
publisher = {ACM},
address = {New York, NY, USA},
}
×
J22
Markus Steinberger:
On Dynamic Scheduling for the GPU and its Applications in Computer Graphics and Beyond
Abstract: During the last decade we have witnessed a severe change in computing, as processor clock-rates stopped increasing. Thus, the arguable only way to increase processing power is switching to a parallel computing architecture, like the graphics processing unit (GPU). While a GPU offers tremendous processing power, harnessing this power is often difficult. In our research we tackle this issue, providing various components to allow a wider class of algorithms to execute efficiently on the GPU. These efforts include new processing models for dynamic algorithms with various degrees of parallelism, a versatile task scheduler, based on highly efficient work queues which also support dynamic priority scheduling, and efficient dynamic memory management. Our scheduling strategies advance the state-of-the-art algorithms in the field of rendering, visualization, and geometric modeling. In the field of rendering, we provide algorithms that can significantly speed-up image generation, assigning more processing power to the most important image regions. In the field of geometric modeling we provide the first GPU-based grammar evaluation system that can generate and render cities in real-time which otherwise take hours to generate and could not fit into GPU memory. Finally, we show that mesh processing algorithms can be computed significantly faster on the GPU when parallelizing them with advanced scheduling strategies.
IEEE Computer Graphics and Applications, 2018
C19
Bernhard Kerbl, Joerg H. Mueller, Michael Kenzel, Dieter Schmalstieg, Markus Steinberger:
The Broker Queue: A Fast, Linearizable FIFO Queue for Fine-Granular Work Distribution on the GPU
Abstract: Harnessing the power of massively parallel devices like the graphics processing unit (GPU) is difficult for algorithms that show dynamic or inhomogeneous workloads. To achieve high performance, such advanced algorithms require scalable, concurrent queues to collect and distribute work. We show that previous queuing approaches are unfit for this task, as they either (1) do not work well in a massively parallel environment, or (2) obstruct the use of individual threads on top of single-instruction-multiple-data (SIMD) cores, or (3) block during access, thus prohibiting multi-queue setups. With these issues in mind, we present the Broker Queue, a highly efficient, fully linearizable FIFO queue for fine-granular parallel work distribution on the GPU. We evaluate its performance and usability on modern GPU models against a wide range of existing algorithms. The Broker Queue is up to three orders of magnitude faster than non-blocking queues and can even outperform significantly simpler techniques that lack desired properties for fine-granular work distribution.
International Conference on Supercomputing (ICS'18), 2018

@inproceedings{Kerbl:2018:BQA:3205289.3205291,
title={The Broker Queue: A Fast, Linearizable FIFO Queue for Fine-Granular Work Distribution on the GPU},
author={Kerbl, Bernhard and Kenzel, Michael and Mueller, Joerg H and Schmalstieg, Dieter and Steinberger, Markus},
booktitle = {Proceedings of the International Conference on Supercomputing},
series = {ICS '18},
year = {2018},
location = {Beijing, China},
numpages = {10},
url = {http://doi.org/10.1145/3205289.3205291},
doi = {10.1145/3205289.3205291},
keywords = {GPU, queuing, concurrent, parallel, scheduling},
}
×
J21
Bernhard Kerbl, Michael Kenzel, Joerg H. Mueller, Dieter Schmalstieg, Markus Steinberger:
A scalable queue for work distribution on GPUs
Abstract: Harnessing the power of massively parallel devices like the graphics processing unit (GPU) is difficult for algorithms that show dynamic or inhomogeneous workloads. To achieve high performance, such advanced algorithms require scalable, concurrent queues to collect and distribute work. We present a new concurrent work queue, the Broker Queue, a highly efficient, linearizable queue for fine-granular work distribution on the GPU. We evaluate its usability and benefits in contrast to existing queuing algorithms. Our queue is up to one order of magnitude faster than non-blocking queues, and outperforms simpler queue designs that are unfit for fine-granular work distribution.
ACM SIGPLAN Notices (PPoPP'18), 2018

@article{Kerbl:2018:SQW:3200691.3178526,
author = {Kerbl, Bernhard and M"{u}ller, J"{o}rg and Kenzel, Michael and Schmalstieg, Dieter and Steinberger, Markus},
title = {A Scalable Queue for Work Distribution on GPUs},
journal = {SIGPLAN Not.},
issue_date = {January 2018},
volume = {53},
number = {1},
month = feb,
year = {2018},
issn = {0362-1340},
pages = {401--402},
numpages = {2},
url = {http://doi.acm.org/10.1145/3200691.3178526},
doi = {10.1145/3200691.3178526},
acmid = {3178526},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {GPU, concurrent, parallel, queuing, scheduling},
}
×
C18
Rhaleb Zayer, Markus Steinberger, Hans-Peter Seidel:
Sparse Matrix Assembly on the GPU Through Multiplication Patterns
Abstract: The numerical treatment of variational problems gives rise to large sparse matrices which are typically assembled by coalescing elementary contributions. As the explicit matrix form is required by direct solvers, the assembly step can be a potential bottleneck especially in implicit and time dependent settings where considerable updates are needed. On standard HPC platforms, this process can be vectorized by taking advantage of additional mesh querying data structures. However, on graphics hardware, vectorization is inhibited by limited memory resources. To address this issue, we propose a lean unstructured mesh representation which allows casting the assembly problem as a sparse matrix-matrix multiplication. In this way, the underlying graph connectivity of the assembled matrix is captured through basic matrix algebra operations. Furthermore, we outline ideas for cutting down on data movement by lowering the memory footprint of the proposed representation and we analyze the effect of its memory layout on the assembly performance. As the underlying machinery behind our approach consists mainly of key linear algebra primitives, we foresee potential portability opportunities for our approach to various architectures.
High Performance Extreme Computing, 2017
C17
Martin Winter, Rhaleb Zayer, Markus Steinberger:
Autonomous, Independent Management of Dynamic Graphs on GPUs
Abstract: In this paper, we present a new, dynamic graph data structure, built to deliver high update rates while keeping a low memory footprint using autonomous memory management directly on the GPU. By transferring the memory management to the GPU, efficient updating of the graph structure and fast initialization times are enabled as no additional memory allocation calls or reallocation procedures are necessary since they are handled directly on the device. In comparison to previous work, this optimized approach allows for significantly lower initialization times (up to 300x faster) and much higher update rates for significant changes to the graph structure and equal rates for small changes. The framework provides different update implementations tailored specifically to different graph properties, enabling over 100 million of updates per second and keeping tens of millions of vertices and hundreds of millions of edges in memory without transferring data back and forth between device and host.
HPEC'17 Best Student Paper
High Performance Extreme Computing, 2017
C16
Bernhard Kerbl, Michael Kenzel, Dieter Schmalstieg, Markus Steinberger:
Effective static bin patterns for sort-middle rendering
Abstract: To effectively utilize an ever increasing number of processors during parallel rendering, hardware and software designers rely on sophisticated load balancing strategies. While dynamic load balancing is a powerful solution, it requires complex work distribution and synchronization mechanisms. Graphics hardware manufacturers have opted to employ static load balancing strategies instead. Specifically, triangle data is distributed to processors based on its overlap with screenspace tiles arranged in a fixed pattern. While the current strategy of using simple patterns for a small number of fast rasterizers achieves formidable performance, it is questionable how this approach will scale as the number of processors increases further. To address this issue, we analyze real-world rendering workloads, derive requirements for effective patterns, and present ten different pattern design strategies based on these requirements. In addition to a theoretical evaluation of these design strategies, we compare the performance of select patterns in a parallel sort-middle software rendering pipeline on an extensive set of triangle data captured from eight recent video games. As a result, we are able to identify a set of patterns that scale well and exhibit significantly improved performance over naïve approaches.
High Performance Graphics (HPG '17), 2017

@inproceedings{Kerbl:2017:ESB:3105762.3105777,
author = {Kerbl, Bernhard and Kenzel, Michael and Schmalstieg, Dieter and Steinberger, Markus},
title = {Effective Static Bin Patterns for Sort-middle Rendering},
booktitle = {Proceedings of High Performance Graphics},
series = {HPG '17},
year = {2017},
isbn = {978-1-4503-5101-0},
location = {Los Angeles, California},
pages = {14:1--14:10},
articleno = {14},
numpages = {10},
url = {http://doi.acm.org/10.1145/3105762.3105777},
doi = {10.1145/3105762.3105777},
acmid = {3105777},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {GPU, parallel rendering, pattern, sort-middle, static load balancing},
}
×
C15
Markus Steinberger, Rhaleb Zayer, Hans-Peter Seidel:
Globally homogeneous, locally adaptive sparse matrix-vector multiplication on the GPU
Abstract: The rising popularity of the graphics processing unit (GPU) across various numerical computing applications triggered a breakneck race to optimize key numerical kernels and in particular, the sparse matrix-vector product (SpMV). Despite great strides, most existing GPU-SpMV approaches trade off one aspect of performance against another. They either require preprocessing, exhibit inconsistent behavior, lead to execution divergence, suffer load imbalance or induce detrimental memory access patterns. In this paper, we present an uncompromising approach for SpMV on the GPU. Our approach requires no separate preprocessing or knowledge of the matrix structure and works directly on the standard compressed sparse rows (CSR) data format. From a global perspective, it exhibits a homogeneous behavior reflected in efficient memory access patterns and steady per-thread workload. From a local perspective, it avoids heterogeneous execution paths by adapting its behavior to the work load at hand, it uses an efficient encoding to keep temporary data requirements for on-chip memory low, and leads to divergence-free execution. We evaluate our approach on more than 2500 matrices comparing to vendor provided, and state-of-the-art SpMV implementations. Our approach not only significantly outperforms approaches directly operating on the CSR format ( 20% average performance increase), but also outperforms approaches that preprocess the matrix even when preprocessing time is discarded. Additionally, the same strategies lead to significant performance increase when adapted for transpose SpMV.
International Conference on Supercomputing (ICS'17), 2017

@inproceedings{Steinberger:2017:GHL:3079079.3079086,
author = {Steinberger, Markus and Zayer, Rhaleb and Seidel, Hans-Peter},
title = {Globally Homogeneous, Locally Adaptive Sparse Matrix-vector Multiplication on the GPU},
booktitle = {Proceedings of the International Conference on Supercomputing},
series = {ICS '17},
year = {2017},
isbn = {978-1-4503-5020-4},
location = {Chicago, Illinois},
pages = {13:1--13:11},
articleno = {13},
numpages = {11},
url = {http://doi.acm.org/10.1145/3079079.3079086},
doi = {10.1145/3079079.3079086},
acmid = {3079086},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {GPU, SpMV, linear algebra, sparse matrix},
}
×
C14
Andreas Derler, Rhaleb Zayer, Hans-Peter Seidel, Markus Steinberger:
Dynamic scheduling for efficient hierarchical sparse matrix operations on the GPU
Abstract: We introduce a hierarchical sparse matrix representation (HiSparse) tailored for the graphics processing unit (GPU). The representation adapts to the local nonzero pattern at all levels of the hierarchy and uses reduced bit length for addressing the entries. This allows a smaller memory footprint than standard formats. Executing algorithms on a hierarchical structure on the GPU usually entails significant synchronization and management overhead or slowdowns due to diverging execution paths and memory access patterns. We address these issues by means of a dynamic scheduling strategy specifically designed for executing algorithms on top of a hierarchical matrix on the GPU. The evaluation of our implementation of basic linear algebra routines, suggests that our hierarchical format is competitive to highly optimized standard libraries and significantly outperforms them in the case of transpose matrix operations. The results point towards the viability of hierarchical matrix formats on massively parallel devices such as the GPU.
International Conference on Supercomputing (ICS'17), 2017

@inproceedings{Derler:2017:DSE:3079079.3079085,
author = {Derler, Andreas and Zayer, Rhaleb and Seidel, Hans-Peter and Steinberger, Markus},
title = {Dynamic Scheduling for Efficient Hierarchical Sparse Matrix Operations on the GPU},
booktitle = {Proceedings of the International Conference on Supercomputing},
series = {ICS '17},
year = {2017},
isbn = {978-1-4503-5020-4},
location = {Chicago, Illinois},
pages = {7:1--7:10},
articleno = {7},
numpages = {10},
url = {http://doi.acm.org/10.1145/3079079.3079085},
doi = {10.1145/3079079.3079085},
acmid = {3079085},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {GPU, hierarchical, linear algebra, sparse matrix},
}
×
J20
Karl Haubenwallner, Hans-Peter Seidel, Markus Steinberger:
ShapeGenetics: Using Genetic Algorithms for Procedural Modeling
Abstract: In this paper, we show that genetic algorithms (GA) can be used to control the output of procedural modeling algorithms. We propose an efficient way to encode the choices that have to be made during a procedural generation as a hierarchical genome representation. In combination with mutation and reproduction operations specifically designed for controlled procedural modeling, our GA can evolve a population of individual models close to any high-level goal. Possible scenarios include a volume that should be filled by a procedurally grown tree or a painted silhouette that should be followed by the skyline of a procedurally generated city. These goals are easy to set up for an artist compared to the tens of thousands of variables that describe the generated model and are chosen by the GA. Previous approaches for controlled procedural modeling either use Reversible Jump Markov Chain Monte Carlo (RJMCMC) or Stochastically-Ordered Sequential Monte Carlo (SOSMC) as workhorse for the optimization. While RJMCMC converges slowly, requiring multiple hours for the optimization of larger models, it produces high quality models. SOSMC shows faster convergence under tight time constraints for many models, but can get stuck due to choices made in the early stages of optimization. Our GA shows faster convergence than SOSMC and generates better models than RJMCMC in the long run.
Computer Graphics Forum / Eurographics (EG'17), 2017
J19
Rhaleb Zayer, Markus Steinberger, Hans-Peter Seidel:
A GPU-adapted Structure for Unstructured Grids
Abstract: A key advantage of working with structured grids (e.g., images) is the ability to directly tap into the powerful machinery of linear algebra. This is not much so for unstructured grids where intermediate bookkeeping data structures stand in the way. On modern high performance computing hardware, the conventional wisdom behind these intermediate structures is further challenged by costly memory access, and more importantly by prohibitive memory resources on environments such as graphics hardware. In this paper, we bypass this problem by introducing a sparse matrix representation for unstructured grids which not only reduces the memory storage requirements but also cuts down on the bulk of data movement from global storage to the compute units. In order to take full advantage of the proposed representation, we augment ordinary matrix multiplication by means of action maps, local maps which encode the desired interaction between grid vertices. In this way, geometric computations and topological modifications translate into concise linear algebra operations. In our algorithmic formulation, we capitalize on the nature of sparse matrix-vector multiplication which allows avoiding explicit transpose computation and storage. Furthermore, we develop an efficient vectorization to the demanding assembly process of standard graph and finite element matrices.
Computer Graphics Forum / Eurographics (EG'17), 2017
J18
Bernhard Kerbl, Michael Kenzel, Dieter Schmalstieg, Hans-Peter Seidel, Markus Steinberger:
Hierarchical Bucket Queuing for Fine-Grained Priority Scheduling on the GPU
Abstract: While the modern graphics processing unit (GPU) offers massive parallel compute power, the ability to influence the scheduling of these immense resources is severely limited. Therefore, the GPU is widely considered to be only suitable as an externally controlled co-processor for homogeneous workloads which greatly restricts the potential applications of GPU computing. To address this issue, we present a new method to achieve fine-grained priority scheduling on the GPU: hierarchical bucket queuing. By carefully distributing the workload among multiple queues and efficiently deciding which queue to draw work from next, we enable a variety of scheduling strategies. These strategies include fair-scheduling, earliest-deadline-first scheduling and user-defined dynamic priority scheduling. In a comparison with a sorting-based approach, we reveal the advantages of hierarchical bucket queuing over previous work. Finally, we demonstrate the benefits of using priority scheduling in real-world applications by example of path tracing and foveated micropolygon rendering.
Computer Graphics Forum, 2016

@article {Kerbl:2016:HBQ,
author = {Kerbl, Bernhard and Kenzel, Michael and Schmalstieg, Dieter and Seidel, Hans-Peter and Steinberger, Markus},
title = {Hierarchical Bucket Queuing for Fine-Grained Priority Scheduling on the GPU},
journal = {Computer Graphics Forum},
issn = {1467-8659},
url = {http://dx.doi.org/10.1111/cgf.13075},
doi = {10.1111/cgf.13075},
pages = {n/a--n/a},
year = {2016},
}
×
J17
Pedro Boechat, Mark Dokter, Michael Kenzel, Hans-Peter Seidel, Dieter Schmalstieg, Markus Steinberger:
Representing and Scheduling Procedural Generation using Operator Graphs
Abstract: In this paper, we present the concept of operator graph scheduling for high performance procedural generation on the graphics processing unit (GPU). The operator graph forms an intermediate representation that describes all possible operations and objects that can arise during a specific procedural generation. While previous methods have focused on parallelizing a specific procedural approach, the operator graph is applicable to all procedural generation methods that can be described by a graph, such as L-systems, shape grammars, or stack based generation methods. Using the operator graph, we show that all partitions of the graph correspond to possible ways of scheduling a procedural generation on the GPU, including the scheduling strategies of previous work. As the space of possible partitions is very large, we describe three search heuristics, aiding an optimizer in finding the fastest valid schedule for any given operator graph. The best partitions found by our optimizer increase performance of 8 to 30x over the previous state of the art in GPU shape grammar and L-system generation.
ACM Transactions on Graphics (SIGGRAPH Asia'16), 2016

@article{Boechat:2016:RSP:2980179.2980227,
author = {Pedro Boechat and Mark Dokter and Michael Kenzel and Hans-Peter Seidel and Dieter Schmalstieg and Markus Steinberger},
title = {Representing and Scheduling Procedural Generation using Operator Graphs},
numpages = {12},
url = {http://dx.doi.org/10.1145/2980179.2980227},
doi = {10.1145/2980179.2980227},
acmid = {2980227},
publisher = {ACM},
address = {New York, NY, USA},
}
×
C13
Markus Steinberger, Andreas Derler, Rhaleb Zayer, Hans-Peter Seidel:
How naive is naive SpMV on the GPU?
Abstract: Sparse matrix vector multiplication (SpMV) is the workhorse for a wide range of linear algebra computations. In a serial setting, naive implementations for direct multiplication and transposed multiplication achieve very competitive performance. In parallel settings, especially on graphics hardware, it is widely believed that naive implementations cannot reach the performance of highly tuned parallel implementations and complex data formats. Most often, the cost for data conversion to these specialized formats as well as the cost for transpose operations are neglected, as they do not arise in all algorithms. In this paper, we revisit the naive implementation of SpMV for the GPU. Relying on recent advances in GPU hardware, such as fast hardware supported atomic operations and better cache performance, we show that a naive implementation can reach the performance of state-of-the-art SpMV implementations. In case the cost of format conversion and transposition cannot be amortized over many SpMV operations a naive implementation can even outperform state-of-the-art implementations significantly. Experimental results over a variety of data sets suggest that the adoption of the naive serial implementation to the GPU is not as inferior as it used to be on previous hardware generations. The integration of some naive strategies can potentially speed up state-of-the-art GPU SpMV implementations, especially in the transpose case.
HPEC '16 Best Paper Nominee
High Performance Extreme Computing, 2016
C12
Philip Voglreiter, Michael Hofmann, Christoph Ebner, Roberto Blanco Sequeiros, Horst Rupert Portugaller, Jurgen Fütterer, Michael Moche, Markus Steinberger, Dieter Schmalstieg:
Visualization-Guided Evaluation of Simulated Minimally Invasive Cancer Treatment
Abstract: We present a visualization application supporting interventional radiologists during analysis of simulated minimally invasive cancer treatment. The current clinical practice employs only rudimentary, manual measurement tools. Our system provides visual support throughout three evaluation stages, starting with determining prospective treatment success of the simulation parameterization. In case of insufficiencies, Stage 2 includes a simulation scalar field for determining a new configuration of the simulation. For complex cases, where Stage 2 does not lead to a decisive strategy, Stage 3 reinforces analysis of interdependencies of scalar fields via bivariate visualization. Our system is designed to be immediate applicable in medical practice. We analyze the design space of potentially useful visualization techniques and appraise their effectiveness in the context of our design goals. Furthermore, we present a user study, which reveals the disadvantages of manual analysis in the measurement stage of evaluation and highlight the demand for computer-support through our system.
Visual Computing for Biology and Medicine, 2016

@inproceedings {Voglreiter:VES:20161284,
booktitle = {Eurographics Workshop on Visual Computing for Biology and Medicine},
editor = {Stefan Bruckner and Bernhard Preim and Anna Vilanova and Helwig Hauser and Anja Hennemuth and Arvid Lundervold},
title = {{Visualization-Guided Evaluation of Simulated Minimally Invasive Cancer Treatment}},
author = {Voglreiter, Philip and Hofmann, Michael and Ebner, Christoph and Sequeiros, Roberto Blanco and Portugaller, Horst Rupert and Fütterer, Jurgen and Moche, Michael and Steinberger, Markus and Schmalstieg, Dieter},
year = {2016},
publisher = {The Eurographics Association},
ISSN = {2070-5786},
ISBN = {978-3-03868-010-9},
DOI = {10.2312/vcbm.20161284}
}
×
J16
Yun-Ta Tsai, Markus Steinberger, Dawid Pajak, Kari Pulli:
Fast ANN for High-Quality Collaborative Filtering
Abstract: Collaborative filtering collects similar patches, jointly filters them and scatters the output back to input patches; each pixel gets a contribution from each patch that overlaps with it, allowing signal reconstruction from highly corrupted data. Exploiting self-similarity, however, requires finding matching image patches, which is an expensive operation. We propose a GPU-friendly approximated-nearest-neighbour(ANN) algorithm that produces high-quality results for any type of collaborative filter. We evaluate our ANN search against state-of-the-art ANN algorithms in several application domains. Our method is orders of magnitudes faster, yet provides similar or higher quality results than the previous work.
Computer Graphics Forum, 2016

@article{Tsai:2016:FAH:12715,
author = {Tsai, Yun-Ta and Steinberger, Markus and Pająk, Dawid and Pulli, Kari},
year = {2016},
title = {Fast ANN for High‐Quality Collaborative Filtering},
journal = {Computer Graphics Forum},
publisher = {Wiley Online Library},
issn = {1467-8659},
doi = {10.1111/cgf.12715},
url = {http:https://dx.doi.org/10.1111/cgf.12715},
volume = {35},
month = {2},
pages = {138--151},
number = {1}
}
×
J15
Bernhard Kainz, Markus Steinberger, Wolfgang Wein, Maria Kuklisova-Murgasova, Christina Malamateniou, Kevin Keraudren, Thomas Torsney-Weir, Mary Rutherford, Paul Aljabar, Joseph V Hajnal, Daniel Rueckert:
Fast volume reconstruction from motion corrupted stacks of 2D slices
Abstract: Capturing an enclosing volume of moving subjects and organs using fast individual image slice acquisition has shown promise in dealing with motion artefacts. Motion between slice acquisitions results in spatial inconsistencies that can be resolved by slice-to-volume reconstruction (SVR) methods to provide high quality 3D image data. Existing algorithms are, however, typically very slow, specialised to specific applications and rely on approximations, which impedes their potential clinical use. In this paper, we present a fast multi-GPU accelerated framework for slice-to-volume reconstruction. It is based on optimised 2D/3D registration, super-resolution with automatic outlier rejection and an additional (optional) intensity bias correction. We introduce a novel and fully automatic procedure for selecting the image stack with least motion to serve as an initial registration target. We evaluate the proposed method using artificial motion corrupted phantom data as well as clinical data, including tracked freehand ultrasound of the liver and fetal Magnetic Resonance Imaging. We achieve speed-up factors greater than 30 compared to a single CPU system and greater than 10 compared to currently available state-of-the-art multi-core CPU methods. We ensure high reconstruction accuracy by exact computation of the point-spread function for every input data point, which has not previously been possible due to computational limitations. Our framework and its implementation is scalable for available computational infrastructures and tests show a speed-up factor of 1.70 for each additional GPU. This paves the way for the online application of image based reconstruction methods during clinical examinations. The source code for the proposed approach is publicly available.
IEEE Transactions on Medical Imaging, 2015

@article{Kainz:FVR:7064742,
author={B. Kainz and M. Steinberger and W. Wein and M. Kuklisova-Murgasova and C. Malamateniou and K. Keraudren and T. Torsney-Weir and M. Rutherford and P. Aljabar and J. V. Hajnal and D. Rueckert},
journal={IEEE Transactions on Medical Imaging},
title={Fast Volume Reconstruction From Motion Corrupted Stacks of 2D Slices},
year={2015},
volume={34},
number={9},
pages={1901-1913},
doi={10.1109/TMI.2015.2415453},
ISSN={0278-0062},
month={Sept}
}
×
J14
Bernhard Kerbl, Denis Kalkofen, Markus Steinberger, Dieter Schmalstieg:
Interactive Disassembly Planning for Complex Objects
Abstract: We present an approach for the automatic generation, interactive exploration and real-time modification of disassembly procedures for complex, multipartite CAD data sets. In order to lift the performance barriers prohibiting interactive disassembly planning, we run a detailed analysis on the input model to identify recurring part constellations and efficiently determine blocked part motions in parallel on the GPU. Building on the extracted information, we present an interface for computing and editing extensive disassembly sequences in real-time while considering user-defined constraints and avoiding unstable configurations. To evaluate the performance of our C++/CUDA implementation, we use a variety of openly available CAD data sets, ranging from simple to highly complex. In contrast to previous approaches, our work enables interactive disassembly planning for objects which consist of several thousand parts and require cascaded translations during part removal.
Computer Graphics Forum / Eurographics (EG'15), 2015

@article {Kerbl:2015:IDP:CGF12560,
author = {Kerbl, Bernhard and Kalkofen, Denis and Steinberger, Markus and Schmalstieg, Dieter},
title = {Interactive Disassembly Planning for Complex Objects},
journal = {Computer Graphics Forum},
volume = {34},
number = {2},
issn = {1467-8659},
url = {http://dx.doi.org/10.1111/cgf.12560},
doi = {10.1111/cgf.12560},
pages = {287--297},
year = {2015},
}
×
J13
Markus Steinberger:
An overview of dynamic resource scheduling on graphics processors
Abstract: In this paper, we present a series of scheduling approaches targeted for massively parallel architectures, which in combination allow a wider range of algorithms to be executed on modern graphics processors. At first, we describe a new processing model which enables the efficient execution of dynamic, irregular workloads. Then, we present the currently fastest queuing algorithm for graphics processors, the most efficient dynamic memory allocator for massively parallel architectures, and the only autonomous scheduler for graphics processing units that can dynamically support different granularities of parallelism. Finally, we show how these scheduling approaches help to advance the state-of-the-art in the rendering, visualization and procedural modeling.
it - Information Technology, 2015
C11
Marting Sattlecker, Markus Steinberger:
Reyes Rendering on the GPU
Abstract: In this paper we investigate the possibility of real-time Reyes rendering with advanced effects such as displacement mapping and multidimensional rasterization on current graphics hardware. We describe a first GPU Reyes implementation executing within an autonomously executing persistent Megakernel. To support high quality rendering, we integrate displacement mapping into the renderer, which has only marginal impact on performance. To investigate rasterization for Reyes, we start with an approach similar to nearest neighbor filling, before presenting a precise sampling algorithm. This algorithm can be enhanced to support motion blur and depth of field using three dimensional sampling. To evaluate the performance quality trade-off of these effects, we compare three approaches: coherent sampling across time and on the lens, essentially overlaying images; randomized sampling along all dimensions; and repetitive randomization, in which the randomization pattern is repeated among subgroups of pixels. We evaluate all approaches, showing that high quality images can be generated with interactive to real-time refresh rates for advanced Reyes features.
Spring Conference on Computer Graphics, 2015

@inproceedings{Sattlecker:2015:RRG:2788539.2788543,
author = {Sattlecker, Martin and Steinberger, Markus},
title = {Reyes Rendering on the GPU},
booktitle = {Proceedings of the 31st Spring Conference on Computer Graphics},
series = {SCCG '15},
year = {2015},
isbn = {978-1-4503-3693-2},
location = {Smolenice, Slovakia},
pages = {31--38},
numpages = {8},
url = {http://doi.acm.org/10.1145/2788539.2788543},
doi = {10.1145/2788539.2788543},
acmid = {2788543},
publisher = {ACM},
address = {New York, NY, USA},
}
×
PA01
Dawid Pajak, Yun-Ta Tsai, Markus Steinberger:
Efficient Approximate-Nearest-Neighbor (ANN) Search for High-Quality Collaborative Filtering
Abstract: A computer implemented method of performing an approximate-nearest-neighbor search is disclosed. The method comprises dividing an image into a plurality of tiles. Further, for each of the plurality of tiles, perform the following in parallel on a processor: (a) dividing image patches into a plurality of clusters, wherein each cluster comprises similar images patches, and wherein the dividing continues recursively until a size of a cluster is below a threshold value; (b) performing a nearest-neighbor query within each of the plurality of clusters; and (c) performing collaborative filtering in parallel for each image patch, wherein the collaborative filtering aggregates and processes nearest neighbor image patches from a same cluster containing a respective image patch to form an output image
US Patent, 2015
J12
Markus Steinberger, Michael Kenzel, Pedro Boechat, Bernhard Kerbl, Mark Dokter, Dieter Schmalstieg:
Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU
Abstract: Conventional pipelines for capturing, displaying, and storing images are usually defined as a series of cascaded modules, each responsible for addressing a particular problem. While this divide-and-conquer approach offers many benefits, it also introduces a cumulative error, as each step in the pipeline only considers the output of the previous step, not the original sensor data. We propose an end-to-end system that is aware of the camera and image model, enforces natural-image priors, while jointly accounting for common image processing steps like demosaicking, denoising, deconvolution, and so forth, all directly in a given output representation (e.g., YUV, DCT). Our system is flexible and we demonstrate it on regular Bayer images as well as images from custom sensors. In all cases, we achieve large improvements in image quality and signal reconstruction compared to state-of-the-art techniques. Finally, we show that our approach is capable of very efficiently handling high-resolution images, making even mobile implementations feasible.
ACM Transactions on Graphics (SIGGRAPH Asia'14), 2014

@article{Steinberger:2014:WTS:2661229.2661250,
author = {Steinberger, Markus and Kenzel, Michael and Boechat, Pedro and Kerbl, Bernhard and Dokter, Mark and Schmalstieg, Dieter},
title = {Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU},
journal = {ACM Trans. Graph.},
issue_date = {November 2014},
volume = {33},
number = {6},
month = nov,
year = {2014},
issn = {0730-0301},
pages = {228:1--228:11},
articleno = {228},
numpages = {11},
url = {http://doi.acm.org/10.1145/2661229.2661250},
doi = {10.1145/2661229.2661250},
acmid = {2661250},
publisher = {ACM},
address = {New York, NY, USA},
}
×
J11
Felix Heide, Markus Steinberger, Yun-Ta Tsai, Nasa Rouf, Dawid Pajak, Dikpal Reddy, Orazio Gallo, Jing Liu, Wolfgang Heidrich, Karen Egiazarian, Jan Kautz, Kari Pulli:
FlexISP: A flexible camera image processing framework
Abstract: Conventional pipelines for capturing, displaying, and storing images are usually defined as a series of cascaded modules, each responsible for addressing a particular problem. While this divide-and-conquer approach offers many benefits, it also introduces a cumulative error, as each step in the pipeline only considers the output of the previous step, not the original sensor data. We propose an end-to-end system that is aware of the camera and image model, enforces natural-image priors, while jointly accounting for common image processing steps like demosaicking, denoising, deconvolution, and so forth, all directly in a given output representation (e.g., YUV, DCT). Our system is flexible and we demonstrate it on regular Bayer images as well as images from custom sensors. In all cases, we achieve large improvements in image quality and signal reconstruction compared to state-of-the-art techniques. Finally, we show that our approach is capable of very efficiently handling high-resolution images, making even mobile implementations feasible.
ACM Transactions on Graphics (SIGGRAPH Asia'14), 2014

@article{Heide:2014:FFC:2661229.2661260,
author = {Heide, Felix and Steinberger, Markus and Tsai, Yun-Ta and Rouf, Mushfiqur and Paj\k{a}k, Dawid and Reddy, Dikpal and Gallo, Orazio and Liu, Jing and Heidrich, Wolfgang and Egiazarian, Karen and Kautz, Jan and Pulli, Kari},
title = {FlexISP: A Flexible Camera Image Processing Framework},
journal = {ACM Trans. Graph.},
issue_date = {November 2014},
volume = {33},
number = {6},
month = nov,
year = {2014},
issn = {0730-0301},
pages = {231:1--231:13},
articleno = {231},
numpages = {13},
url = {http://doi.acm.org/10.1145/2661229.2661260},
doi = {10.1145/2661229.2661260},
acmid = {2661260},
publisher = {ACM},
address = {New York, NY, USA},
}
×
C10
Yun-Ta Tsai, Markus Steinberger, Dawid Pajak, Kari Pulli:
Fast ANN for High-Quality Collaborative Filtering
Abstract: Collaborative filtering collects similar patches, jointly filters them, and scatters the output back to input patches; each pixel gets a contribution from each patch that overlaps with it, allowing signal reconstruction from highly corrupted data. Exploiting self-similarity, however, requires finding matching image patches, which is an expensive operation. We propose a GPU-friendly approximated-nearest-neighbor algorithm that produces high-quality results for any type of collaborative filter. We evaluate our ANN search against state-of-the-art ANN algorithms in several application domains. Our method is orders of magnitudes faster, yet provides similar or higher-quality results than the previous work.
HPG'14 Best Paper Award
High Performance Graphics (HPG '14), 2014

@inproceedings {Tsai:FAH:2014,
booktitle = {Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics},
editor = {Ingo Wald and Jonathan Ragan-Kelley},
title = {{Fast ANN for High-Quality Collaborative Filtering}},
author = {Tsai, Yun-Ta and Steinberger, Markus and Pajak, Dawid and Pulli, Kari},
year = {2014},
publisher = {The Eurographics Association},
ISSN = {2079-8679},
ISBN = {978-3-905674-60-6},
DOI = {10.2312/hpg.20141094}
}
×
O02
Markus Steinberger:
Dynamisches Ressourcen Scheduling auf Grafik Prozessoren
Ausgezeichnete Informatikdissertationen 2013 (German), 2014
J10
Rostislav Khlebnikov, Philip Voglreiter, Markus Steinberger, Bernhard Kainz, Dieter Schmalstieg:
Parallel Irradiance Caching for Interactive Monte-Carlo Direct Volume Rendering
Abstract: We propose a technique to build the irradiance cache for isotropic scattering simultaneously with Monte Carlo progressive direct volume rendering on a single GPU, which allows us to achieve up to four times increased convergence rate for complex scenes with arbitrary sources of light. We use three procedures that run concurrently on a single GPU. The first is the main rendering procedure. The second procedure computes new cache entries, and the third one corrects the errors that may arise after creation of new cache entries. We propose two distinct approaches to allow massive parallelism of cache entry creation. In addition, we show a novel extrapolation approach which outputs high quality irradiance approximations and a suitable prioritization scheme to increase the convergence rate by dedicating more computational power to more complex rendering areas.
Computer Graphics Forum / EuroVis, 2014

@article {Khlebnikov:PIC:2014,
author = {Khlebnikov, Rostislav and Voglreiter, Philip and Steinberger, Markus and Kainz, Bernhard and Schmalstieg, Dieter},
title = {Parallel Irradiance Caching for Interactive Monte-Carlo Direct Volume Rendering},
journal = {Computer Graphics Forum},
volume = {33},
number = {3},
issn = {1467-8659},
url = {http://dx.doi.org/10.1111/cgf.12362},
doi = {10.1111/cgf.12362},
pages = {61--70},
year = {2014},
}
×
C09
Thomas Geymayer, Markus Steinberger, Alexander Lex, Marc Streit, Dieter Schmalstieg:
Show Me the Invisible: Visualizing Hidden Content
Abstract: Content on computer screens is often inaccessible to users because it is hidden, e.g., occluded by other windows, outside the viewport, or overlooked. In search tasks, the efficient retrieval of sought content is important. Current software, however, only provides limited support to visualize hidden occurrences and rarely supports search synchronization crossing application boundaries. To remedy this situation, we introduce two novel visualization methods to guide users to hidden content. Our first method generates awareness for occluded or out-of-viewport content using see-through visualization. For content that is either outside the screen's viewport or for data sources not opened at all, our second method shows off-screen indicators and an on-demand smart preview. To reduce the chances of overlooking content, we use visual links, i.e., visible edges, to connect the visible content or the visible representations of the hidden content. We show the validity of our methods in a user study, which demonstrates that our technique enables a faster localization of hidden content compared to traditional search functionality and thereby assists users in information retrieval tasks.
CHI '14 Honorable Mention Award
SIGCHI Conference on Human Factors in Computing Systems (CHI'14), 2014

@inproceedings{Geymayer:2014:SMI:2556288.2557032,
author = {Geymayer, Thomas and Steinberger, Markus and Lex, Alexander and Streit, Marc and Schmalstieg, Dieter},
title = {Show Me the Invisible: Visualizing Hidden Content},
booktitle = {Proceedings of the SIGCHI Conference on Human Factors in Computing Systems},
series = {CHI '14},
year = {2014},
isbn = {978-1-4503-2473-1},
location = {Toronto, Ontario, Canada},
pages = {3705--3714},
numpages = {10},
url = {http://doi.acm.org/10.1145/2556288.2557032},
doi = {10.1145/2556288.2557032},
acmid = {2557032},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {hidden content, occluded content, off-screen content, visual linking},
}
×
J09
Markus Steinberger, Michael Kenzel, Bernhard Kainz, Peter Wonka, Dieter Schmalstieg:
On-the-fly Generation and Rendering of Infinite Cities on the GPU
Abstract: In this paper, we present a new approach for shape-grammar-based generation and rendering of huge cities in real-time on the graphics processing unit GPU. Traditional approaches rely on evaluating a shape grammar and storing the geometry produced as a preprocessing step. During rendering, the pregenerated data is then streamed to the GPU. By interweaving generation and rendering, we overcome the problems and limitations of streaming pregenerated data. Using our methods of visibility pruning and adaptive level of detail, we are able to dynamically generate only the geometry needed to render the current view in real-time directly on the GPU. We also present a robust and efficient way to dynamically update a scene's derivation tree and geometry, enabling us to exploit frame-to-frame coherence. Our combined generation and rendering is significantly faster than all previous work. For detailed scenes, we are capable of generating geometry more rapidly than even just copying pregenerated data from main memory, enabling us to render cities with thousands of buildings at up to 100 frames per second, even with the camera moving at supersonic speed.
Computer Graphics Forum / Eurographics (EG'14), 2014

@article{Steinberger:2014:OGR:2771441.2771452,
author = {Steinberger, Markus and Kenzel, Michael and Kainz, Bernhard and Wonka, Peter and Schmalstieg, Dieter},
title = {On-the-fly Generation and Rendering of Infinite Cities on the GPU},
journal = {Comput. Graph. Forum},
issue_date = {May 2014},
volume = {33},
number = {2},
month = may,
year = {2014},
issn = {0167-7055},
pages = {105--114},
numpages = {10},
url = {http://dx.doi.org/10.1111/cgf.12315},
doi = {10.1111/cgf.12315},
acmid = {2771452},
publisher = {The Eurographs Association \&\#38; John Wiley \&\#38; Sons, Ltd.},
address = {Chichester, UK},
}
×
J08
Markus Steinberger, Michael Kenzel, Bernhard Kainz, Jörg Müller, Peter Wonka, Dieter Schmalstieg:
Parallel Generation of Architecture on the GPU
Abstract: In this paper, we present a novel approach for the parallel evaluation of procedural shape grammars on the graphics processing unit GPU. Unlike previous approaches that are either limited in the kind of shapes they allow, the amount of parallelism they can take advantage of, or both, our method supports state of the art procedural modeling including stochasticity and context-sensitivity. To increase parallelism, we explicitly express independence in the grammar, reduce inter-rule dependencies required for context-sensitive evaluation, and introduce intra-rule parallelism. Our rule scheduling scheme avoids unnecessary back and forth between CPU and GPU and reduces round trips to slow global memory by dynamically grouping rules in on-chip shared memory. Our GPU shape grammar implementation is multiple orders of magnitude faster than the standard in CPU-based rule evaluation, while offering equal expressive power. In comparison to the state of the art in GPU shape grammar derivation, our approach is nearly 50 times faster, while adding support for geometric context-sensitivity.
Honorable Mention Award (among top 3 papers)
Computer Graphics Forum / Eurographics (EG'14), 2014

@article{Steinberger:2014:PGA:2771441.2771449,
author = {Steinberger, Markus and Kenzel, Michael and Kainz, Bernhard and M"{u}ller, J"{o}rg and Peter, Wonka and Schmalstieg, Dieter},
title = {Parallel Generation of Architecture on the GPU},
journal = {Comput. Graph. Forum},
issue_date = {May 2014},
volume = {33},
number = {2},
month = may,
year = {2014},
issn = {0167-7055},
pages = {73--82},
numpages = {10},
url = {http://dx.doi.org/10.1111/cgf.12312},
doi = {10.1111/cgf.12312},
acmid = {2771449},
publisher = {The Eurographs Association \&\#38; John Wiley \&\#38; Sons, Ltd.},
address = {Chichester, UK},
}
×
J07
Rostislav Khlebnikov, Bernhard Kainz, Markus Steinberger, Dieter Schmalstieg:
Noise-based volume rendering for the visualization of multivariate volumetric data
Abstract: Analysis of multivariate data is of great importance in many scientific disciplines. However, visualization of 3D spatially-fixed multivariate volumetric data is a very challenging task. In this paper we present a method that allows simultaneous real-time visualization of multivariate data. We redistribute the opacity within a voxel to improve the readability of the color defined by a regular transfer function, and to maintain the see-through capabilities of volume rendering. We use predictable procedural noise--random-phase Gabor noise--to generate a high-frequency redistribution pattern and construct an opacity mapping function, which allows to partition the available space among the displayed data attributes. This mapping function is appropriately filtered to avoid aliasing, while maintaining transparent regions. We show the usefulness of our approach on various data sets and with different example applications. Furthermore, we evaluate our method by comparing it to other visualization techniques in a controlled user study. Overall, the results of our study indicate that users are much more accurate in determining exact data values with our novel 3D volume visualization method. Significantly lower error rates for reading data values and high subjective ranking of our method imply that it has a high chance of being adopted for the purpose of visualization of multivariate 3D data.
Transactions on Visualization and Computer Graphics (Vis'13), 2013

@ARTICLE{Khlebnikov:2013:NBV,
author={R. Khlebnikov and B. Kainz and M. Steinberger and D. Schmalstieg},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={Noise-Based Volume Rendering for the Visualization of Multivariate Volumetric Data},
year={2013},
volume={19},
number={12},
pages={2926-2935},
doi={10.1109/TVCG.2013.180},
ISSN={1077-2626},
month={Dec},}
×
P02
Philip Voglreiter, Markus Steinberger, Rostislav Khlebnikov, Bernhard Kainz, Dieter Schmalstieg:
Volume Rendering with Advanced GPU Scheduling Strategies
Abstract: Modern GPUs are powerful enough to enable interactive display of high-quality volume data even despite the fact that many volume rendering methods do not present a natural fit for current GPU hardware. However, there still is a vast amount of computational power that remains unused due to the inefficient use of the available hardware. In this work, we demonstrate how advanced scheduling methods can be employed to implement volume rendering algorithms in a way that better utilizes the GPU by example of three different state-of-the-art volume rendering techniques
VIS'13 Honorable Mention Poster Award
IEEE Scientific Visualization poster (Vis'13), 2013
C08
Denis Kalkofen, Eduardo Veas, Stefanie Zollmann, Markus Steinberger, Dieter Schmalstieg:
Adaptive Ghosted Views for Augmented Reality
Abstract: In Augmented Reality (AR), ghosted views allow a viewer to explore hidden structure within the real-world environment. A body of previous work has explored which features are suitable to support the structural interplay between occluding and occluded elements. However, the dynamics of AR environments pose serious challenges to the presentation of ghosted views. While a model of the real world may help determine distinctive structural features, changes in appearance or illumination detriment the composition of occluding and occluded structure. In this paper, we present an approach that considers the information value of the scene before and after generating the ghosted view. Hereby, a contrast adjustment of preserved occluding features is calculated, which adaptively varies their visual saliency within the ghosted view visualization. This allows us to not only preserve important features, but to also support their prominence after revealing occluded structure, thus achieving a positive effect on the perception of ghosted views.
International Symposium on Mixed and Augmented Reality (ISMAR'13), 2013

@INPROCEEDINGS{Kalkofen:2013:AGV,
author={D. Kalkofen and E. Veas and S. Zollmann and M. Steinberger and D. Schmalstieg},
booktitle={Mixed and Augmented Reality (ISMAR), 2013 IEEE International Symposium on},
title={Adaptive ghosted views for Augmented Reality},
year={2013},
pages={1-9},
keywords={augmented reality;data visualisation;AR environment dynamics;adaptive ghosted view visualization;augmented reality;occluding features contrast adjustment;occluding-occluded elements structural interplay;structural features;visual saliency;Augmented reality;Color;Image color analysis;Rendering (computer graphics);Solid modeling;Three-dimensional displays;Visualization},
doi={10.1109/ISMAR.2013.6671758},
month={Oct},
}
×
C07
Bernhard Kainz, Stefan Hauswiesner, Gerhard Reitmayr, Markus Steinberger, Raphael Grasset, Lukas Gruber, Eduardo Veas, Denis Kalkofen, Hartmut Seichter, Dieter Schmalstieg:
OmniKinect: Real-Time Dense Volumetric Data Acquisition and Applications
Abstract: Real-time three-dimensional acquisition of real-world scenes has many important applications in computer graphics, computer vision and human-computer interaction. Inexpensive depth sensors such as the Microsoft Kinect allow to leverage the development of such applications. However, this technology is still relatively recent, and no detailed studies on its scalability to dense and view-independent acquisition have been reported. This paper addresses the question of what can be done with a larger number of Kinects used simultaneously. We describe an interference-reducing physical setup, a calibration procedure and an extension to the KinectFusion algorithm, which allows to produce high quality volumetric reconstructions from multiple Kinects whilst overcoming systematic errors in the depth measurements. We also report on enhancing image based visual hull rendering by depth measurements, and compare the results to KinectFusion. Our system provides practical insight into achievable spatial and radial range and into bandwidth requirements for depth data acquisition. Finally, we present a number of practical applications of our system.
Symposium On Virtual Reality Software And Technology (VRST'12), 2012

@inproceedings{Kainz:2012:ORD:2407336.2407342,
author = {Kainz, Bernhard and Hauswiesner, Stefan and Reitmayr, Gerhard and Steinberger, Markus and Grasset, Raphael and Gruber, Lukas and Veas, Eduardo and Kalkofen, Denis and Seichter, Hartmut and Schmalstieg, Dieter},
title = {OmniKinect: Real-time Dense Volumetric Data Acquisition and Applications},
booktitle = {Proceedings of the 18th ACM Symposium on Virtual Reality Software and Technology},
series = {VRST '12},
year = {2012},
isbn = {978-1-4503-1469-5},
location = {Toronto, Ontario, Canada},
pages = {25--32},
numpages = {8},
url = {http://doi.acm.org/10.1145/2407336.2407342},
doi = {10.1145/2407336.2407342},
acmid = {2407342},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {4D reconstruction, depth sensors, microsoft kinect},
}
×
J06
Markus Steinberger, Bernhard Kainz, Bernhard Kerbl, Stefan Hauswiesner, Michael Kenzel, Dieter Schmalstieg:
Softshell: Dynamic Scheduling on GPUs
Abstract: In this paper we present Softshell, a novel execution model for devices composed of multiple processing cores operating in a single instruction, multiple data fashion, such as graphics processing units (GPUs). The Softshell model is intuitive and more flexible than the kernel-based adaption of the stream processing model, which is currently the dominant model for general purpose GPU computation. Using the Softshell model, algorithms with a relatively low local degree of parallelism can execute efficiently on massively parallel architectures. Softshell has the following distinct advantages: (1) work can be dynamically issued directly on the device, eliminating the need for synchronization with an external source, i.e., the CPU; (2) its three-tier dynamic scheduler supports arbitrary scheduling strategies, including dynamic priorities and real-time scheduling; and (3) the user can influence, pause, and cancel work already submitted for parallel execution. The Softshell processing model thus brings capabilities to GPU architectures that were previously only known from operating-system designs and reserved for CPU programming. As a proof of our claims, we present a publicly available implementation of the Softshell processing model realized on top of CUDA. The benchmarks of this implementation demonstrate that our processing model is easy to use and also performs substantially better than the state-of-the-art kernel-based processing model for problems that have been difficult to parallelize in the past.
ACM Transactions on Graphics (SIGGRAPH Asia'12), 2012

@article{Steinberger:2012:SDS:2366145.2366180,
author = {Steinberger, Markus and Kainz, Bernhard and Kerbl, Bernhard and Hauswiesner, Stefan and Kenzel, Michael and Schmalstieg, Dieter},
title = {Softshell: Dynamic Scheduling on {GPUs}},
journal = {ACM Trans. Graph.},
issue_date = {November 2012},
volume = {31},
number = {6},
month = nov,
year = {2012},
issn = {0730-0301},
pages = {161:1--161:11},
articleno = {161},
numpages = {11},
url = {http://doi.acm.org/10.1145/2366145.2366180},
doi = {10.1145/2366145.2366180},
acmid = {2366180},
publisher = {ACM},
address = {New York, NY, USA}
}
×
C06
Philip Voglreiter, Markus Steinberger, Dieter Schmalstieg, Bernhard Kainz:
Volumetric Real-Time Particle-Based Representation of Large Unstructured Tetrahedral Polygon Meshes
Abstract: In this paper we propose a particle-based volume rendering approach for unstructured, three-dimensional, tetrahedral polygon meshes. We stochastically generate millions of particles per second and project them on the screen in real-time. In contrast to previous rendering techniques of tetrahedral volume meshes, our method does not need a prior depth sorting of geometry. Instead, the rendered image is generated by choosing particles closest to the camera. Furthermore, we use spatial superimposing. Each pixel is constructed from multiple subpixels. This approach not only increases projection accuracy, but allows also a combination of subpixels into one superpixel that creates the well-known translucency effect of volume rendering. We show that our method is fast enough for the visualization of unstructured three-dimensional grids with hard real-time constraints and that it scales well for a high number of particles.
Mesh Processing in Medical Image Analysis (MICCAI'12), 2012

@Inbook{Voglreiter:2012:VRP,
author={Voglreiter, Philip and Steinberger, Markus and Schmalstieg, Dieter and Kainz, Bernhard},
editor={Levine, Joshua A. and Paulsen, Rasmus R. and Zhang, Yongjie},
title={Volumetric Real-Time Particle-Based Representation of Large Unstructured Tetrahedral Polygon Meshes},
bookTitle={Mesh Processing in Medical Image Analysis 2012: MICCAI 2012 International Workshop, MeshMed 2012, Nice, France, October 1, 2012. Proceedings},
year={2012},
publisher={Springer Berlin Heidelberg},
address={Berlin, Heidelberg},
pages={159--168},
isbn={978-3-642-33463-4},
doi={10.1007/978-3-642-33463-4_16},
url={http://dx.doi.org/10.1007/978-3-642-33463-4_16}
}
×
J05
Rostislav Khlebnikov, Bernhard Kainz, Markus Steinberger, Marc Streit, Dieter Schmalstieg:
Procedural Texture Synthesis for Zoom-Independent Visualization of Multivariate Data
Abstract: Simultaneous visualization of multiple continuous data attributes in a single visualization is a task that is important for many application areas. Unsurprisingly, many methods have been proposed to solve this task. However, the behavior of such methods during the exploration stage, when the user tries to understand the data with panning and zooming, has not been given much attention. In this paper, we propose a method that uses procedural texture synthesis to create zoom-independent visualizations of three scalar data attributes. The method is based on random-phase Gabor noise, whose frequency is adapted for the visualization of the first data attribute. We ensure that the resulting texture frequency lies in the range that is perceived well by the human visual system at any zoom level. To enhance the perception of this attribute, we also apply a specially constructed transfer function that is based on statistical properties of the noise. Additionally, the transfer function is constructed in a way that it does not introduce any aliasing to the texture. We map the second attribute to the texture orientation. The third attribute is color coded and combined with the texture by modifying the value component of the HSV color model. The necessary contrast needed for texture and color perception was determined in a user study. In addition, we conducted a second user study that shows significant advantages of our method over current methods with similar goals. We believe that our method is an important step towards creating methods that not only succeed in visualizing multiple data attributes, but also adapt to the behavior of the user during the data exploration stage.
Computer Graphics Forum (EuroVis12), 2012

@article {Khlebnikov:2012:PTS,
journal = {Computer Graphics Forum},
title = {{Procedural Texture Synthesis for Zoom-Independent Visualization of Multivariate Data}},
author = {Khlebnikov, Rostislav and Kainz, Bernhard and Steinberger, Markus and Streit, Marc and Schmalstieg, Dieter},
year = {2012},
publisher = {The Eurographics Association and Blackwell Publishing Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/j.1467-8659.2012.03127.x}
}
×
J04
Markus Steinberger, Manuela Waldner, Dieter Schmalstieg:
Interactive Self-Organizing Windows
Abstract: In this paper, we present the design and implementation of a dynamic window management technique that changes the perception of windows as fixed-sized rectangles. The primary goal of self-organizing windows is to automatically display the most relevant information for a user's current activity, which removes the burden of organizing and arranging windows from the user. We analyze the image-based representation of each window and identify coherent pieces of information. The windows are then automatically moved, scaled and composed in a contentaware manner to fit the most relevant information into the limited area of the screen. During the design process, we consider findings from previous experiments and show how users can benefit from our system. We also describe how the immense processing power of current graphics processing units can be exploited to build an interactive system that finds an optimal solution within the complex design space of all possible window transformations in real time.
Computer Graphics Forum / Eurographics (EG'12), 2012
C05
Markus Steinberger, Michael Kenzel, Bernhard Kainz, Dieter Schmalstieg:
ScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU
Abstract: In this paper, we analyze the special requirements of a dynamic memory allocator that is designed for massively parallel architectures such as Graphics Processing Units (GPUs). We show that traditional strategies, which work well on CPUs, are not well suited for the use on GPUs and present the thorough design of ScatterAlloc, which can efficiently deal with hundreds of requests in parallel. Our allocator greatly reduces collisions and congestion by scattering memory requests based on hashing. We analyze ScatterAlloc in terms of allocation speed, data access time and fragmentation, and compare it to current state-of-the-art allocators, including the one provided with the NVIDIA CUDA toolkit. Our results show, that ScatterAlloc clearly outperforms these other approaches, yielding speed-ups between 10 to 100.
Innovative Parallel Computing (InPar'12), 2012
C04
Stefan Hauswiesner, Rostislav Khlebnikov, Markus Steinberger, Mathias Straka, Gerhard Reitmayr:
Multi-GPU Image-based Visual Hull Rendering
Abstract: Many virtual mirror and telepresence applications require novel viewpoint synthesis with little latency to user motion. Image-based visual hull (IBVH) rendering is capable of rendering arbitrary views from segmented images without an explicit intermediate data representation, such as a mesh or a voxel grid. By computing depth images directly from the silhouette images, it usually outperforms indirect methods. GPU-hardware accelerated implementations exist, but due to the lack of an intermediate representation no multi-GPU parallel strategies and implementations are currently available. This paper suggests three ways to parallelize the IBVH-pipeline and maps them to the sorting classification that is often applied to conventional parallel rendering systems. In addition to sort-first parallelization, we suggest a novel sort-last formulation that regards cameras as scene objects. We enhance this method’s performance by a block-based encoding of the rendering results. For interactive systems with hard real-time constraints, we combine the algorithm with a multi-frame rate (MFR) system. We suggest a combination of forward and backward image warping to improve the visual quality of the MFR rendering. We observed the runtime behavior of the suggested methods and assessed how their performance scales with respect to input and output resolutions and the number of GPUs. By using additional GPUs, we reduced rendering times by up to 60%. Multi-frame rate viewing can even be ten times faster.
Eurographics Symposium on Parallel Graphics and Visualization (EGPGV'12), 2012

@inproceedings {Hauswiesner:2012:MIV,
title = {{Multi-GPU Image-based Visual Hull Rendering}},
author = {Hauswiesner, Stefan and Khlebnikov, Rostislav and Steinberger, Markus and Straka, Matthias and Reitmayr, Gerhard},
booktitle = {Eurographics Symposium on Parallel Graphics and Visualization},
editor = {Hank Childs and Torsten Kuhlen and Fabio Marton},
year = {2012},
publisher = {The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {10.2312/EGPGV/EGPGV12/119-128},
}
×
J03
Markus Steinberger, Bernhard Kainz, Stefan Hauswiesner, Rostislav Khlebnikov, Denis Kalkofen, Dieter Schmalstieg:
Ray Prioritization Using Stylization and Visual Saliency
Abstract: This paper presents a new method to control scene sampling in complex ray-based rendering environments. It proposes to constrain image sampling density with a combination of object features, which are known to be well perceived by the human visual system, and image space saliency, which captures effects that are not based on the object’s geometry. The presented method uses Non- Photorealistic Rendering techniques for the object space feature evaluation and combines the image space saliency calculations with image warping to infer quality hints from previously generated frames. In order to map different feature types to sampling densities, we also present an evaluation of the object space and image space features’ impact on the resulting image quality. In addition, we present an efficient, adaptively aligned fractal pattern that is used to reconstruct the image from sparse sampling data. Furthermore, this paper presents an algorithm which uses our method in order to guarantee a desired minimal frame rate. Our scheduling algorithm maximizes the utilization of each given time slice by rendering features in the order of visual importance values until a time constraint is reached. We demonstrate how our method can be used to boost or stabilize the rendering time in complex ray- based image generation consisting of geometric as well as volumetric data.
Computers & Graphics, 2012

@article{Steinberger:2012:RPS,
title = {Ray prioritization using stylization and visual saliency},
author = {Markus Steinberger and Bernhard Kainz and Stefan Hauswiesner and Rostislav Khlebnikov and Denis Kalkofen and Dieter Schmalstieg}
journal = {Computers & Graphics},
volume = {36},
number = {6},
pages = {673 - 684},
year = {2012},
note = {2011 Joint Symposium on Computational Aesthetics (CAe), Non-Photorealistic Animation and Rendering (NPAR), and Sketch-Based Interfaces and Modeling (SBIM)},
issn = {0097-8493},
doi = {http://dx.doi.org/10.1016/j.cag.2012.03.037},
url = {http://www.sciencedirect.com/science/article/pii/S0097849312000854},
}
×
C03
Manuela Waldner, Raphael Grasset, Markus Steinberger, Dieter Schmalstieg:
Display-Adaptive Window Management for Irregular Surfaces
Abstract: Current projectors can easily be combined to create an everywhere display, using all suitable surfaces in offices or meeting rooms for the presentation of information. However, the resulting irregular display is not well supported by tradi tional desktop window managers, which are optimized for rectangular screens. In this paper, we present novel display-adaptive window management techniques, which provide semi-automatic placement for desktop elements (such as windows or icons) for users of large, irregularly shaped displays. We report results from an exploratory study, which reveals interesting emerging strategies of users in the manipulation of windows on large irregular displays and shows that the new techniques increase subjective satisfaction with the window management interface.
Interactive Tabletops and Surfaces (ITS'11), 2011

@inproceedings{Waldner:2011:DWM:2076354.2076394,
    author = {Waldner, Manuela and Grasset, Raphael and Steinberger, Markus and Schmalstieg, Dieter},
    title = {Display-adaptive Window Management for Irregular Surfaces},
    booktitle = {Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces},
    series = {ITS '11},
    year = {2011},
    isbn = {978-1-4503-0871-7},
    location = {Kobe, Japan},
    pages = {222--231},
    numpages = {10},
    url = {http://doi.acm.org/10.1145/2076354.2076394},
    doi = {10.1145/2076354.2076394},
    acmid = {2076394},
    publisher = {ACM},
    address = {New York, NY, USA},
    keywords = {irregular displays, window management},
}
×
J02
Markus Steinberger, Manuela Waldner, Marc Streit, Alexander Lex, Dieter Schmalstieg:
Context-Preserving Visual Links
Abstract: Evaluating, comparing, and interpreting related pieces of information are tasks that are commonly performed during visual data analysis and in many kinds of information-intensive work. Synchronized visual highlighting of related elements is a well-known technique used to assist this task. An alternative approach, which is more invasive but also more expressive is visual linking in which line connections are rendered between related elements. In this work, we present context-preserving visual links as a new method for generating visual links. The method specifically aims to fulfill the following two goals: first, visual links should minimize the occlusion of important information; second, links should visually stand out from surrounding information by minimizing visual interference. We employ an image-based analysis of visual saliency to determine the important regions in the original representation. A consequence of the image-based approach is that our technique is application-independent and can be employed in a large number of visual data analysis scenarios in which the underlying content cannot or should not be altered. We conducted a controlled experiment that indicates that users can find linked elements in complex visualizations more quickly and with greater subjective satisfaction than in complex visualizations in which plain highlighting is used. Context-preserving visual links were perceived as visually more attractive than traditional visual links that do not account for the context information.
InfoVis '11 Best Paper Award
IEEE Transactions on Visualization and Computer Graphics (InfoVis '11), 2011

@ARTICLE{Steinberger:2011:CVL:TVCG.2011.183,
    author={Steinberger, M. and Waldner, M. and Streit, M. and Lex, A. and Schmalstieg, D.},
    journal={Visualization and Computer Graphics, IEEE Transactions on},
    title={Context-Preserving Visual Links},
    year={2011},
    volume={17},
    number={12},
    pages={2249-2258},
    keywords={data analysis;data visualisation;complex visualizations;context preserving visual links;image based analysis;information intensive work;information occlusion;synchronized visual highlighting;visual data analysis;visual interference;visual saliency;Data visualization;Histograms;Image color analysis;Visual links;connectedness;highlighting;image-based;routing;saliency.},
    doi={10.1109/TVCG.2011.183},
    ISSN={1077-2626},
    month={Dec},
}
×
C02
Bernhard Kainz, Markus Steinberger, Stefan Hauswiesner, Rostislav Khlebnikov, Dieter Schmalstieg:
Stylization-based ray prioritization for guaranteed frame rates
Abstract: This paper presents a new method to control graceful scene degradation in complex ray-based rendering environments. It proposes to constrain the image sampling density with object features, which are known to support the comprehension of the three-dimensional shape. The presented method uses Non-Photorealistic Rendering (NPR) techniques to extract features such as silhouettes, suggestive contours, suggestive highlights, ridges and valleys. To map different feature types to sampling densities, we also present an evaluation of the features impact on the resulting image quality. To reconstruct the image from sparse sampling data, we use linear interpolation on an adaptively aligned fractal pattern. With this technique, we are able to present an algorithm that guarantees a desired minimal frame rate without much loss of image quality. Our scheduling algorithm maximizes the use of each given time slice by rendering features in order of their corresponding importance values until a time constraint is reached. We demonstrate how our method can be used to boost and guarantee the rendering time in complex ray-based environments consisting of geometric as well as volumetric data.
NPAR '11 Best Paper Award in Rendering
Non-photorealistic Animation and Rendering (NPAR '11), 2011

@inproceedings{Kainz:2011:SRP:2024676.2024685,
    author = {Kainz, Bernhard and Steinberger, Markus and Hauswiesner, Stefan and Khlebnikov, Rostislav and Schmalstieg, Dieter},
    title = {Stylization-based Ray Prioritization for Guaranteed Frame Rates},
    booktitle = {Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Non-Photorealistic Animation and Rendering},
    series = {NPAR '11},
    year = {2011},
    isbn = {978-1-4503-0907-3},
    location = {Vancouver, British Columbia, Canada},
    pages = {43--54},
    numpages = {12},
    url = {http://doi.acm.org/10.1145/2024676.2024685},
    doi = {10.1145/2024676.2024685},
    acmid = {2024685},
    publisher = {ACM},
    address = {New York, NY, USA},
    keywords = {guaranteed frame rates, line features, ray based image generation},
}
×
J01
Manuela Waldner, Markus Steinberger, Raphael Grasset, Dieter Schmalstieg:
Importance-Driven Compositing Window Management
Abstract: In this paper we present importance-driven compositing window management, which considers windows not only as basic rectangular shapes but also integrates the importance of the windows' content using a bottom-up visual attention model. Based on this information, importance-driven compositing optimizes the spatial window layout for maximum visibility and interactivity of occluded content in combination with see-through windows. We employ this technique for emerging window manager functions to minimize information overlap caused by popping up windows or floating toolbars and to improve the access to occluded window content. An initial user study indicates that our technique provides a more effective and satisfactory access to occluded information than the well-adopted Alt+Tab window switching technique and see-through windows without optimized spatial layout.
CHI '11 Honorable Mention Award
Human Factors in Computing Systems (CHI '11), 2011

@inproceedings{Waldner:2011:ICW:1978942.1979085,
    author = {Waldner, Manuela and Steinberger, Markus and Grasset, Raphael and Schmalstieg, Dieter},
    title = {Importance-driven Compositing Window Management},
    booktitle = {Proceedings of the SIGCHI Conference on Human Factors in Computing Systems},
    series = {CHI '11},
    year = {2011},
    isbn = {978-1-4503-0228-9},
    location = {Vancouver, BC, Canada},
    pages = {959--968},
    numpages = {10},
    url = {http://doi.acm.org/10.1145/1978942.1979085},
    doi = {10.1145/1978942.1979085},
    acmid = {1979085},
    publisher = {ACM},
    address = {New York, NY, USA},
    keywords = {compositing window management, space management, transparency, visual saliency},
}
×
P01
Bernhard Kainz, Markus Steinberger, Stefan Hauswiesner, Rostislav Khlebnikov, Denis Kalkofen, Dieter Schmalstieg:
Using Perceptual Features to Prioritize Ray-based Image Generation
Abstract: A common challenge in interactive image generation is maintaining high interactivity of the applications that use computationally demanding rendering algorithms. This is usually achieved by sacrificing some of the image quality in order to decrease the rendering time. Most of such algorithms achieve interactive frame rates while trying to preserve as much image quality as possible by applying the reduction steps non-uniformly. However, high-end rendering systems, such as those presented by Parker et al. in 2010, aim to generate highly realistic images of very complex scenes. In such systems ordinary sampling approaches often give visually unacceptable results. In order to allow to optimize the ratio between the sampling rate of the scene and its resulting perceptual quality, we present a new sampling strategy which uses the information about object features that are known to support the comprehension of 3D shape. We control the sampling density by exploiting line extraction techniques commonly used in Non-Photorealistic Renderings.
Symposium on Interactive 3D Graphics and Games 2011 (I3D), 2011

@inproceedings{Kainz:2011:UPF:1944745.1944795,
    author = {Kainz, B. and Steinberger, M. and Hauswiesner, S. and Khlebnikov, R. and Kalkofen, D. and Schmalstieg, D.},
    title = {Using Perceptual Features to Prioritize Ray-based Image Generation},
    booktitle = {Symposium on Interactive 3D Graphics and Games},
    series = {I3D '11},
    year = {2011},
    isbn = {978-1-4503-0565-5},
    location = {San Francisco, California},
    pages = {215--215},
    numpages = {1},
    url = {http://doi.acm.org/10.1145/1944745.1944795},
    doi = {10.1145/1944745.1944795},
    acmid = {1944795},
    publisher = {ACM},
    address = {New York, NY, USA},
}
×
C01
Markus Steinberger, Markus Grabner:
Wavelet-based Multiresolution Isosurface Rendering
Abstract: We present an interactive rendering method for isosurfaces in a voxel grid. The underlying trivariate function is represented as a spline wavelet hierarchy, which allows for adaptive (view-dependent) selection of the desired level-of-detail by superimposing appropriately weighted basis functions. Different root finding techniques are compared with respect to their precision and efficiency. Both wavelet reconstruction and root finding are implemented in CUDA to utilize the high computational performance of Nvidia's hardware and to obtain high quality results. We tested our methods with datasets of up to 512³ voxels and demonstrate interactive frame rates for a viewport size of up to 1024x768 pixels.
Eurographics/IEEE VGTC Symposium on Volume Graphics, 2010

@inproceedings {VG:VG10:013-020,
    booktitle = {IEEE/ EG Symposium on Volume Graphics},
    editor = {Ruediger Westermann and Gordon Kindlmann},
    title = {{Wavelet-based Multiresolution Isosurface Rendering}},
    author = {Steinberger, Markus and Grabner, Markus},
    year = {2010},
    publisher = {The Eurographics Association},
    ISSN = {1727-8376},
    ISBN = {978-3-905674-23-1},
    DOI = {10.2312/VG/VG10/013-020}
}
×
O01
Markus Steinberger:
Multiresolution Isosurface Rendering
Abstract: In this paper we propose a new technique for isosurface rendering of volume data. Medical data visualization, e.g. relies on exact and interactive isosurface renderings. We show how to construct a multi resolution view of the data using bi-orthogonal spline wavelets and how to perform fast rendering using raycasting implemented on the GPU. This approach benefits from the properties of both: the wavelet transform and the reconstruction using three di- mensional splines. The smoothness of surfaces is zoom level independent and data can be compressed to speed up rendering while still being able to show full detail quickly. Ray evaluation is implemented in model space to enable perspective rendering. Due to the fact that the isosurface is not extracted from the data beforehand, the isosurface level as well as the current resolution level can be changed without any further computation.
CESCG '09 3rd Best Paper Award
Central European Seminar on Computer Graphics (CESCG '09), 2009

Publications

Selection:

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023