Publications

I have started publishing during my master's studies in 2009. Most of my publications are on GPU algorithms, GPU work scheduling and related topics with a focus on rendering. However, I am also publishing in the fields of visualization, information visualization, human computer intaction, and augmented reality. For a current list of citations see my Google Scholar page. In the following you will find my self maintained list of publications with some additional information and material about them. If you have any questions about the individual publications do not hesitate to contact me.

Selection:

  • 2009

  • 2010

  • 2011

  • 2012

  • 2013

  • 2014

  • 2015

  • 2016

  • 2017

  • J20
    ShapeGenetics: Using Genetic Algorithms for Procedural Modeling
    Karl Haubenwallner, Hans-Peter Seidel, Markus Steinberger:
    ShapeGenetics: Using Genetic Algorithms for Procedural Modeling
    Abstract: In this paper, we show that genetic algorithms (GA) can be used to control the output of procedural modeling algorithms. We propose an efficient way to encode the choices that have to be made during a procedural generation as a hierarchical genome representation. In combination with mutation and reproduction operations specifically designed for controlled procedural modeling, our GA can evolve a population of individual models close to any high-level goal. Possible scenarios include a volume that should be filled by a procedurally grown tree or a painted silhouette that should be followed by the skyline of a procedurally generated city. These goals are easy to set up for an artist compared to the tens of thousands of variables that describe the generated model and are chosen by the GA. Previous approaches for controlled procedural modeling either use Reversible Jump Markov Chain Monte Carlo (RJMCMC) or Stochastically-Ordered Sequential Monte Carlo (SOSMC) as workhorse for the optimization. While RJMCMC converges slowly, requiring multiple hours for the optimization of larger models, it produces high quality models. SOSMC shows faster convergence under tight time constraints for many models, but can get stuck due to choices made in the early stages of optimization. Our GA shows faster convergence than SOSMC and generates better models than RJMCMC in the long run.
    Computer Graphics Forum / Eurographics (EG'17), 2017
  • J19
    A GPU-adapted Structure for Unstructured Grids
    Rhaleb Zayer, Markus Steinberger, Hans-Peter Seidel:
    A GPU-adapted Structure for Unstructured Grids
    Abstract: A key advantage of working with structured grids (e.g., images) is the ability to directly tap into the powerful machinery of linear algebra. This is not much so for unstructured grids where intermediate bookkeeping data structures stand in the way. On modern high performance computing hardware, the conventional wisdom behind these intermediate structures is further challenged by costly memory access, and more importantly by prohibitive memory resources on environments such as graphics hardware. In this paper, we bypass this problem by introducing a sparse matrix representation for unstructured grids which not only reduces the memory storage requirements but also cuts down on the bulk of data movement from global storage to the compute units. In order to take full advantage of the proposed representation, we augment ordinary matrix multiplication by means of action maps, local maps which encode the desired interaction between grid vertices. In this way, geometric computations and topological modifications translate into concise linear algebra operations. In our algorithmic formulation, we capitalize on the nature of sparse matrix-vector multiplication which allows avoiding explicit transpose computation and storage. Furthermore, we develop an efficient vectorization to the demanding assembly process of standard graph and finite element matrices.
    Computer Graphics Forum / Eurographics (EG'17), 2017
  • J18
    Hierarchical Bucket Queuing for Fine-Grained Priority Scheduling on the GPU
    Bernhard Kerbl, Michael Kenzel, Dieter Schmalstieg, Hans-Peter Seidel, Markus Steinberger:
    Hierarchical Bucket Queuing for Fine-Grained Priority Scheduling on the GPU
    Abstract: While the modern graphics processing unit (GPU) offers massive parallel compute power, the ability to influence the scheduling of these immense resources is severely limited. Therefore, the GPU is widely considered to be only suitable as an externally controlled co-processor for homogeneous workloads which greatly restricts the potential applications of GPU computing. To address this issue, we present a new method to achieve fine-grained priority scheduling on the GPU: hierarchical bucket queuing. By carefully distributing the workload among multiple queues and efficiently deciding which queue to draw work from next, we enable a variety of scheduling strategies. These strategies include fair-scheduling, earliest-deadline-first scheduling and user-defined dynamic priority scheduling. In a comparison with a sorting-based approach, we reveal the advantages of hierarchical bucket queuing over previous work. Finally, we demonstrate the benefits of using priority scheduling in real-world applications by example of path tracing and foveated micropolygon rendering.
    Computer Graphics Forum, 2016
  • J17
    Representing and Scheduling Procedural Generation using Operator Graphs
    Pedro Boechat, Mark Dokter, Michael Kenzel, Hans-Peter Seidel, Dieter Schmalstieg, Markus Steinberger:
    Representing and Scheduling Procedural Generation using Operator Graphs
    Abstract: In this paper, we present the concept of operator graph scheduling for high performance procedural generation on the graphics processing unit (GPU). The operator graph forms an intermediate representation that describes all possible operations and objects that can arise during a specific procedural generation. While previous methods have focused on parallelizing a specific procedural approach, the operator graph is applicable to all procedural generation methods that can be described by a graph, such as L-systems, shape grammars, or stack based generation methods. Using the operator graph, we show that all partitions of the graph correspond to possible ways of scheduling a procedural generation on the GPU, including the scheduling strategies of previous work. As the space of possible partitions is very large, we describe three search heuristics, aiding an optimizer in finding the fastest valid schedule for any given operator graph. The best partitions found by our optimizer increase performance of 8 to 30x over the previous state of the art in GPU shape grammar and L-system generation.
    ACM Transactions on Graphics (SIGGRAPH Asia'16), 2016
  • C13
    How naive is naive SpMV on the GPU?
    Markus Steinberger, Andreas Derler, Rhaleb Zayer, Hans-Peter Seidel:
    How naive is naive SpMV on the GPU?
    Abstract: Sparse matrix vector multiplication (SpMV) is the workhorse for a wide range of linear algebra computations. In a serial setting, naive implementations for direct multiplication and transposed multiplication achieve very competitive performance. In parallel settings, especially on graphics hardware, it is widely believed that naive implementations cannot reach the performance of highly tuned parallel implementations and complex data formats. Most often, the cost for data conversion to these specialized formats as well as the cost for transpose operations are neglected, as they do not arise in all algorithms. In this paper, we revisit the naive implementation of SpMV for the GPU. Relying on recent advances in GPU hardware, such as fast hardware supported atomic operations and better cache performance, we show that a naive implementation can reach the performance of state-of-the-art SpMV implementations. In case the cost of format conversion and transposition cannot be amortized over many SpMV operations a naive implementation can even outperform state-of-the-art implementations significantly. Experimental results over a variety of data sets suggest that the adoption of the naive serial implementation to the GPU is not as inferior as it used to be on previous hardware generations. The integration of some naive strategies can potentially speed up state-of-the-art GPU SpMV implementations, especially in the transpose case.
    HPEC '16 Best Paper Nominee
    High Performance Extreme Computing, 2016

  • C12
    Visualization-Guided Evaluation of Simulated Minimally Invasive Cancer Treatment
    Philip Voglreiter, Michael Hofmann, Christoph Ebner, Roberto Blanco Sequeiros, Horst Rupert Portugaller, Jurgen Fütterer, Michael Moche, Markus Steinberger, Dieter Schmalstieg:
    Visualization-Guided Evaluation of Simulated Minimally Invasive Cancer Treatment
    Abstract: We present a visualization application supporting interventional radiologists during analysis of simulated minimally invasive cancer treatment. The current clinical practice employs only rudimentary, manual measurement tools. Our system provides visual support throughout three evaluation stages, starting with determining prospective treatment success of the simulation parameterization. In case of insufficiencies, Stage 2 includes a simulation scalar field for determining a new configuration of the simulation. For complex cases, where Stage 2 does not lead to a decisive strategy, Stage 3 reinforces analysis of interdependencies of scalar fields via bivariate visualization. Our system is designed to be immediate applicable in medical practice. We analyze the design space of potentially useful visualization techniques and appraise their effectiveness in the context of our design goals. Furthermore, we present a user study, which reveals the disadvantages of manual analysis in the measurement stage of evaluation and highlight the demand for computer-support through our system.
    Visual Computing for Biology and Medicine, 2016
  • J16
    Fast ANN for High-Quality Collaborative Filtering
    Yun-Ta Tsai, Markus Steinberger, Dawid Pajak, Kari Pulli:
    Fast ANN for High-Quality Collaborative Filtering
    Abstract: Collaborative filtering collects similar patches, jointly filters them and scatters the output back to input patches; each pixel gets a contribution from each patch that overlaps with it, allowing signal reconstruction from highly corrupted data. Exploiting self-similarity, however, requires finding matching image patches, which is an expensive operation. We propose a GPU-friendly approximated-nearest-neighbour(ANN) algorithm that produces high-quality results for any type of collaborative filter. We evaluate our ANN search against state-of-the-art ANN algorithms in several application domains. Our method is orders of magnitudes faster, yet provides similar or higher quality results than the previous work.
    Computer Graphics Forum, 2016
  • J15
    Fast volume reconstruction from motion corrupted stacks of 2D slices
    Bernhard Kainz, Markus Steinberger, Wolfgang Wein, Maria Kuklisova-Murgasova, Christina Malamateniou, Kevin Keraudren, Thomas Torsney-Weir, Mary Rutherford, Paul Aljabar, Joseph V Hajnal, Daniel Rueckert:
    Fast volume reconstruction from motion corrupted stacks of 2D slices
    Abstract: Capturing an enclosing volume of moving subjects and organs using fast individual image slice acquisition has shown promise in dealing with motion artefacts. Motion between slice acquisitions results in spatial inconsistencies that can be resolved by slice-to-volume reconstruction (SVR) methods to provide high quality 3D image data. Existing algorithms are, however, typically very slow, specialised to specific applications and rely on approximations, which impedes their potential clinical use. In this paper, we present a fast multi-GPU accelerated framework for slice-to-volume reconstruction. It is based on optimised 2D/3D registration, super-resolution with automatic outlier rejection and an additional (optional) intensity bias correction. We introduce a novel and fully automatic procedure for selecting the image stack with least motion to serve as an initial registration target. We evaluate the proposed method using artificial motion corrupted phantom data as well as clinical data, including tracked freehand ultrasound of the liver and fetal Magnetic Resonance Imaging. We achieve speed-up factors greater than 30 compared to a single CPU system and greater than 10 compared to currently available state-of-the-art multi-core CPU methods. We ensure high reconstruction accuracy by exact computation of the point-spread function for every input data point, which has not previously been possible due to computational limitations. Our framework and its implementation is scalable for available computational infrastructures and tests show a speed-up factor of 1.70 for each additional GPU. This paves the way for the online application of image based reconstruction methods during clinical examinations. The source code for the proposed approach is publicly available.
    IEEE Transactions on Medical Imaging, 2015
  • J14
    Interactive Disassembly Planning for Complex Objects
    Bernhard Kerbl, Denis Kalkofen, Markus Steinberger, Dieter Schmalstieg:
    Interactive Disassembly Planning for Complex Objects
    Abstract: We present an approach for the automatic generation, interactive exploration and real-time modification of disassembly procedures for complex, multipartite CAD data sets. In order to lift the performance barriers prohibiting interactive disassembly planning, we run a detailed analysis on the input model to identify recurring part constellations and efficiently determine blocked part motions in parallel on the GPU. Building on the extracted information, we present an interface for computing and editing extensive disassembly sequences in real-time while considering user-defined constraints and avoiding unstable configurations. To evaluate the performance of our C++/CUDA implementation, we use a variety of openly available CAD data sets, ranging from simple to highly complex. In contrast to previous approaches, our work enables interactive disassembly planning for objects which consist of several thousand parts and require cascaded translations during part removal.
    Computer Graphics Forum / Eurographics (EG'15), 2015
  • J13
    An overview of dynamic resource scheduling on graphics processors
    Markus Steinberger:
    An overview of dynamic resource scheduling on graphics processors
    Abstract: In this paper, we present a series of scheduling approaches targeted for massively parallel architectures, which in combination allow a wider range of algorithms to be executed on modern graphics processors. At first, we describe a new processing model which enables the efficient execution of dynamic, irregular workloads. Then, we present the currently fastest queuing algorithm for graphics processors, the most efficient dynamic memory allocator for massively parallel architectures, and the only autonomous scheduler for graphics processing units that can dynamically support different granularities of parallelism. Finally, we show how these scheduling approaches help to advance the state-of-the-art in the rendering, visualization and procedural modeling.
    it - Information Technology, 2015
  • C11
    Reyes Rendering on the GPU
    Marting Sattlecker, Markus Steinberger:
    Reyes Rendering on the GPU
    Abstract: In this paper we investigate the possibility of real-time Reyes rendering with advanced effects such as displacement mapping and multidimensional rasterization on current graphics hardware. We describe a first GPU Reyes implementation executing within an autonomously executing persistent Megakernel. To support high quality rendering, we integrate displacement mapping into the renderer, which has only marginal impact on performance. To investigate rasterization for Reyes, we start with an approach similar to nearest neighbor filling, before presenting a precise sampling algorithm. This algorithm can be enhanced to support motion blur and depth of field using three dimensional sampling. To evaluate the performance quality trade-off of these effects, we compare three approaches: coherent sampling across time and on the lens, essentially overlaying images; randomized sampling along all dimensions; and repetitive randomization, in which the randomization pattern is repeated among subgroups of pixels. We evaluate all approaches, showing that high quality images can be generated with interactive to real-time refresh rates for advanced Reyes features.
    Spring Conference on Computer Graphics, 2015
  • PA01
    Efficient Approximate-Nearest-Neighbor (ANN) Search for High-Quality Collaborative Filtering
    Dawid Pajak, Yun-Ta Tsai, Markus Steinberger:
    Efficient Approximate-Nearest-Neighbor (ANN) Search for High-Quality Collaborative Filtering
    Abstract: A computer implemented method of performing an approximate-nearest-neighbor search is disclosed. The method comprises dividing an image into a plurality of tiles. Further, for each of the plurality of tiles, perform the following in parallel on a processor: (a) dividing image patches into a plurality of clusters, wherein each cluster comprises similar images patches, and wherein the dividing continues recursively until a size of a cluster is below a threshold value; (b) performing a nearest-neighbor query within each of the plurality of clusters; and (c) performing collaborative filtering in parallel for each image patch, wherein the collaborative filtering aggregates and processes nearest neighbor image patches from a same cluster containing a respective image patch to form an output image
    US Patent, 2015
  • J12
    Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU
    Markus Steinberger, Michael Kenzel, Pedro Boechat, Bernhard Kerbl, Mark Dokter, Dieter Schmalstieg:
    Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU
    Abstract: Conventional pipelines for capturing, displaying, and storing images are usually defined as a series of cascaded modules, each responsible for addressing a particular problem. While this divide-and-conquer approach offers many benefits, it also introduces a cumulative error, as each step in the pipeline only considers the output of the previous step, not the original sensor data. We propose an end-to-end system that is aware of the camera and image model, enforces natural-image priors, while jointly accounting for common image processing steps like demosaicking, denoising, deconvolution, and so forth, all directly in a given output representation (e.g., YUV, DCT). Our system is flexible and we demonstrate it on regular Bayer images as well as images from custom sensors. In all cases, we achieve large improvements in image quality and signal reconstruction compared to state-of-the-art techniques. Finally, we show that our approach is capable of very efficiently handling high-resolution images, making even mobile implementations feasible.
    ACM Transactions on Graphics (SIGGRAPH Asia'14), 2014
  • J11
    FlexISP: A flexible camera image processing framework
    Felix Heide, Markus Steinberger, Yun-Ta Tsai, Nasa Rouf, Dawid Pajak, Dikpal Reddy, Orazio Gallo, Jing Liu, Wolfgang Heidrich, Karen Egiazarian, Jan Kautz, Kari Pulli:
    FlexISP: A flexible camera image processing framework
    Abstract: Conventional pipelines for capturing, displaying, and storing images are usually defined as a series of cascaded modules, each responsible for addressing a particular problem. While this divide-and-conquer approach offers many benefits, it also introduces a cumulative error, as each step in the pipeline only considers the output of the previous step, not the original sensor data. We propose an end-to-end system that is aware of the camera and image model, enforces natural-image priors, while jointly accounting for common image processing steps like demosaicking, denoising, deconvolution, and so forth, all directly in a given output representation (e.g., YUV, DCT). Our system is flexible and we demonstrate it on regular Bayer images as well as images from custom sensors. In all cases, we achieve large improvements in image quality and signal reconstruction compared to state-of-the-art techniques. Finally, we show that our approach is capable of very efficiently handling high-resolution images, making even mobile implementations feasible.
    ACM Transactions on Graphics (SIGGRAPH Asia'14), 2014
  • C10
    Fast ANN for High-Quality Collaborative Filtering
    Yun-Ta Tsai, Markus Steinberger, Dawid Pajak, Kari Pulli:
    Fast ANN for High-Quality Collaborative Filtering
    Abstract: Collaborative filtering collects similar patches, jointly filters them, and scatters the output back to input patches; each pixel gets a contribution from each patch that overlaps with it, allowing signal reconstruction from highly corrupted data. Exploiting self-similarity, however, requires finding matching image patches, which is an expensive operation. We propose a GPU-friendly approximated-nearest-neighbor algorithm that produces high-quality results for any type of collaborative filter. We evaluate our ANN search against state-of-the-art ANN algorithms in several application domains. Our method is orders of magnitudes faster, yet provides similar or higher-quality results than the previous work.
    HPG'14 Best Paper Award
    High Performance Graphics (HPG '14), 2014

  • O02
    Dynamisches Ressourcen Scheduling auf Grafik Prozessoren
    Markus Steinberger:
    Dynamisches Ressourcen Scheduling auf Grafik Prozessoren
    Ausgezeichnete Informatikdissertationen 2013 (German), 2014
  • J10
    Parallel Irradiance Caching for Interactive Monte-Carlo Direct Volume Rendering
    Rostislav Khlebnikov, Philip Voglreiter, Markus Steinberger, Bernhard Kainz, Dieter Schmalstieg:
    Parallel Irradiance Caching for Interactive Monte-Carlo Direct Volume Rendering
    Abstract: We propose a technique to build the irradiance cache for isotropic scattering simultaneously with Monte Carlo progressive direct volume rendering on a single GPU, which allows us to achieve up to four times increased convergence rate for complex scenes with arbitrary sources of light. We use three procedures that run concurrently on a single GPU. The first is the main rendering procedure. The second procedure computes new cache entries, and the third one corrects the errors that may arise after creation of new cache entries. We propose two distinct approaches to allow massive parallelism of cache entry creation. In addition, we show a novel extrapolation approach which outputs high quality irradiance approximations and a suitable prioritization scheme to increase the convergence rate by dedicating more computational power to more complex rendering areas.
    Computer Graphics Forum / EuroVis, 2014
  • C09
    Show Me the Invisible: Visualizing Hidden Content
    Thomas Geymayer, Markus Steinberger, Alexander Lex, Marc Streit, Dieter Schmalstieg:
    Show Me the Invisible: Visualizing Hidden Content
    Abstract: Content on computer screens is often inaccessible to users because it is hidden, e.g., occluded by other windows, outside the viewport, or overlooked. In search tasks, the efficient retrieval of sought content is important. Current software, however, only provides limited support to visualize hidden occurrences and rarely supports search synchronization crossing application boundaries. To remedy this situation, we introduce two novel visualization methods to guide users to hidden content. Our first method generates awareness for occluded or out-of-viewport content using see-through visualization. For content that is either outside the screen's viewport or for data sources not opened at all, our second method shows off-screen indicators and an on-demand smart preview. To reduce the chances of overlooking content, we use visual links, i.e., visible edges, to connect the visible content or the visible representations of the hidden content. We show the validity of our methods in a user study, which demonstrates that our technique enables a faster localization of hidden content compared to traditional search functionality and thereby assists users in information retrieval tasks.
    CHI '14 Honorable Mention Award
    SIGCHI Conference on Human Factors in Computing Systems (CHI'14), 2014

  • J09
    On-the-fly Generation and Rendering of Infinite Cities on the GPU
    Markus Steinberger, Michael Kenzel, Bernhard Kainz, Peter Wonka, Dieter Schmalstieg:
    On-the-fly Generation and Rendering of Infinite Cities on the GPU
    Abstract: In this paper, we present a new approach for shape-grammar-based generation and rendering of huge cities in real-time on the graphics processing unit GPU. Traditional approaches rely on evaluating a shape grammar and storing the geometry produced as a preprocessing step. During rendering, the pregenerated data is then streamed to the GPU. By interweaving generation and rendering, we overcome the problems and limitations of streaming pregenerated data. Using our methods of visibility pruning and adaptive level of detail, we are able to dynamically generate only the geometry needed to render the current view in real-time directly on the GPU. We also present a robust and efficient way to dynamically update a scene's derivation tree and geometry, enabling us to exploit frame-to-frame coherence. Our combined generation and rendering is significantly faster than all previous work. For detailed scenes, we are capable of generating geometry more rapidly than even just copying pregenerated data from main memory, enabling us to render cities with thousands of buildings at up to 100 frames per second, even with the camera moving at supersonic speed.
    Computer Graphics Forum / Eurographics (EG'14), 2014
  • J08
    Parallel Generation of Architecture on the GPU
    Markus Steinberger, Michael Kenzel, Bernhard Kainz, Jörg Müller, Peter Wonka, Dieter Schmalstieg:
    Parallel Generation of Architecture on the GPU
    Abstract: In this paper, we present a novel approach for the parallel evaluation of procedural shape grammars on the graphics processing unit GPU. Unlike previous approaches that are either limited in the kind of shapes they allow, the amount of parallelism they can take advantage of, or both, our method supports state of the art procedural modeling including stochasticity and context-sensitivity. To increase parallelism, we explicitly express independence in the grammar, reduce inter-rule dependencies required for context-sensitive evaluation, and introduce intra-rule parallelism. Our rule scheduling scheme avoids unnecessary back and forth between CPU and GPU and reduces round trips to slow global memory by dynamically grouping rules in on-chip shared memory. Our GPU shape grammar implementation is multiple orders of magnitude faster than the standard in CPU-based rule evaluation, while offering equal expressive power. In comparison to the state of the art in GPU shape grammar derivation, our approach is nearly 50 times faster, while adding support for geometric context-sensitivity.
    Honorable Mention Award (among top 3 papers)
    Computer Graphics Forum / Eurographics (EG'14), 2014

  • J07
    Noise-based volume rendering for the visualization of multivariate volumetric data
    Rostislav Khlebnikov, Bernhard Kainz, Markus Steinberger, Dieter Schmalstieg:
    Noise-based volume rendering for the visualization of multivariate volumetric data
    Abstract: Analysis of multivariate data is of great importance in many scientific disciplines. However, visualization of 3D spatially-fixed multivariate volumetric data is a very challenging task. In this paper we present a method that allows simultaneous real-time visualization of multivariate data. We redistribute the opacity within a voxel to improve the readability of the color defined by a regular transfer function, and to maintain the see-through capabilities of volume rendering. We use predictable procedural noise--random-phase Gabor noise--to generate a high-frequency redistribution pattern and construct an opacity mapping function, which allows to partition the available space among the displayed data attributes. This mapping function is appropriately filtered to avoid aliasing, while maintaining transparent regions. We show the usefulness of our approach on various data sets and with different example applications. Furthermore, we evaluate our method by comparing it to other visualization techniques in a controlled user study. Overall, the results of our study indicate that users are much more accurate in determining exact data values with our novel 3D volume visualization method. Significantly lower error rates for reading data values and high subjective ranking of our method imply that it has a high chance of being adopted for the purpose of visualization of multivariate 3D data.
    Transactions on Visualization and Computer Graphics (Vis'13), 2013
  • P02
    Volume Rendering with Advanced GPU Scheduling Strategies
    Philip Voglreiter, Markus Steinberger, Rostislav Khlebnikov, Bernhard Kainz, Dieter Schmalstieg:
    Volume Rendering with Advanced GPU Scheduling Strategies
    Abstract: Modern GPUs are powerful enough to enable interactive display of high-quality volume data even despite the fact that many volume rendering methods do not present a natural fit for current GPU hardware. However, there still is a vast amount of computational power that remains unused due to the inefficient use of the available hardware. In this work, we demonstrate how advanced scheduling methods can be employed to implement volume rendering algorithms in a way that better utilizes the GPU by example of three different state-of-the-art volume rendering techniques
    VIS'13 Honorable Mention Poster Award
    IEEE Scientific Visualization poster (Vis'13), 2013

  • C08
    Adaptive Ghosted Views for Augmented Reality
    Denis Kalkofen, Eduardo Veas, Stefanie Zollmann, Markus Steinberger, Dieter Schmalstieg:
    Adaptive Ghosted Views for Augmented Reality
    Abstract: In Augmented Reality (AR), ghosted views allow a viewer to explore hidden structure within the real-world environment. A body of previous work has explored which features are suitable to support the structural interplay between occluding and occluded elements. However, the dynamics of AR environments pose serious challenges to the presentation of ghosted views. While a model of the real world may help determine distinctive structural features, changes in appearance or illumination detriment the composition of occluding and occluded structure. In this paper, we present an approach that considers the information value of the scene before and after generating the ghosted view. Hereby, a contrast adjustment of preserved occluding features is calculated, which adaptively varies their visual saliency within the ghosted view visualization. This allows us to not only preserve important features, but to also support their prominence after revealing occluded structure, thus achieving a positive effect on the perception of ghosted views.
    International Symposium on Mixed and Augmented Reality (ISMAR'13), 2013
  • C07
    OmniKinect: Real-Time Dense Volumetric Data Acquisition and Applications
    Bernhard Kainz, Stefan Hauswiesner, Gerhard Reitmayr, Markus Steinberger, Raphael Grasset, Lukas Gruber, Eduardo Veas, Denis Kalkofen, Hartmut Seichter, Dieter Schmalstieg:
    OmniKinect: Real-Time Dense Volumetric Data Acquisition and Applications
    Abstract: Real-time three-dimensional acquisition of real-world scenes has many important applications in computer graphics, computer vision and human-computer interaction. Inexpensive depth sensors such as the Microsoft Kinect allow to leverage the development of such applications. However, this technology is still relatively recent, and no detailed studies on its scalability to dense and view-independent acquisition have been reported. This paper addresses the question of what can be done with a larger number of Kinects used simultaneously. We describe an interference-reducing physical setup, a calibration procedure and an extension to the KinectFusion algorithm, which allows to produce high quality volumetric reconstructions from multiple Kinects whilst overcoming systematic errors in the depth measurements. We also report on enhancing image based visual hull rendering by depth measurements, and compare the results to KinectFusion. Our system provides practical insight into achievable spatial and radial range and into bandwidth requirements for depth data acquisition. Finally, we present a number of practical applications of our system.
    Symposium On Virtual Reality Software And Technology (VRST'12), 2012
  • J06
    Softshell: Dynamic Scheduling on GPUs
    Markus Steinberger, Bernhard Kainz, Bernhard Kerbl, Stefan Hauswiesner, Michael Kenzel, Dieter Schmalstieg:
    Softshell: Dynamic Scheduling on GPUs
    Abstract: In this paper we present Softshell, a novel execution model for devices composed of multiple processing cores operating in a single instruction, multiple data fashion, such as graphics processing units (GPUs). The Softshell model is intuitive and more flexible than the kernel-based adaption of the stream processing model, which is currently the dominant model for general purpose GPU computation. Using the Softshell model, algorithms with a relatively low local degree of parallelism can execute efficiently on massively parallel architectures. Softshell has the following distinct advantages: (1) work can be dynamically issued directly on the device, eliminating the need for synchronization with an external source, i.e., the CPU; (2) its three-tier dynamic scheduler supports arbitrary scheduling strategies, including dynamic priorities and real-time scheduling; and (3) the user can influence, pause, and cancel work already submitted for parallel execution. The Softshell processing model thus brings capabilities to GPU architectures that were previously only known from operating-system designs and reserved for CPU programming. As a proof of our claims, we present a publicly available implementation of the Softshell processing model realized on top of CUDA. The benchmarks of this implementation demonstrate that our processing model is easy to use and also performs substantially better than the state-of-the-art kernel-based processing model for problems that have been difficult to parallelize in the past.
    ACM Transactions on Graphics (SIGGRAPH Asia'12), 2012
  • C06
    Volumetric Real-Time Particle-Based Representation of Large Unstructured Tetrahedral Polygon Meshes
    Philip Voglreiter, Markus Steinberger, Dieter Schmalstieg, Bernhard Kainz:
    Volumetric Real-Time Particle-Based Representation of Large Unstructured Tetrahedral Polygon Meshes
    Abstract: In this paper we propose a particle-based volume rendering approach for unstructured, three-dimensional, tetrahedral polygon meshes. We stochastically generate millions of particles per second and project them on the screen in real-time. In contrast to previous rendering techniques of tetrahedral volume meshes, our method does not need a prior depth sorting of geometry. Instead, the rendered image is generated by choosing particles closest to the camera. Furthermore, we use spatial superimposing. Each pixel is constructed from multiple subpixels. This approach not only increases projection accuracy, but allows also a combination of subpixels into one superpixel that creates the well-known translucency effect of volume rendering. We show that our method is fast enough for the visualization of unstructured three-dimensional grids with hard real-time constraints and that it scales well for a high number of particles.
    Mesh Processing in Medical Image Analysis (MICCAI'12), 2012
  • J05
    Procedural Texture Synthesis for Zoom-Independent Visualization of Multivariate Data
    Rostislav Khlebnikov, Bernhard Kainz, Markus Steinberger, Marc Streit, Dieter Schmalstieg:
    Procedural Texture Synthesis for Zoom-Independent Visualization of Multivariate Data
    Abstract: Simultaneous visualization of multiple continuous data attributes in a single visualization is a task that is important for many application areas. Unsurprisingly, many methods have been proposed to solve this task. However, the behavior of such methods during the exploration stage, when the user tries to understand the data with panning and zooming, has not been given much attention. In this paper, we propose a method that uses procedural texture synthesis to create zoom-independent visualizations of three scalar data attributes. The method is based on random-phase Gabor noise, whose frequency is adapted for the visualization of the first data attribute. We ensure that the resulting texture frequency lies in the range that is perceived well by the human visual system at any zoom level. To enhance the perception of this attribute, we also apply a specially constructed transfer function that is based on statistical properties of the noise. Additionally, the transfer function is constructed in a way that it does not introduce any aliasing to the texture. We map the second attribute to the texture orientation. The third attribute is color coded and combined with the texture by modifying the value component of the HSV color model. The necessary contrast needed for texture and color perception was determined in a user study. In addition, we conducted a second user study that shows significant advantages of our method over current methods with similar goals. We believe that our method is an important step towards creating methods that not only succeed in visualizing multiple data attributes, but also adapt to the behavior of the user during the data exploration stage.
    Computer Graphics Forum (EuroVis12), 2012
  • J04
    Interactive Self-Organizing Windows
    Markus Steinberger, Manuela Waldner, Dieter Schmalstieg:
    Interactive Self-Organizing Windows
    Abstract: In this paper, we present the design and implementation of a dynamic window management technique that changes the perception of windows as fixed-sized rectangles. The primary goal of self-organizing windows is to automatically display the most relevant information for a user's current activity, which removes the burden of organizing and arranging windows from the user. We analyze the image-based representation of each window and identify coherent pieces of information. The windows are then automatically moved, scaled and composed in a contentaware manner to fit the most relevant information into the limited area of the screen. During the design process, we consider findings from previous experiments and show how users can benefit from our system. We also describe how the immense processing power of current graphics processing units can be exploited to build an interactive system that finds an optimal solution within the complex design space of all possible window transformations in real time.
    Computer Graphics Forum / Eurographics (EG'12), 2012
  • C05
    ScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU
    Markus Steinberger, Michael Kenzel, Bernhard Kainz, Dieter Schmalstieg:
    ScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU
    Abstract: In this paper, we analyze the special requirements of a dynamic memory allocator that is designed for massively parallel architectures such as Graphics Processing Units (GPUs). We show that traditional strategies, which work well on CPUs, are not well suited for the use on GPUs and present the thorough design of ScatterAlloc, which can efficiently deal with hundreds of requests in parallel. Our allocator greatly reduces collisions and congestion by scattering memory requests based on hashing. We analyze ScatterAlloc in terms of allocation speed, data access time and fragmentation, and compare it to current state-of-the-art allocators, including the one provided with the NVIDIA CUDA toolkit. Our results show, that ScatterAlloc clearly outperforms these other approaches, yielding speed-ups between 10 to 100.
    Innovative Parallel Computing (InPar'12), 2012
  • C04
    Multi-GPU Image-based Visual Hull Rendering
    Stefan Hauswiesner, Rostislav Khlebnikov, Markus Steinberger, Mathias Straka, Gerhard Reitmayr:
    Multi-GPU Image-based Visual Hull Rendering
    Abstract: Many virtual mirror and telepresence applications require novel viewpoint synthesis with little latency to user motion. Image-based visual hull (IBVH) rendering is capable of rendering arbitrary views from segmented images without an explicit intermediate data representation, such as a mesh or a voxel grid. By computing depth images directly from the silhouette images, it usually outperforms indirect methods. GPU-hardware accelerated implementations exist, but due to the lack of an intermediate representation no multi-GPU parallel strategies and implementations are currently available. This paper suggests three ways to parallelize the IBVH-pipeline and maps them to the sorting classification that is often applied to conventional parallel rendering systems. In addition to sort-first parallelization, we suggest a novel sort-last formulation that regards cameras as scene objects. We enhance this method’s performance by a block-based encoding of the rendering results. For interactive systems with hard real-time constraints, we combine the algorithm with a multi-frame rate (MFR) system. We suggest a combination of forward and backward image warping to improve the visual quality of the MFR rendering. We observed the runtime behavior of the suggested methods and assessed how their performance scales with respect to input and output resolutions and the number of GPUs. By using additional GPUs, we reduced rendering times by up to 60%. Multi-frame rate viewing can even be ten times faster.
    Eurographics Symposium on Parallel Graphics and Visualization (EGPGV'12), 2012
  • J03
    Ray Prioritization Using Stylization and Visual Saliency
    Markus Steinberger, Bernhard Kainz, Stefan Hauswiesner, Rostislav Khlebnikov, Denis Kalkofen, Dieter Schmalstieg:
    Ray Prioritization Using Stylization and Visual Saliency
    Abstract: This paper presents a new method to control scene sampling in complex ray-based rendering environments. It proposes to constrain image sampling density with a combination of object features, which are known to be well perceived by the human visual system, and image space saliency, which captures effects that are not based on the object’s geometry. The presented method uses Non- Photorealistic Rendering techniques for the object space feature evaluation and combines the image space saliency calculations with image warping to infer quality hints from previously generated frames. In order to map different feature types to sampling densities, we also present an evaluation of the object space and image space features’ impact on the resulting image quality. In addition, we present an efficient, adaptively aligned fractal pattern that is used to reconstruct the image from sparse sampling data. Furthermore, this paper presents an algorithm which uses our method in order to guarantee a desired minimal frame rate. Our scheduling algorithm maximizes the utilization of each given time slice by rendering features in the order of visual importance values until a time constraint is reached. We demonstrate how our method can be used to boost or stabilize the rendering time in complex ray- based image generation consisting of geometric as well as volumetric data.
    Computers & Graphics, 2012
  • C03
    Display-Adaptive Window Management for Irregular Surfaces
    Manuela Waldner, Raphael Grasset, Markus Steinberger, Dieter Schmalstieg:
    Display-Adaptive Window Management for Irregular Surfaces
    Abstract: Current projectors can easily be combined to create an everywhere display, using all suitable surfaces in offices or meeting rooms for the presentation of information. However, the resulting irregular display is not well supported by tradi tional desktop window managers, which are optimized for rectangular screens. In this paper, we present novel display-adaptive window management techniques, which provide semi-automatic placement for desktop elements (such as windows or icons) for users of large, irregularly shaped displays. We report results from an exploratory study, which reveals interesting emerging strategies of users in the manipulation of windows on large irregular displays and shows that the new techniques increase subjective satisfaction with the window management interface.
    Interactive Tabletops and Surfaces (ITS'11), 2011
  • J02
    Context-Preserving Visual Links
    Markus Steinberger, Manuela Waldner, Marc Streit, Alexander Lex, Dieter Schmalstieg:
    Context-Preserving Visual Links
    Abstract: Evaluating, comparing, and interpreting related pieces of information are tasks that are commonly performed during visual data analysis and in many kinds of information-intensive work. Synchronized visual highlighting of related elements is a well-known technique used to assist this task. An alternative approach, which is more invasive but also more expressive is visual linking in which line connections are rendered between related elements. In this work, we present context-preserving visual links as a new method for generating visual links. The method specifically aims to fulfill the following two goals: first, visual links should minimize the occlusion of important information; second, links should visually stand out from surrounding information by minimizing visual interference. We employ an image-based analysis of visual saliency to determine the important regions in the original representation. A consequence of the image-based approach is that our technique is application-independent and can be employed in a large number of visual data analysis scenarios in which the underlying content cannot or should not be altered. We conducted a controlled experiment that indicates that users can find linked elements in complex visualizations more quickly and with greater subjective satisfaction than in complex visualizations in which plain highlighting is used. Context-preserving visual links were perceived as visually more attractive than traditional visual links that do not account for the context information.
    InfoVis '11 Best Paper Award
    IEEE Transactions on Visualization and Computer Graphics (InfoVis '11), 2011

  • C02
    Stylization-based ray prioritization for guaranteed frame rates
    Bernhard Kainz, Markus Steinberger, Stefan Hauswiesner, Rostislav Khlebnikov, Dieter Schmalstieg:
    Stylization-based ray prioritization for guaranteed frame rates
    Abstract: This paper presents a new method to control graceful scene degradation in complex ray-based rendering environments. It proposes to constrain the image sampling density with object features, which are known to support the comprehension of the three-dimensional shape. The presented method uses Non-Photorealistic Rendering (NPR) techniques to extract features such as silhouettes, suggestive contours, suggestive highlights, ridges and valleys. To map different feature types to sampling densities, we also present an evaluation of the features impact on the resulting image quality. To reconstruct the image from sparse sampling data, we use linear interpolation on an adaptively aligned fractal pattern. With this technique, we are able to present an algorithm that guarantees a desired minimal frame rate without much loss of image quality. Our scheduling algorithm maximizes the use of each given time slice by rendering features in order of their corresponding importance values until a time constraint is reached. We demonstrate how our method can be used to boost and guarantee the rendering time in complex ray-based environments consisting of geometric as well as volumetric data.
    NPAR '11 Best Paper Award in Rendering
    Non-photorealistic Animation and Rendering (NPAR '11), 2011

  • J01
    Importance-Driven Compositing Window Management
    Manuela Waldner, Markus Steinberger, Raphael Grasset, Dieter Schmalstieg:
    Importance-Driven Compositing Window Management
    Abstract: In this paper we present importance-driven compositing window management, which considers windows not only as basic rectangular shapes but also integrates the importance of the windows' content using a bottom-up visual attention model. Based on this information, importance-driven compositing optimizes the spatial window layout for maximum visibility and interactivity of occluded content in combination with see-through windows. We employ this technique for emerging window manager functions to minimize information overlap caused by popping up windows or floating toolbars and to improve the access to occluded window content. An initial user study indicates that our technique provides a more effective and satisfactory access to occluded information than the well-adopted Alt+Tab window switching technique and see-through windows without optimized spatial layout.
    CHI '11 Honorable Mention Award
    Human Factors in Computing Systems (CHI '11), 2011

  • P01
    Using Perceptual Features to Prioritize Ray-based Image Generation
    Bernhard Kainz, Markus Steinberger, Stefan Hauswiesner, Rostislav Khlebnikov, Denis Kalkofen, Dieter Schmalstieg:
    Using Perceptual Features to Prioritize Ray-based Image Generation
    Abstract: A common challenge in interactive image generation is maintaining high interactivity of the applications that use computationally demanding rendering algorithms. This is usually achieved by sacrificing some of the image quality in order to decrease the rendering time. Most of such algorithms achieve interactive frame rates while trying to preserve as much image quality as possible by applying the reduction steps non-uniformly. However, high-end rendering systems, such as those presented by Parker et al. in 2010, aim to generate highly realistic images of very complex scenes. In such systems ordinary sampling approaches often give visually unacceptable results. In order to allow to optimize the ratio between the sampling rate of the scene and its resulting perceptual quality, we present a new sampling strategy which uses the information about object features that are known to support the comprehension of 3D shape. We control the sampling density by exploiting line extraction techniques commonly used in Non-Photorealistic Renderings.
    Symposium on Interactive 3D Graphics and Games 2011 (I3D), 2011
  • C01
    Wavelet-based Multiresolution Isosurface Rendering
    Markus Steinberger, Markus Grabner:
    Wavelet-based Multiresolution Isosurface Rendering
    Abstract: We present an interactive rendering method for isosurfaces in a voxel grid. The underlying trivariate function is represented as a spline wavelet hierarchy, which allows for adaptive (view-dependent) selection of the desired level-of-detail by superimposing appropriately weighted basis functions. Different root finding techniques are compared with respect to their precision and efficiency. Both wavelet reconstruction and root finding are implemented in CUDA to utilize the high computational performance of Nvidia's hardware and to obtain high quality results. We tested our methods with datasets of up to 5123 voxels and demonstrate interactive frame rates for a viewport size of up to 1024x768 pixels.
    Eurographics/IEEE VGTC Symposium on Volume Graphics, 2010
  • O01
    Multiresolution Isosurface Rendering
    Markus Steinberger:
    Multiresolution Isosurface Rendering
    Abstract: In this paper we propose a new technique for isosurface rendering of volume data. Medical data visualization, e.g. relies on exact and interactive isosurface renderings. We show how to construct a multi resolution view of the data using bi-orthogonal spline wavelets and how to perform fast rendering using raycasting implemented on the GPU. This approach benefits from the properties of both: the wavelet transform and the reconstruction using three di- mensional splines. The smoothness of surfaces is zoom level independent and data can be compressed to speed up rendering while still being able to show full detail quickly. Ray evaluation is implemented in model space to enable perspective rendering. Due to the fact that the isosurface is not extracted from the data beforehand, the isosurface level as well as the current resolution level can be changed without any further computation.
    CESCG '09 3rd Best Paper Award
    Central European Seminar on Computer Graphics (CESCG '09), 2009