Berk Geveci     About     Archive     Feed

VTK - my perspective on direction and upcoming developments

We have been having great discussions on the VTK developers mailing list about encouraging new developers to join VTK and about keeping existing contributors engaged. Here is a great quote from one of David Gobbi's emails:

Community engagement requires:

  • Responsive communication. You don't need to always have the answer, you just need to not be silent.

  • Dedication. Share your dreams and show that you care about the future. Help your employees to do the same.

  • Transparency. Lay out the roadmap for all to see. Summarize important development meetings on the wiki.

Here is my attempt to addressing the dedication and transparency parts. At Kitware, we have been doing some wonderful things with and to VTK and we plan doing much more. We have not been great at communicating these however. Here is a quick summary of things that I have been involved in doing and planning.

Overhaul of the rendering infrastructure:

We are renovating geometry and volume rendering in VTK. The goal is to move to a much more modern OpenGL implementation and to support mobile platforms through OpenGL ES. See Marcus' blog on the geometry side of the work. Volume rendering blog is in the works. This is part of a larger effort under the VTK Maintenance Grant from NIH. See here for details.

Improved multi-threaded execution

We have introduced a new infrastructure to support the development of multi-threaded algorithms in VTK. The goal is to support structured and unstructured data. See this document for a summary of initial work. These are just the first steps. We plan on doing much more in the near future.

Improved data model

This is a broad category that covers many things such as no-copy data adaptors for in situ analysis ([1], [2]) to better support for ghost arrays and blanking. Some of our goals include

  • Support for correctness in distributed filters (better support for blanking and ghosts)

  • Support for more dataset types

  • Support for different memory layouts for same dataset types

  • More efficient APIs

  • Better support for thread safety

This will be an area of strong focus in the coming years.

Better pipeline support for distributed parallel execution

See this document for some recent work on this. There are some aspects of the data parallel pipeline that do not work well with all filter and data types. For example, the imaging pipeline and its extent based streaming mechanism did not always work well when mixed with piece based parallelism. We made some progress in addressing these issues but there is more work. One particular pet peeve of mine is how the imaging filters do allocation of the output before RequestData() requiring all meta-data about scalars etc. be available during RequestInformation(). I'd like to see meta-data be optional - if it is there, great, use it for optimization, if not, do the slow path if necessary. This would make dealing with mixed imaging and unstructured pipelines as well as multi-block datasets easier.

Better data model and pipeline support for multi-resolution streaming

The goal being supporting hiearchical datasets (octrees of volumes or particles for example) that can be streamed in an adaptive way through the pipeline. Most of the key infrastructure for this is already in place. We need some formalization around information keys, data layouts in composite datasets, reader interfaces etc.

Better pipeline support for ensemble datasets

Think of ensemble datasets as an arbitrary collection of related datasets. They may be generated by traversing an input parameter space in a simulation, by injecting various types on uncertainty to a simulation or be running the same simulation using different models. The common goals for the analysis and visualization of ensemble datasets are comparative visualization, linked parameter space - ensemble member visualization, statistical analysis of ensemble members and uncertainty visualization. Scientific data is rarely a single dataset these days and VTK needs to expand to provide first class support to ensembles. We made some progress towards improvements in the pipeline to support streaming of ensemble datasets (which I will write about in the future) but there is a lot more work to be done.

Improved numpy integration

VTK has had a decent relationship with Python and numpy for a long time. However, things could be much better. We recently migrated ParaView's numpy integration code to VTK and made some significant improvements to it. Although I call this work "numpy integration", it is much more than simply expositing VTK arrays as numpy arrays. VTK brings a full-featured data model that goes beyond n-dimensional arrays to the picture. Plus the code in VTK fully supports distributed parallelism and can support other types of parallelism such as multi-threaded parallelism, vector parallelism and GPU acceleration as applied to its data model. How we can expose some of this functionality while maintaining numpy-style ease-of-use is something we are likely to focus on in the future.

Fixes to executives

Executives have grown organically and collected a lot of gunk. We need to clean them up to make them more accessible to developers. They need to be easily understood and expanded. We also have some issues in pipeline traversal that become problematic when dealing diamond-shaped pipelines.

This is a partial list. There is a lot more improvements that will likely be made at the dataset and algorithm level. Also, I only focused on work that I am pretty familiar with. I hope that others will chime in and add their perspective to mine.

Note: This article was originally published on the Kitware blog. Please see the Kitware web site, the VTK web site and the ParaView web site for more information.