WholeSlide

Technical References

Organizations

This contains a cross-section of some of the current software in use today. It is by no extent complete but it may provide some utility to those entering into the field. For a comprehensive list of tools, visit one of the following sites:

Managing Data

The entry point for this problem, image acquisition, falls outside of the scope of this document. Here we assume that data has been collected from an optimized imaging protocol using modern equipment. The image data is/are stored in a temporary location until metadata is fully populated and a final archival location has been chosen. A good example of this form of workflow organization can be found at NCMIR/CRBS, UCSD. Three key decisions must be made for final archival: storage format, metadata content, and server architecture.

FORMATS:

There are many formats designed specifically for large image content. These formats have high compression factors, fast random access, and excellent file-system compatibility for each file may be multiple gigabytes in size. The most commonly used is the jpeg2000 file format. The jpeg2k format provides a ~3:1 lossless compression ratio (or greater) with random access support and embedded metadata. As an example, a 2GB raw ISH image (715 megapixel) when encoded in lossless jpeg2k takes up approximately 400 MB of disk space. A few viable formats are listed below:

  • BIgTiff – Modification to Tiff format for >2GB file support and embedded metadata
  • DeepZoom – Microsoft’s tiled pyramid stack format, used in SeaDragon and Pivot.
  • BioHDF – Extension of the HDF5 file format to biological domain (DNA, expression data, images)
  • To add:NDPI, SVS, VMS, OLYMPUS

METADATA

Primary image metadata describes image size, format, native resolution, organism, creator, and date acquired. Secondary metadata is information generated from operations (analysis, transformations, and statistics) performed on the raw data. Data provenance is essential for shared image analysis and collaborative approaches to ensure reproducibility post-publication. Two common approaches for storing metadata are internal (within the image file itself) and external (within a separate database). While proprietary data structures exist for metadata, XML is often a common ground between the internal and external storage methods through the use of common language defined by a standard data object model and/or schema.

  • DICOM 145 – Recent supplement to DICOM standard supporting pyramidal file formats, based on JP2K
  • To add: GeoTiff, kml

SERVER ARCHITECTURE

Server configurations vary based on the format chosen and access requirements needed for the data itself. Three configurations exist: basic, interactive, and analytical.

A basic server configuration provides raw data storage (ftp/sftp) and database access for metadata purposes. The basic configuration should be viewed as a redundant storage location with all visualization and analysis being performed on secondary workstations. Many research groups utilize this configuration for their image storage solution. An interactive configuration extends the basic configuration with a layer of interaction often presented through a web interface or thin-client solution. This configuration requires a middleware solution often developed internally by a lab or research group to present pertinent data to the user. Many examples of this solution exist online.

An analytical server configuration extends the interactive configuration with a processing backend, usually taking the form of a grid computing resource. These configurations require significant planning prior to implementation and require additional software tools to handle processing workflows and job management. Very few public examples of this type of configuration exist in practice with no unifying standards or steering groups.

  • Open Microscopy Environment Remote Environment – “OMERO is client-server software for visualisation, management and analysis of biological microscope images”, Java-based, broad image format and analysis support.
  • caMicroscope/caBig – Pathology tools for whole slide imaging, Focus on cancer research, Extensively used but difficult to implement without third-party support and funding

STORAGE

The final component of server configuration is the storage location itself. There are several middleware solutions available to manage large data repositories. These tools abstract the physical storage (big iron, network, and raid configurations) for simplified interfacing and management. Cloud-based storage options have become a viable cost-effective and should be considered when implementing an archival storage solution (e.g. Amazon’s S3 and Elastic Block Store options).

  • IRODS – Integrated Rule-Oriented Data System, currently used by CRBS at UCSD. See also HDF-IRODS.
  • Hadoop File System – a distributed file system built for the Hadoop grid management software.
  • To add: nosqldb with spatial indices: mongodb, geocouch

ANALYSIS PLATFORMS

There are more software tools available than necessary for the purposes of this document. Below are a few common platforms, libraries, and environments used.

INTEGRATED TOOLS:

Most microscopy companies provide integrated tools for analyzing data collected from the scope. These tools often include smart-thresholding, volume rendering, 4-D particle tracking, and statistical image analyses.

  • MicroBrightField – Commercial options for neuroscience, microscopy analysis, and stereology
  • Bitplane Scientific Software – commercial options for neuroscience and stack microscopy
  • Visage Engineering: Amira – Commercial visualization package with plugins for multiple data types and usage scenarios. Also maintains common mesh/contour formats.

LIBRARIES AND DEVELOPMENT KITS

A large amount of work has been done through both open and commercial image analysis communities to produce toolkits and processing libraries that can be licensed freely for academic use. These toolkits include image conversion, transformation, registration, and segmentation algorithms developed across a wide range of technical fields. Many of these libraries can be chained together directly or through minimal wrapping code (python, matlab, or similar).

  • InsightToolKit – Heavily optimized c++ image analysis library, considered the gold standard in large scale image operations and analysis. Supported by NIH.
  • OpenSlide – Virtual slide library, provides access to many third-party virtual slide formats
  • ImageJ/FIJI – open source image processing library developed in Java, supported by NIH. Powerful scripting and plugin development environment with rapid pixel operations. See also: FIJI. http://rsbweb.nih.gov/ij/
  • Enthought Tool Suite – collection of scientific software tools wrapped together with Python interconnects to create a very capable cross-platform processing environment. See also: Python(x,y) and Sage.
  • PointCloudLibrary
  • OpenCV
  • Nasa Vision Toolbox

Recent trends toward cloud computing have generated interest in all-in-one packages built from the OS up. Debian:Med, Debian:CogSci, and neurodebian are linux distributions that include many domain-relevant analysis packages. These repositories can be made into virtual machine images that can be either run locally or hosted on a public cloud such as Amazon EC2 for rapid deployment and scaling of resources to the computational problem.

VISUALIZATION

Visualization of multimodal data in a shared environment is a difficult task. Efforts to create a coordinate space ‘rosetta stone’ are well under way at multiple institutions (UCSD, ABA, INCF). Once positioned, engines are used to render and interact with the different data types. Controls for visual properties, display order, and post-processing effects (transformations, filters, OpenGL shaders) are common. To implement such a display, there are capable libraries, integrated tools, and commercial platforms. Not covered here are volume-rendering tools, e.g. V3D, PSC volume browser, ImageVis3D, etc.

LIBRARIES

  • VTK – Visualization Toolkit. Similar to the Insight Toolkit, this is an NIH-supported toolkit to handle the display of almost every type of image and polygonal data. Also supports complete volume rendering tools.
  • Visualization Library – a less-supported toolkit, also developed for visualization purposes with a c++ backend.

INTEGRATED OPTIONS

Python distributions, c++ frameworks, and extensible software packages.

  • Devide – a python distribution that integrates ITK, VTK, numpy, and a workflow management system. Cross platform.
  • MITK – Medical imaging toolkit. Similar to Devide, designed as a common environment to extend ITK/VTK for medical visualization and the development of surgical tools.
  • MeVisLab – Commercial framework, integrates ITK, VTK and workflow tools.
  • Paraview – A framework developed by kitware around VTK. Can be hooked into ITK, openFOAM, and other simulation tools as a visualization frontend.
  • 3D Slicer – A framework for neuroimaging analysis and display. One of the most powerful tools available with a large user community and robust plugin architecture.
  • VisTrails – Visualization framework, integrates ITK/VTK in a python environment.
  • GoFigure2 – “Slicer3 for cells”. A tool for smart segmentation and volume rendering.
  • StackVis – HRI tile viewer with 3d display and annotation, originally open-source but now licensing is unclear. Developed at UC, Davis by E. Jones.
  • Vano – Volume render and annotation library. Developed at HHMI/Janelia by C. Peng
  • BrainExplorer (½) – Allen Institute software to visualize the reference atlas and gene expression results in a full 3D environment. Free but closed source.
  • Mayavi – See Enthought Tool Suite
  • Blender – Python based modeling tool
  • MeshLab