The InfoVis Toolkit

Jean-Daniel Fekete

INRIA

Jean-Daniel.Fekete@inria.fr

The contents of this book can be freely used and distributed as far as the source is mentioned as a reference that is, its URL: http://www.lri.fr/~fekete/InfovisToolkit/manual.html.

Revision History
Revision 0.4	March 2004
Extended the tutorial and installation instructions.
Revision 0.3	February 2004
Updated to release 0.6 of the Toolkit
Revision 0.2	October 2003
Description of columns, Metadata and tables.
Revision 0.1	January 2003
Start of document

Table of Contents

1. Introduction

Installing the InfoVis Toolkit
Sample Data and File Formats
Supported File Formats

2. Tutorial

Starting with the InfoVis Toolkit

Organization of the InfoVis Toolkit

Carrying on: Visualizing a Tree as a Treemap

Specifying Visual Attributes

Using Fisheye Lenses or Dynamic Labeling

Creating and Managing Data Structures

Managing Tables

Managing Trees

Managing Graphs

3. Data Structures

Columns

Presentation Format
NumberColumns
Dense Columns
Sparse Columns
Column Dependencies

Standard Description
Aggregation
Converting to Number Columns
Color Scheme Categories

4. Visualizations

Implementation of the Visualization Classes

Displaying Items in Visualizations
Visual Column Management
Color Management
Picking

Table Visualizations

Stroking and Link Visualizations

Tree Visualizations

Graph Visualizations

Dynamic Labeling

Fisheye Lenses

5. Interaction

Dynamic Queries

6. Readers and Writers

Table Readers and Writers
Tree Readers and Writers
Graph Readers and Writers

7. Design Patterns

Abstract Factory
Observer
Visitor

Index

List of Figures

1.1. Examples of Visualizations Provided by the InfoVis Toolkit
3.1. The Table interface
3.2. The Tree interface
3.3. The Graph interface

List of Examples

2.1. Simple visualization program for time series data
2.2. Simple visualization program for hierarchical data
3.1. Creating a LongColumn Reading Dates in the Unix Format

Chapter 1. Introduction

Table of Contents

Installing the InfoVis Toolkit
Sample Data and File Formats
Supported File Formats

The InfoVis toolkit is a software package aimed at simplifying the development of Information Visualization Systems. It is written in Java, capitalizing on its rich interactive graphics environment and portability.

Information Visualization is a domain that emerged in the early 1990 and has expanded at a steady pace since then, showing great results, innovative concepts and techniques. So many concepts and techniques that it is challenging to keep pace with the implementation of the most useful techniques. The InfoVis toolkit is designed to be a repository of know-how for building highest quality information visualization systems.

Practically, visualizations translate data items into visual marks. Data items are made of typed data attributes organized in a data structure, such as employee records in a table, or files in a hierarchical file system. Visual marks are eventually made of colored pixels on a screen, but can more accurately be described as geometric entities displayed with graphical attributes such as color or transparency. A visualization technique consists in translating each data item into a geometry and related graphical attributes to draw them on screen in a specified order. The visualization toolkit provides several visualization techniques for three main data structures: tables, trees and graphs. It also provides mechanisms to add visualization techniques and data structures.

Figure 1.1. Examples of Visualizations Provided by the InfoVis Toolkit

Visualizations also involves interacting with the visualized items and the data structures. Interactions include navigation into the data structure — including zooming, dynamically filtering visualized items — including dynamic queries, and various dynamic techniques such as space deformations using fisheye lenses or dynamic labeling. The visualization toolkit provides several techniques for interacting with the visualizations and mechanisms to add new interaction techniques.

Interacting with visualizations require stringent constraints on time. Dynamic updates of visualization should be performed in less than 100ms to appear smooth and continuous. This constraint requires the data structures to be organized in memory in a way specially crafted for fast access. The visualization toolkit provides homogeneous data structures to store visualization data into memory. All data structures are represented as tables of columns. Each column contain data of homogeneous data type such as integers, floating point values or strings. This mechanism allows new columns to be added to existing data structures, allowing new attributes to be computed and used to enrich visualizations in a uniform way.

This document is both a tutorial and a user's manual. The reference information can be found on the HTML descriptions produced by Javadoc and will keep synchronized with the code more closely than this manual.

Installing the InfoVis Toolkit

The InfoVis Toolkit is a Java library, designed to work with Java/Swing components. It requires a Java 1.4 runtime or above for running the library programs and examples. A Java compiler is required to create applications using the InfoVis Toolkit. The InfoVis Toolkit is composed of one main library jar file called infovis.jar and several supporting libraries and programs. A sample application can be run using directly from the infovis.jar library using the following command, assuming the file infovis.jar is in the current directory: "java -jar infovis.jar"

To compile applications or examples using the InfoVis Toolkit, the file infovis.jar should be added to the Java class path. This is generally done using an option of you favorite IDE for Java or by setting the shell variable CLASSPATH to the location of the file infovis.jar. Furthermore, you need to have other library files available to run the InfoVis Toolkit, all distributed with the InfoVis Toolkit in the "lib" directory:

agile2d.jar and agile2d_opengl.jar: Agile2D library files. These files are used by the InfoVis Toolkit to support hardware accelerated graphics when available. See the official site of Agile2D.
antlrall.jar: ANTLR library file, used to parse some complex file formats such as the Newick Tree format or the DOT Graph format. See the official site of ANTLR.
gl4java.jar, gl4java-glffonts.jar and gl4java-glutfonts.jar: The Gl4java library files. These files are used by Agile2D to access the OpenGL low level library of hardware accelerated graphics. gl4java can be downloaded at http://www.jausoft.com/products/gl4java/gl4java_main.html
png.jar: Library allowing the loading of image file of type PNG.
xml-writer.jar: A library simplifying the writing of XML files following the syntactic conventions of XML. This library is used for XML file writers.

Furthermore, some programs are used by the toolkit to layout graphs: dot, neato and twopi. These programs are provided by the GraphViz package of AT&T, available at the official site of GraphViz. These programs should be in the executable PATH because they will be called from the InfoVis Toolkit when graph layout is used.

Sample Data and File Formats

The InfoVis Toolkit is distributed with a set of sample data files to try and test the toolkit or user applications. They are also used as examples of the file formats supported.

52weeks.tqd and salivary.tqd: Time series data provided by Harry Hochheiser from the University of Maryland. The TQD format is a variant of the CSV format used by Excel and similar applications.
ABCYIM.clustalw and ABCYabc.clustalw: Phylogenetic trees in Newick format, used by the InfoVis 2003 Contest and provided by Elie Dassa from Institut Pasteur, France.
ABCYIM.xml and ABCYabc.xml: Phylogenetic trees in XML format, same as previous files in another format.
election.tm3 and nba.tm3: Analysis data for the USA 2002 elections in a tree format, by region and state. These file are part of the Treemap4 distribution from the University of Maryland.
nodelinktest.xml, testtree.xml, testtreeml.xml and tree3.xml: Various trees in XML using the treeml dtd provided in the same directory.
jsort.dot and testgraph2.dot: Graphs in the DOT format. The first graph is distributed with the GraphViz system and the second is a sample graph provided for testing. The file jsort.out.dot provides the graph jsort.dot graph positioned by the dot program.
testgraph.xml and testgraph2.xml: Two graphs using the GraphML DTD provided in the same directory.

Supported File Formats

The InfoVis Toolkit supports a limited but extensible set of file formats. Section Chapter 6, Readers and Writers explains how file formats are read and written and how to extend the list of supported file formats.

CSV Format

Format used by Excel and similar programs to export files using a readable format. CSV files contain tabular organized as lines, each element separated by a specific character. The InfoVis Toolkit reader for CSV file can be configured to read any delimiter but, for by default, the separator is the semicolon character.

On some CSV files, the first line contain the names of the columns. The CSV reader can be configured to read them accordingly. On some file also, the second line contain column type names such as "integer" or "string". The CSV reader can also be configured to take that line into account.

Finally, some CSV files use initial lines for heading information not relevant to the table. the CSV reader can be configured to skip a specified number of initial lines.

So far, all these configurations should be specified by program. A useful extension to the toolkit would allow a user to load a file an specify interactively the right parameters to load any kind of CSV file, but this module remains to be done. Meanwhile, applications should configure the loader programmatically.

Many file readers use that format as a base. This format supports an arbitrary number of attributes associated with table rows.

XML format

XML is a meta-syntax for describing data files. The InfoVis Toolkit supports several instances of XML documents for all its data types. XML specific syntaxes is specified inside XML files using a DTD declaration usually or a starting "tag" expressing its contents. The InfoVis Toolkit recognizes some specific XML syntaxes and can load any XML files as a tree.

TQD

The TQD format is a variant of the CSV format using some initial lines to describe the file contents and a coma as column separator. You should take a look at the sample files for more hints about the format.

Newick format

The Newick format is used in Biology to describe phylogenetic trees. It is a weird parenthesized format described at this url: http://evolution.genetics.washington.edu/phylip/newicktree.html.

This format supports a limited number of attributes associated with nodes.

TM3 format

The TM3 format is the internal format used by the Treemap4 program. It is a variant of the CSV format, using the TAB character as separator. Its first line contains the column names, the second line contains the column types. The remaining lines contain one entry per tree node. The path of the node is encoded after the last column, following two TAB characters. It contains the path to the node: the first name is the name of the root node, the second name is a child of the root node etc.

This format supports an arbitrary number of attributes associated with tree nodes.

The DOT format

Format used by the GraphViz applications to represent graphs. It is readable, supports an arbitrary number of attributes although some of these attributes have a meaning for the layout algorithms and cannot be used with an arbitrary meaning.

The XML TreeML DTD

An XML DTD has been developed for describing general trees, mainly for the InfoVis 2004 Contest. The treeml.dtd file contains a good description of its elements and several files show samples of the format.

This format supports an arbitrary number of attributes associated with tree nodes.

The XML GraphML DTD

The GraphML format is meant for exchanging graphs. It is described in http://graphml.graphdrawing.org. The InfoVis Toolkit implements it partially.

This format supports an arbitrary number of attributes associated with vertices and edges.

Chapter 2. Tutorial

Table of Contents

Starting with the InfoVis Toolkit

Organization of the InfoVis Toolkit

Carrying on: Visualizing a Tree as a Treemap

Specifying Visual Attributes

Using Fisheye Lenses or Dynamic Labeling

Creating and Managing Data Structures

Managing Tables

Managing Trees

Managing Graphs

Starting with the InfoVis Toolkit

Several sample applications of the InfoVis Toolkit are provided in the example folder in the distribution. Visualizing a table as a time-series data can be done as follows:

Example 2.1. Simple visualization program for time series data

public class Example1 {
    public static void main(String args[]) {
        String fileName =
            (args.length == 0) ? "data/salivary.tqd" : args[0];
        DefaultTable t = new DefaultTable();
        t.setName(fileName);
        AbstractReader reader =
            TableReaderFactory.createReader(fileName, t);
        if (reader == null || !reader.load()) {
            System.err.println("cannot load " + fileName);
            return;
        }
        TimeSeriesVisualization visualization =
            new TimeSeriesVisualization(t);
        VisualizationPanel panel =
            new VisualizationPanel(visualization);
            
        JFrame frame = new JFrame(fileName);
        frame.getContentPane().add(panel);
        frame.setVisible(true);
        frame.pack();
    }
}

The package declarations have been omitted for clarity. The main program creates and displays a simple time series visualization from the file name specified in the first argument of the program. First, a table is created and named with the file name. Loading the file is done in two steps: finding a reader and actually loading the file from this reader. The reader is created through a factory object. Factories are used to create objects indirectly according to some specified parameters (we will describe them later in depth). Here, a reader object is created from a file name and a table object. The factory analyzes the file name and maybe the file contents to create the most suitable reader for it. If no reader is returned or the reader cannot load the file, the program is exited with an error message. Otherwise, a visualization object is created and inserted in a standard Java/Swing JFrame inside a visualization panel.

In this example, the visualization is not interactive because we haven't created any control panel associated with the visualization. We can have the control panel if we replace the creation of the visualization panel by the following code:

        ControlPanel panel =
            ControlPanelFactory.sharedInstance()
              .createControlPanel(visualization);

Factories are quite simple and very useful for extending the infovis toolkit. They are simple objects with one creator method that looks at it arguments and create an object according to them. The TableReaderFactory looks at the file name and, if it recognizes that it ends with the ".csv" extension returns a new CSVTableReader. If it receives a file name ending with ".csv.gz", it recognizes that the file is compressed by GZIP and decompress is on the fly. New file readers can then be defined by programmers and added to newer releases of the infovis toolkit and the same program will then be able to read these new file formats without modification. Other kinds of customizations are allowed by the "factory" pattern but we will describe them on a separate section.

Organization of the InfoVis Toolkit

Although the InfoVis Toolkit may seem large by the number of classes it defines, it follows a very simple structure. At the top-level, it defines the most important interfaces and classes: Column, Table, Tree, Graph, Visualization and Metadata. The first-level defines the several column classes in the column package, basic input/output classes in the io package, several metadata categories in the metadata package, control panels and interaction components in the panel package, a visualization package for the visualization internals and subpackages for the data-structures: table, tree and graph. Finally, a utils package contains utility classes that don't fit in other packages.

Each data-structure related package also contain sub-packages for their input/output components and their visualizations. This is where the default readers and writers are defined, as well as the visualizations provided by default.

Carrying on: Visualizing a Tree as a Treemap

To visualize a tree data-structure using the "Treemap" [] visualization technique, the code would be:

Example 2.2. Simple visualization program for hierarchical data

import infovis.tree.DefaultTree;
import infovis.tree.io.TreeReaderFactory;
import infovis.tree.visualization.TreemapVisualization;
import infovis.io.AbstractReader;
import infovis.panel.ControlPanel;
import infovis.panel.ControlPanelFactory;

import javax.swing.JFrame;

public class Example2 {
    public static void main(String[] args) {
        String fileName =
            (args.length == 0) ? "data/salivary.tqd" : args[0];
        DefaultTree t = new DefaultTree();
        AbstractReader reader =
            TreeReaderFactory.createReader(fileName, t);
        if (reader == null || !reader.load()) {
            System.err.println("cannot load " + fileName);
        }

        TreemapVisualization visualization =
            new TreemapVisualization(t, null, Squarified.SQUARIFIED);
        ControlPanel control =
            ControlPanelFactory.sharedInstance().createControlPanel(
                visualization);

        JFrame frame = new JFrame(fileName);
        frame.getContentPane().add(control);
        frame.setVisible(true);
        frame.pack();
    }
}

The package declaration for the program Example 2.1, “Simple visualization program for time series data” is:

import infovis.io.AbstractReader;
import infovis.table.DefaultTable;
import infovis.table.io.TableReaderFactory;
import infovis.table.visualization.TimeSeriesVisualization;
import infovis.panel.ControlPanel;
import infovis.panel.ControlPanelFactory;

import javax.swing.JFrame;

Specifying Visual Attributes

Each visualization technique use a layout algorithm and several visual attributes to set the color or geometric properties of the displayed items. For example, scatter plots compute the position of items based on two visual attributes: one for the X coordinate and one for the Y coordinate. Each item can have a size and a color. These visual attributes can be associated with data columns or set to default values.

Some visual attributes are generic to all visualizations and others are specific. The generic attributes are color, label and size. For example, if you know that the tree in example Example 2.2, “Simple visualization program for hierarchical data” has an column called "name", you can specify it as the label column like this:

            visualization.setVisualColumn(
                Visualization.VISUAL_LABEL,
                t.getColumn("name"));

The label will then appear with a form depending on the visualization: inside items for scatter plots or trees. If dynamic labels are enabled, they will use the content of this column for their labels.

Visualization technique may use additional visual attributes such as shape for scatter plots. Some visual attributes, such as color and stroke, are defaulted when they not specified. The default value can be set through the methods of the form: setDefault<ATTRIB>(ATTRIB default).

Using Fisheye Lenses or Dynamic Labeling

Allowing Fisheye Lenses is just a visualization option:

            visualization.setFisheyes(new Fisheyes());

Enabling Dynamic Labeling involves creating a new visualization layer above the main visualization. Here is the required code:

        TimeSeriesVisualization visualization =
            new TimeSeriesVisualization(t);
        ControlPanel control =
            ControlPanelFactory.sharedInstance().
                createControlPanel(excentric);

The toolkit provides higher-level mechanisms to create visualizations and display them using option panels allowing fisheye views and dynamic labeling to be enabled or disabled at will.

Creating and Managing Data Structures

The InfoVis Toolkit comes with three main data types suited to visualizations: tables, trees and graphs. This section shows how to create and manage them.

Managing Tables

Table is the basic structure for storing data managed by the InfoVis Toolkit. A Table is made of Columns. Here is a example of table creation and initialization. The table will be made of three columns called "name", "date" and "size":

Table table = new DefaultTable();
StringColumn name = new StringColumn("name");
table.addColumn(name);
LongColumn date = new LongColumn("date");
table.addColumn(date);
IntColumn size = new IntColumn("size");
table.addColumn(size);

At this point, the table only contains columns, no values. To add two values to the table, the columns should be filled:

name.add("file1");
date.add(124334);
size.add(1024);
name.add("file2");
date.add(143434);
size.add(10);

Adding dates as long integer is not very convenient. Columns can convert strings into the values they hold using a Format:

date.setFormat(new UTCDateFormat());
name.add("file1");
date.addValue("03/23/2003 15:32:10");
size.add(1024);
name.add("file2");
date.addValue("03/10/2004 10:43:21");
size.add(10);

This syntax is much simpler to read and is also useful for reading from a data file. Formats transform a string into a typed value and, on the other direction, transform an internal representation into an printable string. Dates could be managed as Java Objects using an ObjectColumn but their memory footprint and speed would be much worse. In general, it is wise to transform an external representation into the most suitable internal representation. If you want to manage a table containing names that are categories, such as the days of the week or the states of the USA, you should use an IntColumn to hold the category number and use a CategoricalFormat to map from the category name to the category number in the column automatically.

Columns can have missing values. Suppose your have a dataset containing a pollution value sent by a sensor every hour. If this sensor stops working properly and is fixed in four hours, your serial dataset will miss four values. Here is an example of code that reads a file containing values where empty lines are considered missing values:

IntColumn pollution = new IntColumn("pollution");
for (int i = 0; true; i++) {
    String line = input.readLine();
    if (line == null)
        break;
    if (line.length() != 0) {
        pollution.setValue(i, line);
    }
}

Once a table has been created and filled, it can be visualized. The table can also be dynamically modified while it is visualized since columns implement a mechanism to notify interested objects when they are modified.

Section the section called “Tables” provides all the details on manipulating tables.

Managing Trees

Trees are data structures based on tables. Here is an example of tree creation:

Tree tree = new DefaultTree();
StringColumn name = new StringColumn("name");
tree.addColumn(name);
LongColumn date = new LongColumn("date");
tree.addColumn(date);
IntColumn size = new IntColumn("size");
tree.addColumn(size);
date.setFormat(new UTCDateFormat());
int node = tree.addNode(Tree.ROOT);
name.setValue(node, "file1");
date.setValue(node, "03/23/2003 15:32:10");
size.setExtend(node, 1024);
node = tree.addNode(Tree.ROOT);
name.setValue(node, "file2");
date.setValue(node, "03/10/2004 10:43:21");
size.setExtend(node,10);

This example is similar to the table example but instead of adding data in columns, it uses node numbers created by the tree as indexes. Contrary to the table example, the indexes returned by the tree method addNode(int parent) can be in any order and the attribute values should be set at the node index. This explains why we used the methods setValue(int index, String v) and setExtend(int index, TYPE v) instead of addValue(String v) and add(TYPE v) in the table example.

Section the section called “Trees” prodigies all the details on manipulating trees.

Managing Graphs

Graphs data structures are based on tables. A graph is made of two tables: the vertex table and the edge table. Here is an example of graph creation:

Graph graph = new DefaultGraph();
StringColumn name = new StringColumn("name");
graph.getVertexTable().addColumn(name);
LongColumn date = new LongColumn("date");
graph.getVertexTable().addColumn(date);
IntColumn size = new IntColumn("size");
graph.getVertexTable().addColumn(size);
date.setFormat(new UTCDateFormat());
int v1 = graph.addVertex();
name.setValue(v1, "file1");
date.setValue(v1, "03/23/2003 15:32:10");
size.setExtend(v1, 1024);
int v2 = graph.addVertex();
name.setValue(v2, "file2");
date.setValue(v2, "03/10/2004 10:43:21");
size.setExtend(v2,10);

IntColumn refCount = new IntColumn("edgeCount");
graph.getEdgeTable().addColumn(refCount);
int e1 = graph.addEdge(v1, v2);
refCount.setExtend(e1, 1);

This example creates a new graph and adds the same three columns as in the previous examples to the vertex table. It also creates a column name "refCount" associated with the edge table, so the graph has both attributes associated to its vertices ("name", "date" and "size") and to its edges ("refCount").

Section the section called “Graphs” provides all the details on manipulating graphs.

Chapter 3. Data Structures

Table of Contents

Columns

Presentation Format
NumberColumns
Dense Columns
Sparse Columns
Column Dependencies

Standard Description
Aggregation
Converting to Number Columns
Color Scheme Categories

Data structures define two things: a topology and data attributes. Classical topologies include tables, trees and graphs. data attributes are usually defined as tuples associated with the topology. For example, an employee data type can be described by a set of attributes such as "birth date", "first name", "last name", and "social security number". A table of employees is an indexed container that contains one employee per defined index.

Tables, trees and the graphs data structures are well documented on text books such as [Knuth97] [Cormen01], albeit with a different focus than ours. The associated data attributes associated with the topology are well described in databases -- where they are sometimes called tuples and managed as tables -- and in programming languages also provide simple means to define data types and aggregate them in compound structures such as tables, trees or graphs. Advanced computer languages allow the definition of tables of employees, trees of employees or graph of employees related by some relationship.

These systems usually represent tables as an array of tuples where each tuple is a row and each attribute is a column. They allow for easy insertion or deletion of new tuples, but don't allow the easy modification of the data structure.

The InfoVis Toolkit takes a different approach than databases tuples and language aggregated data structures: it uses "columns" internally so that new attributes can easily be added to existing data structures. Inserting or removing rows is as easy as inserting or removing attributes.

This approach is important in Information Visualization as it is in spreadsheet calculators since several computations or visualizations require new values to be computed and manipulated.

In the next sections, we explain the main properties of a column and how it can be used. We then describe how traditional data structures are built on top of the columns.

Columns

A column is an index container that contains a homogeneous data structure, such as integers, floating point values or string. Columns also manage Metadata, notification and undefined values.

Column is a Java interface with several concrete implementations. There are two derivation paths for columns, one is for concrete data types and the other one is for dense versus sparse columns.

Dense columns are based on primitive array, providing a constant access time and a memory footprint linear with the number of entries. Sparse columns are based on sorted trees, providing a log(n) access time and a memory footprint linear with the number of defined entries. They come specialized for the most used data types such as int, long, float, double, String, Object, bool and some others used internally.

The interface of the Column should be described using an extended syntax with parameterized types to be complete. Each of the concrete class implementing the Column interface for storing an object of class TYPE should provide a set of methods using this concrete type. For example, the IntColumn class implements all the methods with "TYPE" replaced by int so the get method returns an int.

The Java language doesn't allow this kind of type parametrization yet — although a newer experimental version does — so the declaration doesn't express this requirement and the Java compiler doesn't enforce these methods, but we have worked hard to implement them and check them by hand. If you want to implement a new kind of Column, you will also have to implement them.

The RowComparator interface allows elements to be compared and sorted without having to know their type. It declares the compare(int,int) method for comparing the values of two rows in the column.

The basic interface specifies that column elements can always be manipulated in their textual representations. This is convenient for loading column contents from a textual file or saving them. It also provides a representation for using the columns for labeling visualizations. However, the real value stored in a concrete column can be of any type.

When the concrete type of the column is known, the concrete method, shown as parametrized in the interface, such as get orset, can be used to access the concrete values of the column elements. These methods are fast since they don't involve any allocation or transformation.

Presentation Format

Transforming from String values to internal values is performed by java.text.Format objects. These objects can transform a String into an Object through the parse method. They can also transform an Object into a String through the format method. This mechanism is used internally by columns and can be extended by users if needed.

For example, date and time information associated to computer files is usually represented as a 32bits or 64bits integer value. Storing file dates in a table or tree structure can be done either as a column of String or as a column of long. The later offer two advantages: it is more compact and dates can then be compared using standard integer comparison. Translating from a standard date format such as the Unix date for files can be specified as follows:

Example 3.1. Creating a LongColumn Reading Dates in the Unix Format

      LongColumn dateColumn = new LongColumn("date");
      dateColumn.setFormat(new UTCDateFormat());
      dateColumn.addValueOrNull("29 Oct 2003 21:47:32");

If the values are read as integers, maybe because they have been read directly from a function that return them as a long, the format can still be specified but the value can now be added as a long using the add method.

Java Formats transform a string into an object or an object into a string. to work in a Column, the format should return an object compatible with the concrete type of the column. For example, number columns expect the object to be a Java Number to extract the value from it. Ganeral Java Formats may not return the right kind of object, requiring the derivation of the Format class to return the propoer value. This is what the UTCDataFormat does for dates.

NumberColumns

Very often, Columns contain number of different types such as integers, floats, long or doubles. It is however very convenient to abstract the differences between these representation of numbers and manipulate them all in a similar way. NumberColumn is an interface that allows columns holding numeric types to be manipulated in a similar way. For example, when a visualization needs to map a number column to the X screen position, it usually applies a simple linear transformation to the values of the column, regardless of their concrete type. Without the NumberColumn abstraction, the mapping would have to consider each concrete number type, which would be painful and error prone.

NumberColumn should define methods to get or set a row with all the Java number types. These methods are get<TYPE>at(int) and set<TYPE>At(int,TYPE). The implementation should take care not to lose precision when possible. The safest bet when getting values from a NumberColumn is to ask for a double. Only some long values may loose precision when represented as a double. Ideally, this should be checked but isn't currently. When setting a value, it could be specified using its "natural" type to insure that the conversion will be as lossless as possible.

Methods are also defined to get the minimum and maximum values contained in the columns: get<TYPE>Min() and get<TYPE>Max(). Finally, a round(double) method returns the closest representation of a number for the column, as a double. For example, on a column of integers, round(3.5) will return 3.

Dense Columns

Dense columns are based on primitive Java arrays. For now, they use one large array but this may change for several array chunks to avoid copying large blocks when resizing the array. Access time and modification time are constant, not counting the time required to perform notification when the changing the values.

Undefined values are still allowed. For columns containing object-values (non-literal values), undefined rows contain the null value. For Column containing literal types, undefined rows are stored as integers in a balanced binary tree.

Literal Columns: `IntColumn`, `LongColumn`, `FloatColumn` and `DoubleColumn`

Literal columns contain values considered as literal in Java: integers, longs, floating point values and double values. These number types are not objects but are much more efficient than general objects in term of storage and computation. This is why the Infovis Toolkit rely on them for efficiency.

BooleanColumn

Boolean columns are literal columns containing boolean value. They are also number columns, returning a 0 for a false value and a one for a true value. They also implement the ListSelection interface acd+++

FilterColumn

StringColumn and ObjectColumn

Sparse Columns

Column Dependencies

The values of one column can be the result of a computation made from values taken from other columns. The dependencies can be maintained so that each time an column value is modified, all the dependent columns are updated.

MORE TO COME

Tables

Tables are the simplest data structure of the InfoVis toolkit. A simple way to create and populate a table is:

Table table = new DefaultTable();
table.setName("Cities");
StringColumn name = StringColumn.findColumn(table, "name");
IntColumn population = IntColumn.findColumn(table, "population");
name.add("New York");
population.add(2000000);
name.add("Los Angeles");
population.add(1000000);

Figure 3.1. The Table interface

 public Tableimplements Metadata, TableModel {
}

The Table interface is described in Figure 3.1, “The Table interface”. A Table also manages Metadata and implements the TableModel interface, enabling its use in a standard Java Swing JTable. Table defines methods to add/remove/access columns by name or index. The getRowCount method returns the maximum number of rows of each of the columns in the table; Tables don't maintain their own row counts. The columns contained in a Table may have different row counts. This is not a problem at all considering that columns can contain undefined rows; the row count of a table is indeed the largest row count and columns with less elements are considered as having undefined values at their ends.

The default implementation of the Table interface is the DefaultTable class. Concrete implementations of Tree and Graph derive from this class.

Trees

Figure 3.2. The Tree interface

 public Treeimplements Table {
}

The InfoVis Toolkit implements rooted trees with the interface describe in Figure 3.2, “The Tree interface”. A Tree manages a topology and associated attributes, stored in columns. Creating a file-system like tree can be done using this example program:

    Tree tree = new DefaultTree();
    StringColumn name = StringColumn.findColumn(tree, "name");
    LongColumn date = LongColumn.findColumn(tree, "date");
    date.setFormat(new UTCDateFormat());
    
    name.setValueAt(Tree.ROOT, "/");
    date.setValueAt(Tree.ROOT, "29 Oct 2003 21:47:32");

    int n1 = tree.add(Tree.ROOT); // Child of tree root
    name.setValueAt(n1, ".classpath");
    date.setValueAt(n1, "23 Feb 2004 22:46:30");

    int n2 = tree.add(Tree.ROOT); // 2nd child of tree root
    name.setValueAt(n2, "CVS");
    date.setValueAt(n2, "6 May 2003 23:32:34");

    int n3 = tree.add(n2); // child of node n2
    name.setValueAt(n3, "Entries");
    date.setValueAt(n3, "24 Feb 2003 23:30:13");

A Tree is implemented as a Table. Nodes are simply indexes. Currently, four internal columns are used to manage the topology. These columns can be read and even modified, but doing so would certainly break the structure so it should be avoided. The internal columns are all IntColumns. Here is their description:

"#parent"

Contain the parent node for each node. The first line in the following examples is equivalent to the next two lines:

       int p = tree.getParent(node);

       IntColumn parentColumn = IntColumn.findColumn(tree, Tree.PARENT_COLUMN);
       int p = parentColumn.get(node);

The parent of the node Tree.ROOT is Tree.NIL. All other valid nodes have non-nil parents.

"#child"

Contains the first child of a node or Tree.NIL for a leaf node.

"#next"

Contains the next sibling of nodes, or Tree.NIL if the node is the last child of a node.

"#last"

Contains the last child of a node or Tree.NIL for a leaf node. The last child of a node could be computed by following the siblings of the first child of a node but Maintaining the last node is faster.

Other internal columns can be maintained automatically by the DefaultTree implementation to speedup topological queries such as the depth of a node, its degree (number of children) or to access the children list faster. The depth column is created and maintained using the method createDepthColumn. The degree column is created and maintained using the method createDegreeColumn.

Graphs

Figure 3.3. The Graph interface

 public Graphimplements Table {
}

Graphs are composed of two tables: a vertex table and a link table. The first holds topological informations about the vertices and their associated attributes as columns. The second holds the topological information about the links. The Graph interface is described in Figure 3.3, “The Graph interface”.

Metadata

Metadata is information about the data. Columns and tables can contain metadata implemented as an associative map between a key and a value, usually strings.

There are many types of metadata; well known uses include the description of the dataset in term of a title, an author, a creation date, etc. Another use could be describing the processing performed on a table or on a column to create it. The InfoVis toolkit uses the metadata for several different purposes that we describe here with no particular order.

Standard Description

It is useful to describe a dataset using standard and well-understood metadata categories. We use the Dublin Core metadata vocabulary, meant to standardize simple and useful description attributes.

MORE TO COME

Aggregation

Aggregation information applies to columns in a Tree. Some columns only define values for the leaf nodes of the tree. For example, when loading the a file directory in a Tree, InfoVis doesn't provide size information for directories, only for files. However, we know what the file size of a directory means, it is the sum of the file sizes. In that situation, the column containing the file sizes will have an aggregation metadata explaining just that: the file sizes add up with the hierarchy.

Adding up with the hierarchy is a common aggregation method, but others exist as well. First, some column don't aggregate at all. For example, the file names don't aggregate, but it turns out they are defined for all the directories so we don't need to invent a new name. Usually, nominal and categorical information don't aggregate. If your file have types, such as image or text, the directory cannot simply compute a similar category. We will see later that we could still create an aggregation function in similar cases, but let's continue with simpler cases.

The InfoVis Toolkit defines seven well understood aggregation types: additive, max, min, mean, concat, atleaf and none. We have already discussed the additive type. The "max", "min" and "mean" are similar. "Max" computes the maximum over all the children as the aggregation function. "Min" and "mean" compute the minimum and the mean respectively. As an example, consider file dates in a directory tree. If you are interested by finding the latests work performed, you want to aggregate dates on the maximum date value of each directory.

The "concat" type is for string values and simply specifies that the values will be concatenated into a string with a space between them. Finally, the "atleaf" means that the attributes are only defined at the leaves, not for interior nodes. In that case, any of the numeric aggregation function can be freely applied to the column if it is a numerical column, and the concatenation function can be applied in all cases.

There is one class defined to manage each aggregation function. These classes are useful to compute the aggregated values of a column or checking whether a column belongs to one aggregation category. Furthermore, new aggregation classes can be defined if you need it. In that case, you will need to define how to recognize a column that aggregates using your function and to compute the function. The AggregationCategory class is a factory for aggregation functions so you can add yours and it will be correctly applied.

Aggregation information is stored with the AGGREGATION_TYPE metadata key. The following constants are defined in infovis.metadata.AggregationConstants.java:

AGGREGATION_TYPE_NONE
AGGREGATION_TYPE_NONE
AGGREGATION_TYPE_ATLEAF
AGGREGATION_TYPE_ADDITIVE
AGGREGATION_TYPE_MAX
AGGREGATION_TYPE_MIN
AGGREGATION_TYPE_MEAN
AGGREGATION_TYPE_CONCAT

The Aggregation Class

 public Aggregation implements AggregationConstants {
  public static final short AGGREGATE_NO = 0;
  public static final short AGGREGATE_YES = 1;
  public static final short AGGREGATE_COMPATIBLE = -1;
  public short isAggregating(Column col);
  public Column aggregate(Column src,
                          Tree tree,
                          Column dst);
}

Converting to Number Columns

Color Scheme Categories

Chapter 4. Visualizations

Table of Contents

Implementation of the Visualization Classes

Displaying Items in Visualizations
Visual Column Management
Color Management
Picking

Table Visualizations

Stroking and Link Visualizations

Tree Visualizations

Graph Visualizations

Dynamic Labeling

Fisheye Lenses

Visualizations transform a data structure into a visual representation, allowing exploration and interaction. To create the visual representation, a visualization relies on the topology of the data structure and a set of attributes implemented as columns in the data structure.

More formally, a visualization creates one graphical mark for each table row. This mark is defined by a shape, position, color and transparency. The shape itself can be decomposed into a basic shape, a size and a position. These attributes altogether are called visual attributes.

All visualizations implement the interface Visualization. A visualization is always related to one and only one table and should be installed in a Java JComponent to show in a window. The InfoVis toolkit provides the VisualizationPanel for that purpose.

Implementation of the Visualization Classes

The main implementation of the Visualization interface is the DefaultVisualization class. All the specific visualizations derive from it.

When a visualization is created, it should be configured to map table columns to visual columns. Visualizations use several types of visual properties, some being standard and some being specific to a visualization technique. The standard visual properties are the following:

Color

The column used to specify the color of items. For example, if the visualized table contains a number column named "length", using it for coloring the items can be done in the following way:

visualization.setVisualColumn(
	 Visualization.VISUAL_COLOR, table.getColumn("length"));

Section the section called “Color Management” provides more details on the mechanisms for color management.

The constant string Visualization.VISUAL_COLOR contains the name of the visual column for colors.

When no column is mapped to the color visual attribute, a default color is used, specified by the setDefaultColor(Color) method.

Size

The column used to specify the size of items. Sizes have different meaning depending on the specific visualization. For Treemaps, it is the taken as the relative size of Treemap rectangles. On scatter plots, it controls the size of each item varying from a minimum and maximum size specific to the scatter plot visualization class.

The constant string Visualization.VISUAL_SIZE contains the name of the visual column for sizes.

When no column is mapped to the size visual attribute, a default size is used, specified by the setDefaultSize(double) method.

Alpha

The column used to specify the transparency or Alpha channel of items. The constant string Visualization.VISUAL_ALPHA contains the name of the visual column for transparencies.

When no column is mapped to the alpha visual attribute, a default alpha values is used, specified by the setDefaultAlpha(double) method.

Label

The column used to specify the name or label or items. The constant string Visualization.VISUAL_LABEL contains the name of the visual column for labels.

When no column is mapped to the label visual attribute, the items are labeled "item" and the item number.

Other attributes can be specified using the same mechanism, although they are not "visual" attributes per se. These attributes are:

Selection

The column used to specify which items are selected. The constant string Visualization.VISUAL_SELECTION contains the name of the visual column for selection. It should be a boolean column. One important use of this column is to share selection among several visualizations showing the same table.

When no column is mapped to the selection visual attribute, a new selection column is created.

Filter

The column used to specify which items are filtered. The constant string Visualization.VISUAL_FILTER contains the name of the visual column for filtering. It should be a FilterColumn. One important use of this column is to share selection among several visualizations showing the same table.

When no column is mapped to the filter visual attribute, a new filter column is created.

Sort

The column used to specify the item order in the visualization. The constant string Visualization.VISUAL_SORT contains the name of the visual column for sorting.

When no column is mapped to the sort visual attribute, the items are visualized in the table order.

Displaying Items in Visualizations

Displaying items is performed through several steps, each being configurable or modifiable by class derivation.

Layout

This steps compute a shape for each visualized item. A Shape is a Java interface that describes a general geometry. Specific visualizations can decide to limit the shapes they compute and position to rectangles or squares. For that purpose, a Java Rectangle2D implements the Shape interface.

The computed shapes will be used for displaying the items and for hit detection when the mouse pointer moves over displayed items. The computed shapes are stored in a visual column called Visualization.VISUAL_SHAPE and can be queried from outside the visualization if required.

The method used to compute the layout is called computeShapes and is triggered when the layout needs to be computed. The method is called just before painting or hit detection when either of the following events occurred:

The visualization has just been displayed on screen, no layout has been computed before;
A visual column involved in computing the layout has been associated with a new table column;
A visual column involved in computing the layout has been modified.

There is no automatic way to guess whether a visual column is used for computing the layout; it should be specified by each specific visualization class for each visual column. This is usually done from the constructor of the visualization class by calling the method setVisualColumnInvalidate(String,boolean). When the second argument is true, changing the specified column will trigger a recomputation of the layout. When it is false, the visualization will be redisplayed without changing its layout.

By default, the "color" visual attribute does not trigger a layout recomputation whereas the "size" attribute does trigger a layout recomputation. Stroke-based visualizations such as time series or parallel visualizations change the latter because the size only change the stroke thickness, not the shape geometry so changing the size visual column only need to trigger a redisplay, the geometry of shapes shall not change.

Items display

When the shapes have been computed, the items can be displayed one at a time, in the sorting order, using the shapes for the geometry and other visual columns for visual attributes. The default implementation uses the paintItems(Graphics2D g,Rectangle2D bounds) implemented as follows:

            for (RowIterator iter = iterator(); iter.hasNext();) {
                int row = iter.nextRow();
                if (isFiltered(row))
                    continue;
                paintItem(graphics, row);

The method uses a RowIterator to iterate over all the items in sorting order, skip the filtered items using the isFiltered(int row) method and draws the items using the paintItem(Graphics2D g,int row) method.

The method can be overridden in a derived class if, for example, the visualization wants to grey-out the filtered items.

Item Display

Displaying an item needs several steps: installing all the graphic attributes, transforming the shape by a fisheye lens, drawing the shape, drawing its outline and drawing a label. Each of these steps are done inside the paintItem(Graphics2D g, int row) method as follow:

        Shape s = getShapeAt(row);
        if (s == null)
            return;
        s = transformShape(s);

        installAlpha(graphics, row);
        installColor(graphics, row, s);
        displayedItems++;
        paintShape(graphics, row, s);
        paintOutline(graphics, row, s);
        paintLabel(graphics, row, s);

All of the called methods can be overridden by subclasses if required. For example, the StrokingVisualization class interprets shapes as stroked instead of filled. It redefines the paintShape method as doing nothing and the paintOutline as drawing the shape using either red if the item is selected or the selected color otherwise as follows:

    public void paintOutline(Graphics2D graphics, int row, Shape s) {
        if (selection != null
            && (!selection.isValueUndefined(row))
            && selectedColor != null) {
            graphics.setColor(selectedColor);
        }
        installSize(graphics, row);
        graphics.draw(s);
        graphics.setStroke(savedStroke);
    }

Visual Column Management

Visual columns are used internally by visualizations to compute the mapping between table rows and visual marks. They are also managed to trigger redisplays when one of the visual column has changes.

The InfoVis Toolkit

Jean-Daniel Fekete

Chapter 1. Introduction

Installing the InfoVis Toolkit

Sample Data and File Formats

Supported File Formats

Chapter 2. Tutorial

Starting with the InfoVis Toolkit

Organization of the InfoVis Toolkit

Carrying on: Visualizing a Tree as a Treemap

Specifying Visual Attributes

Using Fisheye Lenses or Dynamic Labeling

Creating and Managing Data Structures

Managing Tables

Managing Trees

Managing Graphs

Chapter 3. Data Structures

Columns

Presentation Format

NumberColumns

Dense Columns

Literal Columns: IntColumn, LongColumn, FloatColumn and DoubleColumn

BooleanColumn

FilterColumn

StringColumn and ObjectColumn

Sparse Columns

Column Dependencies

Tables

Trees

Graphs

Metadata

Standard Description

Aggregation

The Aggregation Class

Converting to Number Columns

Color Scheme Categories

Chapter 4. Visualizations

Implementation of the Visualization Classes

Displaying Items in Visualizations

Visual Column Management

Color Management

Picking

Table Visualizations

Stroking and Link Visualizations

Tree Visualizations

Graph Visualizations

Dynamic Labeling

Fisheye Lenses

Chapter 5. Interaction

Dynamic Queries

Chapter 6. Readers and Writers

Table Readers and Writers

Tree Readers and Writers

Graph Readers and Writers

Chapter 7. Design Patterns

Abstract Factory

Observer

Visitor

Index

A

F

M

V

Literal Columns: `IntColumn`, `LongColumn`, `FloatColumn` and `DoubleColumn`