Technical information

If you feel inclined to take a deeper look into the CaGe, or possibly extend it, this page is for you.

The production process:
Generators and Embedders

CaGe is a "control center" for a collection of programs that do the actual work. These external programs are our generators and embedders.

CaGe communicates with these programs using pipes -- the input/output mechanism best known from the UNIX world. The external programs aren't actually "aware" that they are working in the CaGe, they only read and write data using the three standard i/o streams known as standard input, standard output and standard error. CaGe sets these streams up and reads from or writes into them.

A Generator can be any program that produces graphs on standard output, in any of the formats understood by CaGe. From a technical point of view, a generator is nothing more than such a graphs-producing program controlled via its command line, possibly supported by a Java class to define its own options window within CaGe. If you would like to use a generator program without writing Java code for the options window, you can do so via the external generator button in CaGe's first window, providing a command line with all option parameters there.

An Embedder takes one graph and computes (2D or 3D) coordinates for its vertices. Like the generators, it must use the graph formats known to CaGe, and it must output the graph's vertices in the same order as they were supplied in the input. CaGe needs some additional information about an embedder: it must have methods to manipulate the embedder command line to make the embedder spend more time trying to achieve a good embedding, and to start a 2D re-embedding with a new exterior face.

After a generator has been selected and output options chosen, CaGe starts the generation program and reads graphs from it. Graphs selected for output are passed on to the appropriate embedder. (If both 2D and 3D output was chosen, embedder processes are run separately for the two dimensions.)

Adding a Generator

If you want to use a new generator with its own options window, you need to provide CaGe with a Java class extending cage.GeneratorPanel, a subclass of javax.swing.JPanel. CaGe will instantiate an object of this class, add it to a window, call the method showing to allow some extra preparation and then show the window. After the user has clicked "Next" on another panel in the same window, getGeneratorInfo is called which must return an object of type cage.GeneratorInfo, containing several items of information such as the command line that you would otherwise enter in the "external generator" options window.
You need to add entries to the configuration file CaGe.ini to make CaGe offer the new generator to the user. The existing entries will provide you with a template. To access other entries from the configuration file, you can use the static getConfigProperty method of the cage.CaGe class.

CaGe has little flexibility as yet with respect to the format a generator is expected to produce, even if a GeneratorPanel is provided. All generators are currently required to output one of CaGe's two input formats described below.

Adding an Embedder

Adding your own embedder to CaGe is a fairly uncommon use-case in our eyes. If however you do feel the need to use a specific embedder that is not part of CaGe, then this guide might provide you with the necessary information and tips.

Formats

There are currently two graph formats that CaGe can read, so generators and embedders must use one of these for their output. CaGe will try to recognize which of these formats it is reading.

Format headers

CaGe's input formats have an optional header identifying the format in an unambiguous way. If there is no header in an input that CaGe is reading, the program will guess from the first byte, but false guesses are possible. A header consists of the format name enclosed in double angle brackets >>...<<. These brackets can also contain a comment, separated from the format name by white space. The header can be on its own line (i.e. followed by a line separator) if the format is line-oriented (which currently applies to the writegraph format, described next).

The Writegraph format (input/output)

Format names used in headers: writegraph, writegraph2d, writegraph3d

This format originated from Combinatorica, a Mathematica package. It is a plain-text format and easily readable for humans. There is one line for each vertex, and this line contains, separated by white space, • the vertex number (sequentially numbered, starting with one) • vertex coordinates, • the numbers of all vertices adjacent to the current one. Since we deal with lists of graphs, we define a line containing just the number zero as the separator between two graphs (as zero is not a valid vertex number).

When encoding an embedded graph (with coordinates for the vertices), writegraph is the format of choice. It then contains either the 2D or 3D embedding, and the format name in the header is expected to communicate this fact. For a non-embedded graph, it is possible to include no coordinates (and use "writegraph" without a dimension in the header), but there is also the convention of including zero coordinates (and the format name should then specify the number of zero coordinates used for each vertex as "2d" or "3d"). One comment sometimes given in the header after the format name is "planar", signifying that the order in which vertices are listed can be used to construct a planar embedding of the graph. See "Planar Code" for details.

The Planar Code format (input/output)

Format names used in headers: planar_code, embed_code

This is a binary format not including coordinate information. Planar code does however contain a hint for a 2D embedding as a convention. This information lies in the order that a vertex's adjacencies are listed.

A Planar Code representation of a graph starts with a number giving the total number of vertices. Then, for each vertex, follows a sequence of numbers specifiying that vertex's neighbours. This sequence must enumerate the neighbours as they appear when you go clockwise in one circle around the vertex. Going anti-clockwise is allowed as well, but the direction must be the same for all vertices of the graph. This information is actually a partial encoding of a 2D embedding of the graph, and it is used by our 2D embedder.

Other formats (output only): PDB and CML

CaGe can save graphs in two other popular formats, PDB and CML. Both of these are popular in the chemistry world and "rich" formats, providing ways of expressing large amounts of different chemical information. By the nature of its production process, CaGe doesn't know much about the chemistry of its results, and thus only uses a small part of the CML and PDB languages. The >>...<< format headers defined above are not used with CML and PDB in order to produce compatible output.
In PDB, CaGe uses "ATOM" and "CONECT" records to encode vertices and edges of its graphs. The CML features used by CaGe have been chosen to produce CML output that is both compact and readable by one of its viewers, Jmol, which CaGe "feeds" with graphs to display via this CML format. CML's <molecule> tag is amended by CaGe with a convention="MathGraph" attribute. This instructs our "tailored" version of the Jmol applet to strictly adhere to the atom coordinates and bonds given in the input, rather than try to embed any atoms itself or second-guess the existence or absence of bonds. (By contrast, there is no way to get the Rasmol viewer out of its bond-guessing habit.)

Download