Technical information
If you feel
inclined to take a deeper look into the CaGe, or possibly extend it, this
page is for you.
The production process:
Generators and Embedders
CaGe is a
"control center" for a collection of programs that do the actual
work. These external programs are our generators and embedders.
CaGe
communicates with these programs using pipes -- the input/output mechanism
best known from the UNIX world. The external programs aren't actually
"aware" that they are working in the CaGe, they only read and
write data using the three standard i/o streams known as standard input,
standard output and standard error. CaGe sets these streams up and reads
from or writes into them.
A
Generator
can be any program that produces graphs on standard output, in any of
the formats understood by CaGe. From a technical point of view, a generator
is nothing more than such a graphs-producing program controlled via its
command line, possibly supported by a Java class to define its own options
window within CaGe. If you would like to use a generator program without
writing Java code for the options window, you can do so via the external
generator button in CaGe's first window,
providing a command line with all option parameters there.
An
Embedder takes one graph and computes (2D or 3D) coordinates for
its vertices. Like the generators, it must use the graph formats known
to CaGe, and it must output the graph's vertices in the same order as
they were supplied in the input. CaGe needs some additional information
about an embedder: it must have methods to manipulate the embedder command
line to make the embedder spend more time trying to achieve a good embedding,
and to start a 2D re-embedding with a new exterior face.
After a generator
has been selected and output options chosen, CaGe starts the generation
program and reads graphs from it. Graphs selected for output are passed
on to the appropriate embedder. (If both 2D and 3D output was chosen,
embedder processes are run separately for the two dimensions.)
Adding a Generator
If you want
to use a new generator with its own options window, you need to provide
CaGe with a Java class extending cage.GeneratorPanel,
a subclass of javax.swing.JPanel.
CaGe will instantiate an object of this class, add it to a window, call
the method showing
to allow some extra preparation and then show the window. After the user
has clicked "Next" on another panel in the same window, getGeneratorInfo
is called which must return an object of type cage.GeneratorInfo,
containing several items of information such as the command line that
you would otherwise enter in the "external generator" options
window.
You need to add entries to the configuration file CaGe.ini
to make CaGe offer the new generator to the user. The existing entries
will provide you with a template. To access other entries from the configuration
file, you can use the static getConfigProperty
method of the cage.CaGe class.
CaGe has
little flexibility as yet with respect to the format a generator is expected
to produce, even if a GeneratorPanel is provided. All generators are currently
required to output one of CaGe's two input formats described below.
Adding an Embedder
Adding your own embedder to CaGe is a fairly uncommon use-case in our eyes.
If however you do feel the need to use a specific embedder that is not part of
CaGe, then this guide might provide you with the necessary
information and tips.
Formats
There are
currently two graph formats that CaGe can read, so generators and embedders
must use one of these for their output. CaGe will try to recognize which
of these formats it is reading.
Format headers
CaGe's input
formats have an optional header identifying the format in an unambiguous
way. If there is no header in an input that CaGe is reading, the program
will guess from the first byte, but false guesses are possible. A header
consists of the format name enclosed in double angle brackets >>...<<.
These brackets can also contain a comment, separated from the format name
by white space. The header can be on its own line (i.e. followed by a
line separator) if the format is line-oriented (which currently applies
to the writegraph format, described next).
The Writegraph format (input/output)
Format names used in headers: writegraph, writegraph2d, writegraph3d
This format
originated from Combinatorica, a Mathematica package. It is a plain-text
format and easily readable for humans. There is one line for each vertex,
and this line contains, separated by white space, the vertex
number (sequentially numbered, starting with one) vertex coordinates,
the numbers of all vertices adjacent to the current one. Since
we deal with lists of graphs, we define a line containing just the number
zero as the separator between two graphs (as zero is not a valid vertex
number).
When encoding
an embedded graph (with coordinates for the vertices), writegraph is the
format of choice. It then contains either the 2D or 3D embedding, and
the format name in the header is expected to communicate this fact. For
a non-embedded graph, it is possible to include no coordinates (and use
"writegraph" without a dimension in the header), but there is
also the convention of including zero coordinates (and the format name
should then specify the number of zero coordinates used for each vertex
as "2d" or "3d"). One comment sometimes given in the
header after the format name is "planar", signifying that the
order in which vertices are listed can be used to construct a planar embedding
of the graph. See "Planar Code" for details.
The Planar Code format (input/output)
Format names used in headers: planar_code, embed_code
This is a
binary format not including coordinate information. Planar code does however
contain a hint for a 2D embedding as a convention. This information lies
in the order that a vertex's adjacencies are listed.
A Planar Code representation of a graph starts with a number giving the total number
of vertices. Then, for each vertex, follows a sequence of numbers specifiying
that vertex's neighbours. This sequence must enumerate the neighbours
as they appear when you go clockwise in one circle around the vertex.
Going anti-clockwise is allowed as well, but the direction must be the
same for all vertices of the graph. This information is actually a partial
encoding of a 2D embedding of the graph, and it is used by our 2D embedder.
Other formats (output only): PDB and CML
CaGe can save graphs in two other popular formats, PDB and CML. Both
of these are popular in the chemistry world and "rich" formats,
providing ways of expressing large amounts of different chemical information.
By the nature of its production process, CaGe doesn't know much about
the chemistry of its results, and thus only uses a small part of the CML
and PDB languages. The >>...<< format headers defined above
are not used with CML and PDB in order to produce compatible output.
In PDB, CaGe uses "ATOM" and "CONECT" records to encode
vertices and edges of its graphs. The CML features used by CaGe have been
chosen to produce CML output that is both compact and readable by one
of its viewers, Jmol, which CaGe "feeds"
with graphs to display via this CML format. CML's <molecule>
tag is amended by CaGe with a convention="MathGraph"
attribute. This instructs our "tailored" version of the Jmol
applet to strictly adhere to the atom coordinates and bonds given in the
input, rather than try to embed any atoms itself or second-guess the existence
or absence of bonds. (By contrast, there is no way to get the Rasmol viewer
out of its bond-guessing habit.)
|