Guide to the report generator
Guide to the report generator
How the pieces of the report generator fit together
The report generator was designed to be as modular as possible, to allow for easy changes or substitutions in the processing pipeline. It's currently written based on taking data fromcrl_flow
and storing them in
RRDs.
Collection
For the time series graphs to be useful, monitor data should be output frequently enough to show trends, but not so often as to throw off the flow counts. The default interval forcrl_flow
of 5 minutes is
usually sufficient.
Example usage:
crl_flow -I -r -o %s.t2 if:fxp0
Note: The .t2 suffix is from when the main flow analysis program was
crl_traffic2
, and is used only out of habit. You may
choose whatever filenames you desire.
Transport and processing
The simple scriptspoolcat
aids in the transport and organizing of multiple interval files. It
can be used locally or piped over ssh (or equivalent) to another
machine for processing. It can also delete or store the files once
they've been output. If you want to merge subinterfaces or interfaces
together, use t2_merge. At
this point, the data are processed by store_monitor_data
and stored in the desired format. Note that due to the way RRDtool
stores information, if you store something with a lot of entries
(source and destination ASes, for instance), you can end up creating
many RRD files and subsequently using up a lot of disk space, perhaps
in the tens of gigs. Keep this in mind.
Report generation
Once the data have been stored, they can be graphed or turned into tables or whatever other form the user wants. Another simple script to help with this process iscreate_report
, which checks
to see if the data are newer than the graphs and calls the graph/table
generation scripts if necessary, and then copies the files to the web
server. It does not process and store data; its operation is separate
from store_monitor_data
.
Ideally, it will be called regularly via cron job or similar.
It requires a passwordless file transfer, otherwise the transfers will
fail.
For those who specifically want to control a single part of the process, they can do it manually:
Config parsing
Since the report configuration file contains per-monitor information, but the graphing configuation is done on a per-graph basis,config_graphs
is needed to convert the global configuration into specific commands for the grapher.Graph generation
create_graphs
takes the commands generated above and creates graphs as well as any associated text data used by the web page.Upload
Once generated, graphs and text need to be transferred to the appropriate directories on a web server.
Web page
Reports are currently all viewed via a single CGI web page, generated bydisplay_report
.
Putting it all together
This is an example of how one would set up and configure a standard report generator setup. It is assumed that all the scripts are installed and in one's default path.First, you need a link to monitor. For this example we'll monitor an interface that captures all traffic in and out of a university campus. There is a machine that listens on a link (which we'll call fxp0), and all this data will be classified as coming from the 'campus' monitor. To start collecting data, we'd use the command:
crl_flow -I -r -o %s.t2
if:fxp0
Now we make sure our configuration files are correctly set up. Make
a copy of report.conf
and
subif_map.conf
from the doc
directory. You want a different subinterface map for each machine you
collect data from, so we'll rename subif_map.conf
to
campus.conf
, and the entries must be changed to match the
interfaces used. We'll map the interface/subinterface
0[0] to our monitor name, campus. Anything
listed as 'REQUIRED' in the example config files are necessary to
proper operation, and the scripts may not run properly if they are not
there. Some entries in report.conf that must be changed to
ensure proper operation are the rrd_dir and
graph_dir values. If you want to output larger versions
of the timeseries graphs, you must change big_dir, and if
you want to output text tables of stored data, you must change
table_dir. Also, a valid input file for ASFinder is
required in the routes entry if you want to do AS or
country lookups.
Then, the files created by crl_flow
will be transferred
off the monitoring machine and onto a processing machine. (Of course,
one could have the monitoring, processing, and web machines all be the
same, but they are separated here for sake of clarity.) Those data are
then stored into RRD files for later graphing and table generation. We
do this with the following command:
spoolcat -d '*.t2' | ssh
computeserver
"store_monitor_data
report.conf campus.conf"
Assuming the configuration files have been correctly set up, all the data will be copied to the processing machine, converted into the desired table types, stored in RRDs and deleted from the monitoring machine.
At this point, you should make sure the config file is set up to allow
transfer from the processing machine to the web server. The
appropriate block in your report.conf
is named
transfer, and requires that you specify the name of the
server you wish to transfer graphs and tables to, the
cgi_dir directory where display_report
will run
from, and the html_dir where non-CGI items (ie, image files)
will be stored. The cp_cmd entry specifies what command will be
used to transfer the files, along with any desired options, although
currently only scp and rsync have been tested.
Once that's set up, you should be able to simply use:
create_report
report.conf
In most cases, you'll want some sort of periodic report generation, such as from a cron job. Here's an example cron entry:
PATH=/bin:/usr/bin:/usr/local/bin:/usr/local/Coral/bin
(It's important to set the PATH appropriately in order to have these
scripts work properly.)
0,15,30,45 * * * * create_report report.conf
Once the data files are on the web server, you need to put display_report
in
the correct CGI directory. If CoralReef is installed on the web
server, you can either copy display_report
into the CGI
directory or make a symbolic link to the installed version. Otherwise,
you will have to manually copy files from a machine with a CoralReef
installation:
scp /usr/local/Coral/bin/display_report www:/cgi-bin/example
(The directories have to exist prior to copying files into them.)
scp /usr/local/Coral/lib/CAIDA/Traffic2/ReportSupport.pm
www:/cgi-bin/example/lib/CAIDA/Traffic2
To view the reports, you would go to the appropriate URL for the specified CGI directory. This will likely differ depending on the web server's configuration. An example URL might look like:
http://www.example.com/cgi-bin/example/display_report
Proper viewing of reports will require configuration of the cgi.conf
file in the cgi-bin directory, and
each monitor's subdirectory will require configuring a monitor.conf
file. If these are set up
according to the comments in the files, you should be able to view
reports about your monitor's traffic!
Transitioning from old report generator to new one
The most important data to transfer from the old report generator (t2_report++
) are the RRD files used to generate timeseries
graphs. The new report generator system relies on RRD files for all its
data storage, although at some point that'll be replaced with a more
general archival system.
Different versions of the report generator have used different directory
structures and file formats, necessitating conversion when upgrading.
Information on updating RRD directories can be found in a separate document.
Certain features of t2_report++
have not yet been implemented
in the new report generator. In particular, it does not show IP addresses
for the most recent interval, as it currently only displays information
stored in RRD files.