Characterization of the Traffic between SLAC and the Internet
Connie Logg
July 17, 2003, August 8, 2001, March 1, 2001


INDEX
[Analysis] [Challenges] [Introduction] [Methodology]
[Data mining]
Index to the SLAC Netflow Analysis

Introduction

The SLAC <-> Internet connection is instrumented with a Cisco Catalyst 6509 and an MSFC router. Cisco's Netflow is enabled on the switch and Netflow records are sent to a Linux server running. The Netflow records are collected, buffered, and written to disk every 10 minutes by Cisco's Netflow Collector Software. The Netflow data is then analyzed by perl scripts written at SLAC.

There are several aspects to the analysis including (* indicates that the information is only viewable withon SLAC):

Analysis

Analysis performed includes:

Methodology

Terminology: Analysis methodology:

Challenges

Netflow running on the SLAC DMZ can generate over 8 million records a day. A record can represent more than one flow, so the challenge of analyzing the data correlates with the number of records cut per day, not the number of flows.

DMZ1 is only used for the route going to Stanford (i.e. not ESnet routes).

  • Application Port Identification - Please note that this analysis is evolving as research continues into how the various TCP and UDP ports are used. The various pieces of documentation on port usage often indicates that a given port is used for a specific application in both TCP and UDP. Recent research by looking at the actual packets indicates in some cases that this is not really the case. For example, the graphs above for TCP show some TCP AFS traffic. An examination of these packets indicates that they are not AFS. In addition, ports which have been "defined" may also by used for other purposes.

  • Identifying the "Application" - Which port (source or destination) should be used to classify the application traffic? To make this decision, the following algorithm is used:

  • Naming the TCP and UDP application ports - What names should be used to name the TCP and UDP application ports? Different documents use different names.

  • IP Address to IP Name Translation - By far the greatest challenge in analyzing the Netflow data lies in resolving the IP addresses in a record to IP names. It can vary from less than 1 second to translate a name up to several minutes. Given 2 ip addresses to a record, it has been necessary to develop techniques for optimizing this process. Some trickes used are:
  • Understanding How Netflow Works - The documentation on the actual operation of Netflow is minimal and conflicting. Some examples are:

    Data Mining and Validation Script

    FLOWMINE - /afs/slac/package/netmon/netflow/src/flowmine is a script which facilitates independent mining of the Netflow data. It can either take arguments on the command line or from a file. Arguments passed on the command line superceed any arguments in the file. The switch name must be provided. The only required argument is the switch name. Note that the netflow data is read protected, and only authorized users can access it.
    
    flowmine -switch swh-dmz1 [-specfile /u2/CSCOnfc/html/examples/cfg1] [-test] -search_specifications
    
    ############################################################################################
    #
    # flowmine -switch switchname [-test] [-specfile ] [search_specifications]
    #
    #   takes as input specifications for mining flow data and where and what data to output
    #   Note that the switch name or IP address must be supplied on the command line. 
    #   If -test is desired, it must also be specified on the command line. Currently the output
    #   is written to STDOUT.  It is recommended that you pipe the command to more, less or 
    #   redirect it to a file.
    #
    #   Options provided on the command line override any in the specification file
    #
    #   options available only on the command line include:
    ##   -specfile  where  is a file with the specs for the mining
    #
    ##   -test process the options and detail what the print out what the specs are for the 
    #          output, as well as mine the data
    #
    ##   or command line:
    #   -date = timeframe - default "today"
    #    in the form: 
    #         "today", 
    #         "yesterday", 
    #         "date"
    #         "startdate;enddate"
    #         "3/21/01"
    #      
    #   -starttime - 24 hour time "start time frame" ex. "00:00"
    #   -endtime - 24 hour time "end time frame" ex "20:30"
    #
    #   -srcaddr list of sources
    #      if no destination list is provided, all traffic from the sources
    #      meeting the specs is mined
    #      These can be node names or IP addresses
    #   -dstaddr list of destinations
    #      if no source list is provided, all traffic to all destinations 
    #      meeting the specs is mined
    #      These can be node names or IP addresses
    #   -proto protocol (UDP, TCP, ICMP, GRE, other by number)
    #      list of protocols to be mined for specified port(s)
    #   -srcport 
    #      list of source ports (by number or from the keys in this list)
    #   -dstport 
    #      list of source ports (by number or from the keys in this list)
    #   -tos (may not be currently implemented)
    #
    # OUTPUT Options  
    #   -csv 
    #      character to be used to separate the data in the output file,
    #      default is ","
    #   -data
    #     list of data values to be output in csv file, and the order. Default is all in netflow
    #     order
    #       srcaddr,dstaddr,srcport,dstport,prot,tos,pkts,octets,flows,starttime,endtime,activetime
    #     available dtaa fields are:
    #       srcaddr,srcname,dstaddr,dstname,srcport,dstport,prot,tos,pkts,octets,flows,starttime,endtime,activetime
    #     if srcname and/or dstname are provided in this parameter, the script will attempt to 
    #       translate the IP address to a name
    #
    ##############################################################################################################
    
    Examples:
    
    flowmine -switch swh-dmz1 -srcport telnet -data "srcaddr;srcname;dstaddr;dstname;srcport;pkts;octets"
    gives:
    
    #srcaddr,srcname;dstaddr;dstname;srcport;pkts;octets
    171.64.15.100,ELAINE25.STANFORD.EDU,134.79.32.248,MICKY.SLAC.STANFORD.EDU,23,283,30499,
    134.79.33.11,SSRL.SLAC.STANFORD.EDU,24.176.209.28,C1133586-A.STCLA1.SFBA.HOME.COM,23,6,2141,
    171.64.11.151,ELF1.STANFORD.EDU,134.79.80.78,MDNTDL39.SLAC.STANFORD.EDU,23,31,2170,
    134.79.33.11,SSRL.SLAC.STANFORD.EDU,171.66.179.163,GEORGE-PBDSL2.STANFORD.EDU,23,363,22309,
    134.158.105.13,CCAHP03.IN2P3.FR,134.79.16.101,FLORA06.SLAC.STANFORD.EDU,23,12,564,
    
    or
    
    flowmine -switch swh-dmz -specfile /u2/CSCOnfc/html/examples/cfg1
    
    where
    
    /u2/CSCOnfc/html/examples/cfg1 contains:
    
    -srcport telnet
    -data "srcaddr;srcname;dstaddr;dstname;srcport;pkts;octets"
    
    will give the same answer.
    
    
    
    
    

    Please provide Feedback to the Designing Author: Connie Logg