Hardware, Configuration
All I/O for the binary archive format is based on the LowLevelIO class in the Tools directory.
Via ToolsConfig.h, this class can be configured to use "fd" file descriptor I/O (open, write, lseek, ...) or "FILE *" calls from stdio (fopen, fwrite, ...).
In addition, all ChannelArchiver files can be compiled either with debug information or optimized, depending on the HOST_OPT setting which is usually set in EPICS/base/config/CONFIG_SITE but can be overwritten in ArchiverConfig.h.The ChannelArchiver/Engine directory contains a simple benchmark program bench.cpp. Depending on the I/O and debug settings, extremely different results were obtained:
Machine Settings Values/sec. written 800Mhz NT4.0 fd 35000 800Mhz NT4.0 FILE 20000 500Mhz RH 6.1 debug 60 (!) 500Mhz RH 6.1 FILE, -O 14500 2x333Mhz, RAID5 debug 42 (!) 2x333Mhz, RAID5 FILE, -O 8600 In the Win32 case, raw fd I/O seems to be faster, debug settings have little influence.
For Linux, fd I/O is terribly slow in any case. So is FILE I/O without using the optimizing -O flag. For Linux, only FILE I/O with optimization yields acceptable results.This underlying I/O performace limits the number of values that the ArchiveEngine can handle.
Archive Engine Limits
Test: "Up to 10000 values per second"
The ArchiveEngine was started with this configuration file:
#Archive channels of example CA server (excas) !file_size 10 !write_period 10 fred 1.0 Monitor freddy 1.0 Monitor janet 0.1 Monitor alan 1.0 Monitor jane0 0.1 Monitor jane1 0.1 Monitor jane2 0.1 Monitor # ... and so on until jane999 0.1 MonitorThose Channels were served by the example CA server, running on a 233Mhz Linux machine, launched as:
excas -c1000The "jane" channels change at about 10Hz. Together with the other channels, including the array "alan", this should provide at least 10000 values per second.
Both machines were on a 10/100 base T hub, but the Linux box only supports 10baseT.
Observed behaviour
This plot shows the CPU load on a 800MHz PC (Windows NT 4.0) archiving about 10000 values per second:
While the machine is quite busy archiving, it does still respond to user input. It cannot be used for much else, though: Additional load like launching and using PaintShowPro for creating this CPU-load snapshot can cause delays, resulting in messages
"Warning: WriteThread called while busy"
More load can lead to "overwrite" errors (= data loss) and is to be avoided.This image shows the limit for the less performant Linux machine where bench.cpp reported about 7700 writes/second (233 Mhz, RedHat 6.1):
The setup is similar to the 10000 val/sec example but uses only 4000 values per second.
The next two images show a dual-233 Mhz pentium machine with a RAID-5 disk array archiving at different rates:
5000 values/sec
4000 values/sec
At 5000 values/sec, the write thread does not finish in time, so numerous "called while busy" warning result. No overwrites were reported, but this rate can probably not be maintained. In the 4000 val/sec case the write thread is called every 10 seconds and finishes in time, resulting in many brief peaks in CPU load.
(Point 1: The average CPU load is constant, but the individual CPUs are loaded randomly.
Point 2: The machine & disks might be faster than the previous test machine, and the RAID array provides data safety and good read performance, the write performance though is not better than for the older PC).So one could conclude that archiving rates of 10k values/sec are possible on a dedicated machine, except for the additional problem of ...
Channel Access Flow Control
The Archive Engine uses a dedicated write thread to balance the CPU load. It also tries to increase the default TCP buffer size in order to buffer incoming values while the program is busy writing samples to the disk.
The current Channel Access library, though, silently implements a flow control mechanism. It is not based on a TCP buffer "high-water-mark" idea, instead it can drop values when the "ca_poll" routine is called succesfully several times, causing it to believe that it might get behind the CA server.
Ways to detect losses due to flow control are:
- Instrument the CA server code, e.g. on a test IOC, to display dropped value counts.
- Have an IOC serve 0...10 ramps and compile the engine with CA_STATISTICS defined.
(an ArchiveEngine compiled this way can only be used for this specific test!)When this was done the result showed occasional flow control losses at 10000 values per second, so the conclusion is that the current engine can archive "close to 10000 vps".
Behaviour when approaching limit
When archiving more and more values per second, the write threads needs an increased amount of time. The peaks in the CPU load snapshot show the write thread running. Because the write thread takes certain mutex semaphores, the engine's web server is delayed while writing. (If the web server request does not need to access the channel data as in the case of the "stop" request, the server and write thread actually run concurrently, depending only on the scheduling as handled by the operating system.)
When even more values are received, the write thread cannot flush the buffers within the configured write period. If it happens rarely, it will simply result in "write thread called while busy" messages and no data loss. If the incoming data rate is too high, additional "overwrite" errors will occur, data is lost.
The web server will always have to wait for the write thread to finish. Consequently, occasional delays in a browser request are expected. While approaching the load limit, this will happen more often up to the point where the web interface does no longer respond because the write thread is constantly running and data is lost.
ChannelArchiver Manual