AIDA Troubleshooting

 

This document describes programming problems that have occurred in Aida development, together with their solutions, as a reference for future development. 

Run-time Problems

This section describes problems that may occur at Aida run-time. 

General Runtime, not specific to client or server.

Problem: At server image activation, the following errors are reported:
Error [125] in bind() call!
err:: Address already in use
Socket transport failed to init.
Transport dt_socket failed to initialize, rc = -1.
FATAL ERROR in native method: No transports initialized
Cause: Another process on the same host is already using the port that the aida server wants to use. Aida server ports are assigned in their *Server.conf file, so that Aida servers can be made "persistent" CORBA servers.
Solution: Search port assignments reported by netstat or lsof for the port assinged to the Aida server in its *Server.conf file. In the following case a process CaRepeater had stolen the port. Then arrange to change the port of the Aida server, or the other process, so there is no longer a conflict.
[mccas0]:perf/common/tool> more ${CD_CONFSYS}/${AIDA_CA_NAME}.conf
...
ooc.orb.oa.endpoint=iiop --port 58996

[mccas0]:perf/common/tool> lsof -i -P | grep 58996
caRepeate   906  cddev   11u  IPv4 0x301....56e8        0t0  TCP *:58996 (LISTEN)
caRepeate   906  cddev   12u  IPv4 0x301....0628      0t170  TCP somehost.slac.stanford.edu:58996->somehost.slac.stanford.edu:50797 (BOUND)

 

Problem: At image activation, the following exception is thrown: 

Exception java.lang.NoSuchMethodError: org.omg.PortableInterceptor.IORInterceptorOperations.components_established(Lorg/omg/PortableInterceptor/IORInfo;)V
at com.ooc.OB.PIManager.componentsEstablished(PIManager.java:634)
at com.ooc.OBPortableServer.POA_impl.init(POA_impl.java:479)
at com.ooc.OBPortableServer.POA_impl.<init>(POA_impl.java:676)
at com.ooc.OB.ORBControl.initializeRootPOA(ORBControl.java:580)

Cause: ORBACUS non-compliance with Java 1.4
Solution: Recompile your code under Java 1.3. On Solaris this means just "setenv JAVAVER 1.3" prior to compilation.

 

Client run-time problems

Problem: Exception java.lang.NoClassDefFoundError
Cause: The run-time environment can't find a class
Solution: 
  • Check the pathname used in the "java" command. 
  • Check the CLASSPATH environment variable in effect, has all class file directories and .jar files you need (eg OB.jar, classess111.jar etc). 
  • If in an IDE, check that the CLASSPATH within the IDE is set correctly. 

 

Problem: Exception org.omg.CORBA.BAD_PARAM

Cause: The package name with which the client was compiled did not match the package name with which the server was compiled. In this case the server was just defined as in "package Slc", but the client had been built using CORBA client side code generated from IDL in which the full package name "edu.stanford.slac.aida.slc" was used. This caused the id() check in the <interface>.narrow( OR ) call to fail, which throws CORBA.BAD_PARAM.
Solution: Make sure the package names match. This may mean changing the IDL and re-IDL compiling the server, and restarting it so that it inserts a corrected Object Reference (which includes the package name) into the Stringified Object Reference (SOR) it publishes. The package name should be specified in the jidl compile line, and there should be no #prefix directive in the IDL file (which overrides --prefix-package): Eg jidl --prefix-package da edu.stanford.edu.aida idl\da.idl 

 

Problem: ORB<property> unknown message issued by CORBA.ParseArgs
Cause: You are using a version of the orb which does not recognize the property. Perhaps property has become deprecated, or is not you are using an old orb that doesn't recognize the property.
Solution: Check valid properties for orb for the version of the ORb you are using. Check you are using the version of the ORB you think you are.

 

Server Run-time problems

IMR Problems

Problem: An IMR client, like the IMR Console, can't start a server. 
Symptom: The server may appear to start momentarily, but then goes away.
Cause: bat file that starts the server produces output. 
Solution: Put a "@echo off" at the top of the bat file
Cause: The CLASSPATH under which the server is started remotely, is incorrect or incomplete.
Solution: Check the CLASSPATH on the host process running server. Note that, for NT based servers being run by the IMR, the CLASSPATH which is used is the CLASSPATH set from the System control panel, not any CLASSPATH set by the bat file used to start the server by the IMR such as Start<servername>Server.bat. If you change the required CLASSPATH of a server executed remotely by the IMR, you must change the CLASSPATH on the host of every oad that may run the server, and you must restart IMR in a new window so that it gets the new CLASSPATH.
Symptom: 

[ IMR: exec: ca: success ]
[ IMR: updating: ca state: stopped ]

Cause: The server MAX SPAWN Count has been exceeded. In particular, check that an instance of the server isn't already running on the host as a background process which is not showing up in the IMR console for whatever reason. 

Diagnose on unix with: slcs1> ps -ef | grep greg, looking for for instance java jdk's running in the background.

Solution: kill all the background processes of the server already running before attempting to restart a new instance from the IMR. E.g. slcs1> ps -ef | awk '/greg.*jdk1/ {print "kill "$2}' > killgreg.sht

 

Problem: Cant remove a non-existent server from the IMR.
Symptom: If you stop an OAD which the IMR thinks is running a server which in fact not running (for whatever reason), then you can't "Stop..." or "Delete..." the server from the IMR console. You get contradictory messages, you can't stop the server because it says its not running (which is true) and you can't delete the server because the IMR says its running!
Cause: Unknown
Solution: Restart the OAD on the host. This prompts the IMR to update its database, and mark the server "not-running".

 

Problem: A restarted IMR can't make contact with OAD on some host
Symptom: After restarting the IMR, say to change from administrative mode to non-admin, the IMR cannot contact the OAD on some host, and issues "warning: IMR: Could not contact OAD at: corbaloc::<host>:/OAD
Cause: Unknown
Solution: Restart the OAD on the host. The OAD restart does successfully contact the IMR and tell it its running. 

 

Problem: OAD cannot be contacted by IMR after IMR restart
Cause: Unknown. Of course check that the ooc.imr.*port* settings match in both OAD and IMR conf files.
Solution: Restart the OAD on each host which could not be contacted after restarting the IMR. Ie, on each host, issue a command of the form:
imr -ORBconfig %AIDASCRIPT%\oad.conf 

 

Problem:  imradmin operation hangs (never completes) 
Cause: Unknown. 
Solution: check that the IMR process itself doesn't need a RETURN typed to it!

 

Problem:  Client can't contact IMR
Symptom: exception message "IMRDomain not currently reachable"
Cause: ports don't agree on client and server
Solution: Check port specification, perhaps add 10000 port number spec.

 

SLC Data Source Servant

Problem: java.lang.UnsatisfiedLinkError
MCCDEV> java "edu.stanford.slac.aida.slc.Server"
java.lang.UnsatisfiedLinkError: no CorbaDBShr in java.library.path
at java.lang.ClassLoader.loadLibrary (ClassLoader.java:1325) (pc 343)

 

Cause: CorbaDBShr was not defined
Solution: Run @SETLOGICAL.COM to do the define/log CorbaDBShr. Make sure CorbaDBShr exists where the logical points to.

 

Problem: java.lang.UnsatisfiedLinkError
MCCDEV> java "edu.stanford.slac.aida.slc.Server"
Server ready
Init called!
java.lang.UnsatisfiedLinkError
        at edu.stanford.slac.aida.slc.SlcI_impl.DbInit
        at edu.stanford.slac.aida.slc.SlcI_impl.Init (SlcI_impl.java:47) (pc 27)

 

Cause: A JNI routine couldn't be resolved at runtime because the VMS shareable in which it was defined hadn't been rebuilt with the correctly defined UNIVERSAL name for the JNI routine. In this case, the long package name "edu.stanford.slac.aida" had caused the fully qualified name of the JNI routine dbInit to be longer than 31 characters. Names longer than 31 chars are automatically shortened by the Java and javah compiler down to 31 characters. Care must be taken to compile the .c source code implementing the JNI routines with the correct qualifier (/name=(shortened, as_is) to make sure the C compiler produces the same shortened symbol name, and to use SCAN_GLOBALS_FOR_OPTION.COM to build a .OPT file which correctly defines the UNIVERSAL symbols when linking the shareable. 
Solution:  First check that the name of the called function as defined in the .c file matches the name defined in the output of the javah compiler (the .h file) in length and case exactly. If that doesn't solve it, check that the name is <31 characters, and if its longer, that the correct, shortened, names are defined in the shareable being called.

 

Problem: server never gets up and says "Server up and ready"
Cause: Some other process is holding the Oracle db lock while the SLC server is trying to put its OR into the db.
Solution: Find the process holding the Oracle db lock, and stop it. Eg check SQLPlus is not holding lock on interfaces table.

Unix

Problem: command line in shell script appears to have become garbled. 
Cause: 1) Unix (on slcs1 anyway) seems to have a limited command line buffer, possibly 512 characters. Check that after variable expansion the line that must be interpreted is still < 512 characters. 

2) Also check whether any environment variable has used has a hidden character, perhaps a CR at the end do to being defined in a script that was mistakenly edited on NT before being executed on Unix! 

Solution: shorten number of characters in command line. Eg, remove -classpath and use CLASSPATH env variable instead. 

 

Problem: cvs commands causing unexpected results or failing to take effect
Cause: AFS token expired
Solution: Acquire new AFS token, and re-issue CVS command

 

Packaging and Deployment

Jar file packaging

Problem: java -jar <jarfile> command results in "Failed to load Main-Class manifest attribute"
Eg:
Cause: The format of the contents of the manifest template file which specifies the main class is very particular. There must be a <CR> at the end of the line, even if there is only one line (the one containing the name of the main class).
Solution: Add a <CR> to the end of the Main-Class line. 

 

Problem: jar file packaging command, given with the m option to include a given manifest template file that specifies a main class, results in "java.io.IOexception:invalid header field name: Main-Class
Eg: 
Cause: The Main-Class attribute in the mainfiest template file, in the above case called MainClass.txt, had a space before the ":" (!)
Solution: Remove the space between "Main-Class" and ":" in the manifest file.

 

Compile Time

VMS

Problem: Can't find a class.
Eg MCCDEV> javac -classpath ".;udslc/greg/dev/aida" @allslc.list
edu/stanford/slac/aida/slc/SlcIPOA.java:21: Superclass org.omg.PortableServer.Se
rvant of class edu.stanford.slac.aida.slc.SlcIPOA not found.
extends org.omg.PortableServer.Servant
Cause: Incorrect CLASSPATH, or JAVA$CLASSPATH
Solution: Don't override classpath unless you're sure JAVA$CLASSPATH is wrong or incomplete. Check the logical JAVA$CLASSPATH, in the JOB table. Eg just MCCDEV> javac @allslc.list

 

Problem: Wrong or no package name alert
Eg error: File ./edu/stanford/slac/aida/slc/SlcI_impl.class does not contain type e
du.stanford.slac.aida.slc.SlcI_impl as expected, but class Slc.SlcI_impl. Please
 remove the file, or make sure it appears in the correct subdirectory of the cla
ss path.
edu/stanford/slac/aida/slc/Server.java:16: Class edu.stanford.slac.aida.slc.SlcI
_impl not found.
      SlcI_impl slciImpl = new SlcI_impl();
      ^
Cause: In the .java file the "package" directive at the top of the file was wrong, it read just "package Slc", not "package edu.stanford.slac.aida.slc;"
Solution: Correct the package name in the package directive at the top of the .java file.

 

Problem: Undefined Symbols at link
Eg MCCDEV> @buildclib
%LINK-W-NUDFSYMS, 3 undefined symbols:
%LINK-I-UDFSYM,         Java_edu_stanford_slac_aida_slc_SlcI_1impl_DbGet
%LINK-I-UDFSYM,         Java_edu_stanford_slac_aida_slc_SlcI_1impl_DbInit
%LINK-I-UDFSYM,         Java_edu_stanford_slac_aida_slc_SlcI_1impl_DbPut
%LINK-W-USEUNDEFSYMV, undefined symbol Java_edu_stanford_slac_aida_slc_SlcI_1impl_DbInit referenced 
        in symbol vector option
%LINK-W-USEUNDEFSYMV, undefined symbol Java_edu_stanford_slac_aida_slc_SlcI_1impl_DbGet referenced
        in symbol vector option
%LINK-W-USEUNDEFSYMV, undefined symbol Java_edu_stanford_slac_aida_slc_SlcI_1impl_DbPut referenced
        in symbol vector option
%DCL-I-SUPERSEDE, previous value of CORBADBSHR has been superseded
Cause: Note that in this case the undefined symbols were in the symbol vector option. The reason is that the symbols are > 31 characters long, and the linker .OPT file that defined them as UNIVERSAL refered to them with their full name. But java shortens names over 31 charaacters. The VMS utility DCL file SCAN_GLOBALS_FOR_OPTION.COM should be run on the .OBJ file to build the .OPT file, translating the long names to ones < 31 characters using the same algorithm as used by the javac compiler.
Solution: Get SCAN_GLOBALS_FOR_OPTION.COM and run it on the .obj files which define the functions whose symbols are undefined in the message. SCAN_GLOBALS_FOR_OPTION.COM requires JAVA$BUILD_OPTION.EXE and JAVA$STUBS_DEFINED.EXE, and both executables must be in the working directory from which SCAN_GLOBALS_FOR_OPTION.COM is run. SCAN_GLOBALS_FOR_OPTION.COM is part of the JNI_EXAMPLES saveset distributed from Compaq. It may be found in udslc:[rcs.java.exampes]

 


[Aida Home Page][SLAC Controls Software Group][ SLAC Home Page]

Author:  Greg White, 15-Jul-2001
Modified by: