Some questions and answers on tagged containers ----------------------------------------------- 98.06.16 version - original (Any future changes will be flagged explicitly.) Most of these questions were originally submitted in their present form in late April. Some of them have their roots in earlier versions of the tagged container software, but I've continued to include them as I believe the answers are still of use and interest. Most of the answers date from 2 June. Some minor edits have been performed on both the questions and answers since then, mostly in order to make them more tightly connected to current TC technology, and to extend some of the answers in response to followup questions. The original set of questions came from Al Eisner of the SVT group. I thank him for posing them and for helping me make sure that my answers were illuminating, as well as for granting permission for the exchange to be made available to the public. * * * > 1) For some time I've had some incomplete "temporary" documentation on > OdfContainer. By poking around yesterday from the DataFlow web page, > I found an undated revised version of this, which at least has most > headings filled in. Is this current documentation? Or, perhaps a better > way to ask this: is it the most recent and is it reasonably accurate? The best available documentation - which isn't very good - is the combination of the old OdfContainer-dfa.html file in the OdfContainer package, the migration guide Migration-9805.html in that package, and these answers. Brian Naranjo is working on proper merged documentation returned to the original FrameMaker format. > I'll assume the answer to one of those questions is yes, and continue on > that basis. A lot of the document is clear, especially the introductory > portions. However, I've had difficulty understanding some things, not > necessarily (although possibly) because of the concepts, but at least in > part because of innocuous-seeming English words that are being used in > manners less than clear to me. So, part of what I'm after here is a > glossary. I'll start with some questions regarding TypeId's, since that > seems rather central. > > 2) What is a secondary key for TypeId's in practice? That is, could you > provide a concrete example? Secondary IDs were requested by Dave Brown for use in calibration. Until the V00-03-xx revisions to OdfContainer and the deployment of the DataFlow alpha release there were no known specific uses, so I couldn't describe them. We are now using them in a few places. The most important present example is the input container. The input container at the segment level has (though this is still in flux) a primary identity of "odfXTC", a secondary identity of "Any" (i.e., the default, or, in its implementation, zero), a primary contains ID of "odfElement", and a secondary contains ID of {subsystem}. If you were to look at the code which build thems, schematically you would see something like _identity = odfTypeIdQualified( odfTypeRep::primary() ); _contains = odfTypeIdQualified( odfTypeRep::primary(), odfTypeNumSecondary::Id_Svt ); for SVT input containers, although this is not exactly how the input container's initialization is done internally. As you can see, the new version of the system includes an odfTypeNumSecondary class that defines specific secondary key values. > 3) The document states that the base class odfTC can only contain children > of one type. Is that also true for derived classes? For extensions? I > suspect the answer is Yes, since an iterator would otherwise be difficult. Yes, the policy is that all TCs must be homogeneous. To be completely pedantic about it, the real policy is that TCs *when viewed through iterators* must appear homogeneous. You are perfectly free to include additional data in an odfTC or odfXTC subclass, and access it through member functions of the specific subclass. This might, typically, be summary data describing the set of children in the TC. In some cases, for example for subsystems where chip and channel data are inextricably merged in the input container (such as the SVT), it has been possible to write more than one type of iterator for a given container, which each iterator returning a specific type of data, either chips or channels. Even then, however, through any given iterator, the container still looks homogeneous. > 4) Under "How to deal with TypeIds", it says that each subsystem should > assign and register its own primary Ids. Does that include even such > basic types as d_Ushort? Also, the recommendation appears to be that > there be an independent set per package, rather than per subsystem, > although the base values are labelled as being per subsystem. Your summary is correct; we are aware of that mismatch. The rationale was that it would have been too complicated to have assigned allocation ranges in advance for all packages, or to have assigned a new range every time a new package was created. However, it was also required that the packages themselves be self-contained at compile and link time. In order to bridge this gap, it seemed like a reasonable request to make of a subsystem that it provide some central administration of its IDs. I would in general suggest that internal subdivisions of the subsystem's allocation range by package would probably work well, but I don't want to enforce that. See the migration guide for information on how to register TypeIDs. The fundamental types unsigned char, signed char, d_[U]Short, d_[U]Long, d_Float, and d_Double are centrally registered automatically, if you include the OdfContainer library in your link. > 4-followup) If I need the TypeId for one of the fundamental types you > list, is the proper procedure to use odfTypeRep::primary()? What > happens if a type is not yet registered? If a type T is not registered, you will get a link error for any use of odfTypeRep::primary(). Registration is done by static initialization, which is a link-time operation. In general, you may use either "odfTypeRep::primary()" or a construction of the form "odfTypeIdPrimary( odfTypeNumPrimary::Id_T )" to obtain the primary TypeId for the fundamental types and for the core OdfContainer, OdfContainerTools, and Odf classes. The advantages of the second form are that it does not require type registration, and that it compiles to fewer instructions. The advantage of the former is that it can be used in writing a class templated on T. As of this writing (98.6.2), the following are available through odfTypeNumPrimary: Id_unsigned_char Id_signed_char Id_d_UShort Id_d_Short Id_d_ULong Id_d_Long Id_d_Float Id_d_Double Id_odfTC Id_odfXTC Id_odfTestXTC Id_odfSimpleXTC Id_odfContigXTC Id_odfFlatTCBase Id_odfElement Id_odfMap Id_odfMapNode Id_odfMapChannelSet > 5) In the first sentence for odfTypeIdQualified in the reference manual, > what do "fronting" and "fully qualified" mean? "Fronting" is just jargon. What I had in mind is that primary and qualified TypeIds are both just "really" d_ULong values, and the two different classes, odfTypeIdPrimary and odfTypeIdQualified, are just facades for this underlying representation. "Fully qualified" is sloppy writing. "Qualified" would be the correct version. > 6) The constructors for odfTypeIdQualified and some other classes take > as arguments objects of types like odfTypeIdPrimary_t. What are these? > (I didn't find them in the documentation, and they don't appear in the > file names for the current OdfContainer package.) They are just typedefs for d_ULong. They are defined in OdfContainer/odfTypeId.hh . > 7) In the summary of odfTypeRep, what does "polymorphic" mean? If one > actually had multiple objects of this class (for a particular T), wouldn't > the accessors like getSize() give the same answer for all of them? > More generally, what is a "polymorphic delete" (e.g., as near the end > of the Arena section of the introduction)? "Polymorphic" is used in its usual C++ sense. In this particular case, there is a single common base class for all odfTypeRep's, namely odfTypeRepBase. You can use treat any odfTypeRep object as if it were an object of type odfTypeRepBase, and still use getPrimary(), getSize(), etc. on it. There is no application for the base class in currently released code, but there will be later. For any given T, of course, you are correct. Regardless of the number of instances you have instantiated of a specific odfTypeRep class, all of their getPrimary member functions, for example, will return the same value. "Polymorphic delete" is more C++ jargon. It is an attempt to delete an object of a class through a pointer to a base class of that class. If the classes in question have virtual destructors, this is permissible. You may write class Base { // ... virtual ~Base(); // ... }; class Derived : public Base { // ... virtual ~Derived(); // ... }; // ... // Create an object of class Derived, but treat it as a Base. Base* bp = new Derived(); // ... // Delete the object. delete bp; This is not allowed for tagged containers, as they do not and may not under any circumstances have virtual destructors or any other virtual functions. Therefore, you must not use polymorphic deletes on tagged containers. In fact, you will probably almost never find an occasion under which it would be appropriate to use "delete" on a tagged container at all, as the extended object system means that most TCs do not own their own storage, but rather are created within some "arena" that owns the storage. You would be more likely to use "delete" on a arena than on the TC itself. Use of "delete" on a TC with extended data is likely to result in a memory leak. > Next, a few things on arenas, etc. > > 8) In the construction of an odfExtendedObject, is EXTENSION a maximum size > which must therefore be known in advance? The Reference section on this > class says that it is "intended to be used with odfTransition as its > template argument". Does that mean it is irrelevant to a pure-offline > application such as Digi-to-Output TC? If so, what is the offline > alternative? Or is none needed? Yes, EXTENSION is a fixed maximum size which must be known at compile time. odfExtendedObject is probably less useful since the V00-03-xx changes to OdfContainer. However, it could still be used as an alternative to odfSimpleArena. It's usable for creating an extended object of any subclass of odfXTC which has a default constructor. So, if your output TC has a default constructor which allows you to create an empty instance of the TC, to which you can then append its data items, you could use odfExtendedObject along with odfXTCBufferArena to manage memory for the extended data. > 9) Actually, I'm confused about something else here: arenas are > introduced seemingly with the notion of extending the length of an > object, implying an extent larger than sizeof(); whereas the concrete > arena examples discussed seem to refer to reserving chunks of memory, > with a particular sizeof(), but a smaller extent. Could you clarify > this? Only the odfExtendedObject/odfXTCBufferArena scheme has the "reverse" behavior you describe. In any arena application, something must reserve a "big block" of memory as the starting point for any work - in particular for any "new(odfArena)" operations. In the odfSimpleArena case, a completely empty block of memory is reserved by creating an object of that class which owns the storage. It is then filled via "new(odfArena)". In the odfExtendedObject case, an instance of a specific odfXTC subclass is created with a reserved area of memory following it -- with the odfExtendedObject object actually owning all the memory. odfXTCBufferArena is then used to manage that memory. In the post V00-03-xx version of OdfContainer, the odfXTC class itself can be used as a memory allocator, albeit one without bounds checking, unlike the odfArena classes. This method of allocation must be used in order to get optimal performance in DataFlow, particularly in feature extraction. To use it, just include "OdfContainer/odfXTCNew.hh" and use the "new (odfXTC&)" or "new (odfXTC*)" operators. Storage can be allocated via the odfPool mechanisms - see the core DataFlow documentation - or via odfSimpleArena. > 10) The impression I get from "Use of odfXTCBufferArena" is that one puts > the data into the extended TC one datum at a time. Is this true, both > for this and for other methods of creating an extended TC? This is a policy decision which is entirely up to you. You can append data one item at a time or in bulk, as long as you correctly update the extent of the base odfXTC before you hand it off to any other code to process. > 11) This is in part a follow-up to question (3), but is also about the > mechanism for inserting data in the extended part of a TC, via > new(arena) (or some other unspecified mechanism). Is there any required > relationship between the data type inserted via new(arena) and the > "contains field" of the TC? Specifically: suppose we have severral > different types, each a wrapper for one fundamental type (e.g., > d_Ushort), but with different behavior. Can one build an output TC > containing in its extended portion mixed data of these types, provided > there is enough information for an Iterator to figure out how to > navigate? The impression I get from your documentation is that this is > not permissible; one might instead have to extract the d_Ushort's and | > use them directly. (Or is there a sort of casting which could be done, > i.e., is it both possible and desirable?). There is no necessary relationship at all between the types of the objects created via "new (odfArena)" or "new (odfXTC)" and the "contains" value of a container to which they may be appended. Normally one would expect that there would be such a relationship, but the real requirement is that the "usual" iterator to be applied to the container will return "T*" for the type "T" corresponding to the primary "contains" value. If you can use this mechanism to hide some casting that you do internally, and to hide a non-uniform structure of the actual data in the TC, that's OK, though you may thereby be designing a difficult to understand or maintain system. Of course, "contains" can only represent one specific type. For a homogeneous container, naturally, primary "contains" should correspond to C++ type of the contained object. (There is no general policy for the use of the secondary part of "contains", as mentioned in the answer to Question 2 above.) For a heterogeneous container, there is no official policy except in one special and important case, described below. Apart from that case, I would recommend a) rethink: do you really need a heterogeneous container, and then b) have "contains" match the "most important" or "most frequently used" contained type. A potentially important warning for the future: if/when we have "iterator registration" working, allowing automatic browsing of entire container hierarchies, only contained objects matching "contains" will be guaranteed to be accessible through such a browser. The special case is that where all the contained objects share what I'll call a "non-trivial" common base class, where the container can be navigated successfully without knowing the concrete types of its children, and where the base class of the children makes available enough information to make the concrete children useful through that base class. In some cases of this sort, it may be reasonable for "contains" to match the base class rather than the concrete derived classes. The canonical example of this is that "odfFlatTCBase" is considered an acceptable value for "contains" for a container that in fact contains several objects of differing types "odfFlatTC" (i.e., with different specific "T"s) -- all of which in fact inherit from odfFlatTCBase. If the outer container can be iterated over in terms of odfFlatTCBase without knowing the concrete subclasses involved (as is the case for odfContigXTC), then this scheme is useful -- and in fact it provides the much-discussed "mezzanine" container for subsystems with inhomogeneous data. This has been discussed in a recent note from the DIRC group on their feature extraction software. > Next, on odfTCIterator: > > 12)What does odfTCIterator::operator*() do? It dereferences to the "current" item pointed to by the iterator. Instead of returning a pointer to that item, it returns a reference. odfTCIterator it; // ... initialize iterator ... // Two ways to access the third item in a container: // (Remember the post-increment semantics of the next() operator!) // 1) Foo* fp = it.first(); // point to the first item Foo* fp = it.next(); // point to the first item, advance to the second Foo* fp = it.next(); // point to the second item, advance to the third Foo* fp = it.next(); // point to the third item, advance to the fourth Foo f = *fp; // copy over the third item // 2) Foo* fp = it.first(); // point to the first item Foo* fp = it.next(); // point to the first item, advance to the second Foo* fp = it.next(); // point to the second item, advance to the third Foo f = *it; // copy over the third item Note that operator* has no way of returning a null value to indicate "error" or "off the end", so under these circumstances using that operator is not allowed and can lead to an addressing exception or other fatal error. > 13)What is the role of the odfAdrContext argument to odfTCIterator::use? > Does it say where to start? Or is it used purely to associate addresses > with the TC itself (and hence make it possible for the iterator to return > the addresses at any position)? The latter. It is not expected that every node in the full event's TC hierarchy will itself contain enough data for an iterator to be able to return full dAdr and pAdr values without a "hint" from the odfAdrContext. > 14) What does "clone" mean, both generally and in the context of the create() > documentation? The description of create() says that an uninitialized > copy is created, i.e., without even a TC specified; that hardly fits my > naive reaction to the word. In the specific context of odfTCIterator and its related classes, the "clone" returned by the "create()" operator is an object of the same concrete subclass of odfTCIterator as the object on which create() was invoked, though returned by pointer to the base class. "Clone" is fairly widely used in the C++ literature to describe this operation. It does not in general carry a definite implication as to whether the entire state of the "cloned" object should be copied. Note that the word "uninitialized" you mentioned above referred only to the fact that should behave as though "use" had not been called on it. It _will_ have whatever initialization is done by the iterator's constructor(s). The motivation for this scheme is the creation of an "iterator registry", in which instances of iterators are stored, indexed by {identity, contains} key pairs. One would not want these iterators to be pre-associated with specific containers, but rather just to be read to have use() called on them (or on further clones of them) as required for, e.g., browsing a container hierarchy. > Miscellany: > > 15)What is a "self-relative pointer"? An ordinary pointer points to an absolute memory address. This is clearly inappropriate for inclusion in data that is to be transmitted around "over the wire" and recreated on remote machines, or even for data in shared memory, which may be mapped at different addresses in different processes. The basic solution to this is the relative pointer -- an offset from some known base address. One way of locating data items within an extended object might be relative to the beginning of the object, for instance. As long as this convention is known, it will work on virtually all platforms at virtually all absolute addresses at which the data may be loaded. However, this scheme requires a convention defining the location of the base against which the relative pointers will be applied. A simple and globally applicable convention is that of the self-relative pointer. The SRP is a value which is to be interpreted as an offset from the address of the SRP itself in memory, rather than from some outside "base point". The odfSelfRelPointer templated class allows the construction of type-safe self-relative pointers using any fundamental signed type as the offset. Users are responsible for checking that the offsets don't exceed the numeric limits of the offset type. Remember, the offsets are the distance in memory between a SRP and the object to which it points. If, for instance, the SRP is on the stack (an automatic variable) and the data to which it points is on the heap, on some platforms these address spaces are very far apart and only "wide" offset types will be usable. "Narrow" SRPs (based on "signed char" or "signed short"/d_Short) should be used in a manner which avoids the creation of automatic temporaries; in particular, they should never be passed to or returned from functions by value! Time permitting, bounds checking on SRPs, selected at compile time, may be provided at some point. For the time being, care must be taken. > Enough on the general document, for the moment. I'll turn next to your > recent posting on "TCs in AbsEvent". > 16)In order to read this document with greater comprehension, it would > be helpful to know: what is a "handle"? A "handle" in general usually is a name for a pointer to a pointer. An odfTCHandle is a pointer to a pointer to an odfTC with extended data, with the special feature that it has the "correct" ownership properties for the storage occupied by the TC and any extended data. The "correct" properties may vary depending on how the TC and the odfTCHandle were created; specific subclasses of odfTCHandle have different specific ownership properties. It is intended, however, that the odfTCHandle base class can be used safely pretty much as though it were an ordinary C++ object. In particular, it may be deleted (subject to other conventions and policies, such as the usual BaBar policy that all objects in an AbsEvent are owned by the event and may not be deleted singly) with the assurance that no memory leak is being created. ----------------------------------------------------------------------------- Gregory Dubois-Felsmann