gdb.info: Character Sets

Go backward to Dump/Restore Files
Go up to Data
Go to the top op gdb

Character Sets

If the program you are debugging uses a different character set to
represent characters and strings than the one GDB uses itself, GDB can
automatically translate between the character sets for you.  The
character set GDB uses we call the "host character set"; the one the
inferior program uses we call the "target character set".
   For example, if you are running GDB on a GNU/Linux system, which
uses the ISO Latin 1 character set, but you are using GDB's remote
protocol (*note Remote Debugging: Remote.) to debug a program running
on an IBM mainframe, which uses the EBCDIC character set, then the host
character set is Latin-1, and the target character set is EBCDIC.  If
you give GDB the command `set target-charset EBCDIC-US', then GDB
translates between EBCDIC and Latin 1 as you print character or string
values, or use character and string literals in expressions.
   GDB has no way to automatically recognize which character set the
inferior program uses; you must tell it, using the `set target-charset'
command, described below.
   Here are the commands for controlling GDB's character set support:
`set target-charset CHARSET'
     Set the current target character set to CHARSET.  We list the
     character set names GDB recognizes below, but if you type `set
     target-charset' followed by <TAB><TAB>, GDB will list the target
     character sets it supports.
`set host-charset CHARSET'
     Set the current host character set to CHARSET.
     By default, GDB uses a host character set appropriate to the
     system it is running on; you can override that default using the
     `set host-charset' command.
     GDB can only use certain character sets as its host character set.
     We list the character set names GDB recognizes below, and
     indicate which can be host character sets, but if you type `set
     target-charset' followed by <TAB><TAB>, GDB will list the host
     character sets it supports.
`set charset CHARSET'
     Set the current host and target character sets to CHARSET.  As
     above, if you type `set charset' followed by <TAB><TAB>, GDB will
     list the name of the character sets that can be used for both host
     and target.
`show charset'
     Show the names of the current host and target charsets.
`show host-charset'
     Show the name of the current host charset.
`show target-charset'
     Show the name of the current target charset.
   GDB currently includes support for the following character sets:
`ASCII'
     Seven-bit U.S. ASCII.  GDB can use this as its host character set.
`ISO-8859-1'
     The ISO Latin 1 character set.  This extends ASCII with accented
     characters needed for French, German, and Spanish.  GDB can use
     this as its host character set.
`EBCDIC-US'
`IBM1047'
     Variants of the EBCDIC character set, used on some of IBM's
     mainframe operating systems.  (GNU/Linux on the S/390 uses U.S.
     ASCII.)  GDB cannot use these as its host character set.
   Note that these are all single-byte character sets.  More work inside
GDB is needed to support multi-byte or variable-width character
encodings, like the UTF-8 and UCS-2 encodings of Unicode.
   Here is an example of GDB's character set support in action.  Assume
that the following source code has been placed in the file
`charset-test.c':
     #include <stdio.h>
     char ascii_hello[]
       = {72, 101, 108, 108, 111, 44, 32, 119,
          111, 114, 108, 100, 33, 10, 0};
     char ibm1047_hello[]
       = {200, 133, 147, 147, 150, 107, 64, 166,
          150, 153, 147, 132, 90, 37, 0};

main ()
{
printf ("Hello, world!\n");
}

   In this program, `ascii_hello' and `ibm1047_hello' are arrays
containing the string `Hello, world!' followed by a newline, encoded in
the ASCII and IBM1047 character sets.
   We compile the program, and invoke the debugger on it:
     $ gcc -g charset-test.c -o charset-test
     $ gdb -nw charset-test
     GNU gdb 2001-12-19-cvs
     Copyright 2001 Free Software Foundation, Inc.
     ...
     (gdb)
   We can use the `show charset' command to see what character sets GDB
is currently using to interpret and display characters and strings:
     (gdb) show charset
     The current host and target character set is `ISO-8859-1'.
     (gdb)
   For the sake of printing this manual, let's use ASCII as our initial
character set:
     (gdb) set charset ASCII
     (gdb) show charset
     The current host and target character set is `ASCII'.
     (gdb)
   Let's assume that ASCII is indeed the correct character set for our
host system -- in other words, let's assume that if GDB prints
characters using the ASCII character set, our terminal will display
them properly.  Since our current target character set is also ASCII,
the contents of `ascii_hello' print legibly:
     (gdb) print ascii_hello
     $1 = 0x401698 "Hello, world!\n"
     (gdb) print ascii_hello[0]
     $2 = 72 'H'
     (gdb)
   GDB uses the target character set for character and string literals
you use in expressions:

(gdb) print '+' $3 = 43 '+' (gdb)

   The ASCII character set uses the number 43 to encode the `+'
character.
   GDB relies on the user to tell it which character set the target
program uses.  If we print `ibm1047_hello' while our target character
set is still ASCII, we get jibberish:
     (gdb) print ibm1047_hello
     $4 = 0x4016a8 "\310\205\223\223\226k@\246\226\231\223\204Z%"
     (gdb) print ibm1047_hello[0]
     $5 = 200 '\310'
     (gdb)
   If we invoke the `set target-charset' followed by <TAB><TAB>, GDB
tells us the character sets it supports:
     (gdb) set target-charset
     ASCII       EBCDIC-US   IBM1047     ISO-8859-1
     (gdb) set target-charset
   We can select IBM1047 as our target character set, and examine the
program's strings again.  Now the ASCII string is wrong, but GDB
translates the contents of `ibm1047_hello' from the target character
set, IBM1047, to the host character set, ASCII, and they display
correctly:
     (gdb) set target-charset IBM1047
     (gdb) show charset
     The current host character set is `ASCII'.
     The current target character set is `IBM1047'.
     (gdb) print ascii_hello
     $6 = 0x401698 "\110\145%%?\054\040\167?\162%\144\041\012"
     (gdb) print ascii_hello[0]
     $7 = 72 '\110'
     (gdb) print ibm1047_hello
     $8 = 0x4016a8 "Hello, world!\n"
     (gdb) print ibm1047_hello[0]
     $9 = 200 'H'
     (gdb)
   As above, GDB uses the target character set for character and string
literals you use in expressions:

(gdb) print '+' $10 = 78 '+' (gdb)

   The IBM1047 character set uses the number 78 to encode the `+'
character.