Wednesday, November 26, 2008

my VIM setting

" All system-wide defaults are set in $VIMRUNTIME/debian.vim (usually just
" /usr/share/vim/vimcurrent/debian.vim) and sourced by the call to :runtime
" you can find below. If you wish to change any of those settings, you should
" do it in this file (/etc/vim/vimrc), since debian.vim will be overwritten
" everytime an upgrade of the vim packages is performed. It is recommended to
" make changes after sourcing debian.vim since it alters the value of the
" 'compatible' option.

" This line should not be removed as it ensures that various options are
" properly set to work with the Vim-related packages available in Debian.
runtime! debian.vim

" Uncomment the next line to make Vim more Vi-compatible
" NOTE: debian.vim sets 'nocompatible'. Setting 'compatible' changes numerous
" options, so any other options should be set AFTER setting 'compatible'.
"set compatible

" Vim5 and later versions support syntax highlighting. Uncommenting the next
" line enables syntax highlighting by default.
syntax on

" If using a dark background within the editing area and syntax highlighting
" turn on this option as well
"set background=dark

" Uncomment the following to have Vim jump to the last position when
" reopening a file
if has("autocmd")
au BufReadPost * if line("'\"") > 0 && line("'\"") <= line("$")
\| exe "normal g'\"" | endif

" Uncomment the following to have Vim load indentation rules according to the
" detected filetype. Per default Debian Vim only load filetype specific
" plugins.
if has("autocmd")
filetype indent on

set wm=8 " set wrapmargin
set nohls " turn off highlight on search
set et " turn on expand tab
" " for Makefiles
" " added some special formatting in Makefiles
autocmd BufEnter ?akefile* set noet ts=8 sw=8 nocindent list lcs=tab:>-,trail:x
" for source code
autocmd BufEnter *.cpp,*.h,*.c,*.java,*.pl set et ts=3 sw=3 cindent
" change the filetype
autocmd BufEnter *.pro,*.prolog set et ts=3 sw=3 cindent ft=prolog
" for html
autocmd BufEnter *.html set et ts=3 sw=3 wm=8 nocindent

"set softtabstop=3
"set shiftwidth=3

" The following are commented out as they cause vim to behave a lot
" differently from regular Vi. They are highly recommended though.
set showcmd " Show (partial) command in status line.
"set showmatch " Show matching brackets.
"set ignorecase " Do case insensitive matching
set smartcase " Do smart case matching
"set incsearch " Incremental search
"set autowrite " Automatically save before commands like :next and :make
"set hidden " Hide buffers when they are abandoned
"set mouse=a " Enable mouse usage (all modes) in terminals

set foldmethod=syntax
set foldlevel=100
" added for taglists
let Tlist_Show_One_File=1
let Tlist_Exit_OnlyWindow=1
let g:winManagerWindowLayout='FileExplorer|TagList'
nmap wm :WMToggle<cr>
" Source a global configuration file if available
" XXX Deprecated, please move your changes here in /etc/vim/vimrc
if filereadable("/etc/vim/vimrc.local")
source /etc/vim/vimrc.local

Big endian VS little endian, which one is better ?

Which is Better?

You may see a lot of discussion about the relative merits of the two
formats, mostly religious arguments based on the relative merits of
the PC versus the Mac. Both formats have their advantages and

In "Little Endian" form, assembly language instructions for picking up
a 1, 2, 4, or longer byte number proceed in exactly the same way for
all formats: first pick up the lowest order byte at offset 0. Also,
because of the 1:1 relationship between address offset and byte number
(offset 0 is byte 0), multiple precision math routines are
correspondingly easy to write.

In "Big Endian" form, by having the high-order byte come first, you
can always test whether the number is positive or negative by looking
at the byte at offset zero. You don't have to know how long the number
is, nor do you have to skip over any bytes to find the byte containing
the sign information. The numbers are also stored in the order in
which they are printed out, so binary to decimal routines are
particularly efficient.

Monday, November 24, 2008

install puppy linux on thumb drive without cdrom

install puppy linux on thumb drive without cdrom

Puppy linux can run as live CD mode and also boot from USB. In order
to make a "linux can be taken anywhere" system, i want to install it
on thumb
drive. The standard method is to download .iso file and burn a
bootable CD. Run the install app when boot up from CD. But the problem
is : i dont
have writable cd. i just want to intstall it on the USB disk.

it is simple:

1. format thumb drive as fat filesystem.
i am not sure if it must be fat16 format. because "no bootable
operating system" error occurred when i use fat32 instead. For linux
the comamnd is
mksys.dos -t fat16 /dev/sdb1 . if you are using windows the following
tools may need.

2. get the .iso file and extract it, and copy all the following files
to thumb drive.

-r--r--r-- 1 root root 2048 2008-11-02 10:22
-rw-r--r-- 1 root root 1008 2008-10-18 15:03 boot.msg
-rw-r--r-- 1 root root 1268722 2008-11-02 10:22 initrd.gz
-rw-rw-r-- 1 1026 1026 12241 2008-11-01 13:34 isolinux.bin
-rw-r--r-- 1 root root 112 2008-11-02 10:22 isolinux.cfg
-rwxr--r-- 1 root root 95563776 2008-11-02 10:22 pup_411.sfs
-rw-r--r-- 1 root root 1627180 2008-11-02 10:18 vmlinuz

3. change the filename of isolinux.cfg to syslinux.cfg and delete the
context "pmedia=cd"

4. run syslinux /dev/sdb1 (if this is the thumb drive)

5. reboot with BIOS configured to "boot from USB disk"

the booting is a bit slow, but it will run fast when boot up. because
all the image is load to RAM during bootup stage. linux is running as
mode. so it is fast , but as you known the storage media is USB which
is flash device, so the stored process may waste a lot of time and
it is not allowed to write to flash too frequently. currently i dont
have enough time to focus on how does it optimize the system for the
media. anything to be focused ?
a. udev may need, because it will only create fs and device nod in
RAM, not USB disk.
b. check the block device driver
c. ?
d. how to create log
e. anything speciall for package management

anyway, it is a good and usefull linux distribution.

Friday, November 21, 2008

Linux Serial Console HOWTO


Have you ever needed to connect a dumb terminal (like a Wyse 50) to a Linux host? Do you need to login to a Linux server from a laptop to perform administrative functions, because there is no monitor or keyboard attached to the server? If you are accustomed to administering routers, switches, or firewalls in this manner, then you may be interested in doing the same with some of your GNU/Linux hosts. This HOWTO will explain, step-by-step, how to setup a serial console for Red Hat 9, although most of it should apply to other distributions as well.

Why did I write this document? Although there are lots of documents available on the Internet dealing with Linux serial ports, most of them seemed to be either out of date, or focused on modem dial-in/dial-out. I wanted consise documentation on how to setup simple terminal access via RS-232-C serial ports for Red Hat 9.


I was using Red Hat 9 for this test. My test machine consisted of:

  • Motherboard: Gigabyte Technology GA-7VA motherboard (Rev. 2.0)
  • Chipset: VIA KT400A
  • CPU: AMD-K7 (Duron 1400)
  • RAM: 256MB DDR333
  • Serial Ports: 2 built-in ports with 16550A UARTs, DB-9 male
  • Linux kernel: 2.4.20-24.9

Step 1: Check your system's serial support

First, let's make sure that your operating system recognizes serial ports in your hardware. You should make a visual inspection and make sure that you have one or more serial ports on your motherboard or add-in PCI card. Most motherboards have two built-in ports, which are called COM1: and COM2: in the DOS/Windows world. You may need to enable them in BIOS before the OS can recognize them. After your system boots, you can check for serial ports with the following commands:

[root@oscar root]# dmesg | grep tty
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A

[root@oscar root]# setserial -g /dev/ttyS[01]
/dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
/dev/ttyS1, UART: 16550A, Port: 0x02f8, IRQ: 3

As you can see, the two built-in serial ports are /dev/ttyS0 and /dev/ttyS1.

Step 2: Configure your inittab to support serial console logins

The /etc/inittab file must be reconfigured to allow serial console logins. You will note that the mingetty daemon is used to listen for virtual consoles (like the 6 that run by default with your keyboard and monitor). You will need to configure agetty or mgetty to listen on the serial ports, because they are capable of responding to input on physical serial ports. In the past, I have used both full-featured gettys. In this document, I will only discuss agetty, since it is already included in the default Red Hat 9 installation. It handles console/dumb terminal connections as well as dial-in modem connections.

What is a getty?

A getty is is a program that opens a tty port, prompts for a login name, and runs the /bin/login command. It is normally invoked by init.

Before you edit /etc/inittab, which is a very important config file, you should make a backup copy:

[root@oscar etc]# cp /etc/inittab /etc/ 

The required /etc/inittab additions are highlighted in red:


# System initialization.

l0:0:wait:/etc/rc.d/rc 0
l1:1:wait:/etc/rc.d/rc 1
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6

ca::ctrlaltdel:/sbin/shutdown -t3 -r now

pf::powerfail:/sbin/shutdown -f -h +2 "Power Failure; System Shutting Down"

# If power was restored before the shutdown kicked in, cancel it.
pr:12345:powerokwait:/sbin/shutdown -c "Power Restored; Shutdown Cancelled"

# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

# Run agetty on COM1/ttyS0 and COM2/ttyS1
s0:2345:respawn:/sbin/agetty -L -f /etc/issueserial 9600 ttyS0 vt100
s1:2345:respawn:/sbin/agetty -L -f /etc/issueserial 38400 ttyS1 vt100
#s1:2345:respawn:/sbin/agetty -L -i 38400 ttyS1 vt100

# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon

agetty options explained:

  • -L    force line to be local line with no need for carrier detect (when you have no modem).
  • -f    alternative /etc/issue file. This is what a user sees at the login prompt.
  • -i    do not display any messages at the login prompt.
  • 9600    serial line rate in bps. Set this to your dumb terminal or terminal emulator line rate.
  • ttyS0    this is the serial port identifier.
  • vt100    is the terminal emulation. You can use others, but VT100 is the most common or "standard". Another widely used termial type is VT102.

Possible serial line rates (sometimes called baud rates) for the 16550A UART:

  • 110 bps
  • 300 bps
  • 1200 bps
  • 2400 bps
  • 4800 bps
  • 9600 bps
  • 19,200 bps
  • 38,400 bps
  • 57,600 bps
  • 115,200 bps

I have tried all of these line rates. 9600 bps is generally O.K., and is a very common setting for networking hardware. 38,400 bps is the speed of the standard Linux console, so it is my second choice. If your dumb terminal or terminal emulator cannot handle 38,400 bps, then try 19,200 bps: it is reasonably speedy and you will not be annoyed.

Here was my custom issue file, /etc/issueserial. It uses escape sequences defined in the agetty manpage to add some useful information, such as the serial port number, line speed, and how many users are currently logged on:

Connected on \l at \b bps

Now, you must activate the changes that you made in /etc/inittab. This is done with the following command, which forces the init process to re-read the configuration file:

[root@oscar root]# init q 

Now, let's make sure that the agetty process is listening on the serial ports:

[root@oscar root]$ ps -ef | grep agetty
root 958 1 0 Dec13 ttyS0 00:00:00 /sbin/agetty -L -f /etc/issueserial 9600 ttyS0 vt100
root 1427 1 0 Dec13 ttyS1 00:00:00 /sbin/agetty -L -f /etc/issueserial 38400 ttyS1 vt100

Step 3: Test serial port login with an external dumb terminal or terminal emulator

Wyse 50b

I have tested this setup with a WYSE dumb terminal, a Linux laptop running Minicom, and Windows 2000/XP laptops running HyperTerminal. They all worked just fine.

Terminal settings:  should be 9600, N, 8, 1. Terminal emulation should be set to VT100 or VT102. Turn flow control off. If you want to use the 38,400 bps serial port on ttyS1, then your settings should be adjusted to 38400, N, 8, 1.

Cable:  To connect a laptop to the serial port on the Linux host, you need to have a null-modem cable. The purpose of a null-modem cable is to permit two RS-232 DTE devices to communicate with each other without modems between them. While you can construct this yourself, a good, sturdy manufactured null-modem cable is inexpensive and will last longer.

If you insist on making the cable yourself, then check out Nullmodem.Com for the wiring and pinout diagram.

Connectors:  Motherboard serial ports are typically male DB-9 connectors, but some serial ports use DB-25 connectors. You may need some DB-9 to DB-25 converters or gender-changers in order to connect to your terminal. For a typical laptop to server connection, a DB-9 null-modem cable should be sufficient.

Here is what you should see on the dumb terminal or terminal emulator:

Connected on ttyS1 at 38400 bps
3 users login:

Note:  If you want to be able to login via serial console as the root user, you will need to edit the /etc/securetty config file. The entries to add are highlighted in red:



Step 4: Modifying the agetty settings

If you want to change the baud rate or some other agetty setting, you will need to perform these 3 steps:

  1. Modify the /etc/inittab configuration file
  2. Activate the config change by forcing init to re-read the config file
  3. Restart the agetty daemons

Here is an example of steps 2 and 3:

[root@oscar root]# init q
[root@oscar root]# pkill agetty

Optional:  Configure serial port as THE system console

You can use options in /etc/grub.conf to redirect console output to one of your serial ports. This can be handy if you do not have a keyboard or monitor available for the Linux host in question. You can also see all of the bootup and shutdown messages from your terminal. In this example, we will make the /dev/ttyS1 port be the console. The text to add to the config file is highlighted in red:

# grub.conf generated by anaconda
title Red Hat Linux (2.4.20-24.9)
root (hd0,0)
kernel /vmlinuz-2.4.20-24.9 ro root=LABEL=/ console=ttyS1,38400
initrd /initrd-2.4.20-24.9.img

Now, if you drop your system into single user mode with the "init 1" command, you will still be able to administer the system from your serial-connected terminal. No monitor or keyboard is required!

Warning!:   The kudzu hardware detection program may "choke" on boot when the serial port becomes the console, instead of the video adapter. To remedy this situation, you should disable kudzu (assuming that your hardware is configured properly and won't be changing). This is how you would do that:

[root@oscar root]# chkconfig kudzu off
[root@oscar root]# chkconfig --list kudzu
kudzu 0:off 1:off 2:off 3:off 4:off 5:off 6:off

You should also know how to break into the Grub bootloader during system startup and edit the kernel line. By deleting the console argument from the kernel line, you can boot the system with the standard console, which uses the video card and attached keyboard. You have been warned!


Now, you should be able to login from the serial ports on your GNU/Linux host. This could be useful for maintenance or for serving a whole room full of dumb terminals. In the future, I will investigate a PCI multiport serial card in the latter role.

Have fun!

Saturday, November 15, 2008

the max nubmer of threads can be created in linux


kernel/fork.c: fork_init()
max_threads = mempages / ( 8 * THREAD_SIZE / PAGE_SIZE);

Itanium C++ ABI: Exception Handling

Itanium C++ ABI: Exception Handling ($Revision: 1.22 $)



In this document, we define the C++ exception handling ABI, at three levels:
  1. the base ABI, interfaces common to all languages and implementations;
  2. the C++ ABI, interfaces necessary for interoperability of C++ implementations; and
  3. the specification of a particular runtime implementation.

This specification is based on the general model described roughly in the Itanium Software Conventions and Runtime Architecture Guide. However, the Level I (base ABI) specification here contradicts that document in some particulars, and is being proposed as a modification. That document describes a framework which can be used by an arbitrary implementation, with a complete definition of the stack unwind mechanism, but no significant constraints on the language-specific processing. In particular, it is not sufficient to guarantee that two object files compiled by different C++ compilers could interoperate, e.g. throwing an exception in one of them and catching it in the other.

In Section I below, we will elaborate missing details from this base document, largely in the form of specifying the APIs to be used in accessing the language-independent stack unwind facilities, namely the unwind descriptor tables and the personality routines. This specification should be implemented by any Itanium psABI-compliant system.

In Section II below, we will specify the API of the C++ exception handling facilities, specifically for raising and catching exceptions. These APIs should be implemented by any C++ system compliant with the Itanium C++ ABI. Note that the level II and level III specifications are not completed at this time.


The descriptions below make use of the following definitions:

landing pad
A section of user code intended to catch, or otherwise clean up after, an exception. It gains control from the exception runtime via the personality routine, and after doing the appropriate processing either merges into the normal user code or returns to the runtime by resuming or raising a new exception.

Base Documents

This document is based on the C++ ABI for Itanium, and the Level II specification below is considered to be part of that document (Chapter 4). See Base Documents in that document for further references.

Level I. Base ABI

This section defines the Unwind Library interface, expected to be provided by any Itanium psABI-compliant system. This is the interface on which the C++ ABI exception-handling facilities are built. We assume as a basis the unwind descriptor tables described in the base Itanium Software Conventions & Runtime Architecture Guide. Our focus here will on the APIs for accessing those structures.

It is intended that nothing in this section be specific to C++, though some parts are clearly intended to support C++ features.

The unwinding library interface consists of at least the following routines:

   _Unwind_RaiseException,   _Unwind_Resume,   _Unwind_DeleteException,   _Unwind_GetGR,   _Unwind_SetGR,   _Unwind_GetIP,   _Unwind_SetIP,   _Unwind_GetRegionStart,   _Unwind_GetLanguageSpecificData,   _Unwind_ForcedUnwind 
In addition, two datatypes are defined (_Unwind_Context and _Unwind_Exception) to interface a calling runtime (such as the C++ runtime) and the above routines. All routines and interfaces behave as if defined extern "C". In particular, the names are not mangled. All names defined as part of this interface have a "_Unwind_" prefix.

Lastly, a language and vendor specific personality routine will be stored by the compiler in the unwind descriptor for the stack frames requiring exception processing. The personality routine is called by the unwinder to handle language-specific tasks such as identifying the frame handling a particular exception.

1.1 Exception Handler Framework

Reasons for Unwinding

There are two major reasons for unwinding the stack:

  • exceptions, as defined by languages that support them (such as C++)
  • "forced" unwinding (such as caused by longjmp or thread termination).
The interface described here tries to keep both similar. There is a major difference, however.

  • In the case an exception is thrown, the stack is unwound while the exception propagates, but it is expected that the personality routine for each stack frame knows whether it wants to catch the exception or pass it through. This choice is thus delegated to the personality routine, which is expected to act properly for any type of exception, whether "native" or "foreign". Some guidelines for "acting properly" are given below.

  • During "forced unwinding", on the other hand, an external agent is driving the unwinding. For instance, this can be the longjmp routine. This external agent, not each personality routine, knows when to stop unwinding. The fact that a personality routine is not given a choice about whether unwinding will proceed is indicated by the _UA_FORCE_UNWIND flag.

To accomodate these differences, two different routines are proposed. _Unwind_RaiseException performs exception-style unwinding, under control of the personality routines. _Unwind_ForcedUnwind, on the other hand, performs unwinding, but gives an external agent the opportunity to intercept calls to the personality routine. This is done using a proxy personality routine, that intercepts calls to the personality routine, letting the external agent override the defaults of the stack frame's personality routine.

As a consequence, it is not necessary for each personality routine to know about any of the possible external agents that may cause an unwind. For instance, the C++ personality routine need deal only with C++ exceptions (and possibly disguising foreign exceptions), but it does not need to know anything specific about unwinding done on behalf of longjmp or pthreads cancellation.

The Unwind Process

The standard ABI exception handling / unwind process begins with the raising of an exception, in one of the forms mentioned above. This call specifies an exception object and an exception class.

The runtime framework then starts a two-phase process:

  • In the search phase, the framework repeatedly calls the personality routine, with the _UA_SEARCH_PHASE flag as described below, first for the current PC and register state, and then unwinding a frame to a new PC at each step, until the personality routine reports either success (a handler found in the queried frame) or failure (no handler) in all frames. It does not actually restore the unwound state, and the personality routine must access the state through the API.

  • If the search phase reports failure, e.g. because no handler was found, it will call terminate() rather than commence phase 2.

    If the search phase reports success, the framework restarts in the cleanup phase. Again, it repeatedly calls the personality routine, with the_UA_CLEANUP_PHASE flag as described below, first for the current PC and register state, and then unwinding a frame to a new PC at each step, until it gets to the frame with an identified handler. At that point, it restores the register state, and control is transferred to the user landing pad code.

Each of these two phases uses both the unwind library and the personality routines, since the validity of a given handler and the mechanism for transferring control to it are language-dependent, but the method of locating and restoring previous stack frames is language independent.

A two-phase exception-handling model is not strictly necessary to implement C++ language semantics, but it does provide some benefits. For example, the first phase allows an exception-handling mechanism to dismiss an exception before stack unwinding begins, which allows resumptive exception handling (correcting the exceptional condition and resuming execution at the point where it was raised). While C++ does not support resumptive exception handling, other languages do, and the two-phase model allows C++ to coexist with those languages on the stack.

Note that even with a two-phase model, we may execute each of the two phases more than once for a single exception, as if the exception was being thrown more than once. For instance, since it is not possible to determine if a given catch clause will rethrow or not without executing it, the exception propagation effectively stops at each catch clause, and if it needs to restart, restarts at phase 1. This process is not needed for destructors (cleanup code), so the phase 1 can safely process all destructor-only frames at once and stop at the next enclosing catch clause.

For example, if the first two frames unwound contain only cleanup code, and the third frame contains a C++ catch clause, the personality routine in phase 1 does not indicate that it found a handler for the first two frames. It must do so for the third frame, because it is unknown how the exception will propagate out of this third frame, e.g. by rethrowing the exception or throwing a new one in C++.

The API specified by the Itanium psABI for implementing this framework is described in the following sections.

1.2 Data Structures

Reason Codes

The unwind interface uses reason codes in several contexts to identify the reasons for failures or other actions, defined as follows:

The interpretations of these codes are described below.

Exception Header

The unwind interface uses a pointer to an exception header object as its representation of an exception being thrown. In general, the full representation of an exception object is language- and implementation-specific, but it will be prefixed by a header understood by the unwind interface, defined as follows:

     typedef void (*_Unwind_Exception_Cleanup_Fn) 		(_Unwind_Reason_Code reason, 		 struct _Unwind_Exception *exc);      struct _Unwind_Exception { 	    uint64			 exception_class; 	    _Unwind_Exception_Cleanup_Fn exception_cleanup; 	    uint64			 private_1; 	    uint64			 private_2;     }; 

An _Unwind_Exception object must be double-word aligned. The first two fields are set by user code prior to raising the exception, and the latter two should never be touched except by the runtime.

The exception_class field is a language- and implementation-specific identifier of the kind of exception. It allows a personality routine to distinguish between native and foreign exceptions, for example. By convention, the high 4 bytes indicate the vendor (for instance HP\0\0), and the low 4 bytes indicate the language. For the C++ ABI described in this document, the low four bytes are C++\0.

The exception_cleanup routine is called whenever an exception object needs to be destroyed by a different runtime than the runtime which created the exception object, for instance if a Java exception is caught by a C++ catch handler. In such a case, a reason code (see above) indicates why the exception object needs to be deleted:

  • _URC_FOREIGN_EXCEPTION_CAUGHT = 1: This indicates that a different runtime caught this exception. Nested foreign exceptions, or rethrowing a foreign exception, result in undefined behaviour.

  • _URC_FATAL_PHASE1_ERROR = 3: The personality routine encountered an error during phase 1, other than the specific error codes defined.

  • _URC_FATAL_PHASE2_ERROR = 2: The personality routine encountered an error during phase 2, for instance a stack corruption.

    <b>NOTE</b>: Normally, all errors should be reported during phase 1 by returning from _Unwind_RaiseException. However, landing pad code could cause stack corruption between phase 1 and phase 2. For a C++ exception, the runtime should call terminate() in that case.

The private unwinder state (private_1 and private_2) in an exception object should be neither read by nor written to by personality routines or other parts of the language-specific runtime. It is used by the specific implementation of the unwinder on the host to store internal information, for instance to remember the final handler frame between unwinding phases.

In addition to the above information, a typical runtime such as the C++ runtime will add language-specific information used to process the exception. This is expected to be a contiguous area of memory after the _Unwind_Exception object, but this is not required as long as the matching personality routines know how to deal with it, and the exception_cleanup routine de-allocates it properly.

Unwind Context

The _Unwind_Context type is an opaque type used to refer to a system-specific data structure used by the system unwinder. This context is created and destroyed by the system, and passed to the personality routine during unwinding.

    struct _Unwind_Context 

1.3 Throwing an Exception

   _Unwind_Reason_Code _Unwind_RaiseException 	      ( struct _Unwind_Exception *exception_object ); 

Raise an exception, passing along the given exception object, which should have its exception_class and exception_cleanup fields set. The exception object has been allocated by the language-specific runtime, and has a language-specific format, except that it must contain an _Unwind_Exception struct (see Exception Header above). _Unwind_RaiseException does not return, unless an error condition is found (such as no handler for the exception, bad stack format, etc.). In such a case, an _Unwind_Reason_Code value is returned. Possibilities are:

  • _URC_END_OF_STACK: The unwinder encountered the end of the stack during phase 1, without finding a handler. The unwind runtime will not have modified the stack. The C++ runtime will normally call uncaught_exception() in this case.

  • _URC_FATAL_PHASE1_ERROR: The unwinder encountered an unexpected error during phase 1, e.g. stack corruption. The unwind runtime will not have modified the stack. The C++ runtime will normally call terminate() in this case.

If the unwinder encounters an unexpected error during phase 2, it should return _URC_FATAL_PHASE2_ERROR to its caller. In C++, this will usually be__cxa_throw, which will call terminate().

<b>NOTE</b>: The unwind runtime will likely have modified the stack (e.g. popped frames from it) or register context, or landing pad code may have corrupted them. As a result, the the caller of _Unwind_RaiseException can make no assumptions about the state of its stack or registers.

    typedef _Unwind_Reason_Code (*_Unwind_Stop_Fn) 		(int version, 		 _Unwind_Action actions, 		 uint64 exceptionClass, 		 struct _Unwind_Exception *exceptionObject, 		 struct _Unwind_Context *context, 		 void *stop_parameter );      _Unwind_Reason_Code _Unwind_ForcedUnwind 	      ( struct _Unwind_Exception *exception_object, 		_Unwind_Stop_Fn stop, 		void *stop_parameter ); 

Raise an exception for forced unwinding, passing along the given exception object, which should have its exception_class and exception_cleanup fields set. The exception object has been allocated by the language-specific runtime, and has a language-specific format, except that it must contain an _Unwind_Exceptionstruct (see Exception Header above).

Forced unwinding is a single-phase process (phase 2 of the normal exception-handling process). The stop and stop_parameter parameters control the termination of the unwind process, instead of the usual personality routine query. The stop function parameter is called for each unwind frame, with the parameters described for the usual personality routine below, plus an additional stop_parameter.

When the stop function identifies the destination frame, it transfers control (according to its own, unspecified, conventions) to the user code as appropriate without returning, normally after calling _Unwind_DeleteException. If not, it should return an _Unwind_Reason_Code value as follows:

  • _URC_NO_REASON: This is not the destination frame. The unwind runtime will call the frame's personality routine with the _UA_FORCE_UNWIND and_UA_CLEANUP_PHASE flags set in actions, and then unwind to the next frame and call the stop function again.

  • _URC_END_OF_STACK: In order to allow _Unwind_ForcedUnwind to perform special processing when it reaches the end of the stack, the unwind runtime will call it after the last frame is rejected, with a NULL stack pointer in the context, and the stop function must catch this condition (i.e. by noticing the NULL stack pointer). It may return this reason code if it cannot handle end-of-stack.

  • _URC_FATAL_PHASE2_ERROR: The stop function may return this code for other fatal conditions, e.g. stack corruption.
If the stop function returns any reason code other than _URC_NO_REASON, the stack state is indeterminate from the point of view of the caller of_Unwind_ForcedUnwind. Rather than attempt to return, therefore, the unwind library should return _URC_FATAL_PHASE2_ERROR to its caller.

<b>NOTE</b>: Example: longjmp_unwind()

The expected implementation of longjmp_unwind() is as follows. The setjmp() routine will have saved the state to be restored in its customary place, including the frame pointer. The longjmp_unwind() routine will call _Unwind_ForcedUnwind with a stop function that compares the frame pointer in the context record with the saved frame pointer. If equal, it will restore the setjmp() state as customary, and otherwise it will return _URC_NO_REASON or_URC_END_OF_STACK.

<b>NOTE</b>: If a future requirement for two-phase forced unwinding were identified, an alternate routine could be defined to request it, and an actionsparameter flag defined to support it.

    void _Unwind_Resume (struct _Unwind_Exception *exception_object); 

Resume propagation of an existing exception e.g. after executing cleanup code in a partially unwound stack. A call to this routine is inserted at the end of a landing pad that performed cleanup, but did not resume normal execution. It causes unwinding to proceed further.

<b>NOTE 1</b>: _Unwind_Resume should not be used to implement rethrowing. To the unwinding runtime, the catch code that rethrows was a handler, and the previous unwinding session was terminated before entering it. Rethrowing is implemented by calling _Unwind_RaiseException again with the same exception object.

<b>NOTE 2</b>: This is the only routine in the unwind library which is expected to be called directly by generated code: it will be called at the end of a landing pad in a "landing-pad" model.

1.4 Exception Object Management

    void _Unwind_DeleteException 	      (struct _Unwind_Exception *exception_object); 

Deletes the given exception object. If a given runtime resumes normal execution after catching a foreign exception, it will not know how to delete that exception. Such an exception will be deleted by calling _Unwind_DeleteException. This is a convenience function that calls the function pointed to by the exception_cleanupfield of the exception header.

1.5 Context Management

These functions are used for communicating information about the unwind context (i.e. the unwind descriptors and the user register state) between the unwind library and the personality routine and landing pad. They include routines to read or set the context record images of registers in the stack frame corresponding to a given unwind context, and to identify the location of the current unwind descriptors and unwind frame.

    uint64 _Unwind_GetGR 	    (struct _Unwind_Context *context, int index); 

This function returns the 64-bit value of the given general register. The register is identified by its index: 0 to 31 are for the fixed registers, and 32 to 127 are for the stacked registers.

During the two phases of unwinding, only GR1 has a guaranteed value, which is the Global Pointer (GP) of the frame referenced by the unwind context. If the register has its NAT bit set, the behaviour is unspecified.

    void _Unwind_SetGR 	  (struct _Unwind_Context *context, 	   int index, 	   uint64 new_value); 

This function sets the 64-bit value of the given register, identified by its index as for _Unwind_GetGR. The NAT bit of the given register is reset.

The behaviour is guaranteed only if the function is called during phase 2 of unwinding, and applied to an unwind context representing a handler frame, for which the personality routine will return _URC_INSTALL_CONTEXT. In that case, only registers GR15, GR16, GR17, GR18 should be used. These scratch registers are reserved for passing arguments between the personality routine and the landing pads.

    uint64 _Unwind_GetIP 	    (struct _Unwind_Context *context); 

This function returns the 64-bit value of the instruction pointer (IP).

During unwinding, the value is guaranteed to be the address of the bundle immediately following the call site in the function identified by the unwind context. This value may be outside of the procedure fragment for a function call that is known to not return (such as _Unwind_Resume).

    void _Unwind_SetIP 	    (struct _Unwind_Context *context, 	     uint64 new_value); 

This function sets the value of the instruction pointer (IP) for the routine identified by the unwind context.

The behaviour is guaranteed only when this function is called for an unwind context representing a handler frame, for which the personality routine will return_URC_INSTALL_CONTEXT. In this case, control will be transferred to the given address, which should be the address of a landing pad.

    uint64 _Unwind_GetLanguageSpecificData 	    (struct _Unwind_Context *context); 

This routine returns the address of the language-specific data area for the current stack frame.

<b>NOTE</b>: This routine is not stricly required: it could be accessed through _Unwind_GetIP using the documented format of the UnwindInfoBlock, but since this work has been done for finding the personality routine in the first place, it makes sense to cache the result in the context. We could also pass it as an argument to the personality routine.

    uint64 _Unwind_GetRegionStart 	    (struct _Unwind_Context *context); 

This routine returns the address of the beginning of the procedure or code fragment described by the current unwind descriptor block.

This information is required to access any data stored relative to the beginning of the procedure fragment. For instance, a call site table might be stored relative to the beginning of the procedure fragment that contains the calls. During unwinding, the function returns the start of the procedure fragment containing the call site in the current stack frame.

1.6 Personality Routine

    _Unwind_Reason_Code (*__personality_routine) 	    (int version, 	     _Unwind_Action actions, 	     uint64 exceptionClass, 	     struct _Unwind_Exception *exceptionObject, 	     struct _Unwind_Context *context); 

The personality routine is the function in the C++ (or other language) runtime library which serves as an interface between the system unwind library and language-specific exception handling semantics. It is specific to the code fragment described by an unwind info block, and it is always referenced via the pointer in the unwind info block, and hence it has no psABI-specified name.

1.6.1 Parameters

The personality routine parameters are as follows:

Version number of the unwinding runtime, used to detect a mis-match between the unwinder conventions and the personality routine, or to provide backward compatibility. For the conventions described in this document, version will be 1.

Indicates what processing the personality routine is expected to perform, as a bit mask. The possible actions are described below.

An 8-byte identifier specifying the type of the thrown exception. By convention, the high 4 bytes indicate the vendor (for instance HP\0\0), and the low 4 bytes indicate the language. For the C++ ABI described in this document, the low four bytes are C++\0.

<b>NOTE</b>: This is not a null-terminated string. Some implementations may use no null bytes.

The pointer to a memory location recording the necessary information for processing the exception according to the semantics of a given language (see theException Header section above).

Unwinder state information for use by the personality routine. This is an opaque handle used by the personality routine in particular to access the frame's registers (see the Unwind Context section above).

return value
The return value from the personality routine indicates how further unwind should happen, as well as possible error conditions. See the following section.

1.6.2 Personality Routine Actions

The actions argument to the personality routine is a bitwise OR of one or more of the following constants:

     typedef int _Unwind_Action;     static const _Unwind_Action _UA_SEARCH_PHASE = 1;     static const _Unwind_Action _UA_CLEANUP_PHASE = 2;     static const _Unwind_Action _UA_HANDLER_FRAME = 4;     static const _Unwind_Action _UA_FORCE_UNWIND = 8; 

Indicates that the personality routine should check if the current frame contains a handler, and if so return _URC_HANDLER_FOUND, or otherwise return_URC_CONTINUE_UNWIND_UA_SEARCH_PHASE cannot be set at the same time as _UA_CLEANUP_PHASE.

Indicates that the personality routine should perform cleanup for the current frame. The personality routine can perform this cleanup itself, by calling nested procedures, and return _URC_CONTINUE_UNWIND. Alternatively, it can setup the registers (including the IP) for transferring control to a "landing pad", and return _URC_INSTALL_CONTEXT.

During phase 2, indicates to the personality routine that the current frame is the one which was flagged as the handler frame during phase 1. The personality routine is not allowed to change its mind between phase 1 and phase 2, i.e. it must handle the exception in this frame in phase 2.

During phase 2, indicates that no language is allowed to "catch" the exception. This flag is set while unwinding the stack for longjmp or during thread cancellation. User-defined code in a catch clause may still be executed, but the catch clause must resume unwinding with a call to _Unwind_Resume when finished.

1.6.3 Transferring Control to a Landing Pad

If the personality routine determines that it should transfer control to a landing pad (in phase 2), it may set up registers (including IP) with suitable values for entering the landing pad (e.g. with landing pad parameters), by calling the context management routines above. It then returns _URC_INSTALL_CONTEXT.

Prior to executing code in the landing pad, the unwind library restores registers not altered by the personality routine, using the context record, to their state in that frame before the call that threw the exception, as follows. All registers specified as callee-saved by the base ABI are restored, as well as scratch registers GR15, GR16, GR17 and GR18 (see below). Except for those exceptions, scratch (or caller-saved) registers are not preserved, and their contents are undefined on transfer. The accessibility of registers in the frame will be restored to that at the point of call, i.e. the same logical registers will be accessible, but their mappings to physical registers may change. Further, the state of stacked registers beyond the current frame is unspecified, i.e. they may be either in physical registers or on the register stack.

The landing pad can either resume normal execution (as, for instance, at the end of a C++ catch), or resume unwinding by calling _Unwind_Resume and passing it the exceptionObject argument received by the personality routine. _Unwind_Resume will never return.

_Unwind_Resume should be called if and only if the personality routine did not return _Unwind_HANDLER_FOUND during phase 1. As a result, the unwinder can allocate resources (for instance memory) and keep track of them in the exception object reserved words. It should then free these resources before transferring control to the last (handler) landing pad. It does not need to free the resources before entering non-handler landing-pads, since _Unwind_Resume will ultimately be called.

The landing pad may receive arguments from the runtime, typically passed in registers set using _Unwind_SetGR by the personality routine. For a landing pad that can call to _Unwind_Resume, one argument must be the exceptionObject pointer, which must be preserved to be passed to _Unwind_Resume.

The landing pad may receive other arguments, for instance a switch value indicating the type of the exception. Four scratch registers are reserved for this use (GR15, GR16, GR17 and GR18.)

1.6.4 Rules for Correct Inter-Language Operation

The following rules must be observed for correct operation between languages and/or runtimes from different vendors:

An exception which has an unknown class must not be altered by the personality routine. The semantics of foreign exception processing depend on the language of the stack frame being unwound. This covers in particular how exceptions from a foreign language are mapped to the native language in that frame.

If a runtime resumes normal execution, and the caught exception was created by another runtime, it should call _Unwind_DeleteException. This is true even if it understands the exception object format (such as would be the case between different C++ runtimes).

A runtime is not allowed to catch an exception if the _UA_FORCE_UNWIND flag was passed to the personality routine.

 Example: Foreign Exceptions in C++. In C++, foreign exceptions can be caught by a catch(...) statement. They can also be caught as if they were of a __foreign_exception class, defined in <exception>. The __foreign_exception may have subclasses, such as __java_exception and__ada_exception, if the runtime is capable of identifying some of the foreign languages.

The behavior is undefined in the following cases:

  • __foreign_exception catch argument is accessed in any way (including taking its address).

  • __foreign_exception is active at the same time as another exception (either there is a nested exception while catching the foreign exception, or the foreign exception was itself nested).

  • uncaught_exception(), set_terminate(), set_unexpected(), terminate(), or unexpected() is called at a time a foreign exception exists (for example, calling set_terminate() during unwinding of a foreign exception).

All these cases might involve accessing C++ specific content of the thrown exception, for instance to chain active exceptions.

Otherwise, a catch block catching a foreign exception is allowed:

  • to resume normal execution, thereby stopping propagation of the foreign exception and deleting it, or

  • to rethrow the foreign exception. In that case, the original exception object must be unaltered by the C++ runtime.

A catch-all block may be executed during forced unwinding. For instance, a longjmp may execute code in a catch(...) during stack unwinding. However, if this happens, unwinding will proceed at the end of the catch-all block, whether or not there is an explicit rethrow.

Setting the low 4 bytes of exception class to C++\0 is reserved for use by C++ runtimes compatible with the common C++ ABI.

Level II: C++ ABI

2.1 Introduction

The second level of specification is the minimum required to allow interoperability in the sense described above. This level requires agreement on:

  • Standard runtime initialization, e.g. pre-allocation of space for out-of-memory exceptions.

  • The layout of the exception object created by a throw and processed by a catch clause.

  • When and how the exception object is allocated and destroyed.

  • The API of the personality routine, i.e. the parameters passed to it, the logical actions it performs, and any results it returns (either function results to indicate success, failure, or continue, or changes in global or exception object state), for both the phase 1 handler search and the phase 2 cleanup/unwind.

  • How control is ultimately transferred back to the user program at a catch clause or other resumption point. That is, will the last personality routine transfer control directly to the user code resumption point, or will it return information to the runtime allowing the latter to do so?

  • Multithreading behavior.

2.2 Data Structures

2.2.1 C++ Exception Objects

A complete C++ exception object consists of a header, which is a wrapper around an unwind object header with additional C++ specific information, followed by the thrown C++ exception object itself. The structure of the header is as follows:

       struct __cxa_exception {  	std::type_info *	exceptionType; 	void (*exceptionDestructor) (void *);  	unexpected_handler	unexpectedHandler; 	terminate_handler	terminateHandler; 	__cxa_exception *	nextException;  	int			handlerCount; 	int			handlerSwitchValue; 	const char *		actionRecord; 	const char *		languageSpecificData; 	void *			catchTemp; 	void *			adjustedPtr;  	_Unwind_Exception	unwindHeader;       }; 

The fields in the exception object are as follows:

  • The exceptionType field encodes the type of the thrown exception. The exceptionDestructor field contains a function pointer to a destructor for the type being thrown, and may be NULL. These pointers must be stored in the exception object since non-polymorphic and built-in types can be thrown.

  • The fields unexpectedHandler and terminateHandler contain pointers to the unexpected and terminate handlers at the point where the exception is thrown. The ISO C++ Final Draft International Standard [lib.unexpected] ( states that the handlers to be used are those active immediately after evaluating the throw argument. If destructors change the active handlers during unwinding, the new values are not used until unwinding is complete.

  • The nextException field is used to create a linked list of exceptions (per thread).

  • The handlerCount field contains a count of how many handlers have caught this exception object. It is also used to determine exception life-time (see Section ??? [was 11.12]).

  • The handlerSwitchValueactionRecordlanguageSpecificDatacatchTemp, and adjustedPtr fields cache information that is best computed during pass 1, but useful during pass 2. By storing this information in the exception object, the cleanup phase can avoid re-examining action records. These fields are reserved for use of the personality routine for the stack frame containing the handler to be invoked.

  • The unwindHeader structure is used to allow correct operation of exception in the presence of multiple languages or multiple runtimes for the same language. The _Unwind_Exception type is described in Section 1.2.

By convention, a __cxa_exception pointer points at the C++ object representing the exception being thrown, immediately following the header. The header structure is accessed at a negative offset from the __cxa_exception pointer. This layout allows consistent treatment of exception objects from different languages (or different implementations of the same language), and allows future extensions of the header structure while maintaining binary compatibility.

Version information is not required, since the general unwind library framework specifies an exception class identifier, which will change should the layout of the exception object change significantly.

2.2.2 Caught Exception Stack

Each thread in a C++ program has access to an object of the following class:

      struct __cxa_eh_globals { 	__cxa_exception *	caughtExceptions; 	unsigned int		uncaughtExceptions;       }; 

The fields of this structure are defined as follows:

  • The caughtExceptions field is a list of the active exceptions, organized as a stack with the most recent first, linked through the nextException field of the exception header.

  • The uncaughtExceptions field is a count of uncaught exceptions, for use by the C++ library uncaught_exceptions() routine.

This information is maintained on a per-thread basis. Thus, caughtExceptions is a list of exceptions thrown and caught by the current thread, anduncaughtExceptions is a count of exceptions thrown and not yet caught by the current thread. (This includes rethrown exceptions, which may still have active handlers, but are not considered caught.)

The __cxa_eh_globals for the current thread can be obtained by using either of the APIs:

  • __cxa_eh_globals *__cxa_get_globals(void) : 
    Return a pointer to the __cxa_eh_globals structure for the current thread, initializing it if necessary.

  • __cxa_eh_globals *__cxa_get_globals_fast(void) : 
    Return a pointer to the __cxa_eh_globals structure for the current thread, assuming that at least one prior call to __cxa_get_globals has been made from the current thread.

2.3 Standard Runtime Initialization

2.4 Throwing an Exception

This section specifies the process by which the C++ generated code and runtime library throw an exception, transferring control to the unwind library for handling.

2.4.1 Overview of Throw Processing

In broad outline, a possible implementation of the processing necessary to throw an exception includes the following steps:

  • Call __cxa_allocate_exception to create an exception object (see Section 2.4.2).

  • Evaluate the thrown expression, and copy it into the buffer returned by __cxa_allocate_exception, possibly using a copy constructor. If evaluation of the thrown expression exits by throwing an exception, that exception will propagate instead of the expression itself. Cleanup code must ensure that __cxa_free_exception is called on the just allocated exception object. (If the copy constructor itself exits by throwing an exception, terminate() is called.)

  • Call __cxa_throw to pass the exception to the runtime library (see Section 2.4.3). __cxa_throw never returns.

Based on this outline, throwing an object X as in:

	throw X; 
will produce code approximating the template:
	// Allocate -- never throws: 	temp1 = __cxa_allocate_exception(sizeof(X));  	// Construct the exception object: 	#if COPY_ELISION 	  [evaluate X into temp1] 	#else 	  [evaluate X into temp2] 	  copy-constructor(temp1, temp2) 	  // Landing Pad L1 if this throws 	

Friday, November 14, 2008

kernel c++

C++ for Kernel Mode Drivers: Pros and Cons

Updated: February 1, 2007
On This Page
Introduction Introduction
Considerations for Kernel-Mode Code Considerations for Kernel-Mode Code
Using the C++ Compiler for Kernel-Mode Code Using the C++ Compiler for Kernel-Mode Code
C++ Issues for Kernel-Mode Drivers C++ Issues for Kernel-Mode Drivers
Summary Summary
References References

C++ with its object features appears to be a natural match for the semantics of Microsoft Windows Driver Model (WDM) and Windows Driver Foundation (WDF) drivers. However, some C++ language features can cause problems for kernel-mode drivers that can be difficult to find and solve. To help you make an informed choice, this paper shares current insights and recommendations from Microsoft's ongoing investigation of using C++ to write kernel-mode drivers for the Windows family of operating systems.

This information applies for the following operating systems:
Microsoft Windows 2000
Microsoft Windows XP
Microsoft Windows Server 2003
Microsoft Windows Vista
Microsoft Windows Server 2008


With its object features, C++ appears to be a natural match for the semantics of Microsoft Windows Driver Model (WDM) and Windows Driver Foundation (WDF) drivers, and it is appealing for the added convenience and expressive power it provides to developers. However, technical issues associated with writing kernel-mode code in C++ using the currently available Microsoft compilers can cause problems in driver code.

Many developers use the C++ compiler as a "super-C" without using the full C++ language, because the C++ compiler enforces certain rules more strictly than standard C compilers and provides some additional features that happen to be safe for use in the context of drivers. Using the C++ compiler in this way is typically expected to work for kernel-mode code. It is "advanced" C++ features such as non-POD ("plain ol' data", as defined by the C++ standard) classes and inheritance, templates, and exceptions that present problems for kernel-mode code. These problems are due more to the C++ implementation and the kernel environment than to the inherent properties of the C++ language.

Microsoft is investigating issues related to using C++ to write kernel-mode drivers for the Microsoft Windows family of operating systems. This paper shares current insights from Microsoft developers about the tradeoffs of writing drivers in C++.

The information in this paper applies to the standard Windows Driver Development Kit (DDK) build environment for creating kernel-mode drivers as of the Windows Server 2003 Service Pack 1 (SP1) DDK. If you are using build environments or compilers other than those provided with the DDK or Windows Driver Kit (WDK), you should determine whether any of the issues noted here apply to your development environment and whether there are additional concerns. The information to determine this might be available as documentation from the compiler provider, but it is more likely that you will have to inspect generated code and link maps, as discussed below.

This paper does not attempt to explain how to write kernel-mode drivers in C++. This paper assumes that you understand the basic principles of writing kernel-mode drivers. For general information about writing kernel-mode drivers, see the Kernel-Mode Architecture Guide and device-specific information in the Windows DDK documentation.

Considerations for Kernel-Mode Code

Kernel-mode code must take into account the following considerations, to avoid data corruption, unstable systems, and operating system crashes.

The kernel manages its own memory pages:
You must manage the two conflicting requirements of correct operation and minimizing memory footprint.

Code and data must be in memory if code is to be executed when paging is not allowed. That is, when the system is running at IRQL DISPATCH_LEVEL or higher, the pages that contain the currently executing routine, any routines that it calls, or data it accesses (and so on down the chain of function calls) must be locked into memory until the IRQL drops below DISPATCH_LEVEL. Otherwise, a page fault occurs and the system crashes.

To increase the amount of memory available for user applications, drivers should make their code and data segments pageable where reasonable. This can improve system performance.

Not all processor resources are available all the time.

On x86 systems, the floating point and multimedia units are not available in kernel mode unless specifically requested. Trying to use them improperly may or may not cause a floating-point fault at raised IRQL (which will crash the system), but it could cause silent data corruption in random processes. Improper use can also cause data corruption in other processes; such problems are often difficult to debug.

On Intel Itanium systems, not all of the floating-point registers are available.

Resources, particularly the stack, are severely constrained. Resources that are "inexpensive" in user space may be expensive or require different methods to obtain in kernel mode. Specifically, the size of the kernel stack is 3 pages.

Not all of the standard libraries (C or C++) are present in kernel mode.

Versions of standard libraries provided with the build environment for use in kernel mode are not necessarily the same as those provided in user mode, because they cannot rely on the Win32 API and because they must be written to conform to kernel mode requirements. Kernel-mode implementations of standard libraries may have limited functionality or be constrained by other properties of kernel mode.

User-mode implementations of library routines might not work in kernel mode. Some do not link, some do not run, and some might appear to run, but with unintended side-effects.

Using the C++ Compiler for Kernel-Mode Code

It is important to remember that the compiler generates correct object code, but it may not be the code you expect, organized in the way that you expect. This is always true, but it is more likely to be a problem for C++ than for C. You must examine the object code to be sure it matches your expectations, or at least will work correctly in the kernel environment.

Output from the currently available C++ compiler is not guaranteed to work in kernel mode across all platforms and versions. The more your code uses the "advanced" C++ features, the more you risk problems with interoperability.

Key Areas for Kernel-Mode Code
The following areas require particular care in kernel-mode drivers. These apply to both languages (C and C++), but may be more problematic in C++ code because the compiler does more things automatically, and you may not realize that it has created a problem.

Floating-point instructions must be properly protected—for example, with KeSaveFloatingPointState and KeRestoreFloatingPointState or other mechanisms described in Windows DDK documentation.

The InterlockedXxx functions should insert memory barrier instructions in the generated code. Check the output to ensure that the barriers you require are present.

The semantics of the volatile keyword must be carefully understood so that the intended level of indirection is the volatile object. Sometimes the volatile item is the pointer, sometimes it is the object itself, and sometimes both pointer and object are volatile. Applying the volatile keyword to the wrong thing is a common error, so carefully check your use of this keyword. For example, if you intend to use a non-volatile pointer to a volatile location, ensure (by careful code reading) that your code does not implement a volatile pointer to a non-volatile location.

Stack frames are severely limited. For example, on x86 systems the total stack available to a thread is 12K.

Non-obvious jumps or memory usage in function source code creates the risk of an unexpected page fault. Specifically, the compiler can generate functions and data objects whose existence is not immediately obvious. For details about objects that might not be expected, see "Code in Memory" later in this paper.

The use of inline functions (and __forceinline), to ensure that the code is resident in memory interacts with the compiler's optimization rules.

A function you expect to be inlined might not be inlined. Consequently, using the function might cause a page fault.

The compiler might generate inline code for a function when you do not expect it to.

Safe and Unsafe C++ Constructs
Although it is not currently possible to provide a strict and testable definition of the "completely safe" subset of C++ for use in kernel-mode code, some useful guidelines are available for constructs that are usually safe and those that are usually not.

A good rule of thumb is that a C++ construct is probably safe if there is an obvious way to rearrange the code to make it legal C. An example is the relaxed ordering of declarations, including declaring variables in for statements.

The stricter type checking in C++ may disallow a technically legal, but semantically incorrect, construct. Such stricter type checking is a useful means of improving the reliability of the driver.

Anything involving class hierarchies or templates, exceptions, or any form of dynamic typing is likely to be unsafe. Using these constructs requires extremely careful analysis of the generated object code. Limiting use of classes to POD classes significantly reduces the risks.

Reviewing Generated Code
One of the original design goals for C was that it be fairly easy to determine what the generated object code would be, thus making it quite suitable for kernel-mode work. C++ is a much more complex language, and consequently making it work in the kernel environment has proven to be much more difficult.

To write drivers in C++, you must understand the code generated by the compiler, ensure that it meets kernel-mode requirements, and ensure that it avoids the problems discussed in this paper. Be prepared to read the object code and to scan the link map to be sure that code and data are placed in the proper locations and that only kernel-safe libraries are used. Check code for pageability, inline functions, and correct program ordering.

We strongly recommend that you begin such code reading and testing now, rather than waiting until the source code is complete. Check early prototypes and test potentially troublesome usage so that if you encounter an insurmountable problem with C++, you have time to find and implement an alternative solution.

C++ Issues for Kernel-Mode Drivers

Microsoft developers have discovered a number of areas where C++ presents particular problems for kernel-mode drivers.

Code in Memory
The most severe problem with using C++ for writing kernel-mode drivers is the management of memory pages, particularly code in memory, rather than data. It is important that large drivers be pageable, and paged code is not always in memory. All of the code that will be needed must be resident before the system enters a state in which paging cannot occur.

The way the C++ compiler generates code for non-POD classes and templates makes it particularly difficult to know where all the code required to execute a function might go and thus difficult to make the code safely pageable. The compiler automatically generates code for at least the following objects. These objects are put "out of line," and the developer has no direct control over the section in which they are inserted, which means they could happen to be paged out when needed.

Compiler-generated code such as constructors, destructors, casts, and assignment operators. (These can often be explicitly provided, but it requires taking care to recognize that they need to be provided.)

Adjustor thunks, used to convert between various classes in a hierarchy.

Virtual function thunks, used to implement calls to virtual function.

Virtual function table thunks, used to manage base classes and polymorphism.

Template code bodies, which are emitted at first use unless explicitly instantiated.

The virtual function tables themselves.

The C++ compiler does not provide mechanisms for direct control of where these entities are placed in memory. The pragmas necessary to control memory placement were not designed with C++ in mind. #pragma alloc_text cannot be used to control the location of a member function because (for several reasons) there is no way to name the member function. The scope of #pragma code_seg is ambiguous for compiler-generated functions, expanded template bodies, and compiler-generated thunks. There is no mechanism at all for controlling the location of virtual function tables, since they are not quite either code or data from the point of view of the compiler (they go into a section all their own).

If a function in a header is declared inline, but the compiler does not generate inline code for it, the function may be emitted in more than one code segment depending on where the function is used. When a class template is instantiated, it is generated in the section that is current at the point of first use, and it is not always immediately obvious which section that is. Both of these issues can lead to code being pageable when it should not be, or vice versa.

If a class hierarchy is in use, whether code for a base class needs to be in memory when the derived class is accessed depends on exactly which functions in the base class are called from the derived class (and whether the compiler can inline them), as well as what sections they were emitted in. For example, if the derived class provides a method that uses no base class methods, the base class code need not be in memory. However, it is difficult to know when that is the case. Additionally, any thunks used with the hierarchy and its classes might also need to be resident in memory.

The compiler has always been free to generate additional data on the stack, such as creating temporary objects, deferring call cleanup, and other actions that use the stack in a hidden fashion. There are few differences between C and C++ with respect to the way a single function uses the stack, but because of the additional mechanisms that usually result in more function calls, C++ will often use more total stack. You should keep stack size in mind, as you would in any programming language when stack space is limited.

Exceptions also have an effect on the stack. See "Exceptions and RTTI" later in this paper.

Dynamic Memory
Driver development tools such as Driver Verifier rely on tagged memory to validate memory usage in drivers. Using operator new and operator delete to allocate and free memory weakens the ability of these tools to detect memory leaks and other problems in driver code.

In user space, operator new and operator delete are convenient, but they can become cumbersome in drivers that use multiple memory pools or tagged memory. Because "placement new" takes additional operands, it is possible to pass in the information needed to select memory pools or generate tags into an overloaded operator new , but this is not much easier than using the memory functions directly. Because there is no "placement delete" with additional arguments to pass in a tag or a pooltype, there is no way to pass in a tag (or memory control, if needed) when using operatordelete, making it impossible to check that the tag at the point of release was the intended one, thus defeating much of the benefit of using tagged memory. It is possible to delete memory without providing a tag, but in each case you will need to decide whether the risks and disadvantages of not using tags in driver code overcome the apparent convenience.

Memory tracing tools often record the return address of the function that made an allocation. Some C++ compilers implement operator new as a function, causing all allocations to appear to come from a single location and defeating the purpose of that aspect of the memory tracing tool. This can be addressed, but you will have to determine for yourself if there is a benefit in doing so over using memory allocation directly.

There are a number of distinct concerns in creating and using libraries:

The name of exported C++ functions can vary from one release to another.

Not all of the functions available in user mode are available in the kernel-mode libraries.

The Standard Template Library is designed to work with data objects from a single DLL.

C++ functions are exported based on their entire signature, not on their name alone (as C functions are). The name of a C++ function is "mangled" to contain type information, which becomes part of its signature. Although the rules for name mangling are fairly stable, there is no guarantee that the mangled names will be the same from release to release of the compiler. Therefore, C++ functions cannot be reliably exported to a library from one release to the next, although functions that can be represented as extern "C" functions can. In addition, the use of a .def file can help mitigate the problem. Note that extern "C" functions are unique only on the basis of name, not the entire signature as in C++.

Not all library functions are available in kernel mode, particularly those associated with the "advanced" C++ language features. The Standard Template Library is the "usual" way to implement many C++ concepts such as variably sized arrays. However, it is unsafe to simply assume that the Standard Template Library is present and usable. Although much of the Standard Template Library is implemented as source code in headers, it occasionally uses library functions or other features that are not available or usable in the kernel environment.

The Standard Template Library is also based on the assumption that each data object it uses exists in only a single DLL. Although in most cases it works to pass references to POD objects across DLL boundaries, passing references to more complex structures such as lists may cause runtime failures that can be hard to diagnose. Known issues include the fact that freeing memory in a DLL other than the one in which the memory was allocated can cause failures (at least for debug-mode compiles) and that the "end of list" marker differs between DLLs, which can cause unexpected runaway list searches. You must be aware of these problems and take steps to prevent them.

We do not recommend using Standard Template Library functions in a kernel-mode driver, because it is not possible to assume that the Standard Template Library is there and "just works." In the case of kernel-mode code, understanding precisely how a particular data structure is implemented helps assure that it does not violate the requirements of kernel space. It is also possible that a specialized implementation will be smaller than the more general Standard Template Library functions, although the library is often very good in that regard.

Exceptions and RTTI
It is tempting to use C++ exceptions, but they are difficult to implement in kernel mode. C++ exceptions require a kernel-mode-safe library, which does not currently exist. They also present an unavoidable runtime problem, because exception records that are generated when an exception is thrown are large objects on the very limited stack. On x86 systems, exception records are not particularly large (although they are large compared with many typical stack frames), but on Intel Itanium systems they are quite large: 3K to 4K, or one-sixth to one-eighth of the available 24K stack space. To preserve portability of a driver to 64-bit platforms, exceptions would have to be used in a very limited way, even on the x86 architecture. The rethrow operator can cause multiple exception records on the stack. Note that Structured Exception Handling (__try /__except/__finally) is available in kernel mode, although the space concerns remain. C++ exceptions have a number of semantic subtleties that prevent them from simply mapping onto Structured Exception Handling.

Run-time type information (RTTI) also requires a library that does not currently exist for C++ in kernel mode. So far, there have been few, if any, requests for this in kernel-mode code. Whether this lack of demand is a consequence of the other problems masking it or because it is not useful in kernel mode is unknown.

Compiler Versions
Although the C++ language standard is stable, implementation techniques are still evolving. Consequently, compiler versions may change the way generated code operates. Such changes are unlikely to affect user-mode code, but they can affect kernel-mode code in which more of the underlying implementation is exposed to (and sometimes provided by) driver developers; version-to-version interoperability of kernel-mode code is not guaranteed.

You should carefully control any interface between two drivers or a driver and the operating system, usually by writing the interface in C instead of C++. Otherwise, version-to-version incompatibilities in the C++ implementation may cause interoperability failures.

Static and Global Scope Variables and Initialization
C++ static variables (declared at either global or local scope) present a number of problems for drivers.

The C++ standard allows static variables declared at local scope to be initialized at the time of first use (the first time the scope is entered). The way this is implemented both creates the possibility of race conditions during initialization and a particularly high risk of unintended data sharing between threads, because variables declared static are globally static, not per-thread. For globally static data (shared among threads) it is best to do it explicitly at global scope, to make sure access protections appropriate to the situation are applied.

If a C++ global object requires initialization (a global constructor) is declared, there is no mechanism for the constructor to be called. Global objects that require constructors should either not be used, or you must develop a mechanism to assure that the constructor is called. Several sources on the Web claim to have solved this problem, and one of those solutions might work for you.

The order of initialization of global objects is not specified by the C++ standard, so even if there were a mechanism to call their constructors, either the order of initialization must be explicitly controlled by the driver code, or it must not matter.


Microsoft neither endorses nor prohibits the use of C++ for kernel-mode drivers. This conservative position is driven in part by the issues described in this paper, and in part by the need to support all platforms. You must be aware of known problems and risks described in this paper before attempting any development in C++ for kernel mode, and you should be alert for other issues not yet identified.

Microsoft is actively investigating ways of making C++ more usable in the kernel. It is not yet known whether all of the C++ features that can be applied to user-mode code can be made available for kernel-mode code.

The use of the C++ compiler as a "super-C" is typically expected to work, but such use of the compiler is at the developer's own risk.

It is currently impractical to identify problematic C++ constructs mechanically, so developers must carefully analyze the compiler output to ensure that the generated code is suitable for kernel mode.

Before committing to the use of C++, you should carefully assess whether it will work for you. In particular, you should test C++ constructs early in the development process, to ensure that the constructs do not cause the problems described in this paper or otherwise violate the principles of kernel-mode driver writing.

Some of the problems discussed in this paper might not become apparent until near the end of development, and that solving them might require code to be completely rewritten.

Several of the most insidious problems are extremely difficult to reproduce on demand while testing the driver, so a driver with an inherent unreliability might appear to run for extended periods with no problems and fail at random times. This reinforces the need for careful analysis.

It is possible to avoid many problems by careful coding and close examination of generated code. Other problems are very difficult to overcome. All of them require extra care and careful analysis on the part of the developer.



Includes the Windows Driver Kit [WDK] and Windows Logo Program requirements


Scheduling, Thread Context, and IRQL
Locks, Deadlocks, and Synchronization
Six Tips for Efficient Memory Use

Windows Driver Kit (WDK) Documentation

Kernel-Mode Driver Architecture Guide
In particular, see Memory Management and Driver Programming Techniques

Driver Development Tools
In particular, see Driver Verifier, GFlags, and PoolMon

Was This Information Useful?

© 2008 Microsoft Corporation. All rights reserved. Contact Us |Terms of Use |Trademarks |Privacy Statement