Tuesday, March 31, 2009
Monday, March 30, 2009
banned by google

for example i can visit http://blog.harrylau.com/balabala.html but can not access http://blog.harrylau.com neither http://mail.harrylau.com . i am lucky that i dont have any important emails need to check recently. otherwise i may be in trouble.
someone told me people in china mainland can visit blogger already. but it seems not true. is it ?
Sunday, March 29, 2009
what's the hell
We're sorry...
... but your query looks similar to automated requests from a computer
virus or spyware application. To protect our users, we can't process
your request right now.
We'll restore your access as quickly as possible, so try again soon.
In the meantime, if you suspect that your computer or network has been
infected, you might want to run a virus checker or spyware remover to
make sure that your systems are free of viruses and other spurious
software.
If you're continually receiving this error, you may be able to resolve
the problem by deleting your Google cookie and revisiting Google. For
browser-specific instructions, please consult your browser's online
support center.
If your entire network is affected, more information is available in
the Google Web Search Help Center.
We apologize for the inconvenience, and hope we'll see you again on Google.
Saturday, March 28, 2009
[test] can i post a file here ?
atl2-2.0.5-2-i686.pkg.tar.gz.gz contains an executable file. For
security reasons, Gmail does not allow you to send this type of file
>>??
Sunday, March 22, 2009
My archlinux on eeepc (1)
linux distribution but it is really slim by default.
I have installed this distribution several months before. I just used
the command line during last days but finally if found I need a real
desktop instead of the text based command. Because my virus protection
software is invalid on windows. I decided to install a desktop on
archlinux though I really need several commercial software for windows
during my daily life.
1. desktop environment or window manager ?
I need a window manager indeed. A window manager can to anything i
really need on linux, but it seems that I am not very interested in
doing whole the configuration things for a desktop. So if a
lightweight desktop evironment is not a bad idea for me.
GNOME or KDE or Xfce or LXDE ?
i dont like gnome though i have to use it during working days. i like
KDE but it seems very buggy as least for the earlier version serveral
years ago. It seems Xfce and LXDE are both suite me, but xfce uses too
many gnome packages. what's important thing is i dont know if LXDE is
useable ... haha...
Final decision: try LXDE first using the default window manager for
LXDE - openbox
2. base system
pacman -Syu
pacman -Sy gcc g++ make ld
pacman -Sy udev hal syslogd
pacman -Sy vi
pacman -Sy netutils
pacman -Sy dnsutils
3. drivers for eeepc
I installed the wireless driver manually several months ago. I am not
sure if it is supported in the lastest kernel.
4. desktop
pacman -Sy xorg
need to edit the configure file for xorg (xorg.conf) to enable the
touch pad and ajust screen setting. (also multi-touch ??)
install LXDE: just install the default packages. add the start command
to xinit file following the instruction on wiki.archlinux.org.
panel, file manager, launcher, menu etc. are all included in lxde.
(randr seems not working properly, right?)
bash-3.2# pacman -Sg lxde
lxde gpicview
lxde lxappearance
lxde lxde-common
lxde lxlauncher
lxde lxmenu-data
lxde lxpanel
lxde lxrandr
lxde lxsession-lite
lxde lxtask
lxde lxterminal
lxde menu-cache
lxde pcmanfm
5. applicaitons:
browser: firefox
audio: google music manager (gmm) -- this is my first try, it looks nice
video: mplayer+codecs
IM: pidgin skype
input: scim-pinyin
office: epdfview
tools: wine
fonts: wenquan
programming: cscope
6. what's next?
a. my biggest problem is how to use windows software on it. need to
try wine but what can i do if failed ?
b. fonts may need further configuration because the english fonts is
so ugly in my current setting
c. play with alsa... volume is so small sometimes.
d. play with wireless setting -- randr is not working properly. i may
write a script to manually configure it.
e. tools for hardware: blue tooth ? camera ? battery monitor?
especially, acpi configure ?
f. at least i need to find out how to power off camera and bluetooth
and how to change to save power mode when i am using battery.
maybe i don't have time to solve all of these one by one. it works
well now at least. :)
after all, it is amazing that i did not encounter any issues during
the desktop setting. compared with about 8 years ago when i was
playing with redhat this is paradise.
Saturday, March 21, 2009
fluxbox or openbox
source code in c.
openbox: xml configure file which means easy to config
fluxbox supports tag, openbox doesn't..
license: fluxbox is MIT, openbox is GPL ==> is it important for normal
user ??hehe
fluxbox supports panel and openbox itself does not
which one will you choose ? :)
openbox, just because it is the default window manager in LXDE.
here is some comments from freebsd forum:
Quote:
Originally Posted by alie View Post
so i decided to choose between fluxbox or openbox.
anyway i have a lot of questions to this community:
1) is openbox and fluxbox still in active development ?
2) is JWM theme based Window manager ?
3) is there any window manager that can support plugin or applet ?
4) is there any good file browser that doesnt have a lot of
dependencies(note: i like ROX filer) ?
5) is there any text editor like GEdit that doesnt have a lot of dependencies ?
I think you made a good choice. Both Fluxbox and Openbox are in very
active development and have a very large user base. They might be in
too active development because they are slowly getting bloated. In my
bias opinion Openbox is better What kind of answer could you expect
from somebody who used Openbox for a very long time. Joking aside the
big thing for Openbox is the fact that is coded in pure C. If you like
to have panel, task bar and those things Fluxbox is probably better
choice.
All those things can be added to Openbox but then you end up
installing another thing or two. For instance Openbox+Xfce panel is a
killer combo for people who like more or less full desktop
environment.
Let me answer the rest of your questions
@2 JWM is not based on anything. It is written from the scratch.
It is really very, very good for people who like to have built in
task bar, pager, and the launching bar. It is incredible that all that
is just little bit bigger than dwm which has 2000 lines of C code.
@3 What do you mean by that?
@4 Rox is not a browser. It is a file manager. There are three
principle types of file managers.
i) Orthodox file managers (Norton commander type) Examples are
Deco, OFM, Midnight Commander,vifm (VI file manager) (console/xterm
based) and Norther Commander, Worker, emelfm2, krusader and similar if
you want to GUI.
ii) Navigational file managers (or explorer type) typical example is
konqueror web-browser, xfm (X File Explorer).
iii) Spatial file managers. Examples include Xfm, Rox, PCmanFM,
Desktop File Manager DFM, Nautilus file manager, Thunar, Xfiler the
part of Siag suite and similar.
Then you have file managers that really could not be categorized
easily like XTree file manager, Clex, pfm, (personal file manager),
vfu, TkDesk.
I probably left out some interesting examples but you got the idea.
The least bloated file manager is command line + commands
(ls, du, rm, mv, cp) and few filters. If you have to have a spatial
file manager I really like ROX. For a Ortodox file managers I really
like deco but it is not in active development and it is useless in
console as it can not adjust the screen. OFM is very good code base
for somebody who wants to write a good non bloated console based
ortodox file manager.
Be aware that it is GPL so you will have to write everything from
scratch if you prefer BSD license like me.
@5 Learn ed and vi pronto or you will regret very soon. I do NOT like
VIM for me vi is nvi that comes with the base or Heirloom vi.
Emacs sucks IMHO and I have used it seriously. I even learned the Lisp
because of it.
Saturday, March 07, 2009
Thursday, February 26, 2009
Linux kernel glossary
0-9
2Q algorithm
MM algorithm based on two areas, one managed as a FIFO queue, and
one as an LRU list.
8259 PIC
Outdated interrupt controller present on Intel hardware.
A
ABI
Application Binary Interface, the interface of passed structures
between the user processes (and libraries) and the kernel. For
compatibility, it is important that these remain as static as possible
(i.e. making sure that variables and structure members have the same
bytesize as before, and in the same ordering). Occasionally breakage
is necessary, requiring re-compilation of the user-space sources (note
that this does not affect source-compatibility; that is a separate
issue).
ACPI
Advanced Configuration and Power Interface, replacement for APM
that has the advantage of allowing O/S control of power management
facilities as well exporting the set of hardware currently present on
the system.
AGI
Address Generation Interlocking, on x86. When execution of an
instruction requires an address resulting from a non-completed
instruction, the CPU must wait - this is known as an AGI stall.
AGP
Accelerated Graphics Port, on x86 boxes.
AIO
Asynchronous IO, IO that is performed without the issuing process
blocking on IO completion.
Anticipatory Scheduler
A disk IO scheduler that leaves the disk idle after a read, in
anticipation of the next read.
anonymous
Generally, used for something which doesn't have the usual
associated object. For example an anonymous address space is not
interested in user address space (that is, no process context). Some
common ones are :
Anonymous page
A page of memory that is not associated with a file on a file
system. This can come from expanding the process's data segment with
brk(), shared memory segments, or mmap() with a MAP_ANON or
MAP_PRIVATE flag. MAP_PRIVATE, although it maps in data from a file,
is considered anonymous because any changes do not get written back to
the file (any dirty pages have to be moved to swap if the page is
freed from main memory).
Anonymous buffer
The buffer cache contains buffers of data on their way to/from the
disk. An anonymous buffer is not associated with a file. One example
of this is data from a deleted file - it will not be written to any
file, but is kept around until it is flushed.
ALSA
Advanced Linux Sound Architecture.
APIC
See local APIC and IO-APIC.
APM
Advanced Power Management, power management standard superseded by
ACPI. APM and SMP just don't mix.
ARP
Address Resolution Protocol and this is how a network machine
associates an IP Address with a hardware address.
ASN.1
Abstract Syntax Notation, a protocol for structured data, used,
for example, in the Q.3 management protocol.
ast
Professor Andrew S. Tanenbaum, author of MINIX and several
essential O/S books.
ATAPI
ATA Packet Interface, used by most CD-ROMs, and other devices.
AQuoSA
Adaptive Quality of Service Architecture.
B
balancing
Technique used in the VM code, referring to balancing various
parameters such as the number of pages currently free, to avoid
thrashing and other bad memory capacity artefacts. See zones, kswapd
bug.
BAR
Base Address Registers, for PCI devices.
BCD
Binary-Coded Decimal - see a textbook.
bigmem
See highmem.
big lock
kernel_lock, which locks the entire kernel from entry (no other
task may run in the kernel code). It is recursive per process and
dropped automatically when a process gives up the CPU, then regained
on wake-up, in contrast to other spinlocks.
bit error
Used colloquially to mean a single bit error in some memory
address. Often due to faulty memory (ECC memory can correct single bit
errors). Often results in fake oopsen, with addresses like 0x0008000.
Also seen are values some small offset from zero, plus a bit error,
which is where the value passed a NULL check due to the bit error, and
then the kernel tried to access a structure member by means of the
pointer, leading to the offset.
block bitmap
In UNIX-like filesystems, the usage of disks blocks is recorded in
the block bitmap, where each set bit indicates a specific allocated
block.
bottom-half handler
A set of standard kernel threads that execute tasks on a queue
that have been registered with that type of bottom-half handler for
execution. The code is run on return to user space or at the end of a
hardware interrupt. In 2.3.43 a more general solution with softirqs
and tasklets was implemented. Sometimes abbreviated to "bh", which
should not be confused with buffer head, which is also abbreviated to
"bh".
bounce buffer
An intermediate buffer. Used for example, in "faking" alignment to
a client from non-aligned resources.
brlocks
Big-reader locks, used when there are many contending for read
access to a resource, and very few contending for writes (thus the
balance is towards very fast read locking, and very slow write
locking).
BSP
BootStrap Processor, or the CPU which enables the other CPUs in an
SMP system.
bss
Block Storage Segment. This is the memory mapping section
containing the data allocated for a binary image at execution time.
Also known as "Block Started by Symbol" and "Bull-Shit Storage".
BTB
Branch Target Buffer, on x86 processors, the cache of recent
conditional jump results.
buddy allocator
The memory allocation scheme used in the kernel. A vector of lists
of free pages is kept, ordered by the size of the chunk (in powers of
two). When a chunk is allocated, it is removed from the relevant list.
When a chunk is freed back to the free pages pool, it is placed in the
relevant list, starting from the top. If it is physically contiguous
with a present chunk, they are merged and placed in the list above
(i.e. where the chunks are twice the size), and this operation
percolates up the vector. As regions are merged whenever possible,
this design helps to reduce memory fragmentation. FIXME
buffer cache
The buffer cache is a hash table of buffers, indexed by device and
block number. LRU lists are maintained for the buffers in the various
states, with separate lists for buffers of different sizes. With 2.3's
unification of the buffer and page caches, each buffer head points to
part or all of a page structure, through which the buffer's actual
contents are available. FIXME
buffer head
A structure containing information on I/O for some page in real
memory. A buffer can be locked during I/O, or in several other states
depending on its usage or whether it is free. Each buffer is
associated with one page, but every page may have several buffers
(consider the floppy on x86, where the I/O blocksize is 512 bytes, but
each page is commonly 4096 bytes).
BUG()
Used in kernel code in tests for "impossible" conditions. Signify
a kernel bug or faulty hardware.
bus mastering
Giving a card on a bus (e.g. ISA,PCI) the ability to read/write
directly to main memory. This is how DMA is performed on PCI busses.
byte sex
Endianness.
C
cache affinity
Where the cache of a CPU represents the current memory set used by
a task, there is said to be cache affinity with that task. A good
thing if the task is regularly scheduled on that CPU. See processor
affinity.
cache coherency
On an SMP system, ensuring that the local memory cache of each CPU
is consistent with respect to the values which may be stored in other
CPUs' caches, avoiding coherency problems such as the "lost update".
This is achieved by the hardware in concert with the operating system.
cache line
A section of the hardware cache, around 32 bytes large. Kernel
structures are often designed such that the commonly-accessed members
all fit into one cache-line, which reduces cache pollution. Structures
such as this are cache line aligned.
cache ping-pong
A hardware phenomenon in an SMP system, where two tasks on
different CPUs are both accessing the same physical memory in a cache
line. This means as each task runs, when it changes the memory, it
must invalidate the other CPU's relevant cache line (to ensure cache
coherency). Then, when the task on the other CPU runs, it must reload
the cache line (as it's set invalid), before changing it. Repeat ad
jocularum. A bad thing (TM). A common reason for putting a lock on a
different cache line than the data mutexed by the lock : then the
"other" task can grab and drop the lock without having to necessarily
invalidate the cache line on the first CPU. FIXME
cache pollution
Where during execution of a task, another task is scheduled onto
that CPU which disrupts useful lines of the current cache contents,
which will be used soon. That is, cache pollution is a non-optimal
situation where the other process would have been bettered scheduled
on a different CPU or at a different time. The aim is to minimise the
need to replace cache lines, obviously increasing efficiency.
call gate
x86 hardware support for mode switch to kernel (i.e. system call).
In Linux, int 0x80 will trigger the call gate.
CAP_*
These are defined names of capabilities for specific tasks
provided by the kernel, e.g. CAP_SYS_NICE.
CBQ
Class Based Queueing, a hierarchical packet fair queueing qdisc.
CBQ Homepage
CFS
Completely Fair Scheduler
CFQ
Completely Fair Queueing, an alternative to the Anticipatory IO
scheduler (and the default from 2.6.18 onwards) which allocates IO
priority equally between processes.
chroot jail
A process under the aegis of a chroot() syscall is in a chroot
jail, and cannot access the file system above its notion of root
directory /.
Classifier
(also: filter or tcf) classifies a network packet by inspecting
it, used by QDiscs.
cli/sti
x86 assembler instructions for disabling and enabling interrupts,
respectively. There are CPU-local and global variants of these. Code
running with interrupts disabled must be fast, for obvious reasons
(this is called interrupt latency).
CML2
Eric Raymond's proposal for a replacement to the current kernel
build system. See http://www.tuxedo.org/~esr/kbuild.
cold cache
A cache whose content is invalid or irrelevant with respect to
some task to be run.
completion ports
I/O interface used in O/S's such as Windows NT. Userspace notifies
the kernel of each file descriptor the program is interested. The O/S
uses a callback for each fd to indicate that I/O is ready.
contention
Where two tasks each want an exclusive resource. You may hear talk
of, for example, spinlock contention, which is where one or more tasks
is commonly busy-waiting for a spinlock to become unlocked, as it is
being taken by other tasks.
Context switch
switching the CPU from running one thread to running another thread.
*
Refers to the changes necessary in the CPU when the scheduler
schedules a different process to run on the CPU. This involves
invalidating the TLB, loading the registers with the saved values,
etc. There is an associated cost with such a switch, so it is best to
avoid un-necessary context switch when possible. Note that the
division of kernel-mode and user-mode means a similar, but simpler,
operation is necessary when a syscall moves into kernel mode. However
this is not called a context switch, as the mode switch doesn't change
the current process. See lazy TLB. One good of feature of Linux is its
extremely low context and mode switch cost, compared to an operating
system like Solaris.
Copy-on-Write
(also: COW) reuse and share existing objects and copy them not
until a modification is required.
*
Copy-On-Write, efficiency method where a page or other resource
is shared until an attempt to write is made. In that case a copy is
made, and the write is done to the copy.
CPL
Current Privilege Level
critical path
A vital code path which should be optimised for the common case.
Critical paths are executed frequently and form the important trunk
routes of various kernel operations. An example would be buffer head
manipulation during file I/O.
CSS
Code Storage Segment, aka text section. This is the memory mapping
containing the executable code (text) for a binary image.
Current
a kernel variable which points to the task_struct structure of the
process currently running on this CPU.
D
Device Mapper
A technology for presenting arbitrary groupings of underlying
sectors on physical devices in a consistent logical fashion usable by
higher level algorithms. Heavily used by kernel technologies such as
LVM.
DAG
Directed Acyclic Graph
dancing makefiles
An experimental new Makefile set up for configuring and compiling
the kernel, written by Michael Elizabeth Chastain.
dcache
The cache of dentry structures. Under UNIX an entry in a
particular directory must be searched for linearly, so even if the
disk block containing the directory entry list is in-core, there is an
associated cost. The dcache stores recent results of these searches
which in general speeds up these disk searches by a large factor.
Recent 2.3 work uses the dentries to allow multiple mounting, union
mount, and more.
*
The hardware data cache is usually referred to as the D-cache.
deadlock
Any of a number of situations where two or more processes cannot
proceed because they are both waiting for the other to release some
resource. FIXME(give good references).
delayed write
See write behind.
demand zero
In demand paging, where the page is to be zeroed when actually
created (common case: bss segment of an executable image, which is
uninitialised heap data for the executable). Also called ZFOD.
dentry
Directory entry, in-core structure defining a file's details:
inode, parent dentry etc. Cached in a hash table indexed by hashed
filename (see dcache).
DF
IP packet bit indicating it should not be fragmented. The remote
host will return ICMP notifications if the packet had to be split
anyway, and these are used in MTU discovery.
directory notification
Provides hooks for notifying tasks when the contents of a
directory has changed. Note "contents" can refer to dentries, the file
inodes, or even the file contents themselves (file notification).
DOD
Dial-On-Demand for net connections over POTS.
drop behind
In stream I/O conditions, data that has already been read and
processed is not needed again. The VM ideally should recognise this
and mark the used pages as un-needed, so they can be discarded first.
This technique is called "drop behind".
dss
Data Storage Segment, aka data section. This is the memory mapping
containing the initialised data for a binary image.
dual-issue
Processors such as the Pentium Pro, that can decode and execute
two instructions simultaneously.
dupe
Abbrev. fr. duplication.
Dwarf
Debugging Information Format
dword
Double word, i.e. 4 bytes on x86.
E
EA
See extended attributes.
eager coalescing
What the buddy allocator currently does, i.e. merge adjacent
blocks as soon as possible.
edge-triggered interrupt
The interrupt is triggered by the rising or falling edge of the
interrupt line. This makes IRQ line sharing difficult, as an edge may
occur whilst an ISR is running, and it could be easily missed; to
allow sharing level-triggered interrupts are usually used.
EIP
Extended Instruction Pointer. This register contains the PC value
of a task, that is, it points to the next instruction to be fetched,
decoded etc.
elevator algorithm
This algorithm, often used in disk accesses, keeps an ordered list
of requests. When the current request on the disk (e.g. the disk
block) has been satisfied, the next strictly greater request on the
list is dealt with. When a new request arrives, it is inserted into
the ordered list in position (e.g. if the new requested block number
is less than the current handled request, it goes before it in the
list). When reaching the end of the list, the elevator changes
direction, and the situation is reversed.
ELF
Executable Linkable Format, a popular binary format, the default
for Linux on most architectures.
ematch
Extended Match, small classification helper attached to classifiers.
EPIC
Explicitly-Parallel Instruction set Computing, an instruction set
architecture where every dependency for an instruction is encoded into
the instruction itself. This has the potential to be faster as the
compiler can encode the data dependencies in the instructions.
exponential back-off
A general algorithm for dealing with contention cases; for
example, collisions on a network bus, or contention for a spinlock.
extended attributes
Also known as multi-part or multi-stream files, files with
extended attributes deviate from the principle of files being a simple
single data stream. An example of extended attributes is the
Macintosh's "resource fork", which is associated with a specific file
(known as the "data fork").
F
fair scheduler
A scheduler which ensures fairness between users, such that a
user's process count and associated cost only impacts that user,
rather than the whole system as currently. Rik van Riel and Borislav
Deianov have both produced different patches to implement this.
false sharing
On SMP caches, when two parts of single block are accessed,
neither of which collide with the other, the cache coherency protocol
may not be able to detect this, and mark the block as "shared" even
when it isn't. This is known as false sharing.
fastpath
The code path most commonly taken, often optimised heavily at the
expense of less frequently-taken blocks of code. This is the reason
you see so many gotos in core functions - it produces common-path code
far more efficient than an optimising compiler can manage.
fd
file descriptor
filemap
The mapping of a file's contents into memory.
filesystem--sys
"guages" filesystem-based view of kernel objects
filesystem--config
"knobs" filesystem-based manager of kernel objects, or config_items
filesystem--proc
repository for all things task related,
filesystem--dev
devices (with various exceptions, contradictions, confusions, and
hysterical raisins ...)
fixed mmap
A user-space request for a mmap starting at a fixed virtual
address. Generally not useful or guaranteed to work; a notable
exception is overlayed mmaps, where a mmaped area has further mmaps of
different types at fixed positions in the map.
FQDN
Fully-Qualified Domain Name, e.g. martyr.darrenemerson.co.uk.
G
GART
For AGP setups, Graphics Aperture Relocation Table.
gdoc
GNOME's source documentation system (similar to javadoc).
Available by CVS from gnome. Kernel driver interface descriptions,
built from source using gdoc, are currently being written in 2.3.
gdt
Global Descriptor Table. Something to do with x86 memory
segmentation I think (FIXME). See ldt.
get
In the kernel, often means "get a reference to". This may be as
simple as incrementing a usage count, or it may imply attempting to
retrieve an object from a cache of some sort, or allocating kernel
memory. See put.
GKHI
Generalised Kernel Hook Infrastructure, an IBM patch to implement
hooks into the kernel code.
GPL
I just had to point out that lkml is for Linux kernel development
discussions. Please please don't engage in any threads concerning
licensing issues, Microsoft, or Richard Stallman. Please.
group descriptor
On-disk filesystem structure, containing information for a block
group, such as the inode bitmap and block bitmap.
GRUB
GRand Unified Bootloader, a popular bootloader for Linux, BSD, and
other OSes.
GSI
Global System Interrupt. Mainly used in the context of ACPI. Stupid acronym
H
Highmem
high memory, or memory that is not permanently mapped into kernel
memory. Common on 32 bit x86 systems. See HighMemory.
HPET
High Precision Event Timer (HPET) is a replacement timer for the
8254 Programmable Interval Timer and the Real-time clock's (RTC)
periodic interrupt function. HPET is a successor to pmtimer, and is
far more efficient to read.
*
The HPET can produce periodic interrupts at a much higher
resolution than the RTC and is often used to synchronize multimedia
streams, providing smooth playback and reducing the need to use other
timestamp calculations such as an x86 cpu's RDTSC instruction. HPET
support in linux requires that the BIOS expose the HPET (via acpi).
HTB
Hierarchical Token Bucket, a qdisc based on TBF and CBQ. HTB Theory
I
IPVS
IP Virtual Server, the kernel part of the LVS (Linux Virtual
Server) project. IPVS redirects incoming client requests to one of
several "real" servers, usually for the purpose of load balancing a
service.
ISR
Interrupt Service Routine, the function in each device driver that
gets called when an interrupt happens.
J
Jiffies
An incrementing counter representing system "uptime" in ticks - or
the number of timer interrupts since boot. Ultimately the entire
original concept of a jiffy will likely vanish as systems use timer
events only when necessary and become "jiffyless".
K
kswapd
a kernel thread that frees up memory by evicting data from caches
and paging out userspace memory, part of the virtual memory subsystem.
L
LBA
Logical Block Addressing. A way to address IDE disks without
Cylinder/Head/Sector (CHS) coordinates, using linear sector numbers
from the start of the disk. Allows for the use of very large IDE
disks.
Linux Device Drivers, 3rd Edition
online edition.
LKM
Linux Kernel Module. A (often dynamically loadable at system
runtime) kernel extension ("driver") to support, for example, some
kind of new hardware device or generic software abstraction.
LKML
Linux Kernel Mailing List. The primary virtual watering hole
(meeting ground) for kernel developers to share ideas and bounce
opinions off one another during the course of the kernel development
process. FAQ at http://www.tux.org/lkml/.
LSM
Linux Security Module. a security framework for providing
different security levels.
LVM
Logical Volume Management. A technology for providing an arbitrary
logical view of underlying data storage in a fashion supporting
resizing and restructuring of storage on the fly. Currently in version
2, originally written by Sistina (now Redhat).
LXR
a cross-reference tool that can be used to navigate the Linux
kernel source code, available at lxr.linux.no.
M
mem_map
A contiguous virtual array of struct pages representing the
entirety of physical memory pages available within a system.
MMU
Memory Management Unit, part of the CPU hardware that enforces
memory boundaries, and throw page faults, upon which the OS builds its
coherent protection. The MMU maps virtual memory to actual, where
protections allow.
MUTEX
MUTual EXclusion locks. This locking primitive is simpler and
semantically tighter than the others, and hence is easier to make
faster, and to prove correct. Some constraints are; lock has one owner
at a time, the locker, who must also be the unlocker. Read
Documentation/mutex-design.txt for much more.
MSI
Message Signaled Interrupts. A PCI mode where the interrupt
numbers are extended from 8 bits to 32. These also use the normal pci
data lanes not some magic all over the chipset; which means that a
device can basically have as many interrupts as it wants rather than 4
(1 in practice) for legacy PCI interrupts, and there are also no
interrupt sharing issues, since there are just so many numbers for
interrupts... For more, see
http://en.wikipedia.org/wiki/Message_Signaled_Interrupts
N
NAPI
NAPI ("New API") is a modification to the device driver packet
processing framework, which is designed to improve the performance of
high-speed networking. See
http://www.linux-foundation.org/en/Net:NAPI.
Netfilter
Netfilter is a framework that provides a set of hooks within the
Linux kernel for intercepting and manipulating network packets. See
http://en.wikipedia.org/wiki/Netfilter and http://www.netfilter.org.
netlink
Communication protocol between kernel and userspace
O
*
P
Page cache
a cache of file data and filesystem metadata, dynamically grown an
shrunk depending on other memory use.
Page table
data structure used by the MMU to translate virtual memory
addresses to physical memory addresses.
PDA
Per Processor Data Area is the x86 implementation of per-cpu memory.
PFN
Page Frame Number, index into the mem_map[] array which describes
physical memory pages.
PGD
Page Global Directory, the top level of the page table tree. The
page table hierarchy is pgd -> pud -> pmd -> pte.
PID
Process IDentifier (POSIX thread identifier)
PMD
Page Mid-level Directory, note that pmds are folded into pgds on
systems with 2 level page tables.
Process descriptor
kernel data structure that describes/accounts process data related
to a single process.
PTE
Page Table Entry
PUD
Page Upper Directory, note that puds are folded into pmds, except
on systems with 4-levels page tables.
Q
QDisc
Queueing Discipline, queues packets before they are sent out to
the network device, enforces QoS requirements, provides traffic
shaping and prioritizing capabilities.
QoS
Quality of Service, method to define the importance/priority of
network services
R
RCU
Read Copy Update, a mechanism for SMPSynchronisation
Rlimit
resource limit, eg. "maximum amount of virtual memory" or "maximum
number of processes". Can be per process or per user.
S
Semaphore
a lock mechanism that works per process context, see SMPSynchronisation
Scheduler
the part of the kernel that chooses a suitable process to run on
the cpu, see the schedule() function.
Shared/Paged Socket Buffer
(also: pskb) Socket Buffer with uncontinuous data buffer, used for
zero copy, TSO and Scatter/Gather capable network cards.
Slab cache
a fast, SMP scalable kernel memory allocator.
Socket Buffer
(also: skb or sk_buff) data structure used to hold the data and
attributes of a network packet. See
http://www.linux-foundation.org/en/Net:SK_Buff and
http://vger.kernel.org/~davem/skb.html for details.
SoftIRQ
kind of bottom half rarely used.
Spin lock
a simple SMP lock, see SMPSynchronisation
Swap token
a token to temporarily protect a process from pageout, an
alternative approach to memory scheduling, thrashing control. See the
Token Based Thrashing Control paper by Song Jiang and the Linux-MM
wiki.
System call
(also: syscall) the way a program transitions from userspace into
kernel space, to call a kernel space function.
sysenter/sysexit
A pair of instructions on Pentium2+ that replace older INT
instruction based syscall mechanism. See
[http://manugarg.googlepages.com/systemcallinlinux2_6.html
System.map
symbol table used by ksymoops to resolve numbers to function names
in Oops. Also used by ps and top for WCHAN field.
T
TASK_INTERRUPTIBLE
*
State of a task that is sleeping (not on the run queue). The
task will sleep until some event occurs that changes its state to
TASK_RUNNING. A task in this state can be awakened by signals.
TASK_RUNNING
*
State of a task that is on the run queue (but not necessarily running).
TASK_STOPPED
*
State of a task that has stopped and is not ready to run
(happens when a task receives SIGSTOP, SIGTSTP, SIGTTIN or SIGTTOU or
when any signal is received while the task is being debugged)
TASK_UNINTERRUPTIBLE
*
State of the task that is sleeping (not on the run queue) and
must be explicitly awakened. A task in this state can not be awakened
by signals.
TASK_ZOMBIE
*
State of a task that did called exit() but the parent task
didn't call wait4(). The task's descriptor is kept in memory and only
released when the parent task calls wait4()
TBF
Token Bucket Filter, a qdisc used for rate limiting
TGID
Task Group IDentifier (POSIX process identifier)
U
Use-once
the page replacement algorithm used by the Linux 2.6 kernel, based
on the ideas behind the 2Q page replacement algorithm, also see the
AdvancedPageReplacement page.
V
VDSO
Virtual Dynamically-linked Shared Object, a kernel-provided shared
library that helps userspace perform a few kernel actions without the
overhead of a system call, as well as automatically choosing the most
efficient syscall mechanism. Also called the "vsyscall page".
VFS
Virtual File System, an interface through which multiple
filesystems can be hooked into the kernel.
Virtual memory
every process in the system gets its own memory address space,
independent of the other processes.
Vsyscall page
see VDSO.
W
*
X
Xen
A paravirtualisation engine for Linux, an efficient way to run
multiple Linux OSes on one computer. Also runs BSD, Plan9 and other
OSes. (See website for more information.)
XIP
eXecute In Place, the ability to run an executable directly from
the filesystem (usually ROM or flash), instead of loading it into
memory.
Y
*
Z
Zero-Copy
A special networking code path where data is sent to the network
directly from userspace memory; this avoids unnecessary copying of
data and improves performance.
Monday, February 02, 2009
How to run a shared library on Linux
In my prevoius blog I have written how to run the shared libraries on
Open-Solaris.
http://bhushanverma.blogspot.com/2008/06/how-to-run-shared-library-on-open.html
Shared object should have following entries to run:
1. +x permission that is by default is given by the static
linker(program linker) when creating a shared object.
2. Entry point at which the program/shared library is starts to run.
3. Interpreter(Run time linker) that is used to run any shared library
after loaded by kernel part exec().
Entry point at which the program/shared library is starts to run can be
given by passing -Wl,-e entry_point to the linker at command line:
To create .interp section by using GNU gcc, use the follwing line of
code on linux:
const char my_interp[] __attribute__((section(".interp"))) =
"/lib/ld-linux.so.2";
Where /lib/ld-linux.so.2 is the path of interpreter(Run time linker) in linux.
In open solaris we passed -Wl,-I,/usr/lib/ld.so.1 to the sun linker to
create this section.
I think in gnu linker this option is available but do other things.
Demo on Linux machine:
-------------------------
$ cat func.c
const char my_interp[] __attribute__((section(".interp"))) =
"/lib/ld-linux.so.2";
#include
void bar();
int
func()
{
printf("Hacking\n");
bar();
exit (0);
}
void
bar()
{
printf("Bye...\n");
}
$ gcc -fPIC -o func.so -shared -Wl,-e,func func.c
You can see that foo.so have .interp section and interp program header.
# readelf -l func.so
Elf file type is DYN (Shared object file)
Entry point 0x4dc
There are 7 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x00000034 0x00000034 0x000e0 0x000e0 R E 0x4
INTERP 0x0005a3 0x000005a3 0x000005a3 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x00000000 0x00000000 0x005bc 0x005bc R E 0x1000
LOAD 0x0005bc 0x000015bc 0x000015bc 0x00104 0x0010c RW 0x1000
DYNAMIC 0x0005d4 0x000015d4 0x000015d4 0x000c0 0x000c0 RW 0x4
NOTE 0x000114 0x00000114 0x00000114 0x00024 0x00024 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version
.gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata
.interp .eh_frame
03 .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .bss
04 .dynamic
05 .note.gnu.build-id
06
You can cleary see, func.so have .interp section and INTERP program header.
Now try to run func.so:
$ ./func.so
Hacking
Bye...
Thursday, December 11, 2008
An Introduction to ARM Assembly Language (Jason Fuller)
An Introduction to ARM Assembly Language
Jason Fuller
Who is this document for?
This document is intended for anyone who occasionally needs to debug compiled ARM code at the assembly language level.
Why would I want to do that?
Because retail builds have compiler optimizations turned on, and compiler optimizations confuse the source-level debugger. For example, the values it displays for your local variables are often wrong because the real values are kept in registers, not on the stack where the debugger knows how to find them.
Registers
The ARM CPU has 15 registers:
r0 through r3 are used as general purpose registers, but they are also used to pass the first four parameters into a function. Their values are not guaranteed to be preserved across function calls.
r0 is also used to return the return value from a function.
r4 through r11 and r13 are general purpose registers. It is the responsibility of a called function to preserve these values, i.e., to ensure that the values of these registers are the same when the function exits as when it was entered.
sp is the Stack Pointer.
lr is the Link Register, which stores the return address when a function is called. But note that lr can also be used as a general-purpose register when it is not being used as the link register, so be careful.
pc is the Program Counter, i.e., the instruction pointer.
Condition Flags
There are four condition flags that are set as the result of executing instructions:
N Negative Set if result is negative
Z Zero Set if result is zero
C Carry Set if a carry occurs, or a bit is shifted off the end by a shift instruction
V Overflow Set if overflow occurs, i.e., a signed result is bigger than 31 bits
Instructions
A full list of ARM instructions can be found at http://www.arm.com/pdfs/QRC0001H_rvct_v2.1_arm.pdf, but it's not particularly readable or educational, so I'll go over the most common instructions here.
Throughout this document, I'll use C code in the right column to explain what the assembly instruction in the left column does. I use C syntax simply because everyone is familiar with it.
Instruction C language equivalent
------------- ---------------------------
mov rx, ry rx = ry // Move register ry into rx
mov rx, ry lsl #5 rx = ry << 5 // Logical Shift Left
mov rx, ry lsr #6 rx = ry >> 6 // Logical Shift Right
mov rx, #0x12 rx = 0x12 // Move x12 into rx
mov rx, #0x21, 28 rx = 0x21 rotated right 28 bits
(see below)
str rx, [ry, #0x12] DWORD *ry; ry[0x12/4] = rx
// Store rx into memory at ry + x12
str rx, [ry, rz] DWORD *ry; ry[rz/4] = rx
// Store rx into memory at ry + rz
ldr rx, [ry], -rz ry -= rz; rx = *ry
cmp rx, ry Compare rx to ry, and set the Condition flags
accordingly, for example, Z = (rx == ry)
cmp rx, #0x12 Compare rx to 0x12.
add rx, ry, #0x12 rx = ry + 0x12
sub rx, ry, #0x12 rx = ry – 0x12
mul rx, ry, #0x12 rx = ry * 0x12
orr rx, ry, #2 rx = ry | 2
and rx, ry, #2 rx = ry & 2
bic rx, ry, #5 rx = ry & ~5 // Bit Clear
bx rx rx(); // Jump to the address in rx
A little explanation is in order about instructions that use "shifter operands" such as mov rx, 0x21, 28. It may seem like an odd instruction but is actually quite common, because it allows the compiler to stuff 32 bit constant values into just 12 bits of an instruction : 8 bits for the constant (0x21) and 4 bits for the shift amount (28) which can be any even number between 0 and 31. Without this trick of stuffing the constant into the 32-bit instruction, the compiler would have to load a constant from memory, which is much more expensive.
By the way, the example we've been using:
mov rx, 0x21, 28 rx = 0x21 rotated right 28 bits
is essentially the same as :
rx = 0x21 << (32 – 28)
or
rx = 0x21 << 4
PC – relative addressing
Even though the use of shifter operands can sometimes allow the compiler to fit a constant into an instruction, sometimes the constant simply won't fit and must be loaded from memory. Where does the compiler put these constants? Right in the instruction stream, bteween where one function ends and the next one starts. This allows the compiler to load a constant using "pc-relative addressing", that is, using the program counter as if it were a pointer to data. For example:
ldr r1, [pc, #0x1C] DWORD *pc; r1 = pc[0x1c / 4];
The one catch is that the value of pc used in the pointer arithmetic is not the address of the instruction itself. It is the address of the instruction plus 8. (This is just an artifact of the way the chip works. By the time the instruction actually executes, the pc has already been incremented.) So, for example:
Address Instruction Disassembly
------- ----------- -----------
01F05640 e59f101c ldr r1, [pc, #0x1C]
…
01F05664 12345678 ??? // The data is at address 1F05640 + 8 + 1C
By the way, this is why when you're looking at a disassembly, some instructions will look weird, or show as ???. It's because they're not really instructions, they're data.
Suffixes
B and H
By default, instructions operate on 32-bit words. (Note that this definition of a word is different from the Win32 concept of a WORD, which is 16 bits.) However, an instruction that has the H suffix operates on halfwords (16 bits), and an instruction with the B suffix operates on bytes. For example:
strb r1, [r8, #0x28] BYTE r1 = ((BYTE*)r8)[0x28]
S
Some instructions take an optional S suffix, which means "update the condition flags based on the result of this instruction".
Conditional suffixes
All ARM instructions are conditional, i.e., they can all be modified by a suffix indicating under what conditions the instructions should be executed. Here are the most common condition suffixes:
Suffix Condition under which instruction is to execute
------- --------------------------------------------------------
eq equal Z == 1
ne not equal Z == 0
hi unsigned higher C == 1 && Z == 0
ls unsigned lower or same C==0 || Z==1
ge signed greater or equal N==V
lt signed less than N != V
gt signed greater than Z==0 && N==V
le signed less than or equal Z==1 && N != V
Don't worry too much about the third column. The suffixes work the way you would expect them to. For example:
cmp r1, #0x5
addeq r2, r1, r3 if (r1 == 5) r2 = r1 + r3;
movne r3, #0x18 if (r1 != 5) r3 = 0x18;
movgt r3, #0x18 if (r1 > 5) r3 = 0x18;
bleq Function if (r1 == 5) Function();
If the condition is not true, then the instruction does nothing.
Conditional instructions allow the compiler to compile simple "if then else" statements without using any "jump" instructions. For example:
if (r1 == 5)
r2 = 6;
else
r3 = 7;
would become :
cmp r1, 5
moveq r2, #6
movne r3, #7
whereas the x86 compiler (generally) would have to generate two "jump" instructions: one to jump over the "then" clause if the condition was not met, and one to jump over the "else" clause if it was.
Data alignment
The ARM CPU can only access DWORDs in memory that are aligned on addresses that are divisible by 4. Likewise, it can only access 16-bit values on addresses divisible by 2. An unaligned access will result in a Datatype Misalignment exception.
Function Calls
One of the most important mechanisms to understand is how a function call happens.
Step 1 - Parameters:
The caller sets up the parameters. r0 through r3 are used to transfer the first four parameters of a function. If there are more than four, the rest are pushed on the stack. (The rightmost parameter is pushed first).
Step 2 - Call:
The function is called by executing the Branch and Link instruction:
bl MyFunction lr = pc + 4; MyFunction();
Note how this instruction sets lr to be the address to return to when the function is done.
For C++ method calls, you'll see the bx instruction instead of bl. Bx jumps to the address in a register. For example, if r0 is your C++ "this" pointer:
ldr r2, [r0] r2 = &vtable
ldr r3, [r2, #0xC] r3 = vtable[3] // == fourth method
mov lr, pc Manually set up lr since we're not using bl
bx r3 Jump to r3, i.e., call the fourth method
Step 3 – Preserve registers:
The first instruction in a function usually looks something like this:
stmdb sp!, {r4 - r6, lr} push(lr); push(r6);
push(r5); push(r4);
This is the "Store Multiple Decrement Before" instruction, which is my favorite CPU instruction of all time. It pushes an entire specified set of registers onto the stack in one instruction. This serves two purposes. First, it preserves the values of r4 through r11, and r13 (remember it is the responsibility of the called function to preserve these). Second, it safely stores away the return address, lr.
Note that the order in which it pushes the registers may be the opposite of what you expect. The nice thing about this, though, is that if you are looking at a memory dump of the stack, the registers will be in the same order in memory as they are listed in the code.
Step 4 - Locals:
Next, the callee decrements the stack pointer in order to reserve space on the stack for its local variables:
sub sp, sp, #0xC sp -= 12;
Note that the size it reserves may not be what you expect from looking at the local variables in the C code. This is because the optimizing compiler may not need to store a local on the stack at all; it may be able to get away with using registers.
Step 5 – Body
Next, the body of the function is executed. Somewhere along the way, (the optimizing compiler is free to decide where) the compiler sets r0 to the return value of the function.
Step 6 – Return
If space was allocated on the stack for locals, it is released:
add sp, sp, #0xC
Then the registers that we saved away at the beginning of the function are restored using the "Load Multiple Increment After" instruction:
ldmia sp!, {r4 – r6, lr} pop(r4); pop(r5); pop(r6); pop(lr);
And finally we jump to the return address:
bx lr
Note, if you are debugging an app compiled for a version of Windows Mobile prior to v5.0, a different code sequence will be used:
ldmia sp!, {r4 – r6, pc} pop(r4); pop(r5); pop(r6); pop(pc);
Note that this is the same list of registers as used in the stmdb instruction at the beginning of the function, except that now lr has been replaced by pc. So the value of lr when the function was entered, i.e., the return address, is now loaded into pc, the program counter. This has the effect of jumping to the return address. In other words, returning from the function.
The Frame Pointer
The Frame Pointer is not a register; it's just a concept. The Frame Pointer is the value of the stack pointer while the body of a function is executing. In other words, it's a pointer to a stack frame. What's in a stack frame? Well, from steps 3 and 4 above, a stack frame contains local variables, followed by the registers that the function needed to preserve.
Every function call in a call stack has a frame pointer. Platform Builder will show you the frame pointers if you right-click in the Call Stack window and check "Frame Pointer". In Visual Studio 2005, you need to double-click on the frame in the Call Stack window, then look at the value of "sp" in the Registers window. (If the Registers window says "No Data Available", right-click the Registers window and check "Device Registers".)
Finding local variables
Now that you understand the basics of ARM assembly, you can use them to do some cool debugging tricks. For example, the debugger often can't give you the value of local variables in a retail build, because the debugger assumes locals are on the stack, and the optimizing compiler often keeps them in registers instead. But, by looking at the disassembly window, and figuring out what the disassembly is doing and how it relates to the C source code, you can find where the compiler is storing your local variables.
Understanding the optimized code the compiler generates can be tricky, and there's no cookbook to finding your locals, but here's one tip:
Before a function is called, the parameters have to be loaded into r0, r1, r2, and r3. So it's fairly easy to look at what the compiler is loading into these registers and to match them up to the C code. For example:
RECT rc;
GetWindowRect(hWnd, &rc);
becomes:
add r1, sp, #0x20 // This tells us that rc is
// located at sp + 0x20
mov r0, r4 // This tells us that hWnd was in r4
// before this snippet of code
bl GetWindowRect
Finding local variables in other stack frames
Now let's tackle a harder problem. Suppose you want to know the value of a local that lives in a function that is not at the top of the call stack? For example, suppose your call stack looks like this:
Generic.exe!MyRegisterClass
Generic.exe!InitInstance
Generic.exe!WinMain
Generic.exe!WinMainCRTStartup
And you want to know what the value of hInstance was in WinMain. The first step is to do what we did before: read the disassembly and figure out where the value was before InitInstance was called:
int WINAPI WinMain(HINSTANCE hInstance,
HINSTANCE hPrevInstance,
LPTSTR lpCmdLine,
int nCmdShow)
{
00011960 stmdb sp!, {r4, lr}
00011964 sub sp, sp, #0x1C
00011968 mov r4, r0
MSG msg;
// Perform application initialization:
if (!InitInstance(hInstance, nCmdShow))
0001196C mov r1, r3
00011970 bl |InitInstance ( 117f8h )|
Since hInstance is the first parameter to WinMain, hInstance must have been in r0 when WinMain was entered. The mov r4,r0 instruction tells you that hInstance was in r4 when InitInstance was called. Of course, it's not still in r4 by the time we got to MyRegisterClass. So how do you figure out what r4 used to be?
Remember that it is the responsibility of a called function to preserve the values of r4 through r11 and r13. So, one solution is to just step in the debugger back out to WinMain and then look at r4. But there are a number of reasons why this might not be possible: you may be looking at a post-mortem Watson dump; the debugger might be misbehaving; you might need to do more investigation before you step back out and lose your current state, etc. So then what do you do?
Since it is the responsibility of a called function to preserve r4 – r11, that means that one of the functions in the call stack (above the function you care about) must have preserved your r4. So, starting at WinMain (since that's where your local lives) walk up the stack one frame at a time looking for a function that preserved r4 on the stack.
So we start at InitInstance. Find the beginning of the function:
BOOL InitInstance(HINSTANCE hInstance, int nCmdShow)
{
000117F8 stmdb sp!, {r4 - r7, lr}
000117FC sub sp, sp, #0x75, 30
And there it is. InitInstance preserves r4. But where exactly did it put r4? Remember what the stack looks like:
MyRegisterClass locals MyRegisterClass frame pointer
MyRegisterClass preserved registers
InitInstance locals InitInstance frame pointer
InitInstance preserved registers
WinMain locals WinMain frame pointer
WinMain preserved registers
So first you need to find the frame pointer for WinMain, which Platform Builder's call stack window will give you. Or, if you are using Visual Studio 2005, double-click WinMain's frame in the Call Stack window, then look at the value of sp in the Registers window. (If for some reason you don't have a debugger, you can even figure out the frame pointer yourself by starting with the current value of the stack pointer and looking at the prolog of each function in the stack to see how big each frame is.)
So now we know WinMain's frame pointer, which points to its locals. The "sub sp,sp,#0x75,30" instruction tells us that WinMain has 0x1D4 bytes of locals (0x1D4 == 0x75 << (32-30)). So WinMain's preserved registers start at the frame pointer + 0x1d4. And since r4 is the first of the preserved registers, r4 lives at the frame pointer + 0x1d4. And there you have it, you found the value of WinMain's local hInstance variable.
By the way, if you reach the very top of the call stack, and none of the functions preserved r4, that means that none of them trash r4, and so the current value of r4 is what you are looking for.