Android’s kernel for beagleboard-xm

Rowboat port enable TI device on android’s linux kernel at

1, Build

   1: make CROSS_COMPILE=arm-eabi- distclean

   2: make CROSS_COMPILE=arm-eabi- omap3_beagle_android_defconfig

   3: make CROSS_COMPILE=arm-eabi- uImage

The generated uImage is in arch/arm/boot.

2, Modifications

2.1 arm\mach-omap2\board-omap3beagle.c

This file is major BSP for beagleboard. It defines __mach_desc_OMAP3_BEAGLE for architecture features as follows

    /* Maintainer: Syed Mohammed Khasim - */
    .phys_io    = 0x48000000,
    .io_pg_offst    = ((0xfa000000) >> 18) & 0xfffc,
    .boot_params    = 0x80000100,
    .map_io     = omap3_beagle_map_io,
    .init_irq   = omap3_beagle_init_irq,
    .init_machine   = omap3_beagle_init,
    .timer      = &omap_timer,

This structure is put into section referenced in link script file at arm\kernel\

    .init : {           /* Init code and data       */
        _stext = .;
        _sinittext = .;
        _einittext = .;
        __proc_info_begin = .;
        __proc_info_end = .;
        __arch_info_begin = .;
        __arch_info_end = .;
        __tagtable_begin = .;
        __tagtable_end = .;

The machine description structure is as follows:

struct machine_desc {
     * Note! The first four elements are used
     * by assembler code in head.S, head-common.S
    unsigned int        nr;     /* architecture number  */
    unsigned int        phys_io;    /* start of physical io */
    unsigned int        io_pg_offst;    /* byte offset for io 
                         * page tabe entry  */

    const char      *name;      /* architecture name    */
    unsigned long       boot_params;    /* tagged list      */

    unsigned int        video_start;    /* start of video RAM   */
    unsigned int        video_end;  /* end of video RAM */

    unsigned int        reserve_lp0 :1; /* never has lp0    */
    unsigned int        reserve_lp1 :1; /* never has lp1    */
    unsigned int        reserve_lp2 :1; /* never has lp2    */
    unsigned int        soft_reboot :1; /* soft reboot      */
    void            (*fixup)(struct machine_desc *,
                     struct tag *, char **,
                     struct meminfo *);
    void            (*map_io)(void);/* IO mapping function  */
    void            (*init_irq)(void);
    struct sys_timer    *timer;     /* system tick timer    */
    void            (*init_machine)(void);


3, Reference



Android’s Binder


The Binder communicates between processes using a small custom kernel module.This is used instead of standard Linux IPC facilities so that we can efficiently model our IPC operations as “thread migration”. That is, an IPC between processes looks as if the thread instigating the IPC has hopped over to the destination process to execute the code there, and then hopped back with the result.

Why android need IPC communication/binder? Although all android app are using java language, but android uses dalvik VM, unlike traditional OSGi’s JVM, each dalvik app resided in single linux process. As Radoslav Gerganow said, this prevent all app closed when VM is broken. So the IPC is necessary for each android app’s communication.

The binder in android is based on OpenBinder with some modifications. The binder’s protocol version used in android-kernel 2.6.32 is 7. Binder IPC in android is based on binder driver /drivers/staging/android/binder.c.

1 Workflow


2 Binder Driver

When a user-space thread wants to participate in Binder IPC (either to send an IPC to another process or to receiving an incoming IPC), the first thing it must do is open the driver supplied by the Binder kernel module. This associates a file descriptor with that thread, which the kernel module uses to identify the initiators and recipients of Binder IPCs.

2.1 binder_init()

  • Create procfs /proc/binder and some entries as:
    • state
    • stats transactions
    • transaction_log
    • failed_transaction_log
  • Register binder device via misc_register()

2.2 binder_ioctl()


sends zero or more Binder operations, then blocks waiting to receive incoming operations and return with a result. (This is the same as doing a normal write() followed by a read() on the file descriptor, just a little more efficient.)

The ioctl’s  data structure is

struct binder_write_read {
    signed long write_size; /* bytes to write */
    signed long write_consumed; /* bytes consumed by driver */
    unsigned long   write_buffer;
    signed long read_size;  /* bytes to read */
    signed long read_consumed;  /* bytes consumed by driver */
    unsigned long   read_buffer;

Upon calling the driver, write_buffer contains a series of commands for it to perform, and upon return read_buffer is filled in with a series of responses for the thread to execute.

Here is a list of the commands that can be sent by a process to the driver, with comments describing the data that follows each command in the buffer:

enum BinderDriverCommandProtocol {
    BC_TRANSACTION = _IOW('c', 0, struct binder_transaction_data),
    BC_REPLY = _IOW('c', 1, struct binder_transaction_data),
     * binder_transaction_data: the sent command.

    BC_ACQUIRE_RESULT = _IOW('c', 2, int),
     * not currently supported
     * int:  0 if the last BR_ATTEMPT_ACQUIRE was not successful.
     * Else you have acquired a primary reference on the object.

    BC_FREE_BUFFER = _IOW('c', 3, int),
     * void *: ptr to transaction data received on a read

    BC_INCREFS = _IOW('c', 4, int),
    BC_ACQUIRE = _IOW('c', 5, int),
    BC_RELEASE = _IOW('c', 6, int),
    BC_DECREFS = _IOW('c', 7, int),
     * int: descriptor

    BC_INCREFS_DONE = _IOW('c', 8, struct binder_ptr_cookie),
    BC_ACQUIRE_DONE = _IOW('c', 9, struct binder_ptr_cookie),
     * void *: ptr to binder
     * void *: cookie for binder

    BC_ATTEMPT_ACQUIRE = _IOW('c', 10, struct binder_pri_desc),
     * not currently supported
     * int: priority
     * int: descriptor

    BC_REGISTER_LOOPER = _IO('c', 11),
     * No parameters.
     * Register a spawned looper thread with the device.

    BC_ENTER_LOOPER = _IO('c', 12),
    BC_EXIT_LOOPER = _IO('c', 13),
     * No parameters.
     * These two commands are sent as an application-level thread
     * enters and exits the binder loop, respectively.  They are
     * used so the binder can have an accurate count of the number
     * of looping threads it has available.

    BC_REQUEST_DEATH_NOTIFICATION = _IOW('c', 14, struct binder_ptr_cookie),
     * void *: ptr to binder
     * void *: cookie

    BC_CLEAR_DEATH_NOTIFICATION = _IOW('c', 15, struct binder_ptr_cookie),
     * void *: ptr to binder
     * void *: cookie

    BC_DEAD_BINDER_DONE = _IOW('c', 16, void *),
     * void *: cookie

The most interesting commands here are BC_TRANSACTION and BC_REPLY, which initiate an IPC transaction and return a reply for a transaction, respectively. The data structure following these commands is:

enum transaction_flags {
    TF_ONE_WAY  = 0x01, /* this is a one-way call: async, no return */
    TF_ROOT_OBJECT  = 0x04, /* contents are the component's root object */
    TF_STATUS_CODE  = 0x08, /* contents are a 32-bit status code */
    TF_ACCEPT_FDS   = 0x10, /* allow replies with file descriptors */

struct binder_transaction_data {
    /* The first two are only used for bcTRANSACTION and brTRANSACTION,
     * identifying the target and contents of the transaction.
    union {
        size_t  handle; /* target descriptor of command transaction */
        void    *ptr;   /* target descriptor of return transaction */
    } target;
    void        *cookie;    /* target object cookie */
    unsigned int    code;       /* transaction command */

    /* General information about the transaction. */
    unsigned int    flags;
    pid_t       sender_pid;
    uid_t       sender_euid;
    size_t      data_size;  /* number of bytes of data */
    size_t      offsets_size;   /* number of bytes of offsets */

    /* If this transaction is inline, the data immediately
     * follows here; otherwise, it ends with a pointer to
     * the data buffer.
    union {
        struct {
            /* transaction data */
            const void  *buffer;
            /* offsets from buffer to flat_binder_object structs */
            const void  *offsets;
        } ptr;
        uint8_t buf[8];
    } data;

Thus, to initiate an IPC transaction, you will essentially perform a BINDER_READ_WRITE ioctl with the write buffer containing bcTRANSACTION follewed by a binder_transaction_data. In this structure target is the handle of the object that should receive the transaction, code tells the object what to do when it receives the transaction, priority is the thread priority to run the IPC at, and there is a data buffer containing the transaction data, as well as an (optional) additional offsets buffer of meta-data.

Given the target handle, the driver determines which process that object lives in and dispatches this transaction to one of the waiting threads in its thread pool (spawning a new thread if needed). That thread is waiting in a BINDER_WRITE_READ ioctl() to the driver, and so returns with its read buffer filled in with the commands it needs to execute. These commands a very similar to the write commands, for the most part corresponding to write operations on the other side:

enum BinderDriverReturnProtocol {
    BR_ERROR = _IOR('r', 0, int),
     * int: error code

    BR_OK = _IO('r', 1),
    /* No parameters! */

    BR_TRANSACTION = _IOR('r', 2, struct binder_transaction_data),
    BR_REPLY = _IOR('r', 3, struct binder_transaction_data),
     * binder_transaction_data: the received command.

    BR_ACQUIRE_RESULT = _IOR('r', 4, int),
     * not currently supported
     * int: 0 if the last bcATTEMPT_ACQUIRE was not successful.
     * Else the remote object has acquired a primary reference.

    BR_DEAD_REPLY = _IO('r', 5),
     * The target of the last transaction (either a bcTRANSACTION or
     * a bcATTEMPT_ACQUIRE) is no longer with us.  No parameters.

     * No parameters... always refers to the last transaction requested
     * (including replies).  Note that this will be sent even for
     * asynchronous transactions.

    BR_INCREFS = _IOR('r', 7, struct binder_ptr_cookie),
    BR_ACQUIRE = _IOR('r', 8, struct binder_ptr_cookie),
    BR_RELEASE = _IOR('r', 9, struct binder_ptr_cookie),
    BR_DECREFS = _IOR('r', 10, struct binder_ptr_cookie),
     * void *:  ptr to binder
     * void *: cookie for binder

    BR_ATTEMPT_ACQUIRE = _IOR('r', 11, struct binder_pri_ptr_cookie),
     * not currently supported
     * int: priority
     * void *: ptr to binder
     * void *: cookie for binder

    BR_NOOP = _IO('r', 12),
     * No parameters.  Do nothing and examine the next command.  It exists
     * primarily so that we can replace it with a BR_SPAWN_LOOPER command.

    BR_SPAWN_LOOPER = _IO('r', 13),
     * No parameters.  The driver has determined that a process has no
     * threads waiting to service incomming transactions.  When a process
     * receives this command, it must spawn a new service thread and
     * register it via bcENTER_LOOPER.

    BR_FINISHED = _IO('r', 14),
     * not currently supported
     * stop threadpool thread

    BR_DEAD_BINDER = _IOR('r', 15, void *),
     * void *: cookie
     * void *: cookie

    BR_FAILED_REPLY = _IO('r', 17),
     * The the last transaction (either a bcTRANSACTION or
     * a bcATTEMPT_ACQUIRE) failed (e.g. out of memory).  No parameters.

The recipient, in user space will then hand this transaction over to the target object for it to execute and return its result. Upon getting the result, a new write buffer is created containing the bcREPLY reply command with a binder_transaction_data structure containing the resulting data. This is returned with a BINDER_WRITE_READ ioctl() on the driver, sending the reply back to the original process and leaving the thread waiting for the next transaction to perform.

The original thread finally returns back from its own BINDER_WRITE_READ with a brREPLY command containing the reply data.

Note that the original thread may also receive BR_TRANSACTION commands while it is waiting for a reply. This represents a recursion across processes the receiving thread making a call on to an object back in the original process. It is the responsibility of the driver to keep track of all active transactions, so it can dispatch transactions to the correct thread when recursion happens.






(Not used in android)


(Not used in android)


Android on my beagleboard-xm

Ha, I get android boot on my beagleboard-xm and output to 24 ich screen in 1920 * 1080.

Seems the startup frame buffer is not correct:

It does a big android….too large screen…..

Seems it cost about 5 minutes booting from welcome screen to android desktop. I think it is caused by too slow for booting from MMC directly, I need try put rootfs in usb disk but it requires enable usb hub early.



beagleboard-xm research(2)–u-boot

1, Build

From, download u-boot 1.3.3 for beagleboard

If you uses latest ARM gcc from codesourcery, you maybe get following error

arm-none-linux-gnueabi-gcc -g  -Os   -fno-strict-aliasing  -fno-common -ffixed-r8 -msoft-float  -D__KERNEL__ -DTEXT_BASE=0x80e80000 -I/home/ken/bb/u-boot/u-boot-beagle/include -fno-builtin -ffreestanding -nostdinc -isystem /opt/sourcery_g++/bin/../lib/gcc/arm-none-linux-gnueabi/4.3.3/include -pipe  -DCONFIG_ARM -D__ARM__ -march=armv7a  -Wall -Wstrict-prototypes -c -o hello_world.o hello_world.c
hello_world.c:1: error: bad value (armv7a) for -march= switch

This issue is caused by latest GCC changing for ARMV7-A architecture, that should uses -march=armv7-a but not -march=armv7a.

To fix it, in u-boot\cpu\omap3\, change following line:

PLATFORM_CPPFLAGS += -march=armv7a


PLATFORM_CPPFLAGS += -march=armv7-a

Although success to build uboot.bin image, but beagleboard-xm fail to boot it:

a) the serial baudrate is changed from 115200 to 57600

b) system hang after find no NAND memory.

But the u-boot image built from git mainline git:// with omap3 patch can work correctly, please reference

   1: git clone git:// u-boot-main

   2: cd u-boot-main

   3: git checkout --track -b omap3 origin/master


   1: make CROSS_COMPILE=arm-none-linux-gnueabi- mrproper

   2: make CROSS_COMPILE=arm-none-linux-gnueabi- omap3_beagle_config

   3: make CROSS_COMPILE=arm-none-linux-gnueabi- 

As mentioned by previous discussion, u-boot.bin is loaded into the first of internal SDRAM at address 0x80008000. So in uboot\board\ti\beagle\

   1: #

   2: # Physical Address:

   3: # 8000'0000 (bank0)

   4: # A000/0000 (bank1)

   5: # Linux-Kernel is expected to be at 8000'8000, entry 8000'8000

   6: # (mem base + reserved)


   8: # For use with external or internal boots.

   9: CONFIG_SYS_TEXT_BASE = 0x80008000

CONFIG_SYS_TEXT_BASE as macro passed into build options as:

arm-none-linux-gnueabi-gcc   -D__ASSEMBLY__ -g  -Os   -fno-common -ffixed-r8 -msoft-float   -D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0x80008000 -I/home/ken/bb/u-boot/u-boot-mailine/include -fno-builtin -ffreestanding -nostdinc -isystem /opt/sourcery_g++/bin/../lib/gcc/arm-none-linux-gnueabi/4.3.3/include -pipe  -DCONFIG_ARM -D__ARM__ -marm  -mabi=aapcs-linux -mno-thumb-interwork -march=armv5   -o start.o start.S –c

(BTW: there is some interesting compiler options used: –ffreestanding, –isystem, –mabi=aapcs-linux)

It is worth to mention that uboot will keep some information into a global_data in top of stack, the structure is defined in uboot\arch\arm\include\asm\global_data.h:

   1: typedef    struct    global_data {

   2:     bd_t        *bd;

   3:     unsigned long    flags;

   4:     unsigned long    baudrate;

   5:     unsigned long    have_console;    /* serial_init() was called */

   6:     unsigned long    env_addr;    /* Address  of Environment struct */

   7:     unsigned long    env_valid;    /* Checksum of Environment valid? */

   8:     unsigned long    fb_base;    /* base address of frame buffer */

   9: #ifdef CONFIG_VFD

  10:     unsigned char    vfd_type;    /* display type */

  11: #endif

  12: #ifdef CONFIG_FSL_ESDHC

  13:     unsigned long    sdhc_clk;

  14: #endif

  15: #ifdef CONFIG_AT91FAMILY

  16:     /* "static data" needed by at91's clock.c */

  17:     unsigned long    cpu_clk_rate_hz;

  18:     unsigned long    main_clk_rate_hz;

  19:     unsigned long    mck_rate_hz;

  20:     unsigned long    plla_rate_hz;

  21:     unsigned long    pllb_rate_hz;

  22:     unsigned long    at91_pllb_usb_init;

  23: #endif

  24: #ifdef CONFIG_ARM

  25:     /* "static data" needed by most of timer.c on ARM platforms */

  26:     unsigned long    timer_rate_hz;

  27:     unsigned long    tbl;

  28:     unsigned long    tbu;

  29:     unsigned long long    timer_reset_value;

  30:     unsigned long    lastinc;

  31: #endif

  32:     unsigned long    relocaddr;    /* Start address of U-Boot in RAM */

  33:     phys_size_t    ram_size;    /* RAM size */

  34:     unsigned long    mon_len;    /* monitor len */

  35:     unsigned long    irq_sp;        /* irq stack pointer */

  36:     unsigned long    start_addr_sp;    /* start_addr_stackpointer */

  37:     unsigned long    reloc_off;

  38: #if !(defined(CONFIG_SYS_NO_ICACHE) && defined(CONFIG_SYS_NO_DCACHE))

  39:     unsigned long    tlb_addr;

  40: #endif

  41:     void        **jt;        /* jump table */

  42:     char        env_buf[32];    /* buffer for getenv() before reloc. */

  43: } gd_t;

The structure size maybe different according to configure macros, so at beginning of build, a script is used to calculate current size of global data:

   1: arm-none-linux-gnueabi-gcc -DDO_DEPS_ONLY \

   2:         -g  -Os   -fno-common -ffixed-r8 -msoft-float   -D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0x80008000 -I/home/ken/bb/u-boot/u-boot-mailine/include -fno-builtin -ffreestanding -nostdinc -isystem /opt/sourcery_g++/bin/../lib/gcc/arm-none-linux-gnueabi/4.3.3/include -pipe  -DCONFIG_ARM -D__ARM__ -marm  -mabi=aapcs-linux -mno-thumb-interwork -march=armv5 -Wall -Wstrict-prototypes -fno-stack-protector   \

   3:         -o lib/asm-offsets.s lib/asm-offsets.c -c -S

   4: Generating include/generated/generic-asm-offsets.h

   5: tools/scripts/make-asm-offsets lib/asm-offsets.s include/generated/generic-asm-offsets.h

In EFI, there is similar design that put PeiCore’s private data at top of stack as global data.

2, Memory Map

0x9fff0000  ~  TLB table

0x9ff7f000 ~ 0x9fff0000 : Reserved for U-boot (449K)

0x9ff1f000 ~ 0x9ff7f000: for malloc(384k)

0x9ff1efe0 ~ 0x9ff1f000: board info (32 bytes)

0x9ff1ef68 ~ 0x9ff1efe0: global data (120 bytes)

0x9ff1ef68: New stack point

0x80008000                              reset vector

0x8007020 ~0x80008028      interrupt vectors

0x8000100 : Linux boot parameters


0x4020FF80 ~ 0x40210000  global_data

0x4020F800 ~0x4020FF80   stack


2, Workflow

  1. uboot\arch\cpu\armv7\start.S
    • Like x-load, start.S provide the first assemble loader for u-boot
    • The first instruction is reset vector and the interrupt/exception handle are closed to it. As file:
80008000 T _start

80008020 t _undefined_instruction

80008024 t _software_interrupt

80008028 t _prefetch_abort

8000802c t _data_abort

80008030 t _not_used

80008034 t _irq

80008038 t _fiq

    • Switch CPU to SVC32 mode.
   1: mrs    r0, cpsr

   2: bic    r0, r0, #0x1f

   3: orr    r0, r0, #0xd3

   4: msr    cpsr,r0

    • Copy interrupt vectors to ROM indirect address: 0x4020F800
    • Because beagleboard-xm does not have NAND/OneNand device, so need copy DPLL initialize code into ROM indirect address after interrupt vectors
    • Init CPU in assemble like x-load:
      • Setup important registers: mmu, cache
      • Setup memory timing.
    • Setup stack for C code at 0x4020FF80 (uboot\include\configs\omap3_beagle.h):
   1: #define CONFIG_SYS_INIT_RAM_ADDR    0x4020f800

   2: #define CONFIG_SYS_INIT_RAM_SIZE    0x800


   4:                      CONFIG_SYS_INIT_RAM_SIZE - \

   5:                      GENERATED_GBL_DATA_SIZE)


As above mentioned, before top of stack, global_data will be stored, the size of global_data is determined/generated at build time (include\generated\generic-asm-offices.h)

   1: #define GENERATED_GBL_DATA_SIZE (128) /* (sizeof(struct global_data) + 15) & ~15 */

    • Call C function board_init_f (uboot\arch\arm\lib\board.c):
      • Assign/Init global data structure at 0x4020F800
      • Disable memory I/O cache for compiler optization, just like MemoryFence() used in edk2 MdePkg:
   1: __asm__ __volatile__("": : :"memory");

Because many hardware intialization or I/O accessing will use write/read same MMIO address, the compiler maybe optimizate these code out or re-arrange read/write sequence, so it will break. Above instruction like asm volatitle used.

      • call all function defined in init_sequence array:
        • timer_init (arch/arm/cpu/armv7/omap-common/timer.c)
          • here used GPTIMER2 (there are 12 GP time in OMAP3), which base adress is 0x49032000
        • Initialize environment, because no NAND, so the environment is relocated to RAM as in arch\arm\include\asm\global_data.h
   1: #define    GD_FLG_RELOC        0x00001    /* Code was relocated to RAM        */

            By default, some configuration value comes from global variable default_environment in common\env_common.c such as baudrate.

        • serial initliazation

beagleboard-xm use NS16650 serial at COM3 0x49020000, datasheet at

        • Init stage1 console for print
        • Print CPU/board information
        • Init I2C device.
        • Init SDRAM device, caculate the bank’s size.
      • Reserve RAM memory for u-boot at top of RAM1 started from 0x80000000
      • Relocate code for new location at 0x9ff7f000/stack at 0x9ff1ef60 (arch\arm\cpu\armv7\start.S, relocate_code())
      • Jump to board_init_r() in new location in RAM, (The sequence is very like PeiCore relocation in EFI)
      • In beaglboard’s board_init_r() (board\ti\beagle\beagle.c):
        • Init GPMC
        • set board id for linux as 1546
        • set boot parameter address at 0x80000100
        • Init MMC driver
        • Init stdio drivers such as serial, nulldev
        • Init jumptable?
        • Evalute board version, for beagleboard-xm board, set VAUX2 to 1.8v for EHCI PHY. And print DIE ID at 0x4830A200
        • Init IRQ/FIQ stack which size are all 4K
        • Change CPSR to enable interrupt
        • Enter main_loop() function to read boot/user script …

beagleboard-xm research(1) — Initialization & x-load

1, General boot process and device

The initialization process for OMAP Dm37x beagleboard:

  • Preinitialization
  • Power/clock/reset ramp sequence
  • Boot ROM
  • Boot Loader
  • OS/application

Six external pins(sys_boot[5:0]) are used to select interfaces or devices for booting. The interfaces are GPMC, MMC1, MMC2, USB and UART.

The ROM code has two booting functions: peripheral booting and memory booting:

  • In peripheral booting, the ROM code pools a selected communication interface such as UART or USB, downloads the executable code over the interface, and execute it in internal SRAM. Downloaded software from an external host can be used to program flash memories connected to the device.
  • In memory booting, the ROM code finds bootstrap in permanent memories such as flash memory or memory cards and executes it. The process is normally performed after cold or warm device reset.

Overall boot sequence is as follows:

Following is 32K SRAM memory map of GP device, which is used only during the booting process.

Beagleboard-xm is OMAP3, so use 64K SRAM which range is 40200000-4020FFFF.

2, Boot from MMC/SD card

In general, beagleboard-xm uses memory booting from MMC/SD card, because this board does not have NAND device. There are some limitations as follows:

  • Supports MMC/SD cards compliant with the Multimedia Card System Specification v4.2 from the MMCA Technical Committee and the SD I/O Card Specification v2.0 from the SD Association. Includes high-capacity (size >2GB) cards: HC-SD and HC MMC
  • 3-V power supply, 3-V I/O and 1.8-V I/O voltages on port 1
  • Supports eMMC/eSD (1.8-V I/O voltage and 3.0-V Core voltage) on port 2. The external transceiver mode on port 2 is not supported.
  • Initial 1-bit MMC mode, 4-bit SD mode
  • Clock frequency:
                –   Identification mode: 400 kHz
                –   Data transfer mode: 20 MHz
  • Only one card connected to the bus 
  • Raw mode, image data read directly from card sectors 
  • FAT12/16/32 support, with or without a master boot record (MBR). 
  • For a FAT (12/16/32)-formatted memory card, the booting file must not exceed 128 KB.
  • For a raw-mode memory card, the booting image must not exceed 128 KB.  

The image used by the booting procedure is taken from a booting file named MLO. This file must be in the root directory on an active primary partition of type FAT12/16 or FAT32.

An MMC/SD card can be configured as floppy-like or hard-drive-like:

  • When acting like a floppy, the content of the card is a single FAT12/16/32 file system without an MBR holding a partition table.
  • When acting like a hard drive, an MBR is present in the first sector of the card. This MBR holds a table of partitions, one of which must be FAT12/16/32, primary, and active.

3, MLO image format

For a GP device, the image is simple and must contain a small header having the size of the software to load and the destination address of where to store it when a booting device is other than XIP. The XIP device image is even simpler and starts with executable code.

4, x-load

4.1 Why uses x-load

As above mentioned, the SRAM in beagleboard-xm is very tiny as 64K, the u-boot image size is almost 196K, so beagleboard-xm can not use u-boot as MLO. The x-load is used here, which can be considered as u-boot loader, and it’s size is around 24K.

4.2 How to build x-load

  • Get mainline x-load source code from

git clone git://

make CROSS_COMPILE=arm-none-linux-gnueabi- omap3530beagle_config

make CROSS_COMPILE=arm-none-linux-gnueabi-
       Although beagleboard-xm use DM3735 process, there is updated config file in x-load mainline’s tree. So it is ok for reuse omap3530beagle_config file.

  • Generate MLO file

After building, x-load.bin is generated as raw executable binary. As above mentioned about non-XIP image format, the size and address should be added at image’s first 16 bytes. So use signGP scipt to do it. The source code of signGP is

4.3 x-load’s research

4.3.1 memory map

In beagleboard-xm, ROM code will load x-load binary into SRAM (0x4020000 ~ 0x4020FFFF) 64K range. The range is allocated as follows:

Runtime stack: 0x4020000 ~ 0x40207FFF

MLO                : 0x4020800 ~ 0x4020FFFF

please reference board\omap3530beagle\ for TEXT_BASE setting:

   1: # For XIP in 64K of SRAM or debug (GP device has it all availabe)

   2: # SRAM 40200000-4020FFFF base

   3: # initial stack at 0x4020fffc used in s_init (below xloader).

   4: # The run time stack is (above xloader, 2k below)

   5: # If any globals exist there needs to be room for them also

   6: TEXT_BASE = 0x40200800

Please reference cpu\omap3\start.S for stack pointer setting:

   1: /* Set up the stack                            */

   2: stack_setup:

   3:     ldr    r0, _TEXT_BASE        /* upper 128 KiB: relocated uboot   */

   4:     sub    sp, r0, #128        /* leave 32 words for abort-stack   */

   5:     and    sp, sp, #~7        /* 8 byte alinged for (ldr/str)d    */

Because the x-load is non-XIP code, so TEXT_BASE is passed to compiler:

arm-none-linux-gnueabi-gcc -Wa,-gstabs -D__ASSEMBLY__ -g  -Os   -fno-strict-aliasing  -fno-common -ffixed-r8  -D__KERNEL__ -DTEXT_BASE=0x40200800 -I/home/ken/bb/x-load/mainline/include -fno-builtin -ffreestanding -nostdinc -isystem /usr/lib/gcc/i486-linux-gnu/4.4.3/include -pipe  -DCONFIG_ARM -D__ARM__ -march=armv7-a  -c -o cpu/omap3/start.o /home/ken/bb/x-load/mainline/cpu/omap3/start.S


4.3.2 startup process

  1. The boot is started from cpu\omap3\start.S and the first instruction is reset vector.
  2. set cpu mode to Supervisor (SVC) 32 bit mode.
  3. Copy vectors to indirect address 0x4020F800 (SRAM_OFFSET0 + SRAM_OFFSET1 + SRAM_OFFSET2)
  4. relocates clock code into SRAM where its safer to execute
  5. Initialize CPU
    1. Invalidate instruction, L2 cache, and invalidate TLBs, disable MMU
    2. Initialize SRAM stack at 0x4020FFFC, so can use C code now.
    3. In C code s_init do some early initialization such watchdog,  configure SDRAM.
  6. Relocate code section
  7. Set runtime stack
  8. Clear bss section for uninitialization value.
  9. Jump to C code start_armboot().
    1. Initialize the serial device
    2. print version information like
      Texas Instruments X-Loader 1.4.4ss
    3. Initialize I2C which base address is 0x48070000 in L4 core.
    4. reading GPIO173, 172, 171 to determin the version of beagleboard then print it, for beagleboard-xm board, the value should be 0, 0, 0
    5. Initialize MMC card and load u-boot.bin from MMC card into pop SDRAM 0x80008000
      1. If no MMC found, try to boot from onenand or nand, but beagleboard-xm does not has these devices
      2. try to boot from serial ……
    6. Jump to 0x80008000, over for x-load.

4.3.3 x-load vs EFI’s SEC phase

So we can see the x-load is very like SEC phase in UEFI specification, in Intel’s tiano implementation, the SEC phase mainly:

  • Enter protect mode
  • Prepare early C stack in CAR, the CAR is instruction cache in process for temporary stack/heap, just like Omap’s internal SRAM for boot phase. Unlike SRAM, the CAR will be disabled/destroyed after SEC phase.
  • Initialize CPU such as MTRR for flash range.
  • Initialize early ACPI timer for performance collection.
  • Find the PeiCore from flash and shadow into CAR for PEI phase.

5, Reference
1) DM37x Multimedia Device Silicon Revision 1.x


我们可以注意到…iphone是跨指令集的(v6和v7).因为曾经使用过ARM11架构(v6指令集),而3GS是cortex A8(V7指令集),所以个人认为从3G到3Gs的开发是相当费钱的一步..
nokia每次要跨指令集通常都会发布新的FP补丁包过渡(比如说S603rd FP2),所以会显得比较明显.而且最糟的是nokia在兼容性方面做的相当差(也可能是故意的),S60v2就和S60v3的软件完全不兼容,v2支持v5te和v4t指令集的CPU,而v3支持v5te和v6指令集的CPU,甚至非常遗憾的说S60系统现在不支持V7指令集,这也是为什么nokia要特地生产一款linux终端来用cortex A8.
google目前似乎更倾向支持v6指令集的CPU,以换取更统一的平台(nexus one的ARM11CPU)
tips:RTOS为实时系统realtime os,这个通常不会出现手机平台上,而只会出现在一些对可靠性要求极高的设备上(医疗装置,生命维持装置).
      而红色的platform OS就是我们的手机系统或者机顶盒一类的嵌入式系统了.




译自 Programming the Microsoft Windows Driver Model / Walter Oney — 2nd ed,第12章,第2节——Working with the Bus Driver,加入了个人一些理解,希望对大家编写USB设备驱动有一定帮助,欢迎指正。

和其他设备驱动不同,USB设备驱动不直接与底层硬件进行通信,而是先建立一个称为USB请求块(USB request blocks,URB)的数据结构,把它发送给父级驱动,父级驱动根据URB中的信息对底层硬件进行相应操作,这里父级驱动通常就是指USB总线驱动。发送URB可以使用主功能码为IRP_MJ_INTERNAL_DEVICE_CONTROL的IRP来实现,也可以直接调用父级驱动提供的接口调用函数来实现。

1. 初始化请求
URB urb;
sizeof(dd), NULL);


typedef struct _URB {
    union {
            struct _URB_HEADER                           UrbHeader;
            struct _URB_SELECT_INTERFACE                 UrbSelectInterface;
            struct _URB_SELECT_CONFIGURATION             UrbSelectConfiguration;
            struct _URB_PIPE_REQUEST                     UrbPipeRequest;
            struct _URB_FRAME_LENGTH_CONTROL             UrbFrameLengthControl;
            struct _URB_GET_FRAME_LENGTH                 UrbGetFrameLength;
            struct _URB_SET_FRAME_LENGTH                 UrbSetFrameLength;
            struct _URB_GET_CURRENT_FRAME_NUMBER         UrbGetCurrentFrameNumber;
            struct _URB_CONTROL_TRANSFER                 UrbControlTransfer;
            struct _URB_BULK_OR_INTERRUPT_TRANSFER       UrbBulkOrInterruptTransfer;
            struct _URB_ISOCH_TRANSFER                   UrbIsochronousTransfer;

            // for standard control transfers on the default pipe
            struct _URB_CONTROL_DESCRIPTOR_REQUEST       UrbControlDescriptorRequest;
            struct _URB_CONTROL_GET_STATUS_REQUEST       UrbControlGetStatusRequest;
            struct _URB_CONTROL_FEATURE_REQUEST          UrbControlFeatureRequest;
            struct _URB_CONTROL_VENDOR_OR_CLASS_REQUEST UrbControlVendorClassRequest;
            struct _URB_CONTROL_GET_INTERFACE_REQUEST    UrbControlGetInterfaceRequest;
            struct _URB_CONTROL_GET_CONFIGURATION_REQUEST      UrbControlGetConfigurationRequest;

这里类似于 _URB_**** 的代码也是在USBDI.H预先定义的某种结构体类型,例如: _URB_CONTROL_GET_STATUS_REQUEST在USBDI.H中定义如下:

#ifdef OSR21_COMPAT
    struct _URB_HEADER;    
    struct _URB_HEADER Hdr;                 // function code indicates get or set.
    PVOID Reserved;
    ULONG Reserved0;
    ULONG TransferBufferLength;
    PVOID TransferBuffer;
    PMDL TransferBufferMDL;             // *optional*
    struct _URB *UrbLink;               // *optional* link to next urb request
                                        // if this is a chain of commands
    struct _URB_HCD_AREA hca;               // fields for HCD use
    USHORT Reserved1;
    UCHAR Index;
    UCHAR DescriptorType;
    USHORT LanguageId;
    USHORT Reserved2;



    IN OUT PURB  Urb, // 指向一个要初始化的URB首地址
    IN USHORT  Length, // 确定URB的长度
    IN UCHAR  DescriptorType, // 确定描述符类型
    IN UCHAR  Index, // Specifies the device-defined index of the descriptor that is to be retrieved
    IN USHORT  LanguageId, // Specifies the language ID of the descriptor to be retrieved when USB_STRING_DESCRIPTOR_TYPE is set in DescriptorType. This parameter must be zero for any other value in DescriptorType.
    IN PVOID  TransferBuffer  OPTIONAL, // 读回的描述符后所存放的地址
    IN PMDL  TransferBufferMDL  OPTIONAL, // Pointer to a resident buffer to receive the descriptor data or is NULL if an MDL is supplied in TransferBufferMDL.
    IN ULONG  TransferBufferLength, // Specifies the length of the buffer specified in TransferBuffer or described in TransferBufferMDL.
    IN PURB  Link  OPTIONAL // 必需为NULL

2. 发送URB





KEVENT event; //用于建立一个同步的IRP

KeInitializeEvent(&event, NotificationEvent, FALSE);


PIRP Irp = IoBuildDeviceIoControlRequest (IOCTL_INTERNAL_USB_SUBMIT_URB, pdx->LowerDeviceObject, NULL, 0, NULL, 0, TRUE, &event, &iostatus); //建立一个IOCTL

PIO_STACK_LOCATION stack = IoGetNextIrpStackLocation(Irp);

stack->Parameters.Others.Argument1 = (PVOID) urb;//将URB发送到特定地址

NTSTATUS status = IoCallDriver(pdx->LowerDeviceObject, Irp);

if (status == STATUS_PENDING)


KeWaitForSingleObject(&event, Executive, KernelMode,


status = iostatus.Status;


return status;


3.URB 的返回状态


NTSTATUS status = SendAwaitUrb(fdo, &urb);

USBD_STATUS ustatus = URB_STATUS(&urb);


4. 配置(configuration)

     总线驱动能够自动侦测新接入的USB设备,然后读取设备描述符来判断是那种类型的设备,设备描述符的vedor和product identifier域及其它一些描述符决定了需要导入的驱动。

    通常配置管理器会调用驱动的AddDevice函数,AddDevice会建立一个设备对象,并将其与驱动链接等等。配置管理器最终会向驱动发送一个IRP_MN_START_DEVICE Plug and Play 请求,这会使驱动调用一个名为StartDevice 的函数,其大体框架如下:




(PDEVICE_EXTENSION) fdo->DeviceExtension;

<configure device>




    假设USB设备的vendor ID为0x0547,product ID 为0x102A,那么设备接入时PnP管理器会寻找一个包含设备名为 USB\VID_0547&PID_102A的注册表入口,如果没有匹配的入口,那么PnP管理器会触发一个找到新硬件的向导,要求定位一个能描述这个设备的INF文件,根据INF文件向导会自动安装相应位置的驱动,并更新注册表。一旦PnP管理器实现了对注册表入口的定位,就可以动态装载驱动。


6. 读取配置描述符


ULONG iconfig = 0;

URB urb;





iconfig, 0, &tcd, NULL, sizeof(tcd), NULL);

SendAwaitUrb(fdo, &urb);

ULONG size = tcd.wTotalLength;



NonPagedPool, size);




iconfig, 0, pcd, NULL, size, NULL);

SendAwaitUrb(fdo, &urb);



7. 选择配置


PURB USBD_CreateConfigurationRequestEx( IN PUSB_CONFIGURATION_DESCRIPTOR ConfigurationDescriptor, IN PUSBD_INTERFACE_LIST_ENTRY InterfaceList );


ConfigurationDescriptor —— 指向一个配置描述符的指针,这个配置描述符包含了从USB设备获取的所有接口、端点、厂商和class-specific描述符。

InterfaceList —— Pointer to the first element in a variable-length array of USBD_INTERFACE_LIST_ENTRY structures

      感觉单纯的翻译不能无法说清基于WDM的USB驱动开发,因为这里面确实涉及到了WDM相当多的基础知识,理解上有很大困难。此外,文中只是一个USB驱动开发的大略步骤,并不能形成一个完整的实例以供参考,所以还是建议大家根据一个具体的驱动代码来分析。目前我也在反复的看 Programming the Microsoft Windows Driver Model 这本书的其它章节,一边翻译一边结合例子来看,有些吃力,不过坚持下来总会有所收获的。

       路漫漫其修远兮,吾将上下而求索~ 加油…..

WinDBG (Windows内核调试器原理浅析)【转】

WinDBG (Windows内核调试器原理浅析)【转】

  当WinDBG未被加载时KiDebugRoutine为KdpStub,处理也很简单,主要是对由int 0x2d引起的异常如DbgPrint、DbgPrompt、加载卸载SYMBOLS(关于int 0x2d引起的异常将在后面详细介绍)等,把Context.Eip加1,跳过int 0x2d后面跟着的int 0x3指令。
真正实现了WinDBG功能的函数是KdpTrap,它负责处理所有STATUS_BREAKPOINT和STATUS_SINGLE_STEP(单步)异常。STATUS_BREAKPOINT的异常包括int 0x3、DbgPrint、DbgPrompt、加载卸载SYMBOLS。DbgPrint的处理最简单,KdpTrap直接向调试器发含有字符串的包。DbgPrompt因为是要输出并接收字符串,所以先将含有字符串的包发送出去,再陷入循环等待接收来自调试器的含有回复字符串的包。SYMBOLS的加载和卸载通过调用KdpReportSymbolsStateChange,int 0x3断点异常和int 0x1单步异常(这两个异常基本上是内核调试器处理得最多的异常)通过调用KdpReportExceptionStateChange,这两个函数很相似,都是通过调用KdpSendWaitContinue函数。
  case DbgKdReadVirtualMemoryApi:
  case DbgKdReadVirtualMemory64Api:
  case DbgKdWriteVirtualMemoryApi:
  case DbgKdWriteVirtualMemory64Api:
  case DbgKdReadPhysicalMemoryApi:
  case DbgKdWritePhysicalMemoryApi:
  case DbgKdGetContextApi:
  case DbgKdSetContextApi:
  case DbgKdWriteBreakPointApi:
  case DbgKdRestoreBreakPointApi:
  case DbgKdReadControlSpaceApi:
  case DbgKdWriteControlSpaceApi:
  case DbgKdReadIoSpaceApi:
  case DbgKdWriteIoSpaceApi:
  case DbgKdContinueApi:
  if (NT_SUCCESS(ManipulateState.u.Continue.ContinueStatus) != FALSE) {
  return ContinueSuccess;
  } else {
  return ContinueError;
  case DbgKdContinueApi2:
  if (NT_SUCCESS(ManipulateState.u.Continue2.ContinueStatus) != FALSE) {
  return ContinueSuccess;
  } else {
  return ContinueError;
  case DbgKdRebootApi:
  case DbgKdReadMachineSpecificRegister:
  case DbgKdWriteMachineSpecificRegister:
  case DbgKdSetSpecialCallApi:
  case DbgKdClearSpecialCallsApi:
  case DbgKdSetInternalBreakPointApi:
  case DbgKdGetInternalBreakPointApi:
  case DbgKdGetVersionApi:
  case DbgKdCauseBugCheckApi:
  case DbgKdPageInApi:
  case DbgKdWriteBreakPointExApi:
  Status = KdpWriteBreakPointEx(&ManipulateState,
  if (Status) {
  ManipulateState.ApiNumber = DbgKdContinueApi;
  ManipulateState.u.Continue.ContinueStatus = Status;
  return ContinueError;
  case DbgKdRestoreBreakPointExApi:
  case DbgKdSwitchProcessor:
  KdPortRestore ();
  ContinueStatus = KeSwitchFrozenProcessor(ManipulateState.Processor);
  KdPortSave ();
  return ContinueStatus;
  case DbgKdSearchMemoryApi:
  KdpSearchMemory(&ManipulateState, &MessageData, ContextRecord);
  每次内核调试器接管系统是通过调用在KiDispatchException里调用KiDebugRoutine(KdpTrace),但我们知道要让系统执行到KiDispatchException必须是系统发生了异常。而内核调试器与被调试系统之间只是通过串口联系,串口只会发生中断,并不会让系统引发异常。那么是怎么让系统产生一个异常呢?答案就在KeUpdateSystemTime里,每当发生时钟中断后在HalpClockInterrupt做了一些底层处理后就会跳转到这个函数来更新系统时间(因为是跳转而不是调用,所以在WinDBG断下来后回溯堆栈是不会发现HalpClockInterrupt的地址的),是系统中调用最频繁的几个函数之一。在KeUpdateSystemTime里会判断KdDebuggerEnable是否为TRUE,若为TRUE则调用KdPollBreakIn判断是否有来自内核调试器的包含中断信息的包,若有则调用DbgBreakPointWithStatus,执行一个int 0x3指令,在异常处理流程进入了KdpTrace后将根据处理不同向内核调试器发包并无限循环等待内核调试的回应。现在能理解为什么在WinDBG里中断系统后堆栈回溯可以依次发现KeUpdateSystemTime->RtlpBreakWithStatusInstruction,系统停在了int 0x3指令上(其实int 0x3已经执行过了,只不过Eip被减了1而已),实际已经进入KiDispatchException->KdpTrap,将控制权交给了内核调试器。
  系统与调试器交互的方法除了int 0x3外,还有DbgPrint、DbgPrompt、加载和卸载symbols,它们共同通过调用DebugService获得服务。
  NTSTATUS DebugService(
  ULONG   ServiceClass,
  PVOID   Arg1,
  PVOID   Arg2
  NTSTATUS    Status;
  __asm {
  mov     eax, ServiceClass
  mov     ecx, Arg1
  mov     edx, Arg2
  int     0x2d
  int     0x3  
  mov     Status, eax
  return Status;}
  ServiceClass可以是BEAKPOINT_PRINT(0x1)、BREAKPOINT_PROMPT(0x2)、BREAKPOINT_LOAD_SYMBOLS(0x3)、BREAKPOINT_UNLOAD_SYMBOLS(0x4)。为什么后面要跟个int 0x3,M$的说法是为了和int 0x3共享代码(我没弄明白啥意思-_-),因为int 0x2d的陷阱处理程序是做些处理后跳到int 0x3的陷阱处理程序中继续处理。但事实上对这个int 0x3指令并没有任何处理,仅仅是把Eip加1跳过它。所以这个int 0x3可以换成任何字节。
int 0x2d和int 0x3生成的异常记录结(EXCEPTION_RECORD)ExceptionRecord.ExceptionCode都是STATUS_BREAKPOINT(0x80000003),不同是int 0x2d产生的异常的ExceptionRecord.NumberParameters>0且ExceptionRecord.ExceptionInformation对应相应的ServiceClass比如BREAKPOINT_PRINT等。事实上,在内核调试器被挂接后,处理DbgPrint等发送字符给内核调试器不再是通过int 0x2d陷阱服务,而是直接发包。用M$的话说,这样更安全,因为不用调用KdEnterDebugger和KdExitDebugger。
  typedef struct _KD_PACKET {
  ULONG PacketLeader;
  USHORT PacketType;
  USHORT ByteCount;
  ULONG PacketId;
  ULONG Checksum;
  记得我以前问过jiurl为什么WinDBG的单步那么慢(相对softICE),他居然说没觉得慢?*$&$^$^(&(&(我ft。。。现在可以理解为什么WinDBG的单步和从操作系统正常执行中断下来为什么那么慢了。单步慢是因为每单步一次除了必要的处理外,还得从串行收发包,怎么能不慢。中断系统慢是因为只有等到时钟中断发生执行到KeUpdateSystemTime后被调试系统才会接受来自WinDBG的中断包。现在我们研究一下为什么在KiDispatchException里不能下断点却可以用单步跟踪KiDispatchException的原因。如果在KiDispatchException中某处下了断点,执行到断点时系统发生异常又重新回到KiDispatchException处,再执行到int 0x3,如此往复造成了死循环,无法不能恢复原来被断点int 0x3所修改的代码。但对于int 0x1,因为它的引起是因为EFLAG寄存中TF位被置位,并且每次都自动被复位,所以系统可以被继续执行而不会死循环。现在我们知道了内部机制,我们就可以调用KdXXX函数实现一个类似WinDBG之类的内核调试器,甚至可以替换KiDebugRoutine(KdpTrap)为自己的函数来自己实现一个功能更强大的调试器,呵呵。
  0x1:    单步陷阱处理程序
  0x2:    NMI不可屏蔽中断
  0x3:    调试陷阱处理程序
  0x6:    无效操作码陷阱处理程序
  0xb:    段不存在陷阱处理程序
  0xc:    堆栈错误陷阱处理程序
  0xd:    一般保护性错误陷阱处理程序
  0xe:    页面错误陷阱处理程序
  0x2d:    调试服务陷阱处理程序
  0x2e:    系统服务陷阱处理程序
  0x31:    8042键盘控制器中断处理程序
  0x33:    串口2(Com2)中断处理程序
  0x34:    串口1(Com1)中断处理程序
  0x37:    并口中断处理程序
  0x3c:    PS/2鼠标中断处理程序
  0x41:    未使用
  其中关键是替换了0x3 调试陷阱处理程序和0x31 i8042键盘中断处理驱动程序(键盘是由i8042芯片控制的),SoftICE从这两个地方获取系统的控制权。
而处理int 0x3也差不多,若没有激活控制台则先激活并屏蔽除了键盘、鼠标及8259A-2中断控制器外的所有中断,然后进入那段循环代码。
  作为对比同样来看一下在SoftICE里处理int 0x3和单步的过程。当执行到int 0x3时,激活控制台并屏蔽中断,然后将int 0x3指令前后范围的指令反汇编并写入显存映射地址空间,并把最新的寄存器值也写进去,最后在后台循环等待键盘输入命令。当命令是F10时,设置好EFLAG的TF位,清除8259A中断控制器里的中断屏蔽寄存器,开放所有中断,将控制台清除,从循环代码中返回新键盘(或int 0x3)中断处理程序,然后再返回到正常键盘(或int 0x3)中断处理程序,由这里iret到被中断代码处执行。执行了一个指令后因为发生单步异常又进入后台循环代码。