beagleboard-xm research(2)–u-boot

1, Build

From http://code.google.com/p/beagleboard/wiki/BeagleSourceCode, download u-boot 1.3.3 for beagleboard

If you uses latest ARM gcc from codesourcery, you maybe get following error

arm-none-linux-gnueabi-gcc -g  -Os   -fno-strict-aliasing  -fno-common -ffixed-r8 -msoft-float  -D__KERNEL__ -DTEXT_BASE=0x80e80000 -I/home/ken/bb/u-boot/u-boot-beagle/include -fno-builtin -ffreestanding -nostdinc -isystem /opt/sourcery_g++/bin/../lib/gcc/arm-none-linux-gnueabi/4.3.3/include -pipe  -DCONFIG_ARM -D__ARM__ -march=armv7a  -Wall -Wstrict-prototypes -c -o hello_world.o hello_world.c
hello_world.c:1: error: bad value (armv7a) for -march= switch

This issue is caused by latest GCC changing for ARMV7-A architecture, that should uses -march=armv7-a but not -march=armv7a.

To fix it, in u-boot\cpu\omap3\config.mk, change following line:

PLATFORM_CPPFLAGS += -march=armv7a

To:

PLATFORM_CPPFLAGS += -march=armv7-a

Although success to build uboot.bin image, but beagleboard-xm fail to boot it:

a) the serial baudrate is changed from 115200 to 57600

b) system hang after find no NAND memory.

But the u-boot image built from git mainline git://git.denx.de/u-boot.git with omap3 patch can work correctly, please reference http://www.elinux.org/BeagleBoard

   1: git clone git://git.denx.de/u-boot.git u-boot-main

   2: cd u-boot-main

   3: git checkout --track -b omap3 origin/master

Build

   1: make CROSS_COMPILE=arm-none-linux-gnueabi- mrproper

   2: make CROSS_COMPILE=arm-none-linux-gnueabi- omap3_beagle_config

   3: make CROSS_COMPILE=arm-none-linux-gnueabi- 

As mentioned by previous discussion, u-boot.bin is loaded into the first of internal SDRAM at address 0x80008000. So in uboot\board\ti\beagle\config.mk:

   1: #

   2: # Physical Address:

   3: # 8000'0000 (bank0)

   4: # A000/0000 (bank1)

   5: # Linux-Kernel is expected to be at 8000'8000, entry 8000'8000

   6: # (mem base + reserved)

   7:  

   8: # For use with external or internal boots.

   9: CONFIG_SYS_TEXT_BASE = 0x80008000

CONFIG_SYS_TEXT_BASE as macro passed into build options as:

arm-none-linux-gnueabi-gcc   -D__ASSEMBLY__ -g  -Os   -fno-common -ffixed-r8 -msoft-float   -D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0x80008000 -I/home/ken/bb/u-boot/u-boot-mailine/include -fno-builtin -ffreestanding -nostdinc -isystem /opt/sourcery_g++/bin/../lib/gcc/arm-none-linux-gnueabi/4.3.3/include -pipe  -DCONFIG_ARM -D__ARM__ -marm  -mabi=aapcs-linux -mno-thumb-interwork -march=armv5   -o start.o start.S –c

(BTW: there is some interesting compiler options used: –ffreestanding, –isystem, –mabi=aapcs-linux)

It is worth to mention that uboot will keep some information into a global_data in top of stack, the structure is defined in uboot\arch\arm\include\asm\global_data.h:

   1: typedef    struct    global_data {

   2:     bd_t        *bd;

   3:     unsigned long    flags;

   4:     unsigned long    baudrate;

   5:     unsigned long    have_console;    /* serial_init() was called */

   6:     unsigned long    env_addr;    /* Address  of Environment struct */

   7:     unsigned long    env_valid;    /* Checksum of Environment valid? */

   8:     unsigned long    fb_base;    /* base address of frame buffer */

   9: #ifdef CONFIG_VFD

  10:     unsigned char    vfd_type;    /* display type */

  11: #endif

  12: #ifdef CONFIG_FSL_ESDHC

  13:     unsigned long    sdhc_clk;

  14: #endif

  15: #ifdef CONFIG_AT91FAMILY

  16:     /* "static data" needed by at91's clock.c */

  17:     unsigned long    cpu_clk_rate_hz;

  18:     unsigned long    main_clk_rate_hz;

  19:     unsigned long    mck_rate_hz;

  20:     unsigned long    plla_rate_hz;

  21:     unsigned long    pllb_rate_hz;

  22:     unsigned long    at91_pllb_usb_init;

  23: #endif

  24: #ifdef CONFIG_ARM

  25:     /* "static data" needed by most of timer.c on ARM platforms */

  26:     unsigned long    timer_rate_hz;

  27:     unsigned long    tbl;

  28:     unsigned long    tbu;

  29:     unsigned long long    timer_reset_value;

  30:     unsigned long    lastinc;

  31: #endif

  32:     unsigned long    relocaddr;    /* Start address of U-Boot in RAM */

  33:     phys_size_t    ram_size;    /* RAM size */

  34:     unsigned long    mon_len;    /* monitor len */

  35:     unsigned long    irq_sp;        /* irq stack pointer */

  36:     unsigned long    start_addr_sp;    /* start_addr_stackpointer */

  37:     unsigned long    reloc_off;

  38: #if !(defined(CONFIG_SYS_NO_ICACHE) && defined(CONFIG_SYS_NO_DCACHE))

  39:     unsigned long    tlb_addr;

  40: #endif

  41:     void        **jt;        /* jump table */

  42:     char        env_buf[32];    /* buffer for getenv() before reloc. */

  43: } gd_t;

The structure size maybe different according to configure macros, so at beginning of build, a script is used to calculate current size of global data:

   1: arm-none-linux-gnueabi-gcc -DDO_DEPS_ONLY \

   2:         -g  -Os   -fno-common -ffixed-r8 -msoft-float   -D__KERNEL__ -DCONFIG_SYS_TEXT_BASE=0x80008000 -I/home/ken/bb/u-boot/u-boot-mailine/include -fno-builtin -ffreestanding -nostdinc -isystem /opt/sourcery_g++/bin/../lib/gcc/arm-none-linux-gnueabi/4.3.3/include -pipe  -DCONFIG_ARM -D__ARM__ -marm  -mabi=aapcs-linux -mno-thumb-interwork -march=armv5 -Wall -Wstrict-prototypes -fno-stack-protector   \

   3:         -o lib/asm-offsets.s lib/asm-offsets.c -c -S

   4: Generating include/generated/generic-asm-offsets.h

   5: tools/scripts/make-asm-offsets lib/asm-offsets.s include/generated/generic-asm-offsets.h

In EFI, there is similar design that put PeiCore’s private data at top of stack as global data.

2, Memory Map

0x9fff0000  ~  TLB table

0x9ff7f000 ~ 0x9fff0000 : Reserved for U-boot (449K)

0x9ff1f000 ~ 0x9ff7f000: for malloc(384k)

0x9ff1efe0 ~ 0x9ff1f000: board info (32 bytes)

0x9ff1ef68 ~ 0x9ff1efe0: global data (120 bytes)

0x9ff1ef68: New stack point

0x80008000                              reset vector

0x8007020 ~0x80008028      interrupt vectors

0x8000100 : Linux boot parameters

SDRAM #1

0x4020FF80 ~ 0x40210000  global_data

0x4020F800 ~0x4020FF80   stack

 

2, Workflow

  1. uboot\arch\cpu\armv7\start.S
    • Like x-load, start.S provide the first assemble loader for u-boot
    • The first instruction is reset vector and the interrupt/exception handle are closed to it. As system.map file:
80008000 T _start

80008020 t _undefined_instruction

80008024 t _software_interrupt

80008028 t _prefetch_abort

8000802c t _data_abort

80008030 t _not_used

80008034 t _irq

80008038 t _fiq

    • Switch CPU to SVC32 mode.
   1: mrs    r0, cpsr

   2: bic    r0, r0, #0x1f

   3: orr    r0, r0, #0xd3

   4: msr    cpsr,r0

    • Copy interrupt vectors to ROM indirect address: 0x4020F800
    • Because beagleboard-xm does not have NAND/OneNand device, so need copy DPLL initialize code into ROM indirect address after interrupt vectors
    • Init CPU in assemble like x-load:
      • Setup important registers: mmu, cache
      • Setup memory timing.
    • Setup stack for C code at 0x4020FF80 (uboot\include\configs\omap3_beagle.h):
   1: #define CONFIG_SYS_INIT_RAM_ADDR    0x4020f800

   2: #define CONFIG_SYS_INIT_RAM_SIZE    0x800

   3: #define CONFIG_SYS_INIT_SP_ADDR        (CONFIG_SYS_INIT_RAM_ADDR + \

   4:                      CONFIG_SYS_INIT_RAM_SIZE - \

   5:                      GENERATED_GBL_DATA_SIZE)

 

As above mentioned, before top of stack, global_data will be stored, the size of global_data is determined/generated at build time (include\generated\generic-asm-offices.h)

   1: #define GENERATED_GBL_DATA_SIZE (128) /* (sizeof(struct global_data) + 15) & ~15 */

    • Call C function board_init_f (uboot\arch\arm\lib\board.c):
      • Assign/Init global data structure at 0x4020F800
      • Disable memory I/O cache for compiler optization, just like MemoryFence() used in edk2 MdePkg:
   1: __asm__ __volatile__("": : :"memory");

Because many hardware intialization or I/O accessing will use write/read same MMIO address, the compiler maybe optimizate these code out or re-arrange read/write sequence, so it will break. Above instruction like asm volatitle used.

      • call all function defined in init_sequence array:
        • timer_init (arch/arm/cpu/armv7/omap-common/timer.c)
          • here used GPTIMER2 (there are 12 GP time in OMAP3), which base adress is 0x49032000
        • Initialize environment, because no NAND, so the environment is relocated to RAM as in arch\arm\include\asm\global_data.h
   1: #define    GD_FLG_RELOC        0x00001    /* Code was relocated to RAM        */

            By default, some configuration value comes from global variable default_environment in common\env_common.c such as baudrate.

        • serial initliazation

beagleboard-xm use NS16650 serial at COM3 0x49020000, datasheet at http://www.national.com/ds/PC/PC16550D.pdf

        • Init stage1 console for print
        • Print CPU/board information
        • Init I2C device.
        • Init SDRAM device, caculate the bank’s size.
      • Reserve RAM memory for u-boot at top of RAM1 started from 0x80000000
      • Relocate code for new location at 0x9ff7f000/stack at 0x9ff1ef60 (arch\arm\cpu\armv7\start.S, relocate_code())
      • Jump to board_init_r() in new location in RAM, (The sequence is very like PeiCore relocation in EFI)
      • In beaglboard’s board_init_r() (board\ti\beagle\beagle.c):
        • Init GPMC
        • set board id for linux as 1546
        • set boot parameter address at 0x80000100
        • Init MMC driver
        • Init stdio drivers such as serial, nulldev
        • Init jumptable?
        • Evalute board version, for beagleboard-xm board, set VAUX2 to 1.8v for EHCI PHY. And print DIE ID at 0x4830A200
        • Init IRQ/FIQ stack which size are all 4K
        • Change CPSR to enable interrupt
        • Enter main_loop() function to read boot/user script …

beagleboard-xm research(1) — Initialization & x-load

1, General boot process and device

The initialization process for OMAP Dm37x beagleboard:

  • Preinitialization
  • Power/clock/reset ramp sequence
  • Boot ROM
  • Boot Loader
  • OS/application

Six external pins(sys_boot[5:0]) are used to select interfaces or devices for booting. The interfaces are GPMC, MMC1, MMC2, USB and UART.

The ROM code has two booting functions: peripheral booting and memory booting:

  • In peripheral booting, the ROM code pools a selected communication interface such as UART or USB, downloads the executable code over the interface, and execute it in internal SRAM. Downloaded software from an external host can be used to program flash memories connected to the device.
  • In memory booting, the ROM code finds bootstrap in permanent memories such as flash memory or memory cards and executes it. The process is normally performed after cold or warm device reset.

Overall boot sequence is as follows:

Following is 32K SRAM memory map of GP device, which is used only during the booting process.

Beagleboard-xm is OMAP3, so use 64K SRAM which range is 40200000-4020FFFF.

2, Boot from MMC/SD card

In general, beagleboard-xm uses memory booting from MMC/SD card, because this board does not have NAND device. There are some limitations as follows:

  • Supports MMC/SD cards compliant with the Multimedia Card System Specification v4.2 from the MMCA Technical Committee and the SD I/O Card Specification v2.0 from the SD Association. Includes high-capacity (size >2GB) cards: HC-SD and HC MMC
  • 3-V power supply, 3-V I/O and 1.8-V I/O voltages on port 1
  • Supports eMMC/eSD (1.8-V I/O voltage and 3.0-V Core voltage) on port 2. The external transceiver mode on port 2 is not supported.
  • Initial 1-bit MMC mode, 4-bit SD mode
  • Clock frequency:
                –   Identification mode: 400 kHz
                –   Data transfer mode: 20 MHz
  • Only one card connected to the bus 
  • Raw mode, image data read directly from card sectors 
  • FAT12/16/32 support, with or without a master boot record (MBR). 
  • For a FAT (12/16/32)-formatted memory card, the booting file must not exceed 128 KB.
  • For a raw-mode memory card, the booting image must not exceed 128 KB.  

The image used by the booting procedure is taken from a booting file named MLO. This file must be in the root directory on an active primary partition of type FAT12/16 or FAT32.

An MMC/SD card can be configured as floppy-like or hard-drive-like:

  • When acting like a floppy, the content of the card is a single FAT12/16/32 file system without an MBR holding a partition table.
  • When acting like a hard drive, an MBR is present in the first sector of the card. This MBR holds a table of partitions, one of which must be FAT12/16/32, primary, and active.

3, MLO image format

For a GP device, the image is simple and must contain a small header having the size of the software to load and the destination address of where to store it when a booting device is other than XIP. The XIP device image is even simpler and starts with executable code.



4, x-load

4.1 Why uses x-load

As above mentioned, the SRAM in beagleboard-xm is very tiny as 64K, the u-boot image size is almost 196K, so beagleboard-xm can not use u-boot as MLO. The x-load is used here, which can be considered as u-boot loader, and it’s size is around 24K.

4.2 How to build x-load

  • Get mainline x-load source code from

git clone git://gitorious.org/x-load-omap3/mainline.git

make CROSS_COMPILE=arm-none-linux-gnueabi- omap3530beagle_config

make CROSS_COMPILE=arm-none-linux-gnueabi-
       Although beagleboard-xm use DM3735 process, there is updated config file in x-load mainline’s tree. So it is ok for reuse omap3530beagle_config file.

  • Generate MLO file

After building, x-load.bin is generated as raw executable binary. As above mentioned about non-XIP image format, the size and address should be added at image’s first 16 bytes. So use signGP scipt to do it. The source code of signGP is http://beagleboard.googlecode.com/files/signGP.c

4.3 x-load’s research

4.3.1 memory map

In beagleboard-xm, ROM code will load x-load binary into SRAM (0x4020000 ~ 0x4020FFFF) 64K range. The range is allocated as follows:

Runtime stack: 0x4020000 ~ 0x40207FFF

MLO                : 0x4020800 ~ 0x4020FFFF

please reference board\omap3530beagle\config.mk for TEXT_BASE setting:

   1: # For XIP in 64K of SRAM or debug (GP device has it all availabe)

   2: # SRAM 40200000-4020FFFF base

   3: # initial stack at 0x4020fffc used in s_init (below xloader).

   4: # The run time stack is (above xloader, 2k below)

   5: # If any globals exist there needs to be room for them also

   6: TEXT_BASE = 0x40200800

Please reference cpu\omap3\start.S for stack pointer setting:

   1: /* Set up the stack                            */

   2: stack_setup:

   3:     ldr    r0, _TEXT_BASE        /* upper 128 KiB: relocated uboot   */

   4:     sub    sp, r0, #128        /* leave 32 words for abort-stack   */

   5:     and    sp, sp, #~7        /* 8 byte alinged for (ldr/str)d    */

Because the x-load is non-XIP code, so TEXT_BASE is passed to compiler:

arm-none-linux-gnueabi-gcc -Wa,-gstabs -D__ASSEMBLY__ -g  -Os   -fno-strict-aliasing  -fno-common -ffixed-r8  -D__KERNEL__ -DTEXT_BASE=0x40200800 -I/home/ken/bb/x-load/mainline/include -fno-builtin -ffreestanding -nostdinc -isystem /usr/lib/gcc/i486-linux-gnu/4.4.3/include -pipe  -DCONFIG_ARM -D__ARM__ -march=armv7-a  -c -o cpu/omap3/start.o /home/ken/bb/x-load/mainline/cpu/omap3/start.S

 

4.3.2 startup process

  1. The boot is started from cpu\omap3\start.S and the first instruction is reset vector.
  2. set cpu mode to Supervisor (SVC) 32 bit mode.
  3. Copy vectors to indirect address 0x4020F800 (SRAM_OFFSET0 + SRAM_OFFSET1 + SRAM_OFFSET2)
  4. relocates clock code into SRAM where its safer to execute
  5. Initialize CPU
    1. Invalidate instruction, L2 cache, and invalidate TLBs, disable MMU
    2. Initialize SRAM stack at 0x4020FFFC, so can use C code now.
    3. In C code s_init do some early initialization such watchdog,  configure SDRAM.
  6. Relocate code section
  7. Set runtime stack
  8. Clear bss section for uninitialization value.
  9. Jump to C code start_armboot().
    1. Initialize the serial device
    2. print version information like
      Texas Instruments X-Loader 1.4.4ss
    3. Initialize I2C which base address is 0x48070000 in L4 core.
    4. reading GPIO173, 172, 171 to determin the version of beagleboard then print it, for beagleboard-xm board, the value should be 0, 0, 0
    5. Initialize MMC card and load u-boot.bin from MMC card into pop SDRAM 0x80008000
      1. If no MMC found, try to boot from onenand or nand, but beagleboard-xm does not has these devices
      2. try to boot from serial ……
    6. Jump to 0x80008000, over for x-load.

4.3.3 x-load vs EFI’s SEC phase

So we can see the x-load is very like SEC phase in UEFI specification, in Intel’s tiano implementation, the SEC phase mainly:

  • Enter protect mode
  • Prepare early C stack in CAR, the CAR is instruction cache in process for temporary stack/heap, just like Omap’s internal SRAM for boot phase. Unlike SRAM, the CAR will be disabled/destroyed after SEC phase.
  • Initialize CPU such as MTRR for flash range.
  • Initialize early ACPI timer for performance collection.
  • Find the PeiCore from flash and shadow into CAR for PEI phase.

5, Reference
=======
1) DM37x Multimedia Device Silicon Revision 1.x

关于ARM指令集



众所周知.Intel通过发布新版的”多媒体指令集”领跑X86处理器.就是SSE啦..
一旦相关厂商未能及时跟上Intel的脚步,那么最新版的应用程序就无法使用指令集接口做优化…
而我们所看到的ARM处理器也是有指令集的.
同理可得.如果使用的处理器指令集更新了,却不对应用软件做新指令集的优化则无法得到真正的性能提升…
为什么在这里说这个问题.主要是由于嵌入式系统的开发,都必须要过这个坎….选指令集….
通常一个小公司一旦选择择某个指令集的就一直会按照那个指令集走下去…为什么?因为重写代码的成本是很高的.而且软件编译器也得重新开发,否则第三方软件编译之后无法更有效率的运行在使用新指令集的CPU上.
我们可以注意到…iphone是跨指令集的(v6和v7).因为曾经使用过ARM11架构(v6指令集),而3GS是cortex A8(V7指令集),所以个人认为从3G到3Gs的开发是相当费钱的一步..
______________________________________________________________
当然google的系统和nokia的系统也都是跨指令集的….nokia是元老级的,而google的情况和apple差不多.
nokia每次要跨指令集通常都会发布新的FP补丁包过渡(比如说S603rd FP2),所以会显得比较明显.而且最糟的是nokia在兼容性方面做的相当差(也可能是故意的),S60v2就和S60v3的软件完全不兼容,v2支持v5te和v4t指令集的CPU,而v3支持v5te和v6指令集的CPU,甚至非常遗憾的说S60系统现在不支持V7指令集,这也是为什么nokia要特地生产一款linux终端来用cortex A8.
当然,这也可能未必是由于nokia实力太差.而是,兼容性确实很难兼顾.
如果苹果也是和nokia一样从v4t指令集开始做,它就能保证一路走下来全部兼容吗?
google目前似乎更倾向支持v6指令集的CPU,以换取更统一的平台(nexus one的ARM11CPU)
________________________________________________________________
这么说起来,是否支持A8的优劣就出来了.
支持A8,开发成本会上升(且在频率没有拉升之前无法看到性能提升),开发周期会延长,但会便于今后A9处理器的无痛过渡.可以扩大OS的使用范围,因为还有预留的性能提高空间.
不支持A8,节约开发成本,保持兼容性,缩短开发周期,降低开发难度.但会缩小终端的使用范围,因为性能已接近上限.
这在GPGPU未雨绸缪的时候显得非常敏感,如果GPGPU可以解决CPU性能瓶颈,那么便携式设备是否还会对CPU的速度如此渴求呢?也许A4处理的真实架构可以为我们揭开apple的想法…
tips:RTOS为实时系统realtime os,这个通常不会出现手机平台上,而只会出现在一些对可靠性要求极高的设备上(医疗装置,生命维持装置).
      而红色的platform OS就是我们的手机系统或者机顶盒一类的嵌入式系统了.

我的beagleboard-xm到货拉

好久没玩过OMAP了,绕来绕去这几年最终又从Intel的架构绕回来了ARM,世界变化真快啊