Device Memory, MMIO and SSE instruction set

Introduction

- This blogpost is going to summarized my recent research and idea between PCI Device, MMIO, SSE instruction set. Recently, I have a research topic that require to access and copy the device memory (BAR memory). For my previous knowledge , BAR memory is actually as same as the physical memory, but somehow it is mapped to the devices.

However, I changed my thought after I met a problem when coping device memory by kernel optimized memcpy. I used memcpy for memory-mapped space, it all return 0xFF, however, it works great I copy it by normal pointer, it stimulate me to dig inside, what's happening.

PCI Device

A device that meet PCI specification, in PCI device spec, there's one great point that enable you could easier communicate with the device, it called Mapped Memory I/O, MMIO. 

PCI device has it's own device memory and register inside the hardware, for the register is always responsible for communication from external device , like the device driver, such as, SPI Controller, Keyboard, USB Controller, etc. For the register, it should be 1,2,4,8byte size.

Mapped Memory I/O (MMIO) and Base Address Register(BAR)

MMIO provides a convenient approach for accessing and communicating with device, but how does it work? what behind that? In PCI spec, BAR provide an ability that can intercept the address decoding process by attaching onto address bus, after BIOS programmed the BAR for each device, they would have a memory window for memory base access. It's interesting that the reserved physical memory space is no more useful, and it is completely transparent to the OS, considering the following instruction that talk to device

mov [mmio_space] , 0x1

It's very simple instruction that writing 0x1 into mmio_space, however, by assumption that mmio_space is pointing to the BAR,  it isn't magically mapped the actual memory to device register, instead, it is redirection, when address decoder meet BAR device address, found the I/O device number, it won't send the address to memory controller, however, it send to I/O controller of the device. 

In summary, when we access the MMIO address via mov, it's actually not moving the memory, it's a different approach that send I/O request to device instead of using IN / OUT instruction, this makes faster and easier for kernel programmer.

SSE2 Instruction

It's optimized instruction for moving memory at once, it is SIMD based instruction set, it support at minimal size at 128-bit , and maximum at 512-bit at once.

Conflict

As far as I know, the device's register could either be 1, 2, 4, 8 byte width, if you request the data size over the width, it may turn out an unexpected result, for my case, using optimized version of memcpy, that read 128-bit at once by SSE2 instruction, it returned all 0xFF for me. 

Reference


Comments

Popular posts from this blog

Android Kernel Development - Kernel compilation and Hello World

How does Nested-Virtualization works?

Understanding ACPI and Device Tree