How To Use Registers In Embedded C

When working with peripherals, we need to be able to read and write to the device'southward internal registers. How nosotros accomplish this in C depends on whether we're working with memory-mapped IO or port-mapped IO. Port-mapped IO typically requires compiler/language extensions, whereas memory-mapped IO can be accommodated with the standard C syntax.

Embedded "Hello, World!"

We all know the embedded equivalent of the "Hello, world!" programme is flashing the LED, so true to form I'm going to use that as an case.

The examples are based on a STM32F407 flake using the GNU Arm Embedded Toolchain .

The STM32F4 uses a port-based GPIO (General Purpose Input Output) model, where each port can manage 16 physical pins. The LEDS are mapped to external pins 55-58 which maps internally onto GPIO Port D pins 8-11.

Flashing the LEDs

Flashing the LEDs is fairly straightforward, at the port level in that location are only two registers we are interested in.

Mode Annals – this defines, on a pin-by-pin basis what its function is, e.g. we want this pin to carry as an output pin.
Output Information Register – Writing a '1' to the appropriate pivot will generate voltage and writing a '0' will ground the pin.

Mode Register (MODER)

Each port pin has four modes of operation, thus requiring ii configuration bits per pin (pin 0 is configured using fashion bits 0-1, pin 2 uses mode bits two-3, so on):

00 Input
01 Output
10 Alternative function (details configured via other registers)
eleven Analogue

So, for example, to configure pivot 8 for output, we must write the value 01 into $.25 16 and 17 in the MODER register (that is, flake xvi => i, scrap 17 => 0).

Output Data Annals (ODR)

In the Output Data Annals (ODR) each fleck represents an I/O pin on the port. The bit number matches the pin number.

If a pin is set to output (in the MODER register) then writing a 1 into the advisable bit will drive the I/O pin high. Writing 0 into the appropriate bit will drive the I/O pin low.

There are xvi IO pins, but the annals is 32bits wide. Reserved $.25 are read as '0'.

Port D Addresses

The accented addresses for the MODER and ODR of Port D are:

MODER – 0x40020C00
ODR – 0x40020C14

Pointer access to registers

Typically when nosotros access registers in C based on memory-mapped IO we apply a arrow note to 'trick' the compiler into generating the correct load/shop operations at the absolute address needed.

Then for the Port D nosotros might run into something along the lines of (I'll continue the code brief and use magic numbers) for simplicity):

          #include <stdint.h>  volatile uint32_t* const portd_moder   = (uint32_t*) 0x40020C00; volatile uint32_t* const portd_odr     = (uint32_t*) 0x40020C14;  extern void slumber(uint32_t ms); // use systick to busy-await  int main(void) {   uint32_t moder = *portd_moder;   moder |= (1 << xvi);   moder &= ~(one << 17);   *portd_moder = moder;    while(1) {     *portd_odr |= (1 << 8);   // led-on     slumber(500);     *portd_odr &= ~(i << eight);  // led-off     sleep(500);   } }

Alternatively we may see the registers defined using the pre-processors, e.yard.

          #include <stdint.h>  #define PORTD_MODER   (*((volatile uint32_t*) 0x40020C00)) #ascertain PORTD_ODR     (*((volatile uint32_t*) 0x40020C14))  extern void sleep(uint32_t ms); // use systick to busy-wait  int principal(void) {   uint32_t moder = PORTD_MODER;   moder |= (1 << sixteen);   moder &= ~(i << 17);   PORTD_MODER = moder;    while(1) {     PORTD_ODR |= (1 << 8);  // led-on     sleep(500);     PORTD_ODR &= ~(1 << 8); // led-off     sleep(500);   } }

There is a misconception amongst many C programmers that the pointer model is less efficient than the #define model. With C99 and mod compilers this is not the case, they will generate identical code (C99 allows for the complier to optimise away const objects).

Enabling Port D

We are missing one final step; each peripheral on the the STM32F407 is clock gated. The clock point does not accomplish the peripheral until we tell it to do then by fashion of setting a scrap in a specific register. Past default, clock signals never reach peripherals that are not in apply, thus saving power.

To enable the clock to reach the GPIO port D the GPIODEN (GPIO D Enable) bit (chip 3) of the AHB1ENR (AMBA High-operation Omnibus 1 Enable) register in the RCC (Reset and Clock Command) peripheral needs setting.

          #include <stdint.h>  volatile uint32_t* const portd_moder   = (uint32_t*) 0x40020C00; volatile uint32_t* const portd_odr     = (uint32_t*) 0x40020C14;  volatile uint32_t* const rcc_ahb1enr   = (uint32_t*) 0x40023830;  extern void sleep(uint32_t ms); // use systick to busy-wait  int main(void) {   *rcc_ahb1enr |= (1 << 3);     // enable PortD'south clock    uint32_t moder = *portd_moder;   moder |= (1 << 16);   moder &= ~(1 << 17);   *portd_moder = moder;    while(one) {     *portd_odr |= (1 << 8);     // led-on     sleep(500);     *portd_odr &= ~(1 << viii);    // led-off     sleep(500);   } }

Using structs

The code and then far works simply fine, just has a number of shortcomings.

First, to back up multiple IO ports we would accept to define a prepare of pointers for each set of registers for each port, e.g.:

                      volatile uint32_t* const porta_moder   = (uint32_t*) 0x40020000; volatile uint32_t* const porta_odr     = (uint32_t*) 0x40020014;  volatile uint32_t* const portb_moder   = (uint32_t*) 0x40020400; volatile uint32_t* const portb_odr     = (uint32_t*) 0x40020414;  volatile uint32_t* const portc_moder   = (uint32_t*) 0x40020800; volatile uint32_t* const portc_odr     = (uint32_t*) 0x40020014;  volatile uint32_t* const portd_moder   = (uint32_t*) 0x40020C00; volatile uint32_t* const portd_odr     = (uint32_t*) 0x40020C14;  volatile uint32_t* const porte_moder   = (uint32_t*) 0x40021000; volatile uint32_t* const porte_odr     = (uint32_t*) 0x40021014;

Considering the port actually has 10 dissimilar registers nosotros may desire to access, this involves a lot of repetition. Where there is repetition, uncomplicated to make, only difficult to track down bugs can pitter-patter in (did you spot the deliberate mistake?).

In improver, and more significantly, we can see that the port's ODR is ever 0x14 bytes outset from the MODER. The MODER is always at offset 0x00 from the port address (this the MODER is also the port'south base address).

In Software Engineering science terms we'd view this separate annunciation of related pointers
as a lack of cohesion in the code. I of our goals is to strive for high cohesion, thus grouping things together that should naturally be together (as change furnishings them all).

struct Overlay

The full register layout for the STM32F4 GPIO port is shown below:

Past using a struct to define the relative memory offsets, we can get the compiler to generate all the correct accost accesses relative to the base of operations address.

          #include <stdint.h>  typedef struct {   uint32_t MODER;   // way annals,                     get-go: 0x00   uint32_t OTYPER;  // output blazon register,              start: 0x04   uint32_t OSPEEDR; // output speed register,             start: 0x08   uint32_t PUPDR;   // pull-upwardly/pull-down register,        outset: 0x0C   uint32_t IDR;     // input data register,               offset: 0x10   uint32_t ODR;     // output data register,              first: 0x14   uint32_t BSRR;    // scrap set/reset register,            commencement: 0x18   uint32_t LCKR;    // configuration lock register,       kickoff: 0x1C   uint32_t AFRL;    // GPIO alternating part registers, offset: 0x20   uint32_t AFRH;    // GPIO alternate function registers, offset: 0x24 } GPIO_t;

Now we define the pointer equally before, but this time using the struct type rather than a uint32_t:

          volatile GPIO_t*   const portd       = (GPIO_t*)0x40020C00;

Finally we can utilize it equally before, simply this time utilize struct-pointer dereferencing to access the individual registers:

          int chief(void) {   *rcc_ahb1enr |= (1 << 3); // enable PortD'south clock    uint32_t moder = portd->MODER;   moder |= (one << 16);   moder &= ~(1 << 17);   portd->MODER = moder;    while (1) {     portd->ODR |= (one << 8);  // led-on     slumber(500);     portd->ODR &= ~(i << 8); // led-off     sleep(500);   } }

At present when we access the ODR via the statement:

          portd->ODR |= (i << 8);     // led-on

the compiler can calculate the relative offset (0x14) of the ODR member relative to the base of operations address of the arrow (0x40020C00).

This ways that we simply need ane pointer per port rather than ten, e.g.

          volatile GPIO_t* const   porta       = (GPIO_t*)0x40020000; volatile GPIO_t* const   portb       = (GPIO_t*)0x40020400; volatile GPIO_t* const   portc       = (GPIO_t*)0x40020800; volatile GPIO_t* const   portd       = (GPIO_t*)0x40020C00; volatile GPIO_t* const   porte       = (GPIO_t*)0x40021000;

Alternatively nosotros could do the same with #definedue south;

          #define PORTA       ((volatile GPIO_t*) 0x40020000) #ascertain PORTB       ((volatile GPIO_t*) 0x40020400) #ascertain PORTC       ((volatile GPIO_t*) 0x40020800) #define PORTD       ((volatile GPIO_t*) 0x40020C00) #define PORTE       ((volatile GPIO_t*) 0x40021000)

Note in the #defines the leading '*' every bit a dereference has been dropped, and then admission to the annals is coded thus:

          PORTD->ODR |= (1 << 8);     // led-on

If we left the dereference in:

          #define PORTD       (*((volatile GPIO_t) 0x40020C00))

the code would be:

          PORTD.ODR |= (1 << viii);  // led-on

It's a matter of style, the generated instructions are the same.

Code Comparison

Then how does the struct code expression compare to our original pointer code (compiled with optimisation flag -Og):

Original code

          $ arm-none-eabi-objdump -d -S primary.o ...   *portd_odr |= (1 << 8);       // led-on   1a:   4c0b            ldr     r4, [pc, #44]   ; (48 <principal+0x48>)   1c:   6823            ldr     r3, [r4, #0]   1e:   f443 7380       orr.w   r3, r3, #256    ; 0x100   22:   6023            str     r3, [r4, #0] ...

The assembler code does the following:

Load the value 0x40020C14 into r4
Read the contents of 0x40020C14 [r4 + 0] every bit a 32-scrap value into r3
Or 0x100 with the contents of r3 (prepare bit eight)
Store r3 as a 32-bit value at accost 0x40020C14

Comparing this to the struct admission:

          $ arm-none-eabi-objdump -d -S main.o ...   portd->ODR |= (1 << 8);       // led-on   1a:   4c0a            ldr     r4, [pc, #40]   ; (44 <master+0x44>)   1c:   6963            ldr     r3, [r4, #20]   1e:   f443 7380       orr.w   r3, r3, #256    ; 0x100   22:   6163            str     r3, [r4, #xx] ...

So how does this differ? only in the apply of an offset-load:

Load the value 0x40020C00 into r4
Read the contents of 0x40020C14 [r4 + 20] as a 32-bit value into r3
Or the value 0x100 with the contents of r3
Shop r3 as a 32-bit value at address 0x40020C14 – [r4 + 0x14]

This code demonstrates that, from a size and operation perspective, in that location is no difference between the two approaches (at to the lowest degree for the Arm).

Note: An Arm load (ldr) pedagogy with or without a secondary starting time takes 2-cycles.

Caveats

Before rush off and refactor legacy code to now use structs there are a couple of factors nosotros are relying on, which may vary from compiler to compiler.

First, what can we be sure of?

The offset of the first struct member is always 0x0 from the objects accost (this is not guaranteed in C++ only unremarkably is the case).
The compiler cannot reorder the members, and so OTYPER volition e'er come at a higher address in retentivity than MODER and at a lower than OSPEEDR.

However, we cannot guarantee that the compiler volition non introduce padding between members, as the standard states:

At that place may be unnamed padding within a structure object, but non at its first.

Then nosotros cannot guarantee that address of OTYPER is equal to the accost of MODER + 4 bytes.

That said, in practical terms, with modern compilers, it is unlikely to exist a problem (for this code). Padding tends to occur when a information member crosses its natural boundary (i.e. a 32-bit type is non word aligned). e.g.

          typedef struct {   int  a;   char b;   int  c; } Padding_t;

would probable render a outcome of 12 from sizeof(Padding_t); because iii paddings bytes
are added after char b to align the int c definition.

Mitigating the chance

The obvious, and nearly straightforward, approach is to ensure you lot have a unit of measurement test that checks the size of the generated structure, eastward.g.

          void test_GPIO_t_struct_size(void) {     TEST_ASSERT_EQUAL(40, sizeof(GPIO_t)); }

Alternatively, ane of the compelling reasons to use C11 is the introduction of static_assert[link], e.m.

          int main(void) {   static_assert(sizeof(GPIO_t) == 40, "padding in GPIO_t present"); }

This is a compile-time check; if padding was present, then the following compiler error is generated:

          src/master.c: In function 'master': src/primary.c:87:iii: error: static assertion failed: "padding in GPIO_t present"    static_assert(sizeof(GPIO_t) == 40, "padding in GPIO_t present");    ^

If you're not using C11 (I've nevertheless to run across an embedded C project using it) then a terminal arroyo is to try and ensure no padding is nowadays past requesting the compiler 'pack' the struct to the about optimal memory model.

This is always a compiler-specific request, which may be done through #pragmas. Notwithstanding GCC uses its own 'attribute' approach instead of pragmas.

Defining the construction with the attribute 'packed' volition normally remove any potential padding, e.thousand.

          typedef struct {   uint32_t MODER;   // fashion register,                  offset: 0x00   uint32_t OTYPER;  // output type annals,           starting time: 0x04   uint32_t OSPEEDR; // output speed annals,          outset: 0x08   uint32_t PUPDR;   // pull-up/pull-down register,     first: 0x0C   uint32_t IDR;     // input data register,            outset: 0x10   uint32_t ODR;     // output data register,           commencement: 0x14   uint32_t BSRR;    // chip set/reset register,         offset: 0x18   uint32_t LCKR;    // configuration lock annals,    offset: 0x1C   uint32_t AFRL;    // alternate function registers,   offset: 0x20   uint32_t AFRH;    // alternate function registers,   outset: 0x24 } __attribute__((packed)) GPIO_t;  typedef struct {   int  a;   char b;   int  c; } __attribute__((packed)) Padding_t;   int principal(void) {   static_assert(sizeof(GPIO_t) == 40, "padding in GPIO_t present");   static_assert(sizeof(Padding_t) == ix, "padding in Padding_t present"); }

Unaligned access can cause a whole host of issues and functioning issues, and then be extremely conscientious using packing.

Vendor Supplied Headers

On most modern microcontrollers you are likely to find headers provided with register definitions already supplied. Many years agone Arm introduced the
Cortex Micro-controller Software Interface Standard (CMSIS). As office of the standard information technology is expected that between Arm and the Vendor, annals definitions will be supplied.

For example, ST supply a series for headers for their STM32 family of microcontrollers. Searching out the ST provided file stm32f407xx.h you volition observe definitions for all peripheral included in the 407 variant.

On line 544 of this header file (based on version V2.1.0) yous will find the following definition:

          typedef struct {   __IO uint32_t MODER;    /*!< GPIO port mode annals,               Address offset: 0x00      */   __IO uint32_t OTYPER;   /*!< GPIO port output type register,        Address first: 0x04      */   __IO uint32_t OSPEEDR;  /*!< GPIO port output speed register,       Address offset: 0x08      */   __IO uint32_t PUPDR;    /*!< GPIO port pull-up/pull-down register,  Address offset: 0x0C      */   __IO uint32_t IDR;      /*!< GPIO port input information register,         Accost outset: 0x10      */   __IO uint32_t ODR;      /*!< GPIO port output data register,        Address get-go: 0x14      */   __IO uint16_t BSRRL;    /*!< GPIO port fleck prepare/reset depression register,  Address offset: 0x18      */   __IO uint16_t BSRRH;    /*!< GPIO port bit set/reset high register, Address kickoff: 0x1A      */   __IO uint32_t LCKR;     /*!< GPIO port configuration lock register, Address start: 0x1C      */   __IO uint32_t AFR[2];   /*!< GPIO alternate part registers,     Address offset: 0x20-0x24 */ } GPIO_TypeDef;

This is a slightly unlike interpretation of the register layout from earlier, notably:

The BSRR has been carve up into two xvi-bit register (BSRRL and BSRRH)
The AFR has been combined into an array of two elements (rather than a High and Depression).

At that place could be a risk of padding betwixt BSRRL and BSRRH, only unlikely and does not occur hither.

The __IO macro but maps onto volatile. In that location is a macro for __I (volatile const) to define 'read merely' access (there is a __O (volatile) to indicate 'write only' access – only this tin can't exist enforced in C).

Farther down in the file (line 1130):

          #ascertain GPIOD               ((GPIO_TypeDef *) GPIOD_BASE)

Over again, some other slight difference in the code is the choice to put the volatile directive in the struct rather than at the pointer definition.

The RCC struct definition is on line 615 with the #define on line 1137.

The CMSIS code to drive the LED is:

          #include "stm32f407xx.h" #include "timer.h"  int master(void) {   RCC->AHB1ENR = (1 << three);    uint32_t moder = GPIOD->MODER;   moder |= (1 << 16);   moder &= ~(1 << 17);   GPIOD->MODER = moder;    while (1) {     GPIOD->ODR |= (one << 8);  // led-on     sleep(500);     GPIOD->ODR &= ~(one << 8); // led-off     sleep(500);   } }

In summary

Programs are decomposed into modules in several ways of which one is called during the blueprint process (assuming blueprint happens!). The choice of decomposition has a disquisitional effect on the architecturel and thus the product's quality attributes such as maintainability, reliability, modifiability, and testability of the last system.

Cohesion is one of the most important concepts in software decomposition. High cohesion is fundamental to good pattern principles and patterns, guiding separation of concerns and maintainability.

Using a struct-based model for device access improves cohesion through good abstraction models, making code easier to understand and maintain.

In the side by side article I shall commencement to compare the relative merits and consequences of using the #define model verse the pointer model.

About
Latest Posts

Co-Founder and Managing director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in unlike sectors, including aerospace, telecomms, government and banking.
His current involvement lie in IoT Security and Active for Embedded Systems.

Niall Cooling

Co-Founder and Managing director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Active for Embedded Systems.

Source: https://blog.feabhas.com/2019/01/peripheral-register-access-using-c-structs-part-1/

Posted by: turnerwhangs.blogspot.com