C6000 Embedded Design Workshop Student Guide Rev1.20

C6000_Embedded_Design_Workshop_Student_Guide_rev1.20

C6000_Embedded_Design_Workshop_Student_Guide_rev1.20

C6000_Embedded_Design_Workshop_Student_Guide_rev1.20

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 286 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Intro to the TI-RTOS Kernel Workshop - Cover 0 - 1
C6000 Embedded Design Workshop
Student Guide
C6000 Embedded Design Workshop
Student Guide, Rev 1.20 November 2013
Technical Training
Notice
0 - 2 Intro to the TI-RTOS Kernel Workshop - Cover
Notice
Creation of derivative works unless agreed to in writing by the copyright owner is forbidden. No
portion of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the
prior written permission from the copyright holder.
Texas Instruments reserves the right to update this Guide to reflect the most current product
information for the spectrum of users. If there are any differences between this Guide and a
technical reference manual, references should always be made to the most current reference
manual. Information contained in this publication is believed to be accurate and reliable.
However, responsibility is assumed neither for its use nor any infringement of patents or rights of
others that may result from its use. No license is granted by implication or otherwise under any
patent or patent right of Texas Instruments or others.
Copyright 2013 by Texas Instruments Incorporated. All rights reserved.
Technical Training Organization
Semiconductor Group
Texas Instruments Incorporated
7839 Churchill Way, MS 3984
Dallas, TX 75251-1903
Revision History
Rev 1.00 - Oct 2013 - Re-formatted labs/ppts to fit alongside new TI-RTOS Kernel workshop
Rev 1.10 – Oct 2013 Added chapter 10 (Dyn Memory) as first optional chapter
Rev 1.20Nov 2013 upgraded all labs to use UIA/SA
TI-RTOS Workshop - Using Dynamic Memory 10 - 1
Using Dynamic Memory
Introduction
In this chapter, you will learn about how to pass data between threads and how to protect
resources during critical sections of code including using Events, MUTEXs, BIOS “contains”
such as Mailboxes and Queues and other methods of helping threads (mainly Tasks)
communicate with each other.
.
Objectives
Objectives
Compare/contrast static and dynamic systems
Define heaps and describe how to configure the
different types of heaps (std, HeapBuf, etc.)
Describe how to eliminate the drawbacks of
using std heaps (fragments, non-determinism)
Implement dynamic object creation
Lab Using the previous Task/Sem lab, create
our Semaphores and Tasks dynamically
Module Topics
10 - 2 TI-RTOS Workshop - Using Dynamic Memory
Module Topics
Using Dynamic Memory ....................................................................................................... 10-1
Module Topics.................................................................................................................... 10-2
Static vs. Dynamic .............................................................................................................. 10-3
Dynamic Memory Concepts................................................................................................ 10-4
Using Dynamic Memory.................................................................................................. 10-4
Creating A Heap ............................................................................................................. 10-6
Different Types of Heaps .................................................................................................... 10-7
HeapMem ...................................................................................................................... 10-7
HeapBuf ......................................................................................................................... 10-8
HeapMultiBuf.................................................................................................................. 10-9
Default System Heap .................................................................................................... 10-10
Dynamic Module Creation ................................................................................................ 10-11
Custom Section Placement .............................................................................................. 10-13
Lab 10: Using Dynamic Memory ...................................................................................... 10-15
Lab 10 Procedure Using Dynamic Task/Sem .............................................................. 10-16
Import Project ............................................................................................................... 10-16
Check Dynamic Memory Settings ................................................................................. 10-17
Inspect New Code in main().......................................................................................... 10-18
Delete the Semaphore and Add It Dynamically ............................................................. 10-18
Build, Load, Run, Verify ................................................................................................ 10-19
Delete Task and Add It Dynamically ............................................................................. 10-20
Additional Information....................................................................................................... 10-22
Notes ............................................................................................................................... 10-23
More Notes ................................................................................................................... 10-24
Static vs. Dynamic
TI-RTOS Workshop - Using Dynamic Memory 10 - 3
Static vs. Dynamic
Static vs Dynamic Systems
Link Time:
-Allocate Buffers
Execute:
-Read data
-Process data
-Write data
Allocated at LINK time
+ Easy to manage (less thought/planning)
+ Smaller code size, faster startup
+ Deterministic, atomic (interrupts won’t mess it up)
-Fixed allocation of memory resources
Optimal when most resources needed concurrently
Create:
-Allocate Buffers
Execute:
-R/W & Process
Delete:
-FREE Buffers
Allocated at RUN time
+ Limited resources are SHARED
+ Objects (buffers) can be freed back to the heap
+ Smaller RAM budget due to re-use
-Larger code size, more difficult to manage
-NOT deterministic, NOT atomic
Optimal when multi threads share same resource
or memory needs not known until runtime
SYS/BIOS
allows either
method
Static Memory
Dynamic Memory (HEAP)
BIOS Runtime Cfg Dynamic Memory
MAU Minimum Addressable Unit
Memory allocation sizes are measured in MAUs
8 bits: C6000, MSP430, ARM
16 bits: C28x
Memory Policies Dynamic or Static?
Dynamic is the default policy (recommended)
Static policy can save some code/data memory
Select via .CFG GUI:
Note: ~5K bytes savings on a C6000
choosing static only” vs. dynamic”
Dynamic Memory Concepts
10 - 4 TI-RTOS Workshop - Using Dynamic Memory
Dynamic Memory Concepts
Using Dynamic Memory
External
Memory
Dynamic Memory Usage (Heap)
Internal
SRAM
CPU
Program
Cache
Data
Cache
EMIF
Using Memory Efficiently
Stack
Heap
Common memory reuse
within C language
A Heap (
i.e. system
memory) allocates, then
frees chunks of memory
from a common system
block
Code Example…
Dynamic Example (Heap)
#define SIZE 32
char x[SIZE]; /*allocate*/
char a[SIZE];
x={…}; /*initialize*/
a={…};
filter(…); /*execute*/
“Normal” (static) C Coding
#define SIZE 32
x=malloc(SIZE);
// MAUs
a=malloc(SIZE); // MAUs
x={…};
a={…};
filter(…);
free(x);
free(a);
“DynamicC Coding
Create
Execute
Delete
High-performance DSP users have traditionally used static embedded systems
As DSPs and compilers have improved, the benefits of dynamic systems often
allow enhanced flexibility (more threads) at lower costs
Dynamic Memory Concepts
TI-RTOS Workshop - Using Dynamic Memory 10 - 5
Dynamic Memory (Heap)
Internal
SRAM
CPU
Program
Cache
Data
Cache
EMIF
External
Memory
Stack
Heap
What if I need two heaps?
Say, a big image array off-chip, and
Fast scratch memory heap on-chip?
Using Memory Efficiently
Common memory reuse
within C language
A Heap (i.e. system
memory) allocates, then
frees chunks of memory
from a common system
block
Multiple Heaps
Internal
SRAM
CPU
Program
Cache
Data
Cache
EMIF
External
Memory
Stack
Heap
Heap2
BIOS enables multiple
heaps to be created
Create and name heaps in
.CFG file or via C code
Use Memory_alloc()
function to allocate
memory and specify
which heap
Dynamic Memory Concepts
10 - 6 TI-RTOS Workshop - Using Dynamic Memory
Memory_alloc()
#define SIZE 32
x = Memory_alloc(NULL, SIZE, align, &eb);
a = Memory_alloc(myHeap, SIZE, align, &eb);
x = {…};
a = {…};
filter(…);
Memory_free(NULL,x,SIZE);
Memory_free(myHeap,a,SIZE);
Using Memory functions
#define SIZE 32
x=malloc(SIZE);
a=malloc(SIZE);
x={…};
a={…};
filter(…);
free(a);
free(x);
Standard C syntax
Notes: -malloc(size) API is translated to Memory_alloc(NULL,size,0,&eb) in SYS/BIOS
-Memory_calloc/valloc also available
Default System Heap
Custom heap
Error Block (more
details later)
Creating A Heap
Creating A Heap (HeapMem)
1Use HeapMem (Available Products)
2Create HeapMem (myHeap): size, alignment, name
buf1 = Memory_alloc(myHeap, 64, 0, &eb)
HeapMem_Params_init(&prms);
prms.size = 256;
myHeap = HeapMem_create(&prms, &eb);
OR…
Static Dynamic
Usage
2
n
Different Types of Heaps
TI-RTOS Workshop - Using Dynamic Memory 10 - 7
Different Types of Heaps
Heap Types
Users can choose from 3 different types of Heaps:
HeapMem
Allocate variable-size blocks
Default system heap type
HeapBuf
Allocate fixed-size blocks
HeapMultiBuf
Specify variable-size blocks, but internally,
allocate from a variety of fixed-size blocks
HeapMem
HeapMem
HeapMem
Most flexible allows allocation of
variable-sized blocks (like malloc())
Ideal when size of memory is not known
until runtime
Creation: .CFG (static) or C code (dynamic)
Like malloc(), there are drawbacks:
NOT Deterministic Memory Manager traverses
linked list to find blocks
Fragmentation After frequent allocate/free, fragments occur
Is there a heap type without these drawbacks?
Different Types of Heaps
10 - 8 TI-RTOS Workshop - Using Dynamic Memory
HeapBuf
HeapBuf
Allows allocation of fixed-size blocks (no fragmentation)
Deterministic, no reentrancy problems
Ideal when using a varying number of fixed-size
blocks (e.g. 4-6 buffers of 64 bytes each)
Creation: .CFG (static) or C code (dynamic)
For blockSize=64: Ask for 16, get 64. Ask for 66, get NULL
How do you create a HeapBuf?
BUF BUF BUF BUF
SWI
Memory_alloc()
TSK
Memory_free()
BUF BUF BUF BUF
HeapBuf_create() HeapBuf_delete()
HeapBuf
Creating A HeapBuf
1Use HeapBuf (Available Products)
2Create HeapBuf (myBuf): blk size, # of blocks, name
buf1 = Memory_alloc(myHeapBuf, 64, 0, &eb)
What if I need multiple sizes (16, 32, 128)?
prms.blockSize = 64;
prms.numBlocks = 8;
prms.bufSize = 256;
myHeapBuf = HeapBuf_create(&prms, &eb);
OR…
Static Dynamic
Usage
Different Types of Heaps
TI-RTOS Workshop - Using Dynamic Memory 10 - 9
Multiple HeapBufs
16 16 16 16 16 16 16 16
32 32 32 32
32 32 32 32
128
128
128
128
128
heapBuf1
heapBuf2
heapBuf3
1024 MAUs in 3 HeapBuf’s:
8 x 16-bytes
8 x 32-bytes
5 x 128-bytes
Given this configuration, what happens when we allocate
the 9
th 16-byte location from heapBuf1?
What “mechanism” would you want to exist to avoid the
NULL return pointer?
HeapMultiBuf
HeapMultiBuf
Allows variable-size allocation from a variety of fixed-size blocks
Services requests for ANY memory size, but always returns the
most efficient-sized available block
Can be configured to “block borrowfrom the “next size up”
Creation: .CFG (static) or C code (dynamic)
Ask for 17, get 32. Ask for 36, get 128.
16 16 16 16 16 16 16 16
32 32 32 32
32 32 32 32
128
128
128
128
128
1024 MAUs in 3 Buffers:
8 x 16-byte
8 x 32-byte
5 x 128-byte
Different Types of Heaps
10 - 10 TI-RTOS Workshop - Using Dynamic Memory
Default System Heap
Default System Heap
BIOS automatically creates a default system heap of type HeapMem
How do you configure the default heap?
In the .CFG GUI, of course:
How to USE this heap?
buf1 = Memory_alloc(NULL, 128, 0, &eb);
myAlgo(buf1);
Memory_free(NULL, buf1, 128);
If NULL, uses default heap
align
Dynamic Module Creation
TI-RTOS Workshop - Using Dynamic Memory 10 - 11
Dynamic Module Creation
#define COUNT 0
Semaphore_Handle hMySem;
hMySem = Semaphore_create(COUNT,NULL,&eb);
Semaphore_post(hMySem);
Semaphore_delete(&hMySem);
Dynamically Creating SYS/BIOS Objects
Module_create
Allocates memory for object out of heap
Returns a Module_Handle to the created object
Module_delete
Frees the objects memory
Example: Semaphore creation/deletion:
Hwi
Swi
Task
Semaphore
Stream
Mailbox
Timer
Clock
List
Event
Gate
C
X
D
Note: always check return value of _create APIs !
Modules
params
Example Dynamic Task API
Task_Handle hMyTsk;
Task_Params taskParams;
Task_Params_init(&taskParams);
taskParams.priority = 3;
hMyTsk = Task_create(myCode,&taskParams,&eb);
// “MyTsk” now active w/priority = 3 ...
Task_delete(&hMyTsk);
C
X
D
taskParams includes: heap location, priority, stack ptr/size, environment ptr, name
Dynamic Module Creation
10 - 12 TI-RTOS Workshop - Using Dynamic Memory
What is Error Block ?
buf1 = Memory_alloc (myBuf, 64, 0, &eb)Error_Block eb;
Error_init (&eb);
Setup Code
Most SYS/BIOS APIs that expect an error block also return
a handle to the created object or allocated memory
If NULL is passed instead of an initialized Error_Block and
an error occurs, the application aborts and the error can be
output using System_printf().
This may be the best behavior in systems where an error is
fatal and you do not want to do any error checking
The main advantage of passing and testing Error_block is
that your program controls when it aborts.
Typically, systems pass Error_block and check resource
pointer to see if it is NULL, then make a decision…
Usage
Can check Error_Block using: Error_check()
Custom Section Placement
TI-RTOS Workshop - Using Dynamic Memory 10 - 13
Custom Section Placement
Custom Placement of Data and Code
Problem #1: You have a function or buffer that you want to
place at a specific address in the memory map. How is this
accomplished?
myFxn
myBuffer
.myCode
Mem1
Problem #2: have two buffers, you want one to be linked at Ram1
and the other at Ram2. How do you “split” the .bss (compilers
default) section??
buf1
buf2
.bss
buf1 buf2
Ram1 Ram2
.myBuf
Mem2
Making Custom Sections
#pragma CODE_SECTION (myFxn, “.myCode”);
void myFxn(*ptr, *ptr2, …){ };
#pragma DATA_SECTION (myBuffer, “.myBuf”);
int16_t myBuffer[32];
Create custom code & data sections using:
#pragma DATA_SECTION(buf1, “.bss:buf1”);
int16_t buf1[8];
#pragma DATA_SECTION(buf2, “.bss:buf2”);
int16_t buf2[8];
Split default compiler section using SUB sections:
How do you LINK these custom sections?
myFxn & myBuffer is the name of the fxn/var
.myCode & .myBufare the names of the custom sections
Custom Section Placement
10 - 14 TI-RTOS Workshop - Using Dynamic Memory
Linking Custom Sections
app.cfg
Linker
app.out
Create your own linker.cmd file for custom sections
CCS projects can have multiple linker CMD files
May need to create custom MEMORY segments also (device-specific)
.bss: used as protection against custom section not being linked
w warns if unexpected section encountered
“Build
SECTIONS
{ .myCode: > Mem1
.myBuf: > Mem2
.bss:buf1 > Ram1
.bss:buf2 > Ram2
}
M EMORY { }
SECTIONS { }
app.cmd
.map
userlinker.cmd
Lab 10: Using Dynamic Memory
TI-RTOS Workshop - Using Dynamic Memory 10 - 15
Lab 10: Using Dynamic Memory
You might notice this system block diagram looks the same as what we used back in Lab 8
that’s because it IS.
We’ll have the same objects and events, its just that we will create the objects dynamically
instead of statically.
In this lab, you will delete the current STATIC configuration of the Task and Semaphore and
create them dynamically. Then, if your LED blinks once again, you were successful.
Lab 10 Creating Task/Sem Dynamically
main() {
init_hw();
Timer (500ms)
BIOS_start();
}
main.c
Hwi
Scheduler
Idle
Semaphore_post(LedSem);
Procedure
Import archived (.zip) project (from Task lab)
Delete Task/Sem objects (for ledToggle)
Write code to create Task/Sem Dynamically
Build,Play ”, Debug
Use ROV/UIA to debug/analyze
Time: 30 min
Hwi ISR
ledToggle() {
while(1) {
Semaphore_pend(LedSem);
Toggle_LED;
}
}
Task
ledToggleTas k
Lab 10 Procedure Using Dynamic Task/Sem
10 - 16 TI-RTOS Workshop - Using Dynamic Memory
Lab 10 Procedure Using Dynamic Task/Sem
In this lab, you will import the solution for the Task lab from before and modify it by DELETING
the static declaration of the Task and Semaphore in the .cfg file and then add code to create
them DYNAMICALLY in main().
Import Project
1. Open CCS and make sure all existing projects are closed.
Close any open projects (right-click Close Project) before moving on. With many main.c
and app.cfg files floating around, it might be easy to get confused about WHICH file you are
editing.
Also, make sure all file windows are closed.
2. Import existing project from \Lab10.
Just like last time, the author has already created a project for you and it’s contained in an
archived .zip file in your lab folder.
Import the following archive from your /Lab_10 folder:
Lab_10_TARGET_STARTER_blink_Mem.zip
Click Finish.
The projectblink_TARGET_MEM should now be sitting in your Project Explorer. This is the
SOLUTION of the earlier Task lab with a few modifications explained later.
► Expand the project to make sure the contents look correct.
3. Build, load and run the project to make sure it works properly.
We want to make sure the imported project runs fine before moving on. Because this is the
solution from the previous lab, well, it should build and run.
Build fix errors.
► Then run it and make sure it works. If all is well, move on to the next step
If you’re having any difficulties, ask a neighbor for help
Lab 10 Procedure Using Dynamic Task/Sem
TI-RTOS Workshop - Using Dynamic Memory 10 - 17
Check Dynamic Memory Settings
4. Open BIOS
Runtime and check settings.
Open app.cfg and click on BIOS
Runtime.
Make sure the “Enable Dynamic Instance Creation” checkbox is checked (it should already
be checked):
► Check the Runtime Memory Options and make sure the settings below are set properly for
stack and heap sizes.
We need SOME heap to create the Semaphore and Task out of, so 256 is a decent number
to start with. We will see if it is large enough as we go along.
Save app.cfg.
The author also wants you to know that there is duplication of these numbers throughout the
.cfg file which causes some confusion especially for new users. First, BIOS
Runtime is
THE place to change the stack and heap sizes.
Other areas of the app.cfg file are “followers” of these numbers they reflect these
settings. Sometimes they are displayed correctly in other “modules” and some show “zero”.
No worries, just use the BIOS
Runtime numbers and ignore all the rest.
But, you need to see for yourself that these numbers actually show up in four places in the
app.cfg file. Of course, BIOS
Runtime is the first and ONLY place you should use.
However, click on the following modules and see where these numbers show up (dont
modify any numbers just click and look):
Hwi
Memory
Program
Yes, this can be confusing, but now you know. Just use BIOS
Runtime and ignore the other
locations for these settings.
Hint: If you change the stack or heap sizes in any of these other windows, it may result in a
BIOS CFG warning of some kind. So, the author will say this one more time ONLY use
BIOS Runtime to change stack and heap sizes.
Lab 10 Procedure Using Dynamic Task/Sem
10 - 18 TI-RTOS Workshop - Using Dynamic Memory
Inspect New Code in main()
5. Open main.c and inspect the new code.
The author has already written some code for you in main(). Why? Well, instead of making
you type the code and make spelling or syntax errors and deal with the build errors, it is just
easier to provide commented code and have you uncomment it. Plus, when you create the
Task dynamically, the casting of the Task function pointer is a bit odd.
► Open main.c and find main().
Inspect the new code that creates the Semaphore and Task dynamically (DO NOT
UNCOMMENT ANYTHING YET):
As you go through this lab, you will be uncommenting pieces of this code to create the
Semaphore and Task dynamically and you’ll have to fill in the “????” with the proper names
or values. Hey, we couldn’t do ALL the work for you.
Also notice in the global variable declaration area that there are two handles for the
Sempahore and Task also provided.
In order to use functions like Semaphore_create() and Task_create(), you will need to
uncomment the necessary #include for the header files also.
Delete the Semaphore and Add It Dynamically
6. Get rid of the Semaphore in app.cfg.
Remove ledToggleSem from the app.cfg file and save app.cfg.
7. Uncomment the two lines of code associated with creating ledToggleSem dynamically.
In the global declaration area above main(), uncomment the line associated with the
handle for the Semaphore and name the Semaphore ledToggleSem.
In main(), uncomment the line of code for Semaphore_create() and use the same
name for the Semaphore.
In the #include section near the top of main.c, uncomment the #include for
Semaphore.h.
Save main.c.
Lab 10 Procedure Using Dynamic Task/Sem
TI-RTOS Workshop - Using Dynamic Memory 10 - 19
Build, Load, Run, Verify
8. Build, load and run your code.
Build the new code, load it and run it for 5 blinks.
Is it working? If not, it is debug time. If it is working, you can move on
9. Check heap in ROV.
So, how much heap memory does a Semaphore take? Where do you find the heap sizes and
how much was used? ROV, of course…
Open ROV and click on HeapMem (the standard heap type), then click on Detailed:
So, in this example (C28x), the starting heap size was 0x100 (256) and 0xd0 is still free
(208), so the Semaphore object took 48 16-bit locations on the C28x (assuming nothing else
is on the heap). Ok. So, we didn’t run out of heap. Good thing.
Write down how many bytes your Semaphore required here: _____________
How much free size do you have left over? ____________
So, when you create a Task, which has its own stack, if you create it with a stack larger than
the free size left over, what might happen?
_______________________________________________________
Well, let’s go try it…
Lab 10 Procedure Using Dynamic Task/Sem
10 - 20 TI-RTOS Workshop - Using Dynamic Memory
Delete Task and Add It Dynamically
10. Delete the Task in app.cfg.
Remove the Task from the app.cfg file and save app.cfg.
11. Uncomment some lines of code and declarations.
► Uncomment the #include for Task.h.
Uncomment the declaration of the Task_Handle.
Uncomment the code in main() that creates the Task (ledToggleTask) and fill in the
???? properly.
Create the Task at priority 2.
Save main.c.
12. Build, load, run, verify.
Build and run your code for five blinks. No blink? Read further…
Halt your code.
Your code probably is probably sitting at abort(). How would the author know that? Well,
when you create a Task, it needs a stack. On the C6000, the default stack size is 2048 bytes.
For C28x, it is 256.
You probably aborted with a message that looks similar to this:
What happened? Two things. First, your heap is not big enough to create a Task from
because the Task requires a stack that is larger than the entire heap.
Also, did you pass an error block in the Task_create() function? Probably not. So, what
happens if you get a NULL pointer back and you do NOT pass an error block? BIOS aborts.
Well, that’s what it looks like.
13. Open ROV to see the damage.
Open ROV and click on Task. You should see something similar to this:
► Look at the size ofstackSize” for ledToggle (name may or may not show up). This
screen capture was for C28x, so your size may be different (probably larger).
What size did you set the heap to in BIOS Runtime? __________ bytes
What is the size of the stack needed for ledToggle (shown in ROV)? __________ bytes
Get the picture? You need to increase the size of the heap
Lab 10 Procedure Using Dynamic Task/Sem
TI-RTOS Workshop - Using Dynamic Memory 10 - 21
14. Go back and increase the size of the heap.
Open BIOS
Runtime and use the following heap sizes:
C28x: 1024
C6000: 4096
MSP430: 1024
TM4C: 4096
We probably don’t need THIS large of a heap for this application it could be tuned better
we’re just using a larger number to see the application work.
Save app.cfg.
15. Wait, what about Error Block?
In a real application, the user has a choice whether to use Error Block or not. For debug
purposes, maybe it is best to leave it off so that your program aborts when the handle to the
requested resource is NULL. If you don’t like that, then use Error Block and check the return
handle and deal with it however you choose user preference.
In our lab, we chose to ignore Error Block, but at least you know it is there, how to initialize
one and how it works.
16. Rebuild and run again.
Rebuild and run the new project with the larger heap. Run for 5 blinks it should work fine
now.
17. Terminate your debug session, close the project and close CCS.
Youre finished with this lab. Help a neighbor who is struggling you know you
KNOW IT when you can help someone else and it’s being a good neighbor. But, if
you want to be selfish and just leave the room because the workshop is OVER, no
one will look at you funny !!
Additional Information
10 - 22 TI-RTOS Workshop - Using Dynamic Memory
Additional Information
Placing a Specific Section into Memory
SYS/BIOS GUI now supports specific placements of sections (like .far, .bss, etc.)
into specific memory segments (like IRAM, DDR, etc.):
Via the Platform File (C6000 Only) hi-level, but works fine:
Via the app.cfg GUI (finer control):
GUI
CFG script
D .text .econst .ebss
TYP 3568 1b4e 11c0
MIN 2940 4BF 752
----------------------------------------------------------------------------
Savings C28(3112) 168F(5775) A6E(2670)
TYP vs. MIN footprints C28x
SAVINGS OVERALL
FLASH RAM TOTAL
8887 2670 11557
Notes
TI-RTOS Workshop - Using Dynamic Memory 10 - 23
Notes
More Notes
10 - 24 TI-RTOS Workshop - Using Dynamic Memory
More Notes
*** the very end ***
C6000 Embedded Design Workshop - C6000 Introduction 11 - 1
C6000 Introduction
Introduction
This is the first chapter that specifically addresses ONLY the C6000 architecture. All chapters
from here on assume the student has already taken the 2-day TI-RTOS Kernel workshop.
During those past two days, some specific C6000 architecture items were skipped in favor of
covering all TI EP processors with the same focus. Now, it is time to dive deeper into the C6000
specifics.
The first part of this chapter focuses on the C6000 family of devices. The 2nd part dives deeper
into topics already discussed in the previous two days of the TI-RTOS Kernel workshop. In a way,
this chapter is “catching up” all the C6000 users to understand this target environment
specifically.
After this chapter, we plan to dive even deeper into specific parts of the architecture like
optimizations, cache and EDMA.
Objectives
Objectives
Introduce the C6000 Core and the C6748
target device
Highlight a few uncommon pieces of the
architecture e.g. the SCR and PRU
“Catch up” from the TI-RTOS Kernel
discussions are C6000-specific topics such as
Interrupts, Platforms and Target Config Files
Lab 11 Create a custom platform and create
an Hwi to respond to the audio interrupts
Module Topics
11 - 2 C6000 Embedded Design Workshop - C6000 Introduction
Module Topics
C6000 Introduction............................................................................................................... 11-1
Module Topics.................................................................................................................... 11-2
TI EP Product Portfolio ....................................................................................................... 11-3
DSP Core........................................................................................................................... 11-4
Devices & Documentation .................................................................................................. 11-6
Peripherals ......................................................................................................................... 11-7
PRU ............................................................................................................................... 11-7
SCR / EDMA3 ............................................................................................................... 11-8
Pin Muxing ..................................................................................................................... 11-9
Example Device: C6748 DSP ........................................................................................... 11-11
Choosing a Device ........................................................................................................... 11-12
C6000 Arch “Catchup” ...................................................................................................... 11-13
C64x+ Interrupts ........................................................................................................... 11-13
Event Combiner............................................................................................................ 11-14
Target Config Files ....................................................................................................... 11-14
Creating Custom Platforms ........................................................................................... 11-15
Quiz ................................................................................................................................. 11-19
Quiz - Answers ............................................................................................................. 11-20
Using Double Buffers ....................................................................................................... 11-21
Lab 11: An Hwi-Based Audio System ............................................................................... 11-23
Lab 11 Procedure ...................................................................................................... 11-24
Hack LogicPD’s BSL types.h ........................................................................................ 11-24
PART B (Optional) Using the Profiler Clock................................................................ 11-34
Additional Information....................................................................................................... 11-35
Notes ............................................................................................................................... 11-36
TI EP Product Portfolio
C6000 Embedded Design Workshop - C6000 Introduction 11 - 3
TI EP Product Portfolio
TI’s Embedded Processor Portfolio
Microcontrollers (MCU) Application (MPU)
MSP430
C2000 Tiva-C
Sitara DSP
Multicore
16-bit 32-bit 32-bit 32-bit 32-bit 16/32-bit 32-bit
Ultra Low
Power & Cost
Real-time All-around
MCU Safety Linux
Android
All-around
DSP
Massive
Performance
MSP430
ULP RISC
MCU
Real-time
C28x MCU
ARM
M3+C28
ARM
Cortex
-M3
Cortex
-M4F
ARM
Cortex-
Cortex-R4
ARM
Cortex-A8
Cortex-A9
DSP
C5000
C6000
C66 + C66
A15 + C66
A8 + C64
ARM9 + C674
Low Pwr
Mode
0.1 µA
0.5 µA (RTC)
Analog I/F
RF430
Motor Control
Digital Power
Precision
Timers/PWM
32-bit Float
Nested Vector
Int Ctrl (NVIC)
Ethernet
(MAC+PHY)
Lock step
Dual-core R4
ECC Memory
SIL3 Certified
$5 Linux CPU
3D Graphics
PRU-ICSS
industrial
subsys
C5000 Low
Power DSP
32-bit fix/float
C6000 DSP
Fix or Float
Up to 12 cores
4 A15 + 8 C66x
DSP MMAC’s:
352,000
TI RTOS
(SYS/BIOS)
TI RTOS
(SYS/BIOS)
TI RTOS
(SYS/BIOS) N/A
Linux, Android,
SYS/BIOS
C5x: DSP/BIOS
C6x: SYS/BIOS
Linux
SYS/BIOS
Flash: 512K
FRAM: 64K
512K
Flash
512K
Flash
256K to 3M
Flash
L1: 32K x 2
L2: 256K
L1: 32K x 2
L2: 256K
L1: 32K
x 2
L2: 1M + 4M
25 MHz 300 MHz 80 MHz 220 MHz 1.35 GHz 800 MHz 1.4 GHz
$0.25 to
$9.00
$1.85 to
$20.00
$1.00 to
$8.00
$5.00 to
$30.00
$5.00 to
$25.00
$2.00 to
$25.00
$30.00 to
$225.00
DSP Core
11 - 4 C6000 Embedded Design Workshop - C6000 Introduction
DSP Core
What Problem Are We Trying To Solve?
Digital sampling of
an analog signal:
A
t
Most DSP algorithms can be
expressed with MAC:
count
i = 1
Y = Σcoeffi* xi
for (i = 0; i < count; i++){
Y += coeff[i] * x[i]; }
DAC
x Y
ADC
DSP
How is the architecture designed to maximize computations like this?
'C6x CPU Architecture
Memory
‘C6x Compiler excels at Natural C
Multiplier (.M) and ALU (.L) provide up
to 8 MACs/cycle (8x8
or
16x16)
Specialized instructions accelerate
intensive, non-MAC oriented
calculations. Examples include:
Video compression, Machine
Vision, Reed Solomon, …
While MMACs speed math intensive
algorithms, flexibility of 8 independent
functional units allows the compiler to
quickly perform other types of
processing
‘C6x CPU can dispatch up to eight
parallel instructions each cycle
All ‘C6x instructions are conditional
allowing efficient hardware pipelining
A0
A31
.
.
.S1
.D1
.L1
.S2
.M1 .M2
.D2
.L2
B0
B31
.
.
Controller/Decoder
MACs
Note: More details later
DSP Core
C6000 Embedded Design Workshop - C6000 Introduction 11 - 5
C66x
C67x
C6000 DSP Family CPU Roadmap
C62x
C64x+ C674
C67x+
C64x
C671x
C621x
Floating Point
Fixed and Floating
Point
Lower power
EDMA3
PRU
L1 RAM/Cache
Compact Instr’s
EDMA3
Video/Imaging
Enhanced
EDMA2
Fixed Point
Available on the most
recent releases
C67x
C6000 DSP Family CPU Roadmap
C62x
C64x+
C66x
C674
C67x+
C64x
1GHz
EDM A (v2 )
2x Register Set
SIM D Instrs
(Packed Data Proc)
C671x
C621x
EDMA
L1 Cache
L2 Cache/RAM
Lower Cost
DMAX (PRU)
2x Register Set
FFT enhancements
1.2 GHz
EDMA3
SPLOOP
32x32 Int Multiply
Enhanced Instr for
FIR/F FT /Com pl ex
Combined Instr Sets from
C64x+/C67x+
Incr Floating-pt M Hz
Lower power
EDMA3
PRU
L1 RAM and/or Cache
Timestamp Counter
Compact Instr’s
Exceptions
Supervisor/User modes
Devices & Documentation
11 - 6 C6000 Embedded Design Workshop - C6000 Introduction
Devices & Documentation
DSP Generations : DSP and ARM+DSP
Fixed-Point
Cores
Float-Point
Cores DSP DSP+DSP
(Multi-core) ARM+DSP
C62x C67x C620x, C670x
C621x C67x C6211, C671x
C64x C641x
DM642
C67x+ C672x
C64x+ DM643x
C645x C647x
DM64xx,
OMAP35x, DM37x
C674x C6748 OMAP-L138*
C6A8168
C66x Future C667x
C665x (new)
DSP/BIOS Real-Time Operating System
SPRU423 -DSP/BIOS (v5) User’s Guide
SPRU403 -DSP/BIOS (v5) C6000 API Guide
SPRUEX3 -SYS/BIOS (v6) User’s Guide
Code Generation Tools
SPRU186 -Assembly Language Tools User’s Guide
SPRU187 -Optimizing C Compiler User’s Guide
Key C6000 Manuals
C64x/C64x+ C674 C66x
CPU Instruction Set Ref Guide SPRU732 SPRUFE8 SPRUGH7
Megamodule/Corepac Ref Guide SPRU871 SPRUFK5 SPRUGW0
Peripherals Overview Ref Guide SPRUE52 SPRUFK9 N/A
Cache User’s Guide SPRU862 SPRUG82 SPRUGY8
Programmers Guide SPRU198 SPRA198
SPRAB27
To find a manual, at www.ti.com
and enter the document number
in the Keyword field:
or…
www.ti.com/lit/<litnum>
Peripherals
C6000 Embedded Design Workshop - C6000 Introduction 11 - 7
Peripherals
ARM Graphics
Accelerator
C6x DSP
Video Accelerator(s)
Peripherals Video/Display
Subsytem
PRU
(Soft Peripheral)
Capture
Analog
Display
Digital
Display
LCD
Controller
What’s
Next?
DIY…
Serial Storage Master Timing
McBSP DDR2 PCIe Timers
McASP DDR3 USB 2.0 Watch
ASP SDRAM EMAC PWM
UART Async uPP eCAP
SPI SD/MMC HPI RTC
I2C ATA/CF EDMA3
CAN SATA SCR GPIO
UART
CAN
We’ll just look at three of these: PRU and SCR/EDMA3
PRU
Programmable Realtime Unit (PRU)
Use as a soft peripheral to imple-
ment add’l on-chip peripherals
Examples implementations
include:
Soft UART
Soft CAN
Create custom peripherals or
setup non-linear DMA moves.
No C compiler (ASM only)
Implement smart power
controller:
Allows switching off both ARM and
DSP clocks
Maximize power down time by
evaluating system events before
waking up DSP and/or ARM
PRU consists of:
2 Independent, Realtime RISC Cores
Access to pins (GPIO)
Its own interrupt controller
Access to memory (master via SCR)
Device power mgmt control
(ARM/DSP clock gating)
Peripherals
11 - 8 C6000 Embedded Design Workshop - C6000 Introduction
PRU SubSystem : IS / IS-NOT
Is IsNot
Dual 32bit RISC processor specifically
designed for manipulation of packed memory
mapped data structures and implementing
system features that have tight real time
constraints.
Is not a H/W accelerator used to speed up
algorithm computations.
Simple RISC ISA:
Approximately 40 instructions
Logical, arithmetic, and flow control ops all
complete in a single cycle
Is not a general purpose RISC processor:
No multiply hardware/instructions
No cache or pipeline
No C programming
Simple tooling:
Basic commandline assembler/linker
Is not integrated with CCS. Doesn’t include
advanced debug options
Includes example code to demonstrate
various features. Examples can be used as
building blocks.
No Operating System or high-level
application software stack
SCR / EDMA3
System Architecture SCR/EDMA
ARM
DSP
TC0
TC1
TC2
CC
PCI
HPI
EMAC SCR
Switched
Central
Resource
C64 Mem
DDR2
EMIF64
TCP
VCP
PCI
McBSP
Utopia
“Masters” “Slaves”
SCR Switched Central Resource
Masters initiate accesses to/from
slaves via the SCR
Most Masters (requestors) and Slaves
(resources) have their own port
to the SCR
Lower bandwidth masters (HPI,
PCI66, etc) share a port
There is a default priority (0 to 7) to
SCR resources that can be modified.
EDM A3
Note: this picture is the general idea”.
Every device has a different scheme
for SCRs and peripheral muxing. In
other words “check your data sheet”.
Peripherals
C6000 Embedded Design Workshop - C6000 Introduction 11 - 9
TMS320C6748 Interconnect Matrix
Note: not ALL connections are valid
Pin Muxing
What is Pin Multiplexing?
How many pins are on your device?
How many pins would all your peripheral require?
Pin Multiplexing is the answer only so many peripherals can be used at
the same time in other words, to reduce costs, peripherals must share
available pins
Which ones can you use simultaneously?
Designers examine app use cases when deciding best muxing layout
Read datasheet for final authority on how pins are muxed
Graphical utility can assist with figuring out pin-muxingPin mux utility...
HPI
uPP
Pin Mux Example
Peripherals
11 - 10 C6000 Embedded Design Workshop - C6000 Introduction
Pin Muxing Tools
Graphical Utilities For Determining which Peripherals can be Used Simultaneously
Provides Pin Mux Register Configurations. Warns user about conflicts.
ARM-based devices: www.ti.com/tool/pinmuxtool others: see product page
Example Device: C6748 DSP
C6000 Embedded Design Workshop - C6000 Introduction 11 - 11
Example Device: C6748 DSP
TMS320C674x Architecture -Overview
128K L3
16-bit EMIF
DDR2
mDDR
McASP
MMC/SD
EMAC
HPI
SATA
I2C, SPI, UART
Switched Central Resource (SCR)
256K
L2
EDMA3
C674x+ DSP Core
32KB L1P Cache/SRAM
32KB L1D Cache/SRAM
4-32x
PLL
Performance & Memory
TMS320C6748
128
128
256
128
Communications
Up to 456MHz
256K L2 (cache/SRAM)
32K L1P/D Cache/SRAM
16-bit DDR2-266
16-bit EMIF (NAND Flash)
64-Channel EDMA 3.0
10/100 EMAC
USB 1.1 & 2.0
S ATA
Power/Packaging
13x13mm nPBGA & 16x16mm
PBGA
Pin-to-pin compatible w/OMAP
L138 (+ARM9), 361-pin pkg
Dynamic voltage/freq scaling
Total Power < 420mW
128
128
USB
Timers
LCD, PWM, eCAP
uPP
Fixed & Floating-Pt
CPU
Choosing a Device
11 - 12 C6000 Embedded Design Workshop - C6000 Introduction
Choosing a Device
DSP & ARM MPU Selection Tool
http://focus.ti.com/en/multimedia/flash/selection_tools/dsp/dsp.html
C6000 Arch “Catchup”
C6000 Embedded Design Workshop - C6000 Introduction 11 - 13
C6000 Arch Catchup”
C64x+ Interrupts
How do Interrupts Work?
1. An interrupt occurs
EDMA
McASP
Timer
Ext’l pins
2. Interrupt Selector
124+4
12
3. Sets flag in Interrupt
Flag Register
(IFR)
4. Is this specific interrupt
enabled? (IER)
5. Are interrupts globally
enabled? (GIE/NMIE)
6. CPU Acknowledge
Auto hardware sequence
HWI Dispatcher (vector)
Branch to ISR
7. Interrupt Service Routine (ISR)
Context Save, ISR, Context Restore
User is responsible for setting up the following:
#2 Interrupt Selector (choose which 12 of 128 interrupt sources to use)
#4 Interrupt Enable Register (IER) individually enable the proper interrupt sources
#5 Global Interrupt Enable (GIE/NMIE) globally enable all interrupts
C64x+ Hardware Interrupts
C6748 has 128 possible interrupt sources (but only 12 CPU interrupts)
4-Step Programming:
Interrupt
Selector
0.
.
MCASP0_INT
.
.
127
0
HWI
4
1
HWI
5
0
HWI
15
.
.
.
IFR IER GIE
Vector
Table
1. Interrupt Selector choose which of the 128 sources are tied to the 12 CPU ints
2. IER enable the individual interrupts that you want to “listen to (in BIOS .cfg)
3. GIE enable global interrupts (turned on automatically if BIOS is used)
4. Note: HWI Dispatcher performs “smartcontext save/restore (automatic for BIOS Hwi)
1 2 3 4
Note: NMIE must also be enabled. BIOS automatically sets NMIE=1. If
BIOS is NOT used, the user must turn on both GIE and NMIE manually.
C6000 Arch “Catchup”
11 - 14 C6000 Embedded Design Workshop - C6000 Introduction
Event Combiner
Event Combiner (ECM)
EVT 4-31
EVTFLAG[0]
Interrupt
Selector
C
P
U
128:12
EVT 32-63
EVTFLAG[1]
EVT 64-95
EVTFLAG[2]
EVT 96-127
EVTFLAG[3]
EVTMASK[0]
EVTMASK[1]
EVTMASK[2]
EVTMASK[3]
MEVTFLAG[0]
MEVTFLAG[1]
MEVTFLAG[2]
MEVTFLAG[3]
EVT0
EVT1
EVT2
EVT3
EVT 4-127
Occur? Care? Both Yes?
Use only if you need more than 12 interrupt events
ECM combines multiple events (e.g. 4-31) into one event (e.g. EVT0)
EVTx ISR must parse MEVTFLAG to determine which event occurred
Target Config Files
Creating a New Target Config File (.ccxml)
Target Configuration defines your “targeti.e. emulator/device used, GEL
scripts (replaces the old CCS Setup)
Create user-defined configurations (select based on chosen board)
“click”
Advanced Tab
More on GEL files...
Specify GEL script here
C6000 Arch “Catchup”
C6000 Embedded Design Workshop - C6000 Introduction 11 - 15
What is a GEL File ?
GEL General Extension Language (not much help, but there you go…)
A GEL file is basically a “batch file” that sets up the CCS debug
environment including:
Memory Map
Watchdog
UART
Other periphs
The board manufacturer (e.g. SD or LogicPD) supplies GEL files
with each board.
To create a “stand-alone” or “bootable” system, the user must
write code to perform these actions (optional chapter covers these details)
Creating Custom Platforms
Creating Custom Platforms -Procedure
Most users will want to create their own custom
platform package (Stellaris/c28X maybe not
they will use a .cmd file directly)
Here is the process:
1. Create a new platform package
2. Select repository, add to project path, select device
3. Import the existing seed platform
4. Modify settings
5. [Save] creates a custom platform pkg
6. Build Options select new custom platform
C6000 Arch “Catchup”
11 - 16 C6000 Embedded Design Workshop - C6000 Introduction
Creating Custom Platforms -Procedure
1Create New Platform (via DEBUG perspective)
2Configure New Platform
“Add Repository to Path” adds platform path to project path
Custom Repository vs. XDC default location
Platform Package Name
Creating Custom Platforms -Procedure
3New Device Page Click “Import” (copy “seed” platform)
4Customize Settings
C6000 Arch “Catchup”
C6000 Embedded Design Workshop - C6000 Introduction 11 - 17
Creating Custom Platforms -Procedure
5[SAVE] New Platform (creates custom platform package)
6Select New Platform in Build Options (RTSC tab)
Custom Repository vs. XDC default location
With path added, the tools find new platform
C6000 Arch “Catchup”
11 - 18 C6000 Embedded Design Workshop - C6000 Introduction
*** this page is blank for absolutely no reason ***
Quiz
C6000 Embedded Design Workshop - C6000 Introduction 11 - 19
Quiz
Chapter Quiz
CPU
256
128
1. How many functional units does the C6000 CPU have?
2. What is the size of a C6000 instruction word?
3. What is the name of the main “bus arbiter” in the architecture?
4. What is the main difference between a bus “master and “slave”?
5. Fill in the names of the following blocks of memory and bus:
Quiz
11 - 20 C6000 Embedded Design Workshop - C6000 Introduction
Quiz - Answers
Chapter Quiz
8 functional units or “execution units
256 bits (8 units x 32-bit instructions per unit)
Switched Central Resource (SCR)
Masters can initiate a memory transfer (e.g. EDMA, CPU…)
CPU
256
128
L1P
L1D
L2
S
C
R
1. How many functional units does the C6000 CPU have?
2. What is the size of a C6000 instruction word?
3. What is the name of the main “bus arbiter in the architecture?
4. What is the main difference between a bus “master” and “slave”?
5. Fill in the names of the following blocks of memory and bus:
Using Double Buffers
C6000 Embedded Design Workshop - C6000 Introduction 11 - 21
Using Double Buffers
Hwi
Single vs Double Buffer Systems
BUF
Swi/Task Hwi Swi/Task
BUF
Single buffer system: collect data or process data not both!
Hwi Swi/Task Hwi Swi/Task
BUF
x
Double buffer system: process and collect data real-time compliant!
BUF
y
BUF
xBUF
y
One buffer can be processed while another is being collected
When Swi/Task finishes buffer, it is returned to Hwi
Task is now ‘caught up’ and meeting real-time expectations
Hwi must have priority over Swi/Task to get new data while prior
data is being processed standard in SYS/BIOS
Nowhere to store new data when prior data is being processed
Using Double Buffers
11 - 22 C6000 Embedded Design Workshop - C6000 Introduction
*** this page is also blank please stop staring at blank pages…it is not healthy ***
Lab 11: An Hwi-Based Audio System
C6000 Embedded Design Workshop - C6000 Introduction 11 - 23
Lab 11: An Hwi-Based Audio System
In this lab, we will use an Hwi to respond to McASP interrupts. The McASP/AIC3106 init code has
already been written for you. The McASP interrupts have been enabled. However, it is your
challenge to create an Hwi and ensure all the necessary conditions to respond to the interrupt are
set up properly.
This lab also employs double buffers ping and pong. Both the RCV and XMT sides have a ping
and pong buffer. The concept here is that when you are processing one, the other is being filled.
A Boolean variable (pingPong) is used to keep track of which “side” you’re on.
Application: Audio pass-thru using Hwi and McASP/AIC3106
Key Ideas: Hwi creation, Hwi conditions to trigger an interrupt, Ping-Pong
memory management
Pseudo Code:
main()init BSL, init LED, return to BIOS scheduler
isrAudio()responds to McASP interrupt, read data from RCV XBUF put in RCV
buffer, acquire data from XMT buffer, write to XBUF. When buffer is full, copy RCV to
XMT buffer. Repeat.
FIR_process() memcpy RCV to XMT buffer. Dummy “algo” for FIR later on
Audio
Output
(48 KHz)
Lab 11 Hwi Audio
ADC
AIC3106
Audio
Input
(48 KHz)
McASP
XBUF12
DAC
AIC3106
McASP
XBUF11
isrAudio
datIn = XBUF12
pIn[cnt] = datIn
datOut = pOut[cnt]
XBUF11 = datOut
if (cnt >= BUF){
Copy RCV→XMT
}
1. Import existing project (Lab11)
2. Create your own CUSTOM PLATFORM
3. Config Hwi to respond to McASP interrupt
4. Debug Interrupt Problems
mcasp.c
aic3106.c
isr.c
rcvPing
xmtPing
Double Buffers
Time = 45min
COPY
RCV XMT
Hwi
Procedure
Lab 11: An Hwi-Based Audio System
11 - 24 C6000 Embedded Design Workshop - C6000 Introduction
Lab 11Procedure
If you cant remember how to perform some of these steps, please refer back to the previous labs
for help. Or, if you really get stuck, ask your neighbor. If you AND your neighbor are stuck, then
ask the instructor (who is probably doing absolutely NOTHING important) for help.
Import Existing Project
1. Close ALL open projects and files and then open CCS.
2. Import Lab11 project.
As before, import the archived starter project from:
C:\TI-RTOS\C6000\Labs\Lab_11\
This starter file contains all the starting source files for the audio project including the setup
code for the A/D and D/A on the OMAP-L138 target board. It also has UIA activated.
3. Check the Properties to ensure you are using the latest XDC, BIOS and UIA.
For every imported project in this workshop, ALWAYS check to make sure the latest tools
(XDC, BIOS and UIA) are being used. The author created these projects at time “x” and you
may have updated the tools on your student PC at “x+1 some time later. The author used
the tools available at time “x” to create the starter projects and solutions which may or may
not match YOUR current set of tools.
Therefore, you may be importing a project that is NOT using the latest versions of the tools
(XDC, BIOS, UIA) or the compiler.
Check ALL settings for the Properties of the project (XDC, BIOS, UIA) and the compiler
and update the imported project to the latest tools before moving on and save all settings.
Hack LogicPD’s BSL types.h
4. Edit Logic PD’s types.h file (already done for you…but take a look at what the author
did).
Logic PD’s type.h contains typedefs that conflict with BIOS. SO, in order for them to play
together nicely, users need to “hack” this file (like the author did for you already).
Open the following file via CCS or any editor:
C:\TI_RTOS\Labs\LogicPD_BSL\DSP BSL\inc\types.h
At the top of the file, notice the following two lines of code:
Close types.h.
Now that this file is hacked, you will be able to use Logic PD’s types.h for all future labs
without a ton of warnings when you build.
Lab 11: An Hwi-Based Audio System
C6000 Embedded Design Workshop - C6000 Introduction 11 - 25
Application (Audio Pass-Thru) Overview
5. Let’s review what this audio pass-thru code is doing.
As discussed in the lab description, this application performs an audio pass-thru. The best
way to understand the process is via I-P-O:
Input (RCV)each analog audio sample from the audio INPUT port is converted by the
A/D and sent to the McASP port on the C6748. For each sample, the McASP generates
an interrupt to the CPU. In the ISR, the CPU reads this sample and puts it in a buffer
(RCV ping or pong). Once the buffer fills up (BUFFSIZE), processing begins…
Process Our algorithm is very fancy it is a COPY from the RCV buffer to the XMT
buffer.
Output (XMT)When the McASP transmit buffer is empty, it interrupts the CPU and asks
for another sample. In the ISR (same ISR for the RCV side), the CPU reads a sample
from the XMT buffer and writes to the McASP transmit register. The McASP sends this
sample to the D/A and is then tramsitted to the audio OUTPUT port.
Several source files are needed to create this application. Lets explore those briefly…
Source Code Overview
6. Inspect the source code.
Following is a brief description of the source code. Because this workshop can be targeted at
many processors (MSP430, Stellaris-M3, C28x, C6000, ARM), some of the hardware details
will be minimized and saved for the target-specific chapter.
Feel free to open any of these files and inspect them as you read
main.hsame as before, but contains more function prototypes
aic3106_TTO.c initializes the analog interface chip (AIC) on the EVM this is the A/D
and D/A combo device.
fir.cthis is a placeholder for the algorithm. Currently, it is simply a copy function to
copy RCV to XMT buffers.
isr.cThis is the interrupt service routine (isrAudio). When the interrupt from the
McASP fires (RCV or XMT), the BIOS HWI (soon to be set up) will call this routine to
read/write audio samples.
main.csets up the McASP and AIC and then calls BIOS_start().
mcasp_TTO.c init code for the McASP on the C6748 device.
Lab 11: An Hwi-Based Audio System
11 - 26 C6000 Embedded Design Workshop - C6000 Introduction
More Detailed Code Analysis
7. Open main.c for editing.
Near the top of the file, you will see the buffer allocations:
Notice that we have separate buffers for Ping and Pong for both RCV and XMT. Where is
BUFFSIZE defined? Main.h. We’ll see him in a minute.
As you go into main(), you’ll see the zeroing of the buffers to provide initial conditions of
ZERO. Think about this for a minute. Is that ok? Well, it depends on your system. If
BUFFSIZE is 256, that means 256 ZEROs will be transmitted to the DAC during the first 256
interrupts. What will that sound like? Do we care? Some systems require solid initial
conditions so keep that in mind. We will just live with the zeros for now.
Then, you’ll see the calls to the init routines for the McASP and AIC3106. Previously, with
DSP/BIOS, this is where an explicit call to init interrupts was located. However, with
SYS/BIOS, this is done via the GUI. Lastly, there is a call to McASP_Start(). This is where
the McASP is taken out of reset and the clocks start operating and data starts being shifted
in/out. Soon thereafter, we will get the first interrupt.
8. Open mcasp_TTO.c for editing.
This file is responsible for initializing and starting the McASP hence, two functions (init and
start). In particular, look at line numbers 83 and 84 (approximately). This is where the
serializers are chosen. This specifies XBUF11 (XMT) and XBUF12 (RCV). Also, look at line
numbers 111-114. This is where the McASP interrupts are enabled. So, if they are enabled
correctly, we should get these interrupts to fire to the CPU.
Lab 11: An Hwi-Based Audio System
C6000 Embedded Design Workshop - C6000 Introduction 11 - 27
9. Open isr.c for editing.
Well, this is where all the real work happens inside the ISR. This code should look pretty
familiar to you already. There are 3 key concepts to understand in this code:
Ping/Pong buffer management notice that two “local” pointers are used to point to the
RCV/XMT buffers. This was done as a pre-cursor to future labs but works just fine here
too. Notice at the top of the function that the pointers are initialized only if blkCnt is
zero (i.e it is time to switch from ping to pong buffers or vice versa) and we’re done with
the previous block. blkCnt is used as an index into the buffers.
McASP reads/writes refer to the read/write code in the middle. When an interrupt
occurs, we don’t know if it was the RRDY (RCV) or XRDY (XMT) bit that triggered the
interrupt. We must first test those bits, then perform the proper read or write accordingly.
On EVERY interrupt, we EITHER read one sample and write one sample. All McASP
reads and writes are 32 bits. Period. Even if your word length is 16 bits (like ours is).
Because we are “MSB first”, the 16-bits of interest land in the UPPER half of the 32-bits.
We turned on ROR (rotate-right) of 16 bits on rcv/xmt to make our code look more
readable (and save time vs. >> 16 via the compiler).
At the end of the block what happens? Look at the bottom of the code. When
BUFFSIZE is reached, blkCnt is zero’d and the pingPong Boolean switches. Then, a
call to FIR_process() is made that simply copies RCV buffer to XMT buffer. Then, the
process happens all over again for the “other” (PING or PONG) buffers.
10. Open fir.c for editing.
This is currently a placeholder for a future FIR algorithm to filter our audio. We are simply
“pass through” the data from RCV to XMT. In future labs, a FIR filter written in C will
magically appear and we’ll analyze its performance quite extensively.
11. Open main.h for editing.
main.h is actually a workhorse. It contains all of the #includes for BSL and other items,
#defines for BUFFSIZE and PING/PONG, prototypes for all functions and externs for all
variables that require them. Whenever you are asked to “change BUFFSIZE”, this is the file
to change it in.
Lab 11: An Hwi-Based Audio System
11 - 28 C6000 Embedded Design Workshop - C6000 Introduction
Creating A Custom Platform
12. Create a custom platform file.
In previous labs, we specified a platform file during creation of a new project. In this lab, we
will create our own custom platform that we will use throughout the rest of the labs. Plus, this
is a good skill to know how to do.
Whenever you create your own project, you should always IMPORT the seed platform file for
the specific target board and then make changes. This is what we plan to do next
In Debug Perspective, select: Tools
RTSC Tools
Platform
New
When the following dialogue appears:
Give your platform a name: evmc6748_student (the author used _TTO for his)
Point the repository to the path shown (this is where the platform package is stored)
Then select the Device Family/Name as shown
Check the box “Add Repository to Project Package Path” (so we can find it later).
When you check this box, select your current project in the listing that pops up. This also
adds this repository to the list of Repositories in the Properties
General
RTSC tab
dialogue.
Click Next.
Lab 11: An Hwi-Based Audio System
C6000 Embedded Design Workshop - C6000 Introduction 11 - 29
When the new platform dialogue appears, click the IMPORT button to copy the seed file
we used before:
This will copy all of the initial default settings for the board and then we can modify them. A
dialogue box should pop up and select the proper seed file as shown ( select the _TTO
version of the platform file that the author already created for you):
Modify the memory settings to allocate all code, data and stacks into internal memory
(IRAM) as shown. They may already be SET this way just double check.
BEFORE YOU SAVE HAVE THE INSTRUCTOR CHECK THIS FILE.
Then save the new platform. This will build a new platform package.
13. Tell the tools to use this new custom platform in your project.
We have created a new platform file, but we have not yet ATTACHED it to our project. When
the project was created, we were asked to specify a platform file and we chose the default
seed platform. How do we get back to the configuration screen?
Right-click on the project and select Properties
General and then select the RTSC tab.
Look near the bottom and you’ll see that the default seed platform is still specified. We
need to change this.
`Click on the down arrow next to the Platform File. The tools should access your new
repository with your new custom platform file: evmc6748_student.
Select YOUR STUDENT PLATFORM FILE and click Ok. Now, your project is using the
new custom platform. Very nice…
evmc6748_student
Lab 11: An Hwi-Based Audio System
11 - 30 C6000 Embedded Design Workshop - C6000 Introduction
Add Hwi to the Project
14. Use Hwi module and configure the hardware interrupt for the McASP.
Ok, FINALLY, we get to do some real work to get our code running. For most targets, an
interrupt source (e.g. McASP) will have an interrupt EVENT ID (specified in the datasheet).
This event id needs to be tied to a specific CPU interrupt. The details change based on the
target device.For the C6748, the EVENT ID is #61 and the CPU interrupt we’re using is INT5
(there are 16 interrupts on the C6748 again, target specific).
So, we need to do two things: (1) tell the tools we want to USE the Hwi BIOS module; (2)
configure a specific interrupt to point to our ISR routine (isrAudio).
During the 2-day TI-RTOS Kernel Workshop, you performed these actions so this should
be review but thats ok. Review is good.
First, make sure you are viewing the hwi.cfg file.
In the list of Available Products, locate Hwi, right-click and select “Use Hwi”. It will now
show up on the right-hand Outline View.
Then, right click on Hwi in the Outline View and selectNew Hwi”.
When the dialogue appears, which is different than what you see below, click OK.
Then click on the new Hwi (hwi0) (you’ll see a new dialogue like below) and fill in the
following:
Make sure “Enabled at startup” is NOT checked (this sets the IER bit
on the C6748). This will provide us with something to debug later. Once again, you can click
on the new HWI and see the corresponding Source script code.
Lab 11: An Hwi-Based Audio System
C6000 Embedded Design Workshop - C6000 Introduction 11 - 31
Build, Load, Run.
15. Build, load and run the audio pass-thru application.
Before you Run, make sure audio is playing into the board and your headphones are set
up so you can hear the audio.
Also, make sure that Windows Media Player is set to REPEAT forever. If the music stops
(the input is air), and you click Run, you might think there is a problem with your code. Nope,
there is no music playing.
Build and fix any errors. After a successful build, debug the application.
Once the program is loaded, click Run.
Do you hear audio? If not, it’s debug timeit SHOULD NOT be working (by design). One
quick tip for debug is to place a breakpoint in the isrAudio() routine and see if the program
stops there. If not, no interrupt is being generated. Move on to the next steps to debug the
problem
Hint: The McASP on the C6748 cannot be restarted after a halt i.e. you can’t just hit halt,
then Run. Once you halt the code, you must click the restart button and then Play.
Debug Interrupt Problem
As we already know, we decided early on to NOT enable the IER bit in the static configuration of
the Hwi. Ok. But debugging interrupt problems is a crucial skill. The next few steps walk you
through HOW to do this. You may not know WHERE your interrupt problem occurred, so using
these brief debug skills may help in the future.
16. Pause for a moment to reflect on the “dominos” in the interrupt game:
An interrupt must occur (McASP init code should turn ON this source)
The individual interrupt must be enabled (IER, BITx)
Global Interrupts must be turned on (GIE = 1, handled by BIOS)
HWI Dispatcher must be used to provide proper context save/restore
Keep this all in mind as you do the following steps
17. McASP interrupt firing IFR bit set?
The McASP interrupt is set to fire properly, but is it setting the IFR bit? You configured
HWI_INT5, so that would be a “1” in bit 5 of the IFR.
Go there now (View → Registers → Core Registers). `Look down the list to find the IFR
and IERthe two of most interest at the moment. (author note: could it have been set, then
auto-cleared already?). You can also DISABLE IERbit (as it is already in the CFG file),
build/run, and THEN look at IFR (this is a nice trick).
Write your debug “checkmarks” here:
IFR bit set? Yes No
Lab 11: An Hwi-Based Audio System
11 - 32 C6000 Embedded Design Workshop - C6000 Introduction
18. Is the IER bit set?
Interrupts must be individually enabled. When you look at IER bit 5, is it set to 1”? Probably
NOT because we didnt check that “Enable at Start” checkbox.
Open up the config for HWI_INT5 and check the proper checkbox. Then, hit build and your
code will build and load automatically regardless of which perspective you are in.
IER bit set? Yes No
Do you hear audio now? You probably should. But lets check one more thing…
19. Is GIE set?
The Global Interrupt Enable (GIE) Bit is located in the CPU’s CSR register. SYS/BIOS turns
this on automatically and then manages it as part of the O/S. So, no need to check on this.
GIE bit set? Yes No
Hint: If you create a project that does NOT use SYS/BIOS, it is the responsibility of the user to
not only turn on GIE, but also NMIE in the CSR register. Otherwise, NO interrupts will be
recognized. Ever. Did I say ever?
Other Debug/Analysis Items
20. Using “Load Program After Build” Option and Restart.
Often times, users want to make a minor change in their code and rebuild and run quickly.
After you launch a debug session and connect to the target (which takes time), there is NO
NEED to terminate the session to make code changes. After pausing (halting) the code
execution, make a change to code (using the Edit perspective or Debug perspective) and hit
“Build”. CCS will build and load your new .out file WITHOUT taking the time to launch a new
debug session or re-connecting to the target. This is very handy. TRY THIS NOW.
Because we are using the McASP, any underrun will cause the McASP to crash (no more
audio to the speaker/headphone). So, how can you halt and then start again quickly?
Halt your code and then select Run
Restart or click the Restart button (arrow with
PLAY):
So, try this now.
Run your code and halt (pause). Run again. Do you hear audio? Nope. Click the restart
button and run again. Now it should work.
These will be handy tips for all lab steps now and in the future.
Lab 11: An Hwi-Based Audio System
C6000 Embedded Design Workshop - C6000 Introduction 11 - 33
That’s It. You’re Done!!
21. Note about benchmarks, UIA and Logs in this lab.
There is really no extra work we can do in terms of UIA and Logs. These services will be
used in all future labs. If you have time and want to add a Log or benchmark using
Timestamp to the code, go ahead.
You spent the past two days in the Kernel workshop playing with these tools. The point of this
lab was to get you up to speed on Platforms and focusing more on C6000 as the specific
target. In the future labs, though, you’ll have more chances to use UIA and Logs to test the
compiler and optimizer and cache settings.
22. Close the project and delete it from the workspace.
Terminate the debug session and close CCS. Power cycle the board.
RAISE YOUR HAND and get the instructor’s attention when you
have completed PART A of this lab. If time permits, you can
quickly do the next optional part…
Lab 11: An Hwi-Based Audio System
11 - 34 C6000 Embedded Design Workshop - C6000 Introduction
PART B (Optional) Using the Profiler Clock
23. Turn on the Profiler Clock and perform a benchmark.
Set two breakpoints anywhere you like (double click in left pane of code) one at the
“start” point and another at the “endpoint that you want to benchmark.
Turn on the Profiler clock by selecting: Run → Clock → Enable
In the bottom right-hand part of the screen, you should see a little CLK symbol that looks like
this:
Run to the first breakpoint, then double-click on the clock symbol to zero it. Run again and
the number of CPU cycles will display.
Additional Information
C6000 Embedded Design Workshop - C6000 Introduction 11 - 35
Additional Information
Notes
11 - 36 C6000 Embedded Design Workshop - C6000 Introduction
Notes
C6000 Embedded Design Workshop - C6000 CPU Architecture 12 - 1
C64x+/C674x+ CPU Architecture
Introduction
In this chapter, we will take a deeper look at the C64x+ architecture and assembly code. The
point here is not to cover HOW to write assembly it is just a convenient way to understand the
architecture better.
Objectives
Objectives
Provide a detailed overview of the
C64x+/C674x CPU architecture
Describe the basic ASM language
and h/w needed to solve a SOP
Analyze how the hardware
pipeline works
Learn basics of software pipelining
Module Topics
12 - 2 C6000 Embedded Design Workshop - C6000 CPU Architecture
Module Topics
C64x+/C674x+ CPU Architecture ........................................................................................... 9-1
Module Topics...................................................................................................................... 9-2
What Does A DSP Do? ........................................................................................................ 9-3
CPU From the Inside Out ............................................................................................ 9-4
Instruction Sets .................................................................................................................. 9-10
MAC Instructions ............................................................................................................. 9-12
C66x MAC” Instructions ................................................................................................. 9-14
Hardware Pipeline .............................................................................................................. 9-15
Software Pipelining ............................................................................................................ 9-16
Chapter Quiz ...................................................................................................................... 9-19
Quiz - Answers ............................................................................................................... 9-20
What Does A DSP Do?
C6000 Embedded Design Workshop - C6000 CPU Architecture 12 - 3
What Does A DSP Do?
What Problem Are We Trying To Solve?
Digital sampling of
an analog signal:
A
t
Most DSP algorithms can be
expressed with MAC:
count
i = 1
Y = Σcoeffi* xi
for (i = 0; i < count; i++){
Y += coeff[i] * x[i]; }
DAC
x Y
ADC
DSP
How is the architecture designed to maximize computations like this?
'C6x CPU Architecture
Memory
‘C6x Compiler excels at Natural C
Multiplier (.M) and ALU (.L) provide up
to 8 MACs/cycle (8x8
or
16x16)
Specialized instructions accelerate
intensive, non-MAC oriented
calculations. Examples include:
Video compression, Machine
Vision, Reed Solomon, …
While MMACs speed math intensive
algorithms, flexibility of 8 independent
functional units allows the compiler to
quickly perform other types of
processing
‘C6x CPU can dispatch up to eight
parallel instructions each cycle
All ‘C6x instructions are conditional
allowing efficient hardware pipelining
A0
A31
.
.
.S1
.D1
.L1
.S2
.M1 .M2
.D2
.L2
B0
B31
.
.
Controller/Decoder
MACs
Note: More details later
CPU From the Inside Out
12 - 4 C6000 Embedded Design Workshop - C6000 CPU Architecture
CPU From the Inside Out
Mult
ALU
MPY c, x, prod
ADD y, prod, y
y =
40
c
n
x
n
n = 1
*
The Core of DSP : Sum of Products
The ‘C6000
Designed to
handle DSP’s
math-intensive
calculations
ALU
.M
MPY .M c, x, prod
.L ADD .L y, prod, y
Note:
You don’t have to
specify functional
units (.M or .L)
Where are the variables stored?
y =
40
c
n
x
n
n = 1
*
Register File A
c
x
prod
32-bits
y
.
.
.
.M
.L
Working Variables : The Register File
16 or 32 registers
MPY .M c, x, prod
ADD .L y, prod, y
How can we loop our ‘MAC’?
CPU From the Inside Out
C6000 Embedded Design Workshop - C6000 CPU Architecture 12 - 5
Making Loops
1. Program flow: the branch instruction
2. Initialization: setting the loop count
3. Decrement: subtract 1 from the loop counter
B loop
SUB cnt, 1, cnt
MVK 40, cnt
y =
40
c
n
x
n
n = 1
*
“.S Unit: Branch and Shift Instructions
MVK .S 40, cnt
loop:
MPY .M c, x, prod
ADD .L y, prod, y
SUB .L cnt, 1, cnt
B.S loop
.M
.L
.S
Register File A
32-bits
c
x
prod
y
.
.
.
cnt
How is the loop terminated?
16 or 32 registers
CPU From the Inside Out
12 - 6 C6000 Embedded Design Workshop - C6000 CPU Architecture
Conditional Instruction Execution
Note: If condition is false, execution is essentially replaced with nop
Code Syntax Execute if:
[ cnt ] cnt 0
[ !cnt ] cnt =0
Execution based on [zero/non-zero] value of specified variable
To minimize branching, all instructions are conditional
[condition] Bloop
y =
40
c
n
x
n
n = 1
*
Loop Control via Conditional Branch
MVK .S 40, cnt
loop:
MPY .M c, x, prod
ADD .L y, prod, y
SUB .L cnt, 1, cnt
[cnt] B.S loop
.M
.L
.S
Register File A
32-bits
c
x
prod
y
.
.
.
cnt
How are the c and x array values brought in from memory?
16 or 32 registers
CPU From the Inside Out
C6000 Embedded Design Workshop - C6000 CPU Architecture 12 - 7
Memory Access via .D” Unit
.M
.L
.S
y =
40
cnxn
n = 1
*
MVK .S 40, cnt
loop:
LDH .D *cp , c
LDH .D *xp , x
MPY .M c, x, prod
ADD .L y, prod, y
SUB .L cnt, 1, cnt
[cnt] B.S loop
Data Memory:
x(40), a(40), y
Register File A
c
x
prod
y
cnt
*cp
*xp
*yp
.D
What does the “H” in LDH signify?
16 or 32 registers
Note: No restrictions on which regs can be
used for address or data!
Memory Access via .D” Unit
.M
.L
.S
y =
40
cnxn
n = 1
*
MVK .S 40, cnt
loop:
LDH .D *cp , c
LDH .D *xp , x
MPY .M c, x, prod
ADD .L y, prod, y
SUB .L cnt, 1, cnt
[cnt] B.S loop
Data Memory:
x(40), a(40), y
Register File A
c
x
prod
y
cnt
*cp
*xp
*yp
.D
How do we increment through the arrays?
16 or 32 registers
Instr. Description C Type Size
LDB load byte char 8-bits
LDH load half-word short 16-bits
LDW load word int 32-bits
LDDW* load double-word double 64-bits
* Except C62x & C67x generations
CPU From the Inside Out
12 - 8 C6000 Embedded Design Workshop - C6000 CPU Architecture
Auto-Increment of Pointers
Register File A
c
x
prod
y
cnt
*cp
*xp
*yp
y =
40
cnxn
n = 1
*
MVK .S 40, cnt
loop:
LDH .D *cp++, c
LDH .D *xp++, x
MPY .M c, x, prod
ADD .L y, prod, y
SUB .L cnt, 1, cnt
[cnt] B.S loop
.M
.L
.S
Data Memory:
x(40), a(40), y
.D
How do we store results back to memory?
16 or 32 registers
Storing Results Back to Memory
Register File A
c
x
prod
y
cnt
*cp
*xp
*yp
y =
40
cnxn
n = 1
*
MVK .S 40, cnt
loop:
LDH .D *cp++, c
LDH .D *xp++, x
MPY .M c, x, prod
ADD .L y, prod, y
SUB .L cnt, 1, cnt
[cnt] B.S loop
STW .D y, *yp
.M
.L
.S
Data Memory:
x(40), a(40), y
.D
But wait -that’s only half the story...
16 or 32 registers
CPU From the Inside Out
C6000 Embedded Design Workshop - C6000 CPU Architecture 12 - 9
Dual Resources : Twice as Nice
A0
A1
A2
A3
A4
Register File A
A15
or
A31
A5
A6
A7
cn
xn
prd
sum
cnt
.
.
*c
*x
*y
.M1
.L1
.S1
.D1
.M2
.L2
.S2
.D2
Register File B
B0
B1
B2
B3
B4
B15
or
B31
B5
B6
B7
.
.
32-bits
.
.
.
.
32-bits
Our final view of the sum of products example...
MVK .S1 40, A2
loop: LDH .D1 *A5++, A0
LDH .D1 *A6++, A1
MPY .M1 A0, A1, A3
ADD .L1 A4,A3, A4
SUB .S1 A2, 1, A2
[A2] B.S1 loop
STW .D1 A4, *A7
y =
40
c
n
x
n
n = 1
*
Optional -Resource Specific Coding
A0
A1
A2
A3
A4
Register File A
A15
or
A31
A5
A6
A7
c
n
x
n
prd
sum
cnt
.
.
*c
*x
*y
.M1
.L1
.S1
.D1
32-bits
.
.
It’s easier to use symbols rather than
register names, but you can use
either method.
Instruction Sets
12 - 10 C6000 Embedded Design Workshop - C6000 CPU Architecture
Instruction Sets
‘C62x RISC-like instruction set
.L
.D
.S
.M
No Unit Used
IDLENOP
.S Unit
NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO
ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKH
.L Unit
NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO
ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM
.M Unit
SMPY
SMPYH
MPY
MPYH
MPYLH
MPYHL
.D Unit
NEG
STB (B/H/W)
SUB
SUBAB (B/H/W)
ZERO
ADD
ADDAB (B/H/W)
LDB (B/H/W)
MV
‘C67x: Superset of Fixed-Point
.L
.D
.S
.M
No Unit Required
IDLENOP
.S Unit
NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO
ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKH
ABSSP
ABSDP
CMPGTSP
CMPEQSP
CMPLTSP
CMPGTDP
CMPEQDP
CMPLTDP
RCPSP
RCPDP
RSQRSP
RSQRDP
SPDP
.L Unit
NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO
ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM
ADDSP
ADDDP
SUBSP
SUBDP
INTSP
INTDP
SPINT
DPINT
SPRTUNC
DPTRUNC
DPSP
.M Unit
SMPY
SMPYH
MPY
MPYH
MPYLH
MPYHL
MPYSP
MPYDP
MPYI
MPYID
.D Unit
NEG
STB (B/H/W)
SUB
SUBAB (B/H/W)
ZERO
ADD
ADDAB (B/H/W)
LDB (B/H/W)
LDDW
MV
Instruction Sets
C6000 Embedded Design Workshop - C6000 CPU Architecture 12 - 11
'C64x: Superset of C62x Instruction Set
.L
Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
PACKH4
PACKL4
UNPKHU4
UNPKLU4
SWAP2/4
Dual/Quad Arith
ABS2
ADD2
ADD4
MAX
MIN
SUB2
SUB4
SUBABS4
Bitwise Logical
ANDN
Shift & Merge
SHLMB
SHRMB
Load Constant
MVK (5-bit)
.D
.S
.M
Bit Operations
BITC4
BITR
DEAL
SHFL
Move
MVD
Average
AVG2
AVG4
Shifts
ROTL
SSHVL
SSHVR
Multiplies
MPYHI
MPYLI
MPYHIR
MPYLIR
MPY2
SMPY2
DOTP2
DOTPN2
DOTPRSU2
DOTPNRSU2
DOTPU4
DOTPSU4
GMPY4
XPND2/4
Mem Access
LDDW
LDNW
LDNDW
STDW
STNW
STNDW
Load Constant
MVK (5-bit)
Dual Arithmetic
ADD2
SUB2
Bitwise Logical
AND
ANDN
OR
XOR
Address Calc.
ADDAD
Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
UNPKHU4
UNPKLU4
SWAP2
SPACK2
SPACKU4
Dual/Quad Arith
SADD2
SADDUS2
SADD4
Bitwise Logical
ANDN
Shifts & Merge
SHR2
SHRU2
SHLMB
SHRMB
Compares
CMPEQ2
CMPEQ4
CMPGT2
CMPGT4
Branches/PC
BDEC
BPOS
BNOP
ADDKPC
C64x+ Additions
.L
ADDSUB
ADDSUB2
DPACK2
DPACKX2
SADDSUB
SADDSUB2
SHFL3
SSUB2
.D
.S
.M
CMPY
CMPYR
CMPYR1
DDOTP4
DDOTPH2
DDOTPH2R
DDOTPL2
DDOTPL2R
GMPY
MPY2IR
MPY32 (32-bit result)
MPY32 (64-bit result)
MPY32SU
MPY32U
MPY32US
SMPY32
XORMPY
None
CALLP
DMV
RPACK2
DINT
RINT
SPKERNEL
SPKERNELR
SPLOOP
SPLOOPD
SPLOOPW
SPMASK
SPMASKR
SWE
SWENR
None
MAC Instructions
12 - 12 C6000 Embedded Design Workshop - C6000 CPU Architecture
“MAC” Instructions
DOTP2 with LDDW
=
+
a2 a0 A1:A0 LDDW .D1 *A4++,A1:A0
|| LDDW .D2 *B4++,B1:B0
A2B2
a3*x3 + a2*x2 a1*x1 + a0*x0 DOTP2 A0,B0,A2
|| DOTP2 A1,B1,B2
+
intermediate sum ADD A2,A3,A3
a1a3 :
A5
x2 x0 B1:B0
x1x3 :
final sum ADD A3,B3,A4
A4
+
|| ADD B2,B3,B3
intermediate sum
A3B3
Block Real FIR Example (DDOTPL2 )
for (i = 0; I < ndata; i++) {
sum = 0;
for (j = 0; j < ncoef; j++) {
sum = sum + (d[i+j] * c[j]);
}
y[i] = sum;
}
loop Iteration
[i,j] [0,0] [0,1]
d0c0
+
d1c1 d1c0
+
d2c1
d2c2
d3c3 d3c2
.
.
.
Four 16x16 multiplies
In each .M unit every cycle
--------------------------------------
adds up to 8 MACs/cycle, or
8000 MMACS
Bottom Line: Two loop
iterations for the price of one
DDOTPL2 d3d2:d1d0, c1c0, sum1:sum0
MAC Instructions
C6000 Embedded Design Workshop - C6000 CPU Architecture 12 - 13
Complex Multiply (CMPY)
A0 r1 i1
x x
A1 r2 i2
= =
CMPY A0, A1, A3:A2 r1*r2 - i1*i2 : i1*r2 + r1*i2
32-bits 32-bits
Four 16x16 multiplies per .M unit
Using two CMPYs, a total of eight 16x16 multiplies per cycle
Floating-point version (CMPYSP) uses:
64-bit inputs (register pair)
128-bit packed products (register quad)
You then need to add/subtract the products to get the final result
single .M unit
C66x MAC” Instructions
12 - 14 C6000 Embedded Design Workshop - C6000 CPU Architecture
C66x “MAC” Instructions
C66x: QMPY32 (fixed), QMPYSP (float)
A3:A2:A1:A0 c3 : c2 : c1 : c0
x x x x
A7:A6:A5:A4 x3 : x2 : x1 : x0
= = = =
A11:A10:A9:A8 c3*x3 : c2*x3 : c1*x1 : c0*x0 QMPY32
or QMPYSP
32-bits 32-bits 32-bits 32-bits
Four 32x32 multiplies per .M unit
Total of eight 32x32 multiplies per cycle
Fixed or floating-point versions
Output is 128-bit packed result (register quad)
single .M unit
C66x: Complex Matrix Multiply (
CMAXMULT
)
src1 r1 i1 : r2 i2
src2_3 src2_2 src2_1 src2_0
src2 ra ia : rb ib : rc ic : rd id
dest
r1*ra -i1*ia
+
r2*rc -i2*ic
:
r1*ia + i1*ra
+
r2*ic + i2*rc
:
r1*rb -i1*ib
+
r2*rd -i2*id
:
r1*ib + i1*rb
+
r2*id + i2*rd
single .M unit
32-bits 32-bits 32-bits 32-bits
Single .M unit implements complex matrix multiply using 16 MACs (all in 1 cycle)
Achieve 32 16x16 multiplies per cycle using both .M units
[ M9 M8 ] =[ M7 M6 ] *M3 M2
M1 M0
M9 = M7*M3 + M6*M1
M8 = M7*M2 + M6*M0
Where Mx represents a packed
16-bit complex number
Hardware Pipeline
C6000 Embedded Design Workshop - C6000 CPU Architecture 12 - 15
Hardware Pipeline
Pipeline Full
PG PS PW PR DP DC E1
PG PS PW PR DP DC E1
PG PS PW PR DP DC E1
PG PS PW PR DP DC E1
PG PS PW PR DP DC E1
PG PS PW PR DP DC E1
PG PS PW PR DP DC E1
Program
Fetch Execute
Decode
Pipeline Phases
Pipeline Phases
Full Pipe
Software Pipelining
12 - 16 C6000 Embedded Design Workshop - C6000 CPU Architecture
Software Pipelining
Single Cycle 0
Multiply 1
Load 4
Branch B5
All, instrs
except ...
M PY,
SMPY
LDB, LDH,
LDW
Description # Instr. Delay
All 'C64x instructions require only one cycle to
execute, but some results are delayed ...
Instruction Delays
MVK .S1 40, A2
loop: LDH .D1 *A5++, A0
LDH .D1 *A6++, A1
MPY .M1 A0, A1, A3
ADD .L1 A4,A3, A4
SUB .S1 A2, 1, A2
[A2] B.S1 loop
STW .D1 A4, *A7
y =
40
c
n
x
n
n = 1
*
Would This Code Work As Is ??
A0
A1
A2
A3
A4
Register File A
A15
or
A31
A5
A6
A7
c
n
x
n
prd
sum
cnt
.
.
*c
*x
*y
.M1
.L1
.S1
.D1
32-bits
.
.
Need to add NOPs to get this
code to work properly…
NOP = “Not Optimized Properly
How many instructions can this CPU
execute every cycle?
Software Pipelining
C6000 Embedded Design Workshop - C6000 CPU Architecture 12 - 17
Software Pipelined Algorithm
.L1
.L2
.S1
.S2
.M1
.M2
.D1
.D2
add
mpy3
mpy2
mpy
76543210
ldw8
ldw7
ldw6
ldw5
ldw4
ldw3
ldw2
ldw m
ldw8
ldw7
ldw6
ldw5
ldw4
ldw3
ldw2
ldw n
1
4
mpyh3
mpyh2
mpyh
sub7
sub6
sub5
sub4
sub3
sub2
sub
add
B6
B5
B4
B3
B2
B
2
7
5
8
LOOPPROLOG
3
6
Software Pipelined ‘C6x Code
c0: ldw .D1 *A4++,A5
|| ldw .D2 *B4++,B5
c1: ldw .D1 *A4++,A5
|| ldw .D2 *B4++,B5
|| [B0] sub .S2 B0,1,B0
c2_3_4: ldw .D1 *A4++,A5
|| ldw .D2 *B4++,B5
|| [B0] sub .S2 B0,1,B0
|| [B0] B .S1 loop
.
.
.
c5_6: ldw .D1 *A4++,A5
|| ldw .D2 *B4++,B5
|| [B0] sub .S2 B0,1,B0
|| [B0] B .S1 loop
|| mpy .M1x A5,B5,A6
|| mpyh .M2x A5,B5,B6
.
.
*** Single-Cycle Loop
loop:ldw .D1 *A4++,A5
|| ldw .D2 *B4++,B5
|| [B0] sub .S2 B0,1,B0
|| [B0] B .S1 loop
|| mpy .M1x A5,B5,A6
|| mpyh.M2x A5,B5,B6
|| add .L1 A7,A6,A7
|| add .L2 B7,B6,B7
Software Pipelining
12 - 18 C6000 Embedded Design Workshop - C6000 CPU Architecture
*** this page contains no useful information ***
Chapter Quiz
C6000 Embedded Design Workshop - C6000 CPU Architecture 12 - 19
Chapter Quiz
Chapter Quiz
1. Name the four functional units and types of instructions they execute:
2. How many 16x16 MACs can a C674x CPU perform in 1 cycle? C66x ?
3. Where are CPU operands stored and how do they get there?
4. What is the purpose of a hardware pipeline?
5. What is the purpose of s/w pipelining, which tool does this for you?
Chapter Quiz
12 - 20 C6000 Embedded Design Workshop - C6000 CPU Architecture
Quiz - Answers
Chapter Quiz
1. Name the four functional units and types of instructions they execute:
2. How many 16x16 MACs can a C674x CPU perform in 1 cycle? C66x ?
3. Where are CPU operands stored and how do they get there?
4. What is the purpose of a hardware pipeline?
5. What is the purpose of s/w pipelining, which tool does this for you?
M unit Multiplies (fixed, float)
L unit ALU arithmetic and logical operations
S unit Branches and shifts
D unit Data loads and stores
C674x 8 MACs/cycle, C66x 32 MACs/cycle
Register Files (A and B), Load (LDx) data from memory
Maximize performance use as many functional units as possible in
every cycle, the COMPILER/OPTIMIZER performs SW pipelining
To break up instruction execution enough to reach min cycle count
thereby allowing single cycle execution when pipeline is FULL
C6000 Embedded Design Workshop - C and System Optimizations 13 - 1
C and System Optimizations
Introduction
In this chapter, we will cover the basics of optimizing C code and some useful tips on system
optimization. Also included here are some other system-wide optimizations you can take
advantage of in your own application if they are necessary.
Outline
Objectives
Describe how to configure and use the
various compiler/optimizer options
Discuss the key techniquesto increase
performance or reduce code size
Demonstrate how to use optimized libraries
Overview key system optimizations
Lab 13 Use FIR algo on audio data,
optimize using the compiler, benchmark
Module Topics
13 - 2 C6000 Embedded Design Workshop - C and System Optimizations
Module Topics
C and System Optimizations ................................................................................................................... 13-1
Module Topics ........................................................................................................................................ 13-2
Introduction “Optimal and “Optimization” ............................................................................................ 13-3
C Compiler and Optimizer ...................................................................................................................... 13-5
“Debug” vs. “Optimized” ..................................................................................................................... 13-5
Levels of Optimization ........................................................................................................................ 13-6
Build Configurations ........................................................................................................................... 13-7
Code Space Optimization (ms) ........................................................................................................ 13-7
File and Function Specific Options ..................................................................................................... 13-8
Coding Guidelines .............................................................................................................................. 13-9
Data Types and Alignment ................................................................................................................... 13-10
Data Types ....................................................................................................................................... 13-10
Data Alignment................................................................................................................................. 13-11
Using DATA_ALIGN ......................................................................................................................... 13-12
Upcoming Changes ELF vs. COFF ............................................................................................... 13-13
Restricting Memory Dependencies (Aliasing) ....................................................................................... 13-14
Access Hardware Features Using Intrinsics ...................................................................................... 13-16
Give Compiler MORE Information ........................................................................................................ 13-17
Pragma Unroll() ............................................................................................................................. 13-17
Pragma MUST_ITERATE() ........................................................................................................... 13-18
Keyword - Volatile ............................................................................................................................ 13-18
Setting MAX interrupt Latency (-mi option)....................................................................................... 13-19
Compiler Directive - _nassert()......................................................................................................... 13-20
Using Optimized Libraries .................................................................................................................... 13-21
Libraries Download and Support ................................................................................................... 13-23
System Optimizations ........................................................................................................................... 13-24
BIOS Libraries .................................................................................................................................. 13-24
Custom Sections .............................................................................................................................. 13-26
Use Cache ....................................................................................................................................... 13-27
Use EDMA ....................................................................................................................................... 13-28
System Architecture SCR.............................................................................................................. 13-29
Chapter Quiz ........................................................................................................................................ 13-31
Quiz - Answers ................................................................................................................................. 13-32
Lab 13 C Optimizations ..................................................................................................................... 13-33
Lab 13 C Optimizations Procedure ................................................................................................. 13-34
PART A Goals and Using Compiler Options ................................................................................. 13-34
Determine Goals and CPU Min.................................................................................................... 13-34
Using Debug Configuration (–g, NO opt) ..................................................................................... 13-35
Using Release Configuration (–o2, no –g) ................................................................................... 13-36
Using “Opt” Configuration ............................................................................................................ 13-38
Part B Code Tuning ....................................................................................................................... 13-40
Part C Minimizing Code Size (ms) .............................................................................................. 13-43
Part D Using DSPLib ..................................................................................................................... 13-44
Conclusion ....................................................................................................................................... 13-45
Additional Information ........................................................................................................................... 13-46
Notes ................................................................................................................................ 13-48
Introduction “Optimal” and “Optimization”
C6000 Embedded Design Workshop - C and System Optimizations 13 - 3
Introduction “Optimal” and “Optimization
Know Your Goal and Your Limits…
count
i = 1
Y = Σcoeff
i
* x
i
for (i= 0; i< count; i++){
Y += coeff[i] * x[i]; }
A typical goal of any system’s algo is to meet real-time
You might also want to approach or achieve CPU Min” in
order to maximize #channels processed
The minimum # cycles the algo takes based on architectural
limits (e.g. data size, #loads, math operations required)
Goals:
CPU Min (the “limit”):
Often, meeting real-time only requires setting a few compiler options (easy)
However, achieving CPU Min” often requires extensive knowledge
of the architecture (harder, requires more time)
Real-time vs. CPU Min
Introduction “Optimal” and “Optimization”
13 - 4 C6000 Embedded Design Workshop - C and System Optimizations
C Compiler and Optimizer
C6000 Embedded Design Workshop - C and System Optimizations 13 - 5
C Compiler and Optimizer
“Debug” vs. “Optimized
“Debug” vs. “Optimized” Benchmarks
for (j = 0; j < nr; j++) {
sum = 0;
for (i = 0; i < nh; i++)
sum += x[i + j] * h[i];
r[j] = sum >> 15;
}
for (i = 0; i < count; i++){
Y += coeff[i] * x[i]; }
FIR
Dot Product
Debug get your code LOGICALLY correct first (no optimization)
“Optincrease performance using compiler options (easier)
“CPU Minit depends. Could require extensive time…
Benchmarks:
Algo FIR (256, 64) DOTP (256-term)
Debug (no opt, g) 817K 4109
“Opt” (-o3, no g) 18K 42
Add’l pragmas 7K 42
(DSPLib)7K 42
CPU Min 4096 42
“Debug” vs. “Optimized” Environments
Provides the best “debug” environment with full symbolic
support, no code motion, easy to single step
Code is NOT optimized i.e. very poor performance
Create test vectors on FUNCTION boundaries (use same
vectors as Opt Env)
“Debug” (g, NO opt): Get Code Logically Correct
Higher levels of “opt results in code motion functions
become black boxes (hence the use of FXN vectors)
Optimizer can find errors in your code (use volatile)
Highly optimized code (can reach “CPU Min w/some algos)
Each level of optimization increases optimizer’sscope”
“Opt” (o3, g ): Increase Performance
C Compiler and Optimizer
13 - 6 C6000 Embedded Design Workshop - C and System Optimizations
Levels of Optimization
Levels of Optimization
FILE1.C
{
{
}
{
. . .
}
}
{
. . .
}
FILE2.C
-o0, -o1 -o2 -o3 -pm -o3
LOCAL
single block
FUNCTION
across
blocks
FILE
across
functions
PROGRAM
across files
{
. . .
}
Increasing levels of opt:
scope, code motion
build times
visibility
C Compiler and Optimizer
C6000 Embedded Design Workshop - C and System Optimizations 13 - 7
Build Configurations
Code Space Optimization (ms)
C Compiler and Optimizer
13 - 8 C6000 Embedded Design Workshop - C and System Optimizations
File and Function Specific Options
C Compiler and Optimizer
C6000 Embedded Design Workshop - C and System Optimizations 13 - 9
Coding Guidelines
Programming the ‘C6000
Source Efficiency* Effort
80 -100%
C
C ++
Compiler
Optimizer Low
95 -100%
Linear
ASM
Assembly
Optimizer Med
100% High
ASM
Hand
Optimize
Technical Training
Organization
TTO
Data Types and Alignment
13 - 10 C6000 Embedded Design Workshop - C and System Optimizations
Data Types and Alignment
Data Types
Data Types and Alignment
C6000 Embedded Design Workshop - C and System Optimizations 13 - 11
Data Alignment
Data Types and Alignment
13 - 12 C6000 Embedded Design Workshop - C and System Optimizations
Using DATA_ALIGN
Data Types and Alignment
C6000 Embedded Design Workshop - C and System Optimizations 13 - 13
Upcoming Changes ELF vs. COFF
EABI : ELF ABI
Starting with v7.2.0 the C6000 Code Gen Tools (CGT) will begin
shipping two versions of the Linker:
1. COFF: Binary file-format used by TI tools for over a decade
2. ELF: New binary file-format which provides additional features
like dynamic/relocatable linking
You can choose either format
v7.3.x default may become ELF
(prior to this, choose ELF for new features)
Continue using COFF for projects already in progress using
--abi=coffabi
” compiler option
(support will continue for a long time)
Formats are not compatible
Your program’s binary files (.obj, .lib) must all be built with the same format
If building libraries used for multiple projects, we recommend building two
libraries one with each format
Migration Issues
EABI long’s are 32 bits; new TI type (__int40_t) created to support 40 data
COFF adds a leading underscore to symbol names, but the EABI does not
See: http://processors.wiki.ti.com/index.php/C6000_EABI_Migration
Restricting Memory Dependencies (Aliasing)
13 - 14 C6000 Embedded Design Workshop - C and System Optimizations
Restricting Memory Dependencies (Aliasing)
Restricting Memory Dependencies (Aliasing)
C6000 Embedded Design Workshop - C and System Optimizations 13 - 15
Aliasing?
in + 4
a
b
c
d
e
...
in
What happens if the function is
called like this?
fcn(*myVector, *myVector+1)
void fcn(*in, *out)
{
LDW *in++, A0
ADD A0, 4, A1
STW A1, *out++
}
Definitely Aliased pointers
*in and *out could point to
the same address
But how does the compiler know?
If you tell the compiler there is no
aliasing, this code will break (LDs
in software pipelined loop)
One solution is to “restrict” the
writes -*out (see next slide…)
Access Hardware Features Using Intrinsics
13 - 16 C6000 Embedded Design Workshop - C and System Optimizations
Access Hardware Features Using Intrinsics
Give Compiler MORE Information
C6000 Embedded Design Workshop - C and System Optimizations 13 - 17
Give Compiler MORE Information
Pragma Unroll()
Give Compiler MORE Information
13 - 18 C6000 Embedded Design Workshop - C and System Optimizations
Pragma MUST_ITERATE()
4. MUST_ITERATE(min, max, %factor)
#pragma UNROLL(2);
#pragma MUST_ITERATE(10, 100, 2);
for(i= 0; i< count ; i++) {
sum += a[i] * x[i];
}
Gives the compiler information about the trip (loop) count
In the code above, we are promising that:
count >= 10, count <= 100, and count % 2 == 0
If you break your promise, you might break your code
MIN helps with code size and software pipelining
MULT allows for efficient loop unrolling (and “odd” cases)
The #pragma must come right before the for() loop
Keyword - Volatile
Give Compiler MORE Information
C6000 Embedded Design Workshop - C and System Optimizations 13 - 19
Setting MAX interrupt Latency (-mi option)
-mi Details
-mi0
Compiler’s code is not interruptible
User must guarantee no interrupts will occur
-mi1
Compiler uses single assignment and never produces a loop
less than 6 cycles
-mi1000
(or any number > 1)
Tells the compiler your system must be able to see interrupts
every 1000 cycles
When not using
mi
(compiler’s default)
Compiler will software pipeline (when using o2 or o3)
Interrupts are disabled for s/w pipelined loops
Notes:
Be aware that the compiler is unaware of issues such as memory
wait-states, etc.
Using mi, the compiler only counts instruction cycles
Give Compiler MORE Information
13 - 20 C6000 Embedded Design Workshop - C and System Optimizations
MUST_ITERATE Example
int dot_prod(short *a, Short *b, int n)
{int i, sum = 0;
#pragma MUST_ITERATE ( ,512)
for (i = 0; i < n; i++)
sum += a[i] * b[i];
return sum;
}
Provided:
If interrupt threshold was set at 1000 cycles (-mi 1000),
Assuming this can compile as a single-cycle loop,
And 512 = max# for Loop count (per MUST_ITERATE pragma).
Result:
The compiler knows a 1-cycle kernel will execute no more than 512 times
which is less than the 1000 cycle interrupt disable option (–mi1000)
Uninterruptible loop works fine
Verdict:
3072 cycle loop (512 x 6) can become a 512 cycle loop
Compiler Directive - _nassert()
Using Optimized Libraries
C6000 Embedded Design Workshop - C and System Optimizations 13 - 21
Using Optimized Libraries
Using Optimized Libraries
13 - 22 C6000 Embedded Design Workshop - C and System Optimizations
Using Optimized Libraries
C6000 Embedded Design Workshop - C and System Optimizations 13 - 23
Libraries Download and Support
System Optimizations
13 - 24 C6000 Embedded Design Workshop - C and System Optimizations
System Optimizations
BIOS Libraries
System Optimizations
C6000 Embedded Design Workshop - C and System Optimizations 13 - 25
System Optimizations
13 - 26 C6000 Embedded Design Workshop - C and System Optimizations
Custom Sections
System Optimizations
C6000 Embedded Design Workshop - C and System Optimizations 13 - 27
Use Cache
System Optimizations
13 - 28 C6000 Embedded Design Workshop - C and System Optimizations
Use EDMA
CPU
Internal
RAM
EDMA
Using EDMA
External
Memory
EMIF
Program
func1
func2
func3
0x8000
Program the EDMA to automatically transfer
data/code from one location to another.
Operation is performed WITHOUT CPU intervention
All details covered in a later chapter
Multiple DMA’s : EDMA3 and QDMA
VPSS C64x+ DSP
L1P L1D
L2
EDMA3
(System DMA)
DMA
(sync)
QDMA
(async)
DMA
Enhanced DMA (version 3)
DMA to/from peripherals
Can be sync’d to peripheral events
Handles up to 64 events
QDMA
Quick DMA
DMA between memory
Async must be started by CPU
4-8 channels available
128-256 Parameter RAM sets (PARAMs)
64 transfer complete flags
2-4 Pending transfer queues
Both Share (number depends upon specific device)
Master Periph
System Optimizations
C6000 Embedded Design Workshop - C and System Optimizations 13 - 29
System Architecture SCR
System Architecture SCR
SRIO
CPU
TC0
TC1
TC2
TC3
CC
PCIe
HPI
EMAC
SCR
Switched
Central
Resource
C64 Mem
DDR2
EMIF64
TCP
VCP
PCI66
McBSP
Utopia
“Masters” “Slaves”
SCR Switched Central Resource
Masters initiate accesses to/from
slaves via the SCR
Most Masters (requestors) and Slaves
(resources) have their own port to SCR
Lower bandwidth masters (HPI,
PCIe, etc) share a port
There is a default priority (0 to 7) to
SCR resources that can be modified:
SRIO, HOST (PCI/HPI), EMAC
TC0, TC1, TC2, TC3
CPU accesses (cache misses)
Priority Register: MSTPRI
Note: refer to your specific datasheet for register names…
System Optimizations
13 - 30 C6000 Embedded Design Workshop - C and System Optimizations
*** this page is blank so why are you staring at it? ***
Chapter Quiz
C6000 Embedded Design Workshop - C and System Optimizations 13 - 31
Chapter Quiz
Chapter Quiz
1. How do you turn ON the optimizer ?
2. Why is there such a performance delta between Debug and “Opt?
3. Name 4 compiler techniques to increase performance besides -o?
4. Why is data alignment important?
5. What is the purpose of the mi option?
6. What is the BEST feedback mechanism to test compilers efficiency?
Chapter Quiz
13 - 32 C6000 Embedded Design Workshop - C and System Optimizations
Quiz - Answers
Chapter Quiz
1. How do you turn ON the optimizer ?
2. Why is there such a performance delta between Debug” and “Opt ?
3. Name 4 compiler techniques to increase performance besides -o?
4. Why is data alignment important?
5. What is the purpose of the mi option?
6. What is the BEST feedback mechanism to test compilers efficiency?
Project -> Properties, use o2 or o3 for best performance
Debug allows for single-step (NOPs), “Opt” fills delay slots optimally
Data alignment, MUST_ITERATE, restrict, mi, intrinsics, _nassert()
To specify the max # cycles a loop will go “dark” responding to INTs
Performance. The CPU can only perform 1 non-aligned LD per cycle
Benchmarks, then LOOK AT THE ASSEMBLY FILE. Look for LDDW & SPLOOP
Lab 13 C Optimizations
C6000 Embedded Design Workshop - C and System Optimizations 13 - 33
Lab 13 – C Optimizations
In the following lab, you will gain some experience benchmarking the use of optimizations using
the C optimizer switches. While your own mileage may vary greatly, you will gain an
understanding of how the optimizer works and where the switches are located and their possible
affects on speed and size.
Lab 13 C Optimizations Procedure
13 - 34 C6000 Embedded Design Workshop - C and System Optimizations
Lab 13C Optimizations Procedure
PART A Goals and Using Compiler Options
Determine Goals and CPU Min
1. Determine Real-Time Goal
Because we are running audio, our “real-time” goal is for the processing (using low-pass FIR
filter) to keep up with the I/O which is sampling at 48KHz. So, if we were doing a “single
sample” FIR, our processing time would have to be less than 1/48K = 20.8uS. However, we
are using double buffers, so our time requirement is relaxed to 20.8uS * BUFFSIZE = 20.8 *
256 = 5.33ms. Alright, any DSP worth its salt should be able to do this work inside 5ms.
Right? Hmmm…
Real-time goal: music sounds fine.
2. Determine CPU Min.
What is the theoretical minimum based on the C674x architecture? This is based on several
factors data type (16-bit), #loads required and the type mathematical operations involved.
What kind of algorithm are we using? FIR. So, let’s figure this out:
256 data samples * 64 coeffs = 16384 cycles. This assumes 1 MAC/cycle
Data type = 16-bit data
# loads possible = 8 16-bit values (aligned). Two LDDW (load double words).
Mathematical operation DDOTP (cross multiply/accumulate) = 8 per cycle
So, the CPU Min = 16384/8 = ~2048 cycles + overhead.
If you look at the inner loop (which is a simple dot product, it will take 64/8 cycles = 8 cycles
per inner loop. Add 8 cycles overhead for prologue and epilogue (pre-loop and post-loop
code), so the inner loop is 16 cycles. Multiply that by the buffer size = 256, so the
approximate CPU min = 16*256 = 4096.
CPU Min = 4096 cycles.
3. Import Lab 13 Project.
Import Lab 13 Project from \Labs\Lab13 folder. Change the build properties to use
YOUR student platform file and ensure the latest BIOS/XDC/UIA tools are selected.
4. Analyze new items FIR_process and COEFFs
Open fir.c. You will notice that this file is quite different. It has the same overall TSK
structure (Semaphore_pend, if ping/pong, etc). Notice that after the if(pingPong), we
process the data using a FIR filter.
Scroll on down to cfir(). This is a simple nested for() loop. The outer loop runs once
for every block size (in our case, this is DATA_SIZE). The inner loop runs the size of
COEFFS[] times (in our case, 64).
Open coeffs.c. Here you will see the coefficients for the symmetric FIR filter. There are
3 sets low-pass, hi-pass and all-pass. We’ll use the low-pass for now.
Lab 13 C Optimizations Procedure
C6000 Embedded Design Workshop - C and System Optimizations 13 - 35
Using Debug Configuration (–g, NO opt)
5. Using the Debug Configuration, build and play.
Build your code and run it. The audio sounds terrible (if you can hear it at all). What is
happening ?
6. Analyze poor audio.
The first thing you might think is that the code is not meeting real-time. And, you’d be right.
Let’s use some debugging techniques to find out what is going on.
7. Check CPU load.
Make sure you clicked Restart. Run again. What do the CPU loads and Log_info’s report?
Hmmm. The CPU Load graph (for the author), showed NOTHING no line at all.
Right now, the CPU is overloaded (> 100%). In that condition, results cannot be sent to the
tools because the Idle thread is never run.
But, if you look at Raw Logs, you can see the CPU load reported as ZERO (which we know is
not the case) and benchmark is:
About 913K cycles. Whoa. Maybe we need to OPTIMIZE this thing.
What were your results? Write the down below:
Debug (-g, no opt) benchmark for cfir()? _________________ cycles
Did we meet our real-time goal (music sounding fine?): ____________
Can anyone say “heck no”. The audio sounds terrible. We have failed to meet our only real-
time goal.
But hey, it’s using the Debug Configuration. And if we wanted to single step our code, we
can. It is a very nice debug-friendly environment although the performance is abysmal. This
is to be expected.
Lab 13 C Optimizations Procedure
13 - 36 C6000 Embedded Design Workshop - C and System Optimizations
8. Check Semaphore count of mcaspReadySem.
If the semaphore count for mcaspReadySem is anything other than ZERO after the
Semaphore_pend in FIR_process(), we have troubles. This will indicate that we are NOT
keeping up with real time. In other words, the Hwi is posting the semaphore but the
processing algorithm is NOT keeping up with these posts. Therefore, if the count is higher
than 0, then we are NOT meeting realtime.
Use ROV and look at the Semaphore module. Your results may vary, but you’ll see the
semaphore counts pretty high (darn, even ledToggleSem is out of control):
My goodness a number WELL greater than zero. We are definitely not meeting realtime.
9. View Debug compiler options.
FYI if you looked at the options for the Debug configuration, you’d see the following:
Full symbolic debug is turned on and NO optimizations. Ok, nice fluffy debug environment to
make sure we’re getting the right answers, but not good enough to meet realtime. Let’s “kick
it up a notch”…
Using Release Configuration (–o2, no –g)
10. Change the build configuration from Debug to Release.
Next, we’ll use the Release build configuration.
In the project view, right-click on the project and choose “Build Configuration and select
Release:
Check Properties Include directory. Make sure the BSL \inc folder is specified.
Also, double-check your PLATFORM file. Make sure all code/data/stacks are in internal
memory and that your project is USING the proper platform in this NEW build configuration.
Once again, these configurations are containers of options. Even though Debug had the
proper platform file specified, Release might NOT !!
Lab 13 C Optimizations Procedure
C6000 Embedded Design Workshop - C and System Optimizations 13 - 37
11. Rebuild and Play.
Build and Run. If you get errors, did you remember to set the INCLUDE path for the BSL
library? Remember, the Debug configuration is a container of options including your path
statements and platform file. So, if you switch configs (Debug to Release), you must also add
ALL path statements and other options you want. Dont forget to modify the RTSC settings to
point to your _student platform AGAIN!
Once built and loaded, your audio should sound fine now that is, if you like to hear music
with no treble…
12. Benchmark cfir()release mode.
Using the same method as before, observe the benchmark for cfir().
Release (-o2, no -g) benchmark for cfir()? __________ cycles
Meet real-time goal? Music sound better? ____________
Here’s our picture:
Ok, now we’re talkin it went from 913K to 37K just by switching to the release
configuration. So, the bottom line is TURN ON THE OPTIMIZER !!
13. Study release configuration build properties.
Here’s a picture of the build options for release:
The “biggie” is o2 is selected.
Can we improve on this benchmark a little? Maybe
Lab 13 C Optimizations Procedure
13 - 38 C6000 Embedded Design Workshop - C and System Optimizations
Using “Opt” Configuration
14. Create a NEW build configuration named “Opt”.
Really? Yep. And it’s easy to do. Using the Release configuration, right-click on the project
and select properties (where youve been many times already).
Click on Basic Options and notice they are currently set to o2 g. Look up a few
inches and youll see the “Configuration:” drop-down dialogue. Click on the down arrow
and you’ll see “Debug” and “Release”.
Click on the “Manage” button:
Click New:
(also note the Remove button where you can delete build configurations).
Give the new configuration a name: “Opt” and choose to copy the existing configuration
fromRelease. Click Ok.
Change the Active Configuration to “Opt”
Lab 13 C Optimizations Procedure
C6000 Embedded Design Workshop - C and System Optimizations 13 - 39
15. Change the “Opt” build properties to use o3 and NO g (the “blank” choice).
The only change that needs to be made is to turn UP the optimization level to o3 vs. o2
which was used in the Release Configuration. Also, make sure g is turned OFF (which it
should already be).
Open the Opt Config Build Properties and verify it contains NO –g (blank) and
optimization level of –o3.Rebuild your code and benchmark (FYI LED may stop
blinking…don’t worry).
Follow the same procedure as before to benchmark cfir:
Opt (-o3, no -g) benchmark for cfir()? __________ cycles
The author’s number was about 18K cycles – another pretty significant performance increase
over o2, -g. We simply went to o3 and killed g and WHAM, we went from 37K to 18K.
This is why the author has stated before that the Opt settings we used in this lab SHOULD be
the RELEASE settings. But I am not king.
So, as you can see, we went from 913K to 18K in about 30 minutes. Wow. But what was the
CPU Min? About 7K? Ok…we still have some room for improvement…
Just for kicks and grins, try single stepping your code and/or adding breakpoints in the
middle of a function (like cfir). Is this more difficult with g turned OFF and o3 applied? Yep.
Note: With g turned OFF, you still get symbol capability i.e. you can enter symbol
names into the watch and memory windows. However, it is nearly impossible to single
step C code hence the suggestion to create test vectors at function boundaries to
check the LOGICAL part of your code when you build with the Debug Configuration.
When you turn off g, you need to look at the answers on function boundaries to make
sure it is working properly.
16. Turn on verbose and interlist and then see what the .asm file looks like for fir.asm.
As noted in the discussion material, to “see it all”, you need to turn on three switches. Turn
them on now, then build, then peruse the fir.asm file. You will see some interesting
information about software pipelining for the loops in fir.c.
Turn on:
RunTime Model Options Verbose pipeline info (-mw)
Optimizations Interlist (-os)
Assembler Options → Keep .ASM file (-k)
Lab 13 C Optimizations Procedure
13 - 40 C6000 Embedded Design Workshop - C and System Optimizations
Part B Code Tuning
17. Use #pragma MUST_ITERATE in cfir().
Uncomment the #pragmas for MUST_ITERATE on the two for loops. This pragma gives
the compiler some information about the loops and how to unroll them efficiently. As
always, the more info you can provide to the compiler, the better.
Use the “Opt” build configuration. Rebuild (use the Build button it is an incremental build
and WAY faster when youre making small code changes like this). Then Run.
Opt + MUST_ITERATE (-o3, no –g) cfir()? __________ cycles
The author’s results were close to the previous results about 15K. Well, this code tuning
didn’t help THIS algo much, but it might help yours. At least you know how to apply it now.
18. Use restrict keyword on the results array.
You actually have a few options to tell the compiler there is NO ALIASING. The first method
is to tell the compiler that your entire project contains no aliasing (using the mt compiler
option). However, it is best to narrow the scope and simply tell the compiler that the results
array has no aliasing (because the WRITES are destructive, we RESTRICT the output array).
So, in fir.c, add the following keyword (restrict) to the results (r) parameter of the fir
algorithm as shown:
Build, then run again. Now benchmark your code again. Did it improve?
Opt + MUST_ITERATE + restrict (-o3, no –g) cfir()? __________ cycles
Here is what the author got:
Well, getting rid of ALIASING was a big help to our algo. We went from about 15K down to
7K cycles. You could achieve the same result by using-mt” compiler switch, but that tells the
compiler that there is NO aliasing ANYWHERE scope is huge. Restrict is more restricted.
Lab 13 C Optimizations Procedure
C6000 Embedded Design Workshop - C and System Optimizations 13 - 41
19. Use _nassert() to tell optimizer about data alignment.
Because the receive buffers are set up using STRUCTURES, the compiler may or may not
be able to determine the alignment of an ELEMENT (i.e. rcvPingL.hist) inside that structure
thus causing the optimizer to be conservative and use redundant loops. You may have seen
the benchmarks have two results the same, and one larger. Or, you may not have. It usually
happens on Thursdays….
It is possible that using _nassert() may help this situation. Again, this “fix” is only needed
in this specific case where the memory buffers were allocated using structures (see main.h
if you want a looksy).
Uncomment the two _nassert() intrinsics in fir.c inside the cfir() function and rebuild/run
and check the results.
Here is what the author got (same as before…but hey, worth a try):
20. Turn on symbolic debug with FULL optimization.
This is an important little trick that you need to know. As we have stated before, it is
impossible to single step your code when you have optimization turned on to level o3. You
are able to place breakpoints at function entry/exit points and check your answers, but that’s
it. This is why FUNCTION LEVEL test vectors are important.
There are two ways to accomplish this. Some companies use script code to place
breakpoints at specific disassembly symbols (function entry/exit) and run test vectors through
automatically. Others simply want to manually set breakpoints in their source code and hit
RUN and see the results.
While still in the Debug perspective with your program loaded, select:
Restart
The execution pointer is at main, but do you see your main() source file? Probably not. Ok,
pop over to Edit perspective and open fir.c. Set a breakpoint at the beginning of the
function. Hit RUN. Your program will stop at that breakpoint, but in the Debug perspective, do
you see your source file associated with the disassembly? Again, probably not.
Again, hit Restart to start your program at main() again.
How do you tell the compiler to add JUST ENOUGH debug info to allow your source files to
SYNC with the disassembly but not affect optimization? There is a little known option that
allows this
Lab 13 C Optimizations Procedure
13 - 42 C6000 Embedded Design Workshop - C and System Optimizations
Make sure you have the Opt configuration selected, right click and choose Properties.
Next, check the box below (at C6000 Compiler
Runtime Model Options) to turn on
symbolic debug with FULL Optimization (-mn):
TURN ON g (symbolic debug). mn only makes sense if g is turned ON. Go back to the
basic options and select Full Symbolic Debug.
Rebuild and load your program. The execution pointer should now show up along with
your main.c file.
Hit Restart again.
Set a breakpoint in the middle of FIR_process() function inside fir.c. You can’t do it. The
breakpoint snaps to the beginning or end of the function, right?
Make sure the breakpoint is at the beginning of FIR_process() and hit RUN. You can now
see your source code synced with the disassembly. Very nice.
But did this affect your optimization and your benchmark? Go try it.
Hit Restart again and remove all breakpoints.
Then RUN. Halt your program and check your benchmark. Is it about the same? It should
be…
Lab 13 C Optimizations Procedure
C6000 Embedded Design Workshop - C and System Optimizations 13 - 43
Part C Minimizing Code Size (ms)
21. Determine current cfir benchmark and .text size.
Select the “Opt” configuration and also make sure MUST_ITERATE and restrict are used
in your code (this is the same setting as the previous lab step).
Rebuild and Run.
Write down your fastest benchmark for cfir:
Opt (-o3, NO g, NO ms3) cfir, ____________ cycles
.text (NO ms) = ___________ h
Open the .map file generated by the linker. Hmmm. Where is it located?
Try to find it yourself without asking anyone else. Hint: which build config did you use
when you hit “build” ?
22. Add ms3 to Opt Config.
Open the build properties and add ms3 to the compiler options (under Basic Options).
We will just put the “pedal to the metal” for code size optimizations and go all the way to
ms3 first. Note here that we also have o3 set also (which is required for the ms option).
In this scenario, the compiler may choose to keep the “slow version” of the redundant loops
(fast or slow) due to the presence of –ms.
Rebuild and run.
Opt + -ms (-o3, NO g, ms3) cfir, ____________ cycles
.text (-ms3) = ___________ h
Did your benchmark get worse with ms3? How much code size did you save? What
conclusions would you draw from this?
____________________________________________________________________
____________________________________________________________________
Keep in mind that you can also apply ms3 (or most of the basic options) to a specific
function using #pragma FUNCTION_OPTIONS( ).
FYI the author saved about 2.2K bytes total out of the .text section and the benchmark was
about 33K. HOWEVER, most of the .text section is LIBRARY code which is not affected by
ms3. So, of the NON .lib code which IS affected by ms3, using –ms3 saved 50% on code
size (original byte count was 6881 bytes and was reduced to 3453 bytes). This is pretty
significant. Yes, the benchmark ended up being 33K, but now you know the tradeoff.
Also remember that you can apply ms3 on a FILE BY FILE basis. So, a smart way to apply
this is to use it on init routines and keep it far away from your algos that require the best
performance.
Lab 13 C Optimizations Procedure
13 - 44 C6000 Embedded Design Workshop - C and System Optimizations
Part D Using DSPLib
23. Download and install the appropriate DSP Library.
This, fortunately for you, has already been done for you. This directory is located at:
C:\SYSBIOSv4\Labs\dsplib64x+\lib
24. Link the appropriate library to your project.
Find the lib file in the above folder and link it to your project (non ELF version).
Also, add the include path for this library to your build properties.
25. Add #include to the fir.c file.
Add the proper #include for the header file for this library to fir.c
26. Replace the calls to the fir function in fir.c.
THIS MUST BE DONE 4 TIMES (Ping, Pong, L and R = 4). Should I say it again? There
are FOUR calls to the fir routine that need to be replaced by something new. Ok, twice should
be enough. ;-)
Replace:
cfir(rcvPongL.hist, COEFFS, xmt.PongL, ORDER, DATA_SIZE);
with
DSP_fir_gen(rcvPongL.hist, COEFFS, xmt.PongL, ORDER, DATA_SIZE);
27. Build, load, verify and BENCHMARK the new FIR routine in DSPLib.
28. What are the best-case benchmarks?
Yours (compiler/optimizer):___________ DSPLib: ___________
Wow, for what we wanted in THIS system (a fast simple FIR routine), we would have been
better off just using DSPLib. Yep. But, in the process, you’ve learned a great deal about
optimization techniques across the board that may or may not help your specific system.
Remember, your mileage may vary.
Lab 13 C Optimizations Procedure
C6000 Embedded Design Workshop - C and System Optimizations 13 - 45
Conclusion
Hopefully this exercise gave you a feel for how to use some of the basic compiler/optimizer
switches for your own application. Everyone’s mileage may vary and there just might be a
magic switch that helps your code and dosent help someone else’s. Thats the beauty of trial
and error.
Conclusion? TURN ON THE OPTIMIZER ! Was that loud enough?
Here’s what the author came up with how did your results compare?
Optimizations Benchmark
Debug Bld Config No opt 913K
Release (-o2, -g) 37K
Opt (-o3, no g) 18K
Opt + MUST_ITERATE 15K
Opt + MUST_ITERATE + restrict 7K
DSPLib (FIR) 7K
Regarding ms3, use it wisely. It is more useful to add this option to functions that are large
but not time critical like IDL functions, init code, maintenance type items.You can save
some code space (important) and lose some performance (probably a don’t care). For your
time-critical functions, do not use ms ANYTHING. This is just a suggestion again, your
mileage may vary.
CPU Min was 4K cycles. We got close, but didnt quite reach it. The authors believe that it is
possible to get closer to the 4K benchmark by using intrinsics and the DDOTP instruction.
The biggest limiting factor in optimizing the cfir routine is the “sliding window”. The processor
is only allowed ONE non-aligned load each cycle. This would happen 75% of the time. So,
the compiler is already playing some games and optimizing extremely well given the
circumstances. It would require “hand-tweaking” via intrinsics and intimate knowledge of the
architecture to achieve much better.
29. Terminate the Debug session, close the project and close CCS. Power-cycle the board.
Throw something at the instructor to let him know that you’re done with
the lab. Hard, sharp objects are most welcome…
Additional Information
13 - 46 C6000 Embedded Design Workshop - C and System Optimizations
Additional Information
IDMA0 Programming Details
IDMA0 operates on a block of 32 contiguous 32-bit registers (both src/dst blocks
must be aligned on a 32-word boundary). Optionally generate CPU interrupt if needed.
User provides: Src, Dst, Count and “mask(Reference: SPRU871)
0
31
.
.
32-bits
L1D/L2
Src
0
31
.
.
32-bits
Periph Cfg
Dst Count = # of 32-register blocks to xfr (up to 16)
Mask = 32-bit mask determines WHICH registers
to transfer (“0” = xfr, “1” = NO xfr)
Source
address 01456
810 12
23
22
31
29
27
Destination
address 01456
810 12
23
22
31
29
27
Mask = 01010111001111111110101010001100
Example Transfer using MASK (not all regs typically need to be programmed):
User must write to IDMA0 registers in the following order (COUNT written triggers transfer):
IDMA0_MASK = 0x573FEA8C; //set mask for 13 regs above
IDMA0_SOURCE = reg_ptr; //set src addr in L1D/L2
IDMA0_DEST = MMR_ADDRESS; //set dst addr to config location
IDMA0_COUNT = 0; //set mask for 1 block of 32 registers
Additional Information
C6000 Embedded Design Workshop - C and System Optimizations 13 - 47
Notes
13 - 48 C6000 Embedded Design Workshop - C and System Optimizations
Notes
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 1
Cache & Internal Memory
Introduction
In this chapter the memory options of the C6000 will be considered. By far, the easiest and
highest performance option is to place everything in on-chip memory. In systems where this is
possible, it is the best choice. To place code and initialize data in internal RAM in a production
system, refer to the chapters on booting and DMA usage.
Most systems will have more code and data than the internal memory can hold. As such, placing
everything off-chip is another option, and can be implemented easily, but most users will find the
performance degradation to be significant. As such, the ability to enable caching to accelerate the
use of off-chip resources will be desirable.
For optimal performance, some systems may beneifit from a mix of on-chip memory and cache.
Fine tuning of code for use with the cache can also improve performance, and assure reliability in
complex systems. Each of these constructs will be considered in this chapter,
Objectives
Objectives
Compare/contrast different uses of
memory (internal, external, cache)
Define cache terms and definitions
Describe C6000 cache architecture
Demonstrate how to configure and use
cache optimally
Lab 14 modify an existing system to
use cache benchmark solutions
Module Topics
11 - 2 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Module Topics
Cache & Internal Memory .................................................................................................... 11-1
Module Topics.................................................................................................................... 11-2
Why Cache? ...................................................................................................................... 11-3
Cache Basics Terminology .............................................................................................. 11-4
Cache Example .................................................................................................................. 11-7
L1P Program Cache ...................................................................................................... 11-10
L1D Data Cache ............................................................................................................ 11-13
L2 RAM or Cache ? ....................................................................................................... 11-15
Cache Coherency (or Incoherency?) ................................................................................ 11-17
Coherency Example ..................................................................................................... 11-17
Coherency Reads & Writes ........................................................................................ 11-18
Cache Functions Summary ........................................................................................ 11-21
Coherency Use Internal RAM ! .................................................................................. 11-22
Coherency Summary ................................................................................................. 11-22
Cache Alignment .......................................................................................................... 11-23
Turning OFF Cacheability (MAR) ...................................................................................... 11-24
Additional Topics .............................................................................................................. 11-26
Chapter Quiz .................................................................................................................... 11-29
Quiz Answers ............................................................................................................ 11-30
Lab 14 Using Cache ...................................................................................................... 11-31
Lab Overview: .............................................................................................................. 11-31
Lab 14 Using Cache Procedure .................................................................................. 11-32
A. Run System From Internal RAM .............................................................................. 11-32
B. Run System From External DDR2 (no cache) .......................................................... 11-33
C. Run System From DDR2 (cache ON) ................................................................... 11-34
Why Cache?
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 3
Why Cache?
Cache Basics Terminology
11 - 4 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Cache Basics Terminology
Cache Basics Terminology
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 5
Cache Basics Terminology
11 - 6 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Cache Example
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 7
Cache Example
Cache Example
11 - 8 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Direct-Mapped Cache Example
Index
0
1
2
3
4
5
6
7
8
9
A
.
.
F
LDH
MPY
ADD
BADD B
SUB
B
Cache
000
000
000
000 002 000
002
002
Tag
Valid
Address Code
0003h L1 LDH
...
0026h L2 ADD
0027h SUB cnt
0028h [!cnt] B L1
Cache Example
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 9
L1P Program Cache
11 - 10 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
L1P Program Cache
L1P Program Cache
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 11
L1P Cache Comparison
Device Scheme Size Linesize New Features
C62x/C67x Direct
Mapped 4K bytes 64 bytes
(16 instr) N/A
C64x Direct
Mapped 16K bytes 32 bytes
(8 instr) N/A
C64x+
C674x
C66x
Direct
Mapped 32K bytes 32 bytes
(8 instr)
Cache/RAM
Cache Freeze
Memory Protection
Next two slides discuss Cache/RAM and Freeze features.
Memory Protection is not discussed in this workshop.
All L1P memories provide zero waitstate access
Cache/Ram...
L1P Program Cache
11 - 12 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
L1D Data Cache
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 13
L1D Data Cache
L1D Data Cache
11 - 14 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
L2 RAM or Cache ?
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 15
L2 RAM or Cache ?
L2 RAM or Cache ?
11 - 16 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Cache Coherency (or Incoherency?)
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 17
Cache Coherency (or Incoherency?)
Coherency Example
Cache Coherency (or Incoherency?)
11 - 18 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Coherency Reads & Writes
Cache Coherency (or Incoherency?)
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 19
Cache Coherency (or Incoherency?)
11 - 20 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Coherency Solution Write (Flush/Writeback)
CPU
L2L1D
XmtBuf
RcvBuf
XmtBuf
EDMA
When the CPU is finished with the data (and has written it to XmtBuf in L2), it can
be sent to ext. memory with a cache writeback
A writeback is a copy operation from cache to memory, writing back the modified
(i.e. dirty) memory locations all writebacks operate on full cache lines
Use BIOS Cache APIs to force a writeback:
writeback
BIOS: Cache_wb (XmtBuf, BUFFSIZE, CACHE_NOWAIT);
DDR2
What happens with the "next" RCV buffer?
Cache Coherency (or Incoherency?)
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 21
Cache Functions Summary
Cache Coherency (or Incoherency?)
11 - 22 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Coherency Use Internal RAM !
Coherency Summary
Cache Coherency (or Incoherency?)
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 23
Cache Alignment
Turning OFF Cacheability (MAR)
11 - 24 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Turning OFF Cacheability (MAR)
"Turn Off" the DATA Cache (MAR)
CPU
L2L1D
RcvBuf
XmtBuf
EDMA
Memory Attribute Registers (MARs) enable/disable DATA caching memory ranges
Don’t use MAR to solve basic cache coherency performance will be too slow
Use MAR when you have to always read the latest value of a memory location,
such as a status register in an FPGA, or switches on a board.
MAR is like “volatile”. You must use both to always read a memory location: MAR
for cache; volatile for the compiler
Looking more closely at the MAR registers ...
DDR2
Turning OFF Cacheability (MAR)
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 25
Additional Topics
11 - 26 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Additional Topics
Additional Topics
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 27
Additional Topics
11 - 28 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Chapter Quiz
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 29
Chapter Quiz
Chapter Quiz
1. How do you turn ON the cache ?
2. Name the three types of caches & their associated memories:
3. All cache operations affect an aligned cache line. How big is a line?
4. Which bit(s) turn on/off cacheability and where do you set these?
5. How do you fix coherency when two bus masters access extl mem?
6. If a dirty (newly written) cache line needs to be evicted, how does
that dirty line get written out to external memory?
Chapter Quiz
11 - 30 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Quiz Answers
Chapter Quiz
1. How do you turn ON the cache ?
2. Name the three types of caches & their associated memories:
3. All cache operations affect an aligned cache line. How big is a line?
4. Which bit(s) turn on/off cacheability and where do you set these?
5. How do you fix coherency when two bus masters access extl mem?
6. If a dirty (newly written) cache line needs to be evicted, how does
that dirty line get written out to external memory?
Set size > 0 in platform package (or via Cache_setSize() during runtime)
Direct Mapped (L1P), 2-way (L1D), 4-way (L2)
L1P 32 bytes (256 bits), L1D 64 bytes, L2 128 bytes
Invalidate before a read, writeback after a write (or use L2 mem)
MAR (Mem Attribute Register), affects 16MB Ext’l data space, .cfg
Cache controller takes care of this
Lab 14 Using Cache
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 31
Lab 14 – Using Cache
In the following lab, you will gain some experience benchmarking the use of cache in the system.
First, we’ll run the code with EVERYTHING (buffers, code, etc) off chip with NO cache. Then,
we’ll turn on the cache and compare the results. Then, we’ll move everything ON chip and
compare the cache results with using on-chip memory only.
This will provide a decent understanding of what you can expect when using cache in your own
application.
Lab Overview:
There are two goals in this lab: (1) to learn how to turn on and off cache and the effects of each
on the data buffers and program code; (2) to optimize a hi-pass FIR filter written in C. To gain
this basic knowledge you will:
A. Learn to use the platform and CFG files to setup cache memory address range (MAR bits)
and turn on L2 and L1 caches.
B. Benchmark the system performance with running code/data externally (DDR2) vs. with the
cache on vs. internal (IRAM).
Lab 14 Using Cache Procedure
11 - 32 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
Lab 14Using Cache Procedure
A. Run System From Internal RAM
1. Close all previous projects and import Lab14.
This project is actually the solution for Lab 13 (OPT)with all optimizations in place.
Ensure the proper platform (student) and the latest XDC/BIOS/UIA versions are being
used.
Note: For all benchmarks throughout this lab, use the Opt” build configuration when you build.
Do NOT use the Debug or Release config.
2. Ensure BUFFSIZE is 256 in main.h.
In order to compare our cache lab to the OPT lab, we need to make sure the buffer sizes
are the same which is 256.
3. Find out where code and data are mapped to in memory.
First, check Build Properties for the Opt configuration. Make sure you are using YOUR
student platform file in this configuration. Then, view the platform file and determine which
memory segments (like IRAM) contain the following sections:
Section Memory Segment
.text
.bss
.far
It’s not so simple, is it? .bss and .far sections are “data” and .text is “code”. If you didnt know
that, you couldn’t answer the question. So, they are all allocated in IRAM if not, please
make sure they are before moving on.
4. Which cache areas are turned on/off (circle your answer)?
L1P OFF/ON
L1D OFF/ON
L2 OFF/ON
Leave the settings as is.
5. Build, load.
BEFORE YOU RUN, open up the Raw Logs window.
Click Run and write down below the benchmarks for cfir():
Data Internal (L1P/D cache ON): __________ cycles
The benchmark from the Log_info should be around 8K cycles. We’ll compare this “internal
RAM” benchmark to “all external” and “all external with cache ON” numbers. You just might
be surprised…
Lab 14 Using Cache Procedure
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 33
B. Run System From External DDR2 (no cache)
6. Place the buffers (data) in external DDR2 memory and turn OFF the cache.
Edit your platform file and place the data external (DDR). Leave stacks and code in IRAM.
Modify the L1P/D cache sizes to ZERO (0K).
In this scenario, the audio data buffers are all external. Cache is not turned on. This is the
worst case situation.
Do you expect the audio to sound ok? ____________________
Match the settings you see below (0K for all cache sizes, Data Memory in DDR) :
7. Clean project, build, load, run using the “Opt Configuration.
Select Project
Clean (this will ensure your platform file is correct). Then Build and
load your code. Run your code. Listen to the audio how does it sound? It’s DEAD
that’s how it sounds just air bad air it is the absence of noise. Plus, we cant see
anything because the CPU is overloaded and therefore no RTA tools.
Ah, but Log_info() just might save us again. Go look at the Raw Logs and see if the
benchmark is getting reported.
All Code/Data External: ___________ cycles
Did you get a cycle count? The author experienced a total loss absolute NOTHING. I think
the system is so out of it, it crashes. In fact, CCS crashed a few times in this mode. Yikes. I
vote for calling it “the national debt” #cycles uh, what is it now $15 Trillion? Ok, 15 trillion
cycles… ;-)
Lab 14 Using Cache Procedure
11 - 34 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
C. Run System From DDR2 (cache ON)
8. Turn on the cache (L1P/D, L2) in the platform file.
Choose the following settings for the cache (L2=64K, L1P/D = 32K):
Set L1D/P to 32K and L2 to 64KIF YOU DON’T SET L2 CACHE ON, YOU WILL
CACHE IN L1 ONLY. Watch it, though, when you reconfigure cache sizes, it wipes your
memory sections selections. Redo those properly after you set the cache sizes.
These sizes are larger than we need, but it is good enough for now. Leave code/data in DDR
and stacks in IRAM. Click Ok to rebuild the platform package.
The system we now have is identical to one of the slides in the discussion material.
9. Wait what about the MAR bits?
In the discussion material, we talked about the MAR bits specifying which regions were
cacheable and which were not. Don’t we have to set the MAR bits for the external region of
DDR for them to get cached? Yep.
In order to modify (or even SEE) the MAR bits OR use any BIOS Cache APIs (like invalidate
or writeback), you need to add the C64p Cache Module to your .cfg file. Or, you can
simply right-click (and Use) the Cache module listed under: Available Products SYS/BIOS
Target Specific Support C674 Cache (as shown in the discussion material).
Lab 14 Using Cache Procedure
C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory 11 - 35
Save the .cfg file. This SHOULD add the module to your outline view. When it shows up
in the outline view, click on it. Do you see the MAR bits?
The MAR region we are interested in, by the way, for DDR2 is MAR 192-223. As a courtesy
to users, the platform file already turned on the proper MAR bits for us for the DDR2 region.
Check it out:
The good news is that we don’t need to worry about the MAR bits for now.
10. Build, load, run using the Opt (duh) Configuration.
Run the program. View the CPU load graph and benchmark stat and write them down
below:
All Code/Data External (cache ON”): _________ cycles
With code/data external AND the cache ON, the benchmark should be close to 8K cycles
the SAME as running from internal IRAM (L2). In fact, what you’re seeing is the L1D/P
numbers. Why? Because L2 is cached in L1D/P the closest memory to the CPU. This is
what a cache does for you especially with this architecture.
Here’s what the author got:
Lab 14 Using Cache Procedure
11 - 36 C6000 Embedded Design Workshop Using BIOS - Cache & Internal Memory
11. What about cache coherency?
So, how does the audio sound with the buffers in DDR2 and the cache on? Shouldnt we be
experiencing cache coherency problems with data in DDR2? Well, the audio sounds great, so
why bother? Think about this for awhile. What is your explanation as to why there are NO
cache coherency problems in this lab.
Answer: _______________________________________________________________
12. Conclusion and Summary long read but worth it…
It is amazing that you get the same benchmarks from all code/data in internal IRAM (L2) and
L1 cache turned on as you do with code/data external and L2/L1 cache turned on. In fact, if
you place the buffers DIRECTLY in L1D as SRAM, the benchmark is the same. How can this
be? That’s an efficient cache, eh? Just let the cache do its thing. Place your buffers in DDR2,
turn on the cache and move on to more important jobs.
Here’s another way to look at this. Cache is great for looping code (program, L1P) and
sequentially accessed data (e.g. buffers). However, cache is not as effective at random
access of variables. So, what would be a smart choice for part of L1D as SRAM? Coefficient
tables, algorithm tables, globals and statics that are accessed frequently, but randomly (not
sequential) and even frequently used ISRs (to avoid cache thrashing). The random data
items would most likely fall into the .bss compiler section. Keep that in mind as you design
your system.
Let’s look at the final results:
System benchmark
Buffers in IRAM (internal) 8K cycles
All External (DDR2), cache OFF ~4M
All External (DDR2), cache ON 8K cycles
Buffers in L1D SRAM 7K cycles
So, will you experience the same results? 150x improvement with cache on and not much
difference between internal memory only and external with cache on? Probably something
similar. The point here is that turning the cache ON is a good idea. It works well and there
is little thinking that is required unless you have peripherals hooked to external memory
(coherency). For what it is worth, you’ve seen the benefits in action and you know the issues
and techniques that are involved. Mission accomplished.
RAISE YOUR HAND and get the instructor’s attention
when you have completed PART A of this lab. If time
permits, move on to the next OPTIONAL part…
You’re finished with this lab. If time permits, you may move on to addition-
al “optional” steps on the following pages if they exist.
C6000 Embedded Design Workshop - Using EDMA3 15 - 1
Using EDMA3
Introduction
In this chapter, you will learn the basics of the EDMA3 peripheral. This transfer engine in the
C64x+ architecture can perform a wide variety of tasks within your system from memory to
memory transfers to event synchronization with a peripheral and auto sorting data into separate
channels or buffers in memory. No programming is covered. For programming concepts, see
ACPY3/DMAN3, LLD (Low Level Driver covered in the Appendix) or CSL (Chip Support
Library). Heck, you could even program it in assembly, but don’t call ME for help.
Objectives
At the conclusion of this module, you should be able to:
Understand the basic terminology related to EDMA3
Be able to describe how a transfer starts, how it is configured and what happens after the
transfer completes
Undersand how EDMA3 interrupts are generated
Be able to easily read EDMA3 documentation and have a great context to work from to
program the EDMA3 in your application
Module Topics
15 - 2 C6000 Embedded Design Workshop - Using EDMA3
Module Topics
Using EDMA3 ....................................................................................................................... 15-1
Module Topics.................................................................................................................... 15-2
Overview ............................................................................................................................ 15-3
What is a “DMA” ? .......................................................................................................... 15-3
Multiple “DMAs” .............................................................................................................. 15-4
EDMA3 in C64x+ Device ................................................................................................ 15-5
Terminology ....................................................................................................................... 15-6
Overview ........................................................................................................................ 15-6
Element, Frame, Block ACNT, BCNT, CCNT ............................................................... 15-7
Simple Example ............................................................................................................. 15-7
Channels and PARAM Sets ............................................................................................ 15-8
Examples ........................................................................................................................... 15-9
Synchronization ............................................................................................................... 15-12
Indexing ........................................................................................................................... 15-13
Events – Transfers Actions ............................................................................................ 15-15
Overview ...................................................................................................................... 15-15
Triggers ........................................................................................................................ 15-16
Actions – Transfer Complete Code ............................................................................... 15-16
EDMA Interrupt Generation .............................................................................................. 15-17
Linking ............................................................................................................................. 15-18
Chaining .......................................................................................................................... 15-19
Channel Sorting ............................................................................................................... 15-21
Architecture & Optimization .............................................................................................. 15-22
Programming EDMA3 Using Low Level Driver (LLD) ..................................................... 15-23
Chapter Quiz .................................................................................................................... 15-25
Quiz Answers ............................................................................................................ 15-26
Additional Information....................................................................................................... 15-27
Notes ............................................................................................................................... 15-30
Overview
C6000 Embedded Design Workshop - Using EDMA3 15 - 3
Overview
What is a “DMA” ?
Overview
15 - 4 C6000 Embedded Design Workshop - Using EDMA3
Multiple “DMAs”
Multiple DMAs : EDMA3 and QDMA
VPSS C64x+ DSP
L1P L1D
L2
EDMA3
(System DMA)
DMA
(sync)
QDMA
(async)
DMA
Enhanced DMA (version 3)
DMA to/from peripherals
Can be sync’d to peripheral events
Handles up to 64 events
QDMA
Quick DMA
DMA between memory
Async must be started by CPU
4-16 channels available
128-256 Parameter RAM sets (PARAMs)
64 transfer complete flags
2-4 Pending transfer queues
Both Share
(number depends upon specific device)
Master Periph
Overview
C6000 Embedded Design Workshop - Using EDMA3 15 - 5
EDMA3 in C64x+ Device
SCR & EDMA3
EDMA3
TC0
TC1
TC2
EMAC
HPI
PCI
ARM
McASP
McBSP
PCI
DDR2/3
L2
Mem
Ctrl
L2
L1P
L1D
D
S
M
L
D
S
M
L
CPU
C64x+ MegaModule
M
S
S
M M
S
IDMA
L1P
Mem
Ctrl
L1D
Mem
Ctrl
AET
DATA
SCR CFG
SCR
EMIF
128
128
Cfg
PERIPH
MS
M S
Master Slave
EDMA3 is a master on the DATA SCR it can initiate data transfers
EDMA3’s configuration registers are accessed via the CFG SCR (by the CPU)
Each TC has its own connection (and priority) to the DATA SCR. Refer to the connection matrix to determine valid connections
32
PERIPH =
All peripheral’s
Cfg registers
SCR = Switched Central Resource
32
External
Mem
Cntl
CC
x2
L3
Terminology
15 - 6 C6000 Embedded Design Workshop - Using EDMA3
Terminology
Overview
Terminology
C6000 Embedded Design Workshop - Using EDMA3 15 - 7
Element, Frame, Block ACNT, BCNT, CCNT
How Much to Move?
Element
(# of contig bytes)
A Count (Element Size)
015
Options
Source
Destination
Index
Link AddrCnt Reload
Transfer Count
“A” Count
1631
B Count (# Elements)
Elem 1
Elem 2
Elem N
Frame
.
.
“B” Count
IndexIndex
CRsvd
Frame 1
Frame 2
Frame M
Block
.
.
“C” Count
C Count (# Frames)
0151631
AB
Transfer Configuration
Let's look at a simple example...
Simple Example
Terminology
15 - 8 C6000 Embedded Design Workshop - Using EDMA3
Channels and PARAM Sets
Examples
C6000 Embedded Design Workshop - Using EDMA3 15 - 9
Examples
Examples
15 - 10 C6000 Embedded Design Workshop - Using EDMA3
Examples
C6000 Embedded Design Workshop - Using EDMA3 15 - 11
Synchronization
15 - 12 C6000 Embedded Design Workshop - Using EDMA3
Synchronization
Indexing
C6000 Embedded Design Workshop - Using EDMA3 15 - 13
Indexing
Indexing
15 - 14 C6000 Embedded Design Workshop - Using EDMA3
Events – Transfers Actions
C6000 Embedded Design Workshop - Using EDMA3 15 - 15
Events Transfers Actions
Overview
Events – Transfers Actions
15 - 16 C6000 Embedded Design Workshop - Using EDMA3
Triggers
Actions Transfer Complete Code
EDMA Interrupt Generation
C6000 Embedded Design Workshop - Using EDMA3 15 - 17
EDMA Interrupt Generation
Linking
15 - 18 C6000 Embedded Design Workshop - Using EDMA3
Linking
Chaining
C6000 Embedded Design Workshop - Using EDMA3 15 - 19
Chaining
Chaining
15 - 20 C6000 Embedded Design Workshop - Using EDMA3
Channel Sorting
C6000 Embedded Design Workshop - Using EDMA3 15 - 21
Channel Sorting
Architecture & Optimization
15 - 22 C6000 Embedded Design Workshop - Using EDMA3
Architecture & Optimization
EDMA Architecture
Evt Reg (ER)
Chain Evt Reg
(CER)
Evt Enable Reg
(EER)
Evt Set Reg
(ESR)
Q0
Q1
Q2
Q3
Queue
PSET 0
PSET 1
PSET 254
PSET 255
TR
Submit
Periphs
Completion
Detection
Int Pending Reg IPR
Int Enable Reg IER
TC0
TC1
TC2
TC3
TC
EDMAINT
CC
EDMA consists of two parts: Channel Controller (CC) and Transfer Controller (TC)
An event (from periph-ER/EER, manual-ESR or via chaining-CER) sends the transfer
to 1 of 4 queues (Q0 is mapped to TC0, Q1-TC1, etc. Note: McBSP can use TC1 only)
Xfr mapped to 1 of 256 PSETs and submitted to the TC (1 TR transmit request per ACNT
bytes orA*B CNT bytes based on sync).
Note: Dst FIFO allows buffering of writes while more reads occur.
The TC performs the transfer (read/write) and then sends back a transfer completion code (TCC)
The EDMA can then interrupt the CPU and/or trigger another transfer (chaining Chap 6)
SCR = Switched Central Resource
.
.
.
Early
TCC
Normal
TCC
E0E1E63
DATA
SCR
Programming EDMA3 Using Low Level Driver (LLD)
C6000 Embedded Design Workshop - Using EDMA3 15 - 23
Programming EDMA3 Using Low Level Driver (LLD)
Programming EDMA3 Using Low Level Driver (LLD)
15 - 24 C6000 Embedded Design Workshop - Using EDMA3
*** this page used to have very valuable information on it ***
Chapter Quiz
C6000 Embedded Design Workshop - Using EDMA3 15 - 25
Chapter Quiz
Chapter Quiz
L0
R0
L1
R1
L2
R2
L3
R3
16-bit stereo audio (interleaved)
Use EDMA to auto “channel sort” to memory
L0
L1
L2
L3
R0
R1
R2
R3
ACNT: _____
BCNT: _____
CCNT: _____
‘BIDX: _____
‘CIDX: _____
PERIPH MEM
Could you calculate these ?
BUFSIZE
1. Name the 4 ways to trigger a transfer?
2. Compare/contrast linking and chaining
3. Fill out the following values for this channel sorting example (5 min):
Chapter Quiz
15 - 26 C6000 Embedded Design Workshop - Using EDMA3
Quiz Answers
Chapter Quiz
1. Name the 4 ways to trigger a transfer?
2. Compare/contrast linking and chaining
3. Fill out the following values for this channel sorting example (5 min):
Manual start, Event sync, chaining and (QDMA trigger word)
linking copy new configuration from existing PARAM (link field)
chaining completion of one channel triggers another (TCC) to start
L0
R0
L1
R1
L2
R2
L3
R3
16-bit stereo audio (interleaved)
Use EDMA to auto “channel sort” to memory
L0
L1
L2
L3
R0
R1
R2
R3
ACNT: _____
BCNT: _____
CCNT: _____
‘BIDX: _____
‘CIDX: _____
PERIPH MEM
2
2
4
8
-6
Could you calculate these ?
BUFSIZE
Additional Information
C6000 Embedded Design Workshop - Using EDMA3 15 - 27
Additional Information
Additional Information
15 - 28 C6000 Embedded Design Workshop - Using EDMA3
Additional Information
C6000 Embedded Design Workshop - Using EDMA3 15 - 29
Notes
15 - 30 C6000 Embedded Design Workshop - Using EDMA3
Notes
C6000 Embedded Design Workshop - "Grab Bag" Topics 16 - 1
"Grab Bag" Topics
Grab Bag” Explanation
Several other topics of interest remain. However, there is not enough time to cover them all. Most
topics take over an hour to complete especially if the labs are done. Students can vote which
ones they’d like to see first, second, third in the remaining time available.
Shown below is the current list of topics. Vote for your favorite two and the instructor will tally the
results and make any final changes to the remaining agenda.
While all of these topics cannot be covered, the notes are in your student guide. So, at a
minimum, you have some reference material on the topics not covered live to take home with
you.
Topic Choices
Error! No text of specified style in document.
16 - 2 C6000 Embedded Design Workshop - "Grab Bag" Topics
*** insert blank page here ***
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 1
Intro to DSP/BIOS
Introduction
In this chapter an introduction to the general nature of real-time systems and the DSP/BIOS
operating system will be considered. Each of the concepts noted here will be studied in greater
depth in succeeding chapters.
Objectives
Objectives
Grab bag chapter assumes students have
already been through Intro to SYS/BIOS
Describe how to create a new BIOS project
Learn how to configure BIOS using TCF files
Lab 16a Create and debug a simple
DSP/BIOS application
Module Topics
Grabbag 16a - 2 C6000 Embedded Design Workshop - Intro to DSP/BIOS
Module Topics
Intro to DSP/BIOS................................................................................................................. 16-1
Module Topics.................................................................................................................... 16-2
DSP/BIOS Overview .......................................................................................................... 16-3
Threads and Scheduling..................................................................................................... 16-4
Real-Time Analysis Tools ................................................................................................... 16-6
DSP/BIOS Configuration Using TCF Files ....................................................................... 16-7
Creating A DSP/BIOS Project ............................................................................................. 16-8
Memory Management Using the TCF File...................................................................... 16-10
Lab 16a: Intro to DSP/BIOS.............................................................................................. 16-11
Lab 16a Procedure .................................................................................................... 16-12
Create a New Project................................................................................................ 16-12
Add a New TCF File and Modify the Settings ............................................................ 16-14
Build, Load, Play, Verify… ........................................................................................ 16-16
Benchmark and Use Runtime Object Viewer (ROV) .................................................. 16-19
Additional Information & Notes ......................................................................................... 16-22
Notes ............................................................................................................................... 16-24
DSP/BIOS Overview
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 3
DSP/BIOS Overview
Threads and Scheduling
Grabbag 16a - 4 C6000 Embedded Design Workshop - Intro to DSP/BIOS
Threads and Scheduling
Threads and Scheduling
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 5
Real-Time Analysis Tools
Grabbag 16a - 6 C6000 Embedded Design Workshop - Intro to DSP/BIOS
Real-Time Analysis Tools
DSP/BIOS Configuration Using TCF Files
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 7
DSP/BIOS Configuration Using TCF Files
Creating A DSP/BIOS Project
Grabbag 16a - 8 C6000 Embedded Design Workshop - Intro to DSP/BIOS
Creating A DSP/BIOS Project
Creating A DSP/BIOS Project
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 9
Memory Management Using the TCF File
Grabbag 16a - 10 C6000 Embedded Design Workshop - Intro to DSP/BIOS
Memory Management Using the TCF File
Remember ?
How do we accomplish this with a .tcf file ?
.text
.bss
.far
.cinit
.cio
.stack
How do you define the memory segments (e.g. IRAM, FLASH, DDR2) ?
How do you place the sections into these memory segments ?
6400_0000
C000_0000 512MB DDR2
4MB FLASH
256K IRAM
1180_0000
Sections Memory Segments
21
Lab 16a: Intro to DSP/BIOS
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 11
Lab 16a: Intro to DSP/BIOS
Now that you’ve been through creating projects, building and running code, we now turn the page
to learn about how DSP/BIOS-based projects work. This lab, while quite simple in nature, will
help guide you through the steps of creating (possibly) your first BIOS project in CCSv4.
This lab will be used as a “seed for future labs.
Application: blink USER LED_1 on the EVM every second
Key Ideas: main() returns to BIOS scheduler, IDL fxn runs to blink LED
What will you learn? .tcf file mgmt, IDL fxn creation/use, creation of BIOS
project, benchmarking code, ROV
Pseudo Code:
main()init BSL, init LED, return to BIOS scheduler
ledToggle() IDL fxn that toggles LED_1 on EVM
Lab 16a: Intro to DSP/BIOS
Grabbag 16a - 12 C6000 Embedded Design Workshop - Intro to DSP/BIOS
Lab 16a Procedure
If you cant remember how to perform some of these steps, please refer back to the previous labs
for help. Or, if you really get stuck, ask your neighbor. If you AND your neighbor are stuck, then
ask the instructor (who is probably doing absolutely NOTHING important) for help.
Create a New Project
1. Create a new project named “bios_led”.
Create your new project in the following directory:
C:\TI-RTOS\C6000\Labs\Lab16a\Project
When the following screen appears, make sure you click Next instead of Finish:
Lab 16a: Intro to DSP/BIOS
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 13
2. Choose a Project template.
This screen was brand new in CCSv4.2.2. And it is not intuitive to the casual observer that
the Next button above even exists you see Finish, you click it. Ah, but the hidden secret is
the Next button. The CCS developers are actually trying to do us a favor IF you understand
what a BIOS template is.
As you can see, there are many choices. Empty Projects are just that empty just a path to
the include files for the selected processor. Go ahead and click on “Basic Exmaples” to see
whats inside. Click on all the other + signs to see what they contain. Ok, enough playing
around. We are using BIOS 5.41.xx.xx in this workshop. So, the correct + sign to choose in
the end is the one that is highlighted above.
3. Choose the specific BIOS template for this workshop.
Next, you’ll see the following screen:
Select “Empty Example”. This will give us the paths to the BIOS include directories. The other
examples contain example code and .tcf files. NOW you can click Finish.
Lab 16a: Intro to DSP/BIOS
Grabbag 16a - 14 C6000 Embedded Design Workshop - Intro to DSP/BIOS
4. Add files to your project.
From the lab’s \Files directory, ADD the following files:
led.c, main.c, main.h
Open each and inspect them. They should be pretty self explanatory.
5. Link the LogicPD BSL library to your project as before.
6. Add an include path for the BSL library \inc directory.
Right-click on the project and selectBuild Properties”. Select C6000 Compiler, then Include
Options (you’ve done this before). Add the proper path for the BSL include dir (else you will
get errors when you build).
At this point in time, what files are we missing? There are 3 of them. Can you name them?
______________ ______________ ______________
Add a New TCF File and Modify the Settings
7. Add a new TCF file.
As discussed earlier, you have several options available to you regarding the TCF file. In this
lab, we chose to use an EMPTY BIOS example from the project templates. Therefore, no
TCF file exists.
Referring back to the material in this chapter, create a NEW TCF file (File New
DSP/BIOS v5.x Config File). Name it: bios_led.tcf. When prompted to pick a platform
seed tcf file, type “evm6748” into the filter filter and choose the tcf that pops up.
CCS should have placed your new TCF file in the project directory AND added it to your
project. Check to make sure both of these statements are true.
If the new TCF file did not open automatically when created, double-click on the new TCF file
(bios_led.tcf) to open it.
8. Create a HEAP in memory.
All BIOS projects need a heap. Why this doesnt get created for you in the “seed tcf file is a
good question. The fact that it doesnt causes a heap full of troubles. If you ever get any
strange unexplainable errors when you build BIOS projects, check THIS first.
Open the TCF file (if it’s not already) and click on System. Right-click on MEM and select
Properties. The checkbox for “No Dynamic Heaps” is most likely not checked (because we
used an existing TCF file that had this selection as default).
UNCHECK this box (if not already done) to specify that you want a heap created. A warning
will bark at you that you havent defined a memory segment yet no kidding. Just ignore the
warning and click OK. (Note: this warning probably won’t occur because we used an existing
TCF file).
Click the + next to MEM. This will display the “seedTCF memory areas already defined.
Thank you.
Lab 16a: Intro to DSP/BIOS
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 15
Right-click IRAM and select properties.
Check the box that says “create a heap in this memory” (if not already checked) and change
and change the heap size to 4000h.
Click Ok.
Now that we HAVE a heap in IRAM (that’s another name for L2 by the way), we need to tell
the mother ship (MEM) where our heap is.
Right-click on MEM and select Properties. Click on both down arrows and select IRAM for
both (again, this is probably already done for you). Click OK. Now she’s happy…
Save the TCF file.
Note: FYI throughout the labs, we will throw in the “top 10 or 20” tips that cause Debug
nightmares during development. Here’s your first one
Hint: TIP #1 Always create a HEAP when working with BIOS projects.
Lab 16a: Intro to DSP/BIOS
Grabbag 16a - 16 C6000 Embedded Design Workshop - Intro to DSP/BIOS
Build, Load, Play, Verify
9. Ensure you have the proper target config file selected as Default.
10. Build your project.
Fix any errors that occur (and there will be some, just keep reading). You didn’t make
errors, did you? Of course you did. Remember when we said that ANY BIOS project needs
the cfg.h file included in one of the source files? Yep. And it was skipped on purpose to
drive the point home.
Open main.h for editing and add the following line as the FIRST include in main.h:
#include “bios_ledcfg.h”
Rebuild and see if the errors go away. They should. If you have more, than you really DO
need to debug something. If not, move on…
Hint: TIP #2 Always #include the cfg.h file in your application code when using BIOS as the
FIRST included header file.
11. Inspect the “generated” files resulting from our new TCF file.
In the project view, locate the following files and inspect them (actually, youll need to BUILD
the project before these show up):
bios_ledcfg.h
bios_ledcfg.cmd
There are other files that get generated by the existence of .tcf which we will cover in later
labs. The .cmd file is automatically added to your project as a source file. However, your code
must #include the cfg.h file or the compiler will think all the BIOS stuff is “declared implicitly”.
Lab 16a: Intro to DSP/BIOS
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 17
12. Debug and “Play” your code.
Click the DebugBug this is equivalent to “Debug Active Project”. Remember, this code
blinks LED_1 near the bottom of the board. When you Play your code and the LED blinks,
you’re done.
When the execution arrow reaches main(), hit “Play”. Does the LED blink?
No? What is going on?
Think back to the scheduling diagram and our discussions. To turn BIOS ON, what is the
most important requirement? main() must RETURN or fall out via a brace }. Check
main.c and see if this is true. Many users still have while() loops in their code and
wonder why BIOS isn’t working. If you never return from main(), BIOS will never run.
Hint: TIP #3BIOS will NOT run if you don’t exit main().
Ok, so no funny tricks there - that checks out.
Next question: how is the function ledToggle() getting called? Was it called in main()?
Hmmm. Who is supposed to call ledToggle()?
When your code returns from main(), where does it go? The BIOS scheduler. And,
according to our scheduling diagram and the threads we have in the system, which THREAD
will the scheduler run when it returns from main()?
Can you explain what needs to be done? ________________________________________
Lab 16a: Intro to DSP/BIOS
Grabbag 16a - 18 C6000 Embedded Design Workshop - Intro to DSP/BIOS
13. Add IDL object to your TCF.
The answer is: the scheduler will run the IDL thread when nothing else exists. All other thread
types are higher priority. So, how do you make the IDL thread call ledToggle()?
Simple. Add an IDL object and point it to our function.
Open the TCF file and click on Scheduling. Right-click on IDL and selectInsert IDL. Name
the IDL ObjectIDL_ledToggle”.
Now that we have the object, we need to tell the object what to do which fxn to run. Right-
click on IDL_ledToggle and select Properties. Youll notice a spot to type in the function
name.
Ok, make room for another important tip. BIOS is written in ASSEMBLY. The ledToggle()
function is written in C. How does the compiler distinguish between an assembly label or
symbol and a C label? The magic underscore “_”. All C symbols and labels (from an
assembly point of view) are preceded with an underscore.
Hint: TIP #4 When entering a fxn name into BIOS objects, precede the name with an
underscore “_”. Otherwise you will get a symbol referencing error which is difficult to
locate.
SO, the fxn name you type in here must be preceded by an underscore:
You have now created an IDL object that is associated with a fxn. By the way, when you
create HWI, SWI and TSK objects later on, guess what? It is the SAME procedure. You’ll get
sick of this by the end of the week right-click, insert, rename, right-click and select
Properties, type some stuff. There that is DSP/BIOS in a nutshell.
14. Build and Debug AGAIN.
When the execution arrow hits main(), click “Play”. You should now see the LED blinking. If
you ever HALT/PAUSE, it will probably pause inside a library fxn that has no source
associated with it. Just X that thing.
At this point, your first BIOS project is working. Do NOT “terminate all” yet. Simply click on the
C/C++ perspective and move on to a few more goodies
Lab 16a: Intro to DSP/BIOS
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 19
Benchmark and Use Runtime Object Viewer (ROV)
15. Benchmark LED BSL call.
So, how long does it take to toggle an LED? 10, 20, 50 instruction cycles? Well, you would be
off by several orders of magnitude. So, let’s use the CLK module in BIOS to determine how
long the LED_toggle() BSL call takes.
This same procedure can be used quickly and effectively to benchmark any area in code and
then display the results either via a local variable (our first try) or via another BIOS module
called LOG (our 2nd try).
BIOS uses a hardware timer for all sorts of things which we will investigate in different labs.
The high-resolution time count can be accessed through a call to CLK_gethtime() API.
Let’s use it
Open led.c for editing.
Allocate three new variables: start, finish and time. First, we’ll get the CLK value
just before the BSL call and then again just after. Subtract the two numbers and you have a
benchmark called time. This will show up as a local variable when we use a breakpoint to
pause execution.
Your new code in led.c should look something like this:
Dont type in the call to LOG_printf() just yet. We’ll do that in a few moments
Lab 16a: Intro to DSP/BIOS
Grabbag 16a - 20 C6000 Embedded Design Workshop - Intro to DSP/BIOS
16. Build, Debug, Play.
When finished, build your project it should auto-download to the EVM. Switch to the Debug
perspective and set a breakpoint as shown in the previous diagram. Click “Play”.
When the code stops at your breakpoint, select View Local. Here’s the picture of what that
will look like:
Are you serious? 1.57M CPU cycles. Of course. This mostly has to do with going through I2C
and a PLD and waiting forever for acknowledge signals (can anyone say “BUS HOLD”?).
Also, don’t forget we’re using the “Debug” build configuration with no optimization. More on
that later. Nonetheless, we have our benchmark.
17. Open up TWO .tcf files is this a problem?
The author has found a major “uh oh” that you need to be aware of. Open your .tcf file and
keep it open. Double-click on the project’s TCF file AGAIN. Another “instance” of this window
opens. Nuts. If you change one and save the other, what happens? Oops. So, we
recommend you NOT minimize TCF windows and then forget you already have one open
and open another. Just BEWARE…
18. Add LOG Object and LOG_printf() API to display benchmark.
Open led.c for editing and add the LOG_printf() statement as shown in a previous
diagram.
Open the TCF for editing. Under Instrumentation, add a new LOG object namedtrace”.
Remember? Right-click on LOG, insert log, rename to trace, click OK.
Save the TCF.
Lab 16a: Intro to DSP/BIOS
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 21
19. Pop over to Windows Explorer and analyse the \Project folder.
Remember when we said that another folder would be created if you were using BIOS? It
was called .gconf. This is the GRAPHICAL config tool in action that is fed by the .cdb file.
When you add a .tcf file, the graphical and textual tools must both exisit and follow each
other. Go check it out. Is it there? Ok…back to the action
20. Build, Debug, Play use ROV.
When the code loads, remove the breakpoint in led.c. Then, click Play. PAUSE the execution
after about 5 seconds. Open the ROV tool via Tools ROV. When ROV opens, select LOG
and one of the sequence numbers like 2 or 3:
Notice the result of the LOG_printf() under “message”. You can choose other sequence
numbers and see what their times were.
You can also choose to see the LOG messages via Tools RTA Printf Logs. Try that now and
see what you get. If you’d like to change the behaviour of the LOGging, go back to the LOG
object and try a bigger buffer, circular (last N samples) or fixed (first N samples). Experiment
away
When we move on to a TSK-based system, the ROV will come in very handy. This tool
actually replaced the older KOV (kernel object viewer) in the previous CCS. Also, in future
labs, we’ll use the RTA (Real-time Analysis) tools to view Printf logs directly. By then, you’ll
know two different ways to access debug info.
Note: Explain this to me so, the tool is called ROV which stands for RUNTIME Object Viewer.
But the only way to VIEW the OBJECT is in STOP time. Hmmm. Marketing? Illegal drug
use? Ok, so it “collects” the data during runtime…but still…to the author, this is a stretch
and confuses new users. Ah, but now you know the “rest of the story”…
Terminate the Debug Session and close the project.
You’re finished with this lab. Please raise your hand and let the
instructor know you are finished with this la (maybe throw something
heavy at them to get their attention or say “CCS crashed AGAIN !
that will get them running…)
Additional Information & Notes
Grabbag 16a - 22 C6000 Embedded Design Workshop - Intro to DSP/BIOS
Additional Information & Notes
Additional Information & Notes
C6000 Embedded Design Workshop - Intro to DSP/BIOS Grabbag 16a - 23
Notes
Grabbag 16a - 24 C6000 Embedded Design Workshop - Intro to DSP/BIOS
Notes
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 1
Booting From Flash
Introduction
In this chapter the steps required to migrate code from being loaded and run via CCS to running
autonomously in flash will be considered. Given the AISgen and SPIWriter tools, this is a simple
process that is desired toward the end of the design cycle.
Objectives
Objectives
Compare/contrast the startup events of
CCS (GEL) vs. booting from flash
Describe how to use AISgen and
SPI Flash Writer utilities to create and
burn a flash boot image
Lab 16b Convert the keystone lab to
a bootable flash image, POR, run
Module Topics
GrabBag 16b - 2 C6000 Embedded Design Workshop - Booting From Flash
Module Topics
Booting From Flash ............................................................................................................. 16-1
Module Topics.................................................................................................................... 16-2
Booting From Flash ............................................................................................................ 16-3
Boot Modes Overview.................................................................................................. 16-3
System Startup ............................................................................................................... 16-4
Init Files.......................................................................................................................... 16-4
AISgen Conversion......................................................................................................... 16-5
Build Process ................................................................................................................. 16-5
SPIWriter Utility (Flash Programmer) .............................................................................. 16-6
ARM + DSP Boot............................................................................................................ 16-7
Additonal Info… .............................................................................................................. 16-8
C6748 Boot Modes (S7, DIP_x) ...................................................................................... 16-9
Lab 16b: Booting From Flash ........................................................................................... 16-11
Lab16b Booting From Flash - Procedure.................................................................... 16-12
Tools Download and Setup (Students: SKIP STEPS 1-6 !!) ....................................... 16-12
Build Keystone Project: [Src → .OUT File] ............................................................... 16-16
Use AISgen To Convert [.OUT → .BIN].................................................................... 16-21
Program the Flash: [.BIN → SPI1 Flash] .................................................................. 16-29
Optional DDR Usage ............................................................................................. 16-32
Additional Information....................................................................................................... 16-33
Notes ............................................................................................................................... 16-34
Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 3
Booting From Flash
Boot Modes Overview
Booting From Flash
GrabBag 16b - 4 C6000 Embedded Design Workshop - Booting From Flash
System Startup
Init Files
Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 5
AISgen Conversion
Build Process
Booting From Flash
GrabBag 16b - 6 C6000 Embedded Design Workshop - Booting From Flash
SPIWriter Utility (Flash Programmer)
Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 7
ARM + DSP Boot
Booting From Flash
GrabBag 16b - 8 C6000 Embedded Design Workshop - Booting From Flash
Additonal Info…
Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 9
C6748 Boot Modes (S7, DIP_x)
Flash Pin Settings C6748 EVM
EMU MODE
SW7
BOOT[4]
BOOT[3]
BOOT[2]
BOOT[1]
I/O (1.8/3.3)
8
7
6
5
4
3
2
1
NC
Audio EN
LCD EN
ON
ON
SPI BOOT
SW7
8
7
6
5
4
3
2
1
Default = SPI BOOT
OFF
Booting From Flash
GrabBag 16b - 10 C6000 Embedded Design Workshop - Booting From Flash
*** this page was accidentally created by a virus please ignore ***
Lab 16b: Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 11
Lab 16b: Booting From Flash
In this lab, a .out file will be loaded to the on-board flash memory so that the program may be run
when the board is powered up, with no connection to CCS.
Any lab solution would work for this lab, but again we’ll standardize on the “keystone” lab so that
we ensure a known quantity.
Lab 16b: Booting From Flash
GrabBag 16b - 12 C6000 Embedded Design Workshop - Booting From Flash
Lab16bBooting From Flash - Procedure
Hint: This lab procedure will work with either the C6748 SOM or OMAP-L138 SOM. The basic
procedure is the same but a few steps are VERY different. These will be noted clearly in
this document. So, please pay attention to the HINTS and grey boxes like this one along
the way.
Tools Download and Setup (Students: SKIP STEPS 1-6 !!)
The following steps in THIS SECTION ONLY have already been performed. So,
workshop attendees can skip to the next section. These steps are provided in order to show
exactly where and how the flash/boot environment was set up (for future reference).
1. Download AISgen utility SPRAB41c.
Download the pdf file from here:
http://focus.ti.com/dsp/docs/litabsmultiplefilelist.tsp?docCategoryId=1&familyId=1621&literatu
reNumber=sprab41c&sectionId=3&tabId=409
A screen cap of the pdf file is here:
The contents of this zip are shown here:
Lab 16b: Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 13
2. Create directories to hold tools and projects.
Three directories need to be created:
C:\TI-RTOS\C6000\Labs\Lab16b_keystonewill contain the audio project
(keystone) to build into a .OUT file.
C:\TI-RTOS\C6000\Labs\Lab13b_ARM_Boot will contain the ARM boot
code required to start up the DSP after booting.
C:\TI-RTOS\C6000\Labs\Lab13b_SPIWriterwill contain the SPIWriter.out
file used to program the flash on the EVM.
C:\TI-RTOS\C6000\Labs\Lab13b_AIScontains the AISgen.exe file (shown
above) and is where the resulting AIS script (bin) will be located after running the utility
(.OUT → .BIN)
Place the “keystone files into the \Lab16b_keystone\Files directory. Users will build a
new project to get their .OUT file.
Place the recently downloaded AISgen.exe file into \Lab16b_AIS directory.
Lab 16b: Booting From Flash
GrabBag 16b - 14 C6000 Embedded Design Workshop - Booting From Flash
3. Download SPI Flash Utilities.
You can find the SPI Flash Utility here:
http://processors.wiki.ti.com/index.php/Serial_Boot_and_Flash_Loading_Utility_for_OMAP-L138
This is actually a TI wiki page:
From here, locate the following and clickhere” to go to the download page:
This will take you to a SourceForge site that will contain the tools you need to download.
Click on the latest version under OMAP-L138 and download the tar.gz file. UnTAR the
contents and you’ll see this:
The path we need is \OMAP-L138. If we dive down a bit, we will find the SPIWriter.out
file that is used to program the flash with our boot image (.bin).
Lab 16b: Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 15
4. Copy the SPIWriter.out file to \Lab13b_SPIWriter\ directory.
Shown below is the initial contents of the Flash Utility download:
Copy the following file to the \Lab13b_SPIWriter\ directory:
SPIWriter_OMAP-L138.out
5. Install AISgen.
Find the download of the AISgen.exe file and double-click it to install. After installation, copy a
shortcut to the desktop for this program:
6. Create the keystone project.
Create a new CCSv5 SYS/BIOS project with the source files listed in
C:\SYSBIOSv4\Lab13b_keystone\Files. Create this project in the neighboring
\Project folder. Also, don’t forget to add the BSL library and BSL includes (as normal)
Make sure you use the RELEASE configuration only.
Lab 16b: Booting From Flash
GrabBag 16b - 16 C6000 Embedded Design Workshop - Booting From Flash
Hint: [workshop students: START HERE]
Build Keystone Project: [Src .OUT File]
7. Import keystone audio project and make a few changes.
Importkeystone_flash project from the following directory:
C:\TI-RTOS\C6000\Labs\Lab16b_keystone\Project
This project was built for emulation with CCSv5 – i.e there is a GEL file that sets up our PLL,
DDR2, etc. This is actually the SOLUTION to the clk_rta_audio lab (with the platform file
set to all data/code INTERNAL). In creating a boot image, as discussed in the chapter, we
have to perform these actions in code vs. the GEL creating this nice environment for us.
So, we have a choice here write code that runs in main to set up PLL0, PLL1, DDR, etc.
OR have the bootloader do it FOR US. Having the bootloader perform these actions offers
several advantages fewer mistakes by human programmers AND, these settings are done
at bootload time vs waiting all the way until main() for the settings to take effect.
Hint: The following step is for OMAP-L138 SOM Users ONLY !!
8. Set address of reset vector for DSP
Here is one of the “tricks” that must be employed when using both the ARM and DSP. The
ARM code has to know the entry point (reset vector, c_int00) of the DSP. Well, if you just
compile and link, it could go anywhere in L2. If your class is based on SYS/BIOS, please
follow those instructions. If you’re interested in how this is done with DSP/BIOS, that solution
is also provided for your reference.
SYS/BIOS Users must add two lines of script code to the CFG file as shown. This script
forces the reset vector address for the DSP to 0x11830000. Locate this in the given .cfg file
and UNCOMMENT these two lines of code.
DSP/BIOS Users must create a linker.cmd file as shown below to force the address of
the reset vector. This little command file specifies EXACTLY where the .boot section should
go for a BIOS project (this is not necessary for a non-BIOS program).
Lab 16b: Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 17
9. Examine the platform file.
In the previous step, we told the tools to place the DSP reset vector specifically at address
0x11830000. This is the upper 64K of the 256K block of L2 RAM. One of our labs in the
workshop specified L2 cache as 64K. Guess what? If that setting is still true, L2 cache
effective starts at the same address which means that this address is NOT available for the
reset vector. WHOOPS.
Select Build Options and determine WHICH platform file is associated with this project. Once
you have determined which platform it is, open it and examine it. Make sure L2 cache is
turned off or ZERO and that all code/data/stack segments are allocated in IRAM. If this is
not true, then “make it so”.
10. Build the keystone project.
Update all tools for XDC, BIOS, UIA. Kill Agent. Update Compiler basically update
everything to your latest tool set to get rid of errors and warnings.
Using the DEBUG build configuration, build the project. This should create the .OUT file. Go
check the \Debug directory and locate the .OUT file:
keystone_flash.out
Load the .OUT file and make sure it executes properly. We don’t want to flash something that
isn’t working.
Do not close the Debug session yet.
Lab 16b: Booting From Flash
GrabBag 16b - 18 C6000 Embedded Design Workshop - Booting From Flash
11. Determine silicon rev of the device you are currently using.
AISgen will want to know which silicon rev you are using. Well, you can either attempt to read
it off the device itself (which is nearly impossible) or you can visit a convenient place in
memory to see it.
Now that you have the Debug perspective open, this should be relatively straightforward.
Open a memory view window and type in the following address:
0x11700000
Can you see it? No? Shame on you. Ok. Try changing the style view to “Character” instead.
See something different?
Like this?
That says “d800k002” which means rev2 of the silicon. That’s an older rev…but whatever
yours is…write it down below:
Silicon REV: ____________________
FYI for OMAP-L138 (and C6748), note the following:
d800k002 = Rev 1.0 silicon (common, but old)
d800k004 = Rev 1.1 silicon (fairly common)
d800k006 = Rev 2.0 silicon (if you have a newer board, this is the latest)
There ARE some differences between Rev1 and Rev2 silicon that we’ll mention later in this
lab very important in terms of how the ARM code is written.
You will probably NEVER need to change the memory view to “Character” ever again so
enjoy the moment.
Next, we need to convert this .out file and combine it with the ARM .out file and create a
single flash image for both using the AIS script via AISgen
Lab 16b: Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 19
12. Use the Debug GEL script to locate the Silicon Rev.
This script can be run at any time to debug the state of your silicon and all of the important
registers and frequencies your device is running at. This file works for both OMAP-L137/8
and C6747/8 devices. It is a great script to provide feedback for your hardware engineer.
It goes kind of like this: we want a certain frequency for PLL1. We read the documentation
and determine that these registers need to be programmed to a, b and c. You write the code,
program them and then build/run. Well, is PLL1 set to the frequency you thought it should
be? Run the debug script and find out what the processor is “reporting” the setting is. Nice.
This script outputs its results to the Console window.
Let’s use the debug script to determine the silicon rev as in the previous step.
First, we need to LOAD the gel file. This file can be downloaded from the wiki shown in the
chapter. We have already done that for you and placed that GEL file in the \gel directory next
to the GEL file youve been using for CCS.
Select Tools GEL Files.
Right-click in the empty area under the currently loaded GEL file and select: Load Gel.
The \gel directory should show up and the file OMAPL1x_debug.gel should be listed. If not,
browse to C:\SYSBIOSv4\Labs\DSP_BSL\gel.
Click Open.
Lab 16b: Booting From Flash
GrabBag 16b - 20 C6000 Embedded Design Workshop - Booting From Flash
This will load the new GEL file and place the scripts under theScriptsmenu.
Select “Scripts” Diagnostics Run All:
You can choose to run only a specific script or “All” of them. Notice the output in the Console
window. Scroll up and find the silicon revision. Also make note of all of the registers and
settings this GEL file reports. Quite extensive.
Does your report show the same rev as you found in the previous step? Lets hope so…
Write down the Si Rev again here:
Silicon Rev (again): ______________________
Lab 16b: Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 21
Use AISgen To Convert [.OUT → .BIN]
AISgen (Application Image Script Generator) is a free downloadable tool from TI check out
the beginning of this lab for the links to get this tool.
13. Locate AISgen.exe (only if requiring installation…if not, see next step).
The installation file has already been downloaded for you and is sitting in the following
directory:
C:\SYSBIOSv4\Labs\Lab13b_AIS
Here, you will find the following install file:
This is the INSTALL file (fyi). You don’t need to use this if the tool is already installed on your
computer…
14. Run AISgen.
There should be an icon on your desktop that looks like this:
If not, you will need to install the tool by double-clicking on the install file, installing it and then
creating a shortcut to it on the desktop (youll find it in Programs Texas Instruments
AISgen).
Double-click on the icon to launch AISgen and fill out the dialogue box as shown on the next
page…there are several settings you need…so be careful and go SLOWLY here…
It is usually BEST to place all of your PLL and DDR settings in the flash image and have the
bootloader set these up vs. running code on the DSP to do it. Why? Because the DSP then
comes out of reset READY to go at the top speeds vs. running “slow” until your code in
main() is run. So, thats what we plan to do….
Note: Each dialogue has its own section below. It is quite a bit of setup…but hey, you are
enabling the bootloader to set up your entire system. This is good stuff…but it takes
some work
Hint: When you actually use the DSP to burn the flash in a later step, the location you store
your .bin file too (name of the .bin file AND the directory path you place the .bin file in)
CANNOT have ANY SPACES IN THE PATH OR FILENAME.
Lab 16b: Booting From Flash
GrabBag 16b - 22 C6000 Embedded Design Workshop - Booting From Flash
Main dialogue basic settings.
Fill out the following on this page:
Device Type (match it up with what you determined before)
For OMAP-L138 SOM (ARM + DSP), choose “ARM”. If you’re using the 6748 SOM,
choose “DSP”.
Boot Mode: SPI1 Flash. On the OMAP-L138, the SPI1 port and UART2 ports are
connected to the flash.
For now, wait on filling in the Application and Output files.
Hint: For C6748 SOM, choose “DSP as the Device type
Hint: For OMAP-L138 SOM, choose “ARM” as the Device type
Note: you will type in these paths in a future step do NOT do it now
Lab 16b: Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 23
Configure PLL0, PLL0 Tab
On the “General” tab, check the box for “Configure PLL0” as shown:
Then click on the PLL0 tab and view these settings. You will see the defaults show up. Make
the following modifications as shown below.
Change the multiplier value from 20 to 25 and notice the values in the bottom RH corner
change.
Peripheral Tab
Next, click on the Peripheral tab. This is where you will set the SPI Clock. It is a function
(divide down) from the CPU clock. If you leave it at 1MHz, well, it will work, but the bootload
will take WAY longer. So, this is a “speed up” enhancement.
Type “20” into the SPI Clock field as shown:
Also check the “Enable Sequential Read” checkbox. Why is this important? Speed of the boot
load. If this box is unchecked, the ROM code will send out a read command (0x03) plus a 24-
bit address before every single BYTE. That is a TON of read commands.
However, if we CHECK this box, the ROM code will send out a single 24-bit address
(0x000000) and then proceed to read out the ENTIRE boot image. WAY WAY faster.
Lab 16b: Booting From Flash
GrabBag 16b - 24 C6000 Embedded Design Workshop - Booting From Flash
Configure PLL1
Just in case you EVER want to put code or data into the DDR, PLL1 needs to be set in the
flash image and therefore configured by the bootloader.
So, click the checkbox next to “Configure PLL1, click on that tab, and use the following
settings:
This will clock the DDR at 300MHz. This is equivalent to what our GEL file sets the DDR
frequency to. We dont have any code in DDR at the moment but now we have it setup just
in case we ever do later on. Now, we need to write values to the DDR config registers…
Configure DDR
You know the drill. Click the proper checkbox on the main dialogue page and click on the
DDR tab. Fill in the following values as shown. If you want to know what each of the values
are on the right, look it up in the datasheet.
Lab 16b: Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 25
Configure PSC0, PSC0 Tab
Next, we need to configure the Low Power Sleep Controller (LPSC) to allow the ARM to write
to the DSP’s L2 memory. If both the ARM and DSP code resided in L3, well, the ARM
bootloader could then easily write to L3. But, with a BIOS program, BIOS wants to live in L2
DSP memory (around 0x11800000). In order for the ARM bootloader code to write to this
address, we need to have the DSP clocks powered up. Enabling PSC0 does this for us.
On the main page, “check” the box next to “Configure PSC” and go to the PSC tab.
In the GEL file weve been using in the workshop, a function named
PSC_All_On_Full_EVM() runs to set all the PSC values. We could cheat and just type in
15” as shown below:
Minimum Setting (don’t use this for the lab):
This would Enable module 15 of the PSC which says “de-assert the reset on the DSP
megamodule” and enable the clocks so that the ARM can write to the DSP memory located in
L2. However, this setting does NOT match what the GEL file did for us. So, we need to
enable MORE of the PSC modules so that we match the GEL file.
Note: When doing this for your own system, you’ll need to pick and choose the PSC modules
that are important to your specific system.
Better Setting (USE THIS ONE for the labor as a starting point for your own system)
The numbers scroll out of sight, so here are the values:
PSC0: 0;1;2;3;4;5;9;10;11;12;13;15
PSC1: 0;1;2;3;4;5;6;7;9;10;11;12;13;14;15;16;17;18;19;20;21;24;25;26;27;28;29;30;31
Note: Note: PSC1 is MISSING modules 8, 22-23 (see datasheet for more details on these).
Lab 16b: Booting From Flash
GrabBag 16b - 26 C6000 Embedded Design Workshop - Booting From Flash
Notice for SATA users:
PSC1 Module 8 (SATA) is specifically NOT being enabled. There is a note in the System
Reference Guide saying that you need to set the FORCE bit in MDCTL when enabling SATA.
Thats not an option in the GUI/bootROM so we simply cannot enable it. If you ignore the
author’s advice and enable module 8 in PSC1, you’ll find the boot ROM gets stuck in a spin
loop waiting for SATA to transition and so ultimately your boot fails as a result.
So, there are really two pieces to this puzzle if using SATA:
A. Make sure you do NOT try to enable PSC1 Module 8 through AISgen
B. If you need SATA, make sure you enable this through your application code and be sure
to set the FORCE bit in MDCTL when doing so.
FINAL CHECK - SUMMARY
So, your final main dialogue should look like this with all of these tabs showing. Please
double-check you didn’t forget something:
Save your .cfg file in the \Lab13b_AIS folder for potential use later on you don’t want to
have to re-create all of these steps again if you can avoid it. If you look in that folder, it
already contains this .cfg file done for you. Ok, so we could have told you that earlier, but
then the learning would have been crippled.
The author named the solution’s config file:
OMAP-L138-ARM-DSP-LAB13B_TTO.cfg
Hint: C6748 Users: You will only specify ONE output file (DSP.out)
Hint: OMAP-L138 Users: You will specify TWO files (an ARM.out and a DSP.out).
Lab 16b: Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 27
ARM/DSP Application & Output Files
Ok, we’re almost done with the AISgen settings.
Hint: 6748 SOM Users follow THESE directions (OMAP Users can skip this part)
For the “DSP Application File”, browse to the .OUT file that was created when you built your
keystone project: keystone_flash.out
Hint: OMAP-L138 SOM Users follow THESE directions:
For OMAP-L138 users: you will enter the paths to both files and AISgen will combine them
intoONE image (.bin) to burn into the flash. You must FIRST specify the ARM.out file
followed by the DSP.out file this order MATTERS.
Follow these steps in order carefully.
Click the “…” button shown above next to “ARM Application File” to browse to (use \Lab13b
instead):
Click Open.
Your screen should now look like this (except for using \Lab13b…):
This ARM code is for rev1 silicon. It should also work on Rev2 silicon but not tested.
Lab 16b: Booting From Flash
GrabBag 16b - 28 C6000 Embedded Design Workshop - Booting From Flash
Next, click on the “+” sign (yours will say \Lab13b):
and browse to your keystone_flash.out file you built earlier. You should now have two .out
files listed under “ARM Application File” first the ARM.out, then the DSP.out files separated by a
semicolon. Double-check this is the case.
The AISgen software won’t allow you to see both paths at once in that tiny box, but here is a
picture of the “middle” of the path showing the “semicolon in the middle of the two .out files
again, the ARM.out file needs to be first followed by the DSP.out file (use \Lab13b instead):
Hint: ALL SOM Users Follow THIS STEP…
For the Output file, name it “flash.bin” and use the following path:
C:\SYSBIOSv4\Labs\Lab13b_AIS\flash.bin
Hint: Again, the path and filename CANNOT contain any spaces. When you run the flash
writer later on, that program will barf on the file if there are any spaces in the path or
filename.
Before you click the “Generate AIS” button, notice the other configuration options you have here.
If you wanted AIS to write the code to configure any of these options, simply check them and fill
out the info on the proper tab. This is a WAY cool interface. And, the bootloader does “system
setup for you instead of writing code to do it and making mistakes and debugging those
mistakes…and getting frustratedlike getting tired of reading this rambling text from the
author….
Lab 16b: Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 29
15. Generate AIS script (flash.bin).
Click the “Generate AIS button. When complete, it will provide a little feedback as to how
many bytes were written. Like this:
So, what did you just do?
For OMAP-L138 (ARM+DSP) users, you just combined the ARM.out and DSP.out files into
one flash image flash.bin. For C6748 Users, you simply converted your .out file to a flash
image.
The next step is to burn the flash with this image and then let the bootloader do its thing…
Program the Flash: [.BIN → SPI1 Flash]
16. Check target config and pin settings.
Use the standard XDS510 Target Config file that uses one GEL file (like all the other labs in
this workshop). Make sure it is the default.
Also, make sure pins 5 and 8 on the EVM (S7 switch 7) are ON/UP so that we are in
EMU mode NOT flash boot mode.
17. Load SPIWriter.out into CCS.
The SPIWriter.out file should already be copied into a convenient place:
C:\SYSBIOSv4\Labs\Lab13b_SPIWriter
In CCS,
Launch a debug session (right-click on the target config file and click “launch”)
Connect to target
Select “Load programand browse to this location:
C:\SYSBIOSv4\Labs\Lab13b_SPIWriter\SPIWriter_OMAP-L138.out
Lab 16b: Booting From Flash
GrabBag 16b - 30 C6000 Embedded Design Workshop - Booting From Flash
18. PLAY !
Click Play. The console window will pop up and ask you a question about whether this is a
UBL image. The answer is NO. Only if you were using a TI UBL which would then boot
Uboot, the answer is no. This assumes that Linux is running. Our ARM code has no O/S.
Type a smallcase “n and hit [ENTER]. To respond to the next question, provide the path
name for your .BIN file (flash.bin) created in a previous step, i.e.:
C:\SYSBIOSv4\Labs\Lab13b_AIS\flash.bin
Hint: Do NOT have any spaces in this path name for SPIWriter it NO WORK that way.
Here’s a screen capture from the author (although, you are using the \Lab13b_ais dir, not
\Lab12b) :
Let it run shouldn’t take too long. 15-20 seconds (with an XDS510 emulator). You will see
some progress msgs and then see “success”like this:
19. Terminate the Debug session, close CCS.
Lab 16b: Booting From Flash
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 31
20. Ensure DIP switches are set correctly and get music playing, then power-cycle!
Make sure ALL DIP switches on S7 are DOWN [OFF]. This will place the EVM into the SPI-1
boot mode. Get some music playing. Power cycle the board and THERE IT GOES…
No need to re-flash anything like a POST just leave your neat little program in there for
some unsuspecting person to stumble on one day when they forget to set the DIP switches
back to EMU mode and they automagically hear audio coming out of the speakers when the
turn on the power. Freaky. You should see the LED blinking as well…great work !!
Hint: DO NOT SKIP THE FOLLOWING STEP.
21. Change the boot mode pins on the EVM back to their original state.
Please ensure DIP_5 and DIP_8 of S7 (the one on the right) are UP [ON].
RAISE YOUR HAND and get the instructor’s attention when you have completed
this lab. If time permits, move on to the next OPTIONAL part…
Lab 16b: Booting From Flash
GrabBag 16b - 32 C6000 Embedded Design Workshop - Booting From Flash
Optional DDR Usage
Go back to your keystone project and link the data buffers into DDR memory (just like we did in
the cache lab) via the platform file. Re-compile and generate a new .out file. Then, use AISgen to
create a new flash.bin file and flash it with SPIWriter. Then reset the board and see if it worked.
Did it?
FYI to make things go quicker, we have a .cfg file pre-loaded for AISgen. It is located at (use
\Lab13b_AIS):
When running AISgen, you can simply load this config file and it contains ALL of the settings from
this lab. Edit, recompile, load this cfg, generate .bin, burn, reset. Quick.
Or, you can simply use the .cfg file you saved earlier in this lab…
Additional Information
C6000 Embedded Design Workshop - Booting From Flash GrabBag 16b - 33
Additional Information
Notes
GrabBag 16b - 34 C6000 Embedded Design Workshop - Booting From Flash
Notes
C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM) GrabBag - 16c - 1
Stream I/O and Drivers (PSP/IOM)
Introduction
In this chapter a technique to exchange buffers of data between input/output devices and
processing threads will be considered. The BIOS ‘stream interface will be seen to provide a
universal intervace between I/O and processing threads, making coding easier and more easily
reused.
Objectives
Objectives
Analyze BIOS streams SIO and
the key APIs used
Adapt a TSK to use SIO (Stream I/O)
Describe the benefits of multi-buffer
streams
Learn the basics of PSP drivers
Module Topics
GrabBag - 16c - 2 C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM)
Module Topics
Stream I/O and Drivers (PSP/IOM) ....................................................................................... 16-1
Module Topics.................................................................................................................... 16-2
Driver I/O - Intro ................................................................................................................. 16-3
Using Double Buffers ......................................................................................................... 16-5
PSP/IOM Drivers ................................................................................................................ 16-7
Additional Information....................................................................................................... 16-10
Notes ............................................................................................................................... 16-12
Driver I/O - Intro
C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM) GrabBag - 16c - 3
Driver I/O - Intro
Driver I/O - Intro
GrabBag - 16c - 4 C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM)
Using Double Buffers
C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM) GrabBag - 16c - 5
Using Double Buffers
Double Buffer Stream TSK Coding Example
//prolog prime the process…
status = SIO_issue(&sioIn, pIn1, SIZE, NULL);
status = SIO_issue(&sioIn, pIn2, SIZE, NULL);
status = SIO_issue(&sioOut, pOut1, SIZE, NULL);
status = SIO_issue(&sioOut, pOut2, SIZE, NULL);
//while loop iterate the process…
while (condition == TRUE){
size = SIO_reclaim(&sioIn, (Ptr *)&pInX, NULL);
size = SIO_reclaim(&sioOut, (Ptr *)&pOutX, NULL);
// DSP... to pOut
status = SIO_issue(&sioIn, pInX, SIZE, NULL);
status = SIO_issue(&sioOut, pOutX, SIZE, NULL);
}
//epilog wind down the process…
status = SIO_flush(&sioIn); //stop input
status = SIO_idle(&sioOut); //idle output, then stop
size = SIO_reclaim(&sioIn, (Ptr *)&pIn1, NULL);
size = SIO_reclaim(&sioIn, (Ptr *)&pIn2, NULL);
size = SIO_reclaim(&sioOut, (Ptr *)&pOut1, NULL);
size = SIO_reclaim(&sioOut, (Ptr *)&pOut2, NULL);
Using Double Buffers
GrabBag - 16c - 6 C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM)
Double Buffer Stream TSK Coding Example
//prolog prime the process…
status = SIO_issue(&sioIn, pIn1, SIZE, NULL);
status = SIO_issue(&sioIn, pIn2, SIZE, NULL);
size = SIO_reclaim(&sioIn, (Ptr *)&pInX, NULL);
// DSP... to pOut1
status = SIO_issue(&sioIn, pInX, SIZE, NULL);
size = SIO_reclaim(&sioIn, (Ptr *)&pInX, NULL);
// DSP... to pOut2
status = SIO_issue(&sioIn, pInX, SIZE, NULL);
status = SIO_issue(&sioOut, pOut1, SIZE, NULL);
status = SIO_issue(&sioOut, pOut2, SIZE, NULL);
//while loop iterate the process…
while (condition == TRUE){
size = SIO_reclaim(&sioIn, (Ptr *)&pInX, NULL);
size = SIO_reclaim(&sioOut, (Ptr *)&pOutX, NULL);
// DSP... to pOut
status = SIO_issue(&sioIn, pInX, SIZE, NULL);
status = SIO_issue(&sioOut, pOutX, SIZE, NULL);
}
//epilog wind down the process…
status = SIO_flush(&sioIn); //stop input
status = SIO_idle(&sioOut); //idle output, then stop
size = SIO_reclaim(&sioIn, (Ptr *)&pIn1, NULL);
size = SIO_reclaim(&sioIn, (Ptr *)&pIn2, NULL);
size = SIO_reclaim(&sioOut, (Ptr *)&pOut1, NULL);
size = SIO_reclaim(&sioOut, (Ptr *)&pOut2, NULL);
PSP/IOM Drivers
C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM) GrabBag - 16c - 7
PSP/IOM Drivers
PSP/IOM Drivers
GrabBag - 16c - 8 C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM)
PSP/IOM Drivers
C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM) GrabBag - 16c - 9
Additional Information
GrabBag - 16c - 10 C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM)
Additional Information
Additional Information
C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM) GrabBag - 16c - 11
Notes
GrabBag - 16c - 12 C6000 Embedded Design Workshop - Stream I/O and Drivers (PSP/IOM)
Notes
C6000 Embedded Design Workshop - C66x Introduction GrabBag - 16d - 1
C66x Introduction
Introduction
This chapter provides a high-level overview of the architecture of the C66x devices along with a
brief overview of the MCSDK (Multicore Software Development Kit).
Objectives
Objectives
Describe the basic architecture of the
C66x family of devices
Provide an overview of each device
subsystem
Describe the basic features of the
Multicore Software Development Kit
(MCSDK)
Module Topics
GrabBag - 16d - 2 C6000 Embedded Design Workshop - C66x Introduction
Module Topics
C66x Introduction................................................................................................................. 16-1
Module Topics.................................................................................................................... 16-2
C66x Family Overview ....................................................................................................... 16-3
C6000 Roadmap ............................................................................................................ 16-3
C667x Architecture Overview ......................................................................................... 16-4
C665x Low-Power Devices............................................................................................... 16-11
MCSDK Overview ............................................................................................................ 16-13
What is the MCSDK ? ................................................................................................... 16-13
Software Architecture ................................................................................................... 16-14
For More Info… ................................................................................................................ 16-16
Notes ............................................................................................................................... 16-17
More Notes ................................................................................................................... 16-18
C66x Family Overview
C6000 Embedded Design Workshop - C66x Introduction GrabBag - 16d - 3
C66x Family Overview
C6000 Roadmap
Enhanced DSP core
100% upward object code
compatible
4x performance improvement
for multiply operation
32 16-bit MACs
Improved support for complex
arithmetic and matrix
computation
IEEE 754 Native
Instructions for
SP & DP
Advanced VLIW
architecture
2x registers
Enhanced
floating-point
add capabilities
100% upward object code
compatible with C64x, C64x+,
C67x and c67x+
Best of fixed-point and
floating-point architecture for
better system performance
and faster time-to-market.
Advanced fixed-
point instructions
Four 16-bit or eight
8-bit MACs
Two-level cache
SPLOOP and 16-bit
instructions for
smaller code size
Flexible level one
memory architecture
iDMA for rapid data
transfers between
local memories
C66x ISA
C64x+
C64xC67x
C67x+
FLOATI NG-POINT VALUE FIXED-POINT VALUE
Performance improvement
C674x
C66x Family Overview
GrabBag - 16d - 4 C6000 Embedded Design Workshop - C66x Introduction
C667x Architecture Overview
CorePac
1 to 8 C66x CorePac DSP Cores
operating at up to 1.25 GHz
Fixed/Floating-pt operations
Code compatible with other
C64x+ and C67x+ devices
L1 Memory
Partition as Cache or RAM
32KB L1P/D per core
Dedicated L2 Memory
Partition as Cache or RAM
512 KB to 1 MB per core
Direct connection to memory
subsystem
CorePac
1 to 8 Cores @ up to 1.25 GHz
MSMC
MSM
SRAM
64-bit
DDR3 EMIF
Application-Specific
Coprocessors
Memory Subsystem
Multicore Navigator
C66x™
CorePac
L1P
HyperLink TeraNet
Common and App-specific I/O
L1D
L2
Network
Coprocessor
Memory Subsystem
1 to 8 Cores @ up to 1.25 GHz
MSMC
MSM
S R AM
Application-Specific
Coprocessors
Memory Subsystem
Multicore Navigator
C66x
CorePac
L1P
HyperLink TeraNet
L1D
L2
Memory Subsystem
64-bit
DDR3 EMIF
Multicore Shared Memory
(MSM SRAM)
2 to 4MB (Program or Data)
Available to all cores
Multicore Shared Mem (MSMC)
Arbitrates access to shared
memory and DDR3 EMIF
Provides CorePac access to
coprocessors and I//O
Provides address extension
to 64G (36 bits)
DDR3 External Memory Interface
(EMIF) 8GB
Support for 16/32/64-bit
modes
Specified at up to 1600 MT/s
Network
Coprocessor
Common and App-specific I/O
CorePac
C66x Family Overview
C6000 Embedded Design Workshop - C66x Introduction GrabBag - 16d - 5
Multicore Navigator
1 to 8 Cores @ up to 1.25 GHz
MSMC
MSM
SRAM
Application-Specific
Coprocessors
Memory Subsystem
Packet
DMA
Multicore Navigator
Queue
Manager
Network
Coprocessor
C66x
CorePac
L1P
HyperLink
TeraNet
L1D
L2
Memory Subsystem
CorePac
64-bit
DDR3 EMIF
Provides seamless inter-core
communications (msgs and
data) between cores, IP, and
peripherals. “Fire and forget
Low-overhead processing and
routing of packet traffic to/from
cores and I/O
Supports dynamic load
optimization
Consists of a Queue Manager
Subsystem (QMSS) and
multiple, dedicated Packet DMA
engines
Multicore Navigator
Common and App-specific I/O
Multicore Navigator Architecture
L2 or DDR
Queue
Manager
Hardware Block
queue
pend
PKTDMA
Tx Streaming I/F
Rx Streaming I/F
Tx Scheduling I/F
(AIF2 only)
Tx Scheduling
Control
Tx Channel
Ctrl / Fifos
Rx Channel
Ctrl / Fifos
Tx Core
Rx Core
QMSS
Config RAM
Link RAM
Descriptor RAMs
Register I/F
Config RAM
Register I/F
PKTDMA Control
Buffer Memory
Queue Man register I/F
Input
(ingress)
Output
(egress)
VBUS
Host
(App SW)
Rx Coh
Unit
PKTDMA
(internal)
Timer
PKTDMA register I/F
Queue Interrupts
APDSP
(Accum)
APDSP
(Monitor)
queue pend
Accumulator command I/F
Queue Interrupts
Timer
Accumulation Memory
Tx DMA
Scheduler
Link RAM
(internal)
Interrupt Distributor
C66x Family Overview
GrabBag - 16d - 6 C6000 Embedded Design Workshop - C66x Introduction
Network Coprocessor
1 to 8 Cores @ up to 1.25 GHz
MSMC
MSM
SRAM
Application-Specific
Coprocessors
Memory Subsystem
Multicore Navigator
C66x
CorePac
L1P
HyperLink
TeraNet
L1D
L2
Memory Subsystem
CorePac
64-bit
DDR3 EMIF
Multicore Navigator
Network Coprocessor
Provides H/W accelerators to
perform L2, L3, L4 processing
and encryption (often done in
S/W)
Packet Accelerator (PA)
8K multi-in/out HW queues
Single IP address option
UDP/TCP checksum and CRCs
Quality of Service (QoS)
support
Multi-cast to multiple queues
Security Accelerator (SA)
HW encryption, decryption
and authentication
Supports protocols: IPsec E SP,
IPsec AH, SRTP, 3GPP
Switch
Ethernet
Switch
SGMII
x2
PA
SA
Common and App-specific I/O Network
Coprocessor
External Interfaces
1 to 8 Cores @ up to 1.25 GHz
MSMC
MSM
SRAM
Application-Specific
Coprocessors
Memory Subsystem
Multicore Navigator
C66x
CorePac
L1P
HyperLink
TeraNet
L1D
L2
Memory Subsystem
CorePac
64-bit
DDR3 EMIF
Multicore Navigator
Network Coprocessor
2x SGMII ports support
10/100/1000 Ethernet
4x SRIO lanes for inter-DSP xfrs
SPI for boot operations
UART for development/test
2x PCIe at 5Gbps
I2C for EPROM at 400 Kbps
GPIO
App-specific interfaces
Network
Coprocessor
External Interfaces
S
R
IOx4
P
C
Iex2
U
A
RT
SPI
IC
2
GPIO
Application
Specific I/O
Application
Specific I/O
Ethernet
Switch
SGMII
x2
Common and App-specific I /O
C66x Family Overview
C6000 Embedded Design Workshop - C66x Introduction GrabBag - 16d - 7
TeraNet Switch Fabric
1 to 8 Cores @ up to 1.25 GHz
MSMC
MSM
S R AM
Application-Specific
Coprocessors
Memory Subsystem
Multicore Navigator
C66x
CorePac
L1P
HyperLink TeraNet
L1D
L2
64-bit
DDR3 EMIF
Non-blocking switch fabric that
enables fast and contention-free
data movement
Can configure/manage traffic
queues and priorities of xfrs
while minimizing core
involvement
High-bandwidth transfers
between cores, subsystems,
peripherals and memory
Network
Coprocessor
Common and App-specific I/O
TeraNet Switch Fabric
Memory Subsystem
Multicore Navigator
CorePac
External Interfaces
Network Coprocessor
QMSS
TeraNet Data Connections
MSMC
DDR3
Shared L2 S
S
CoreS
PCIe
S
TAC_BE
S
SRIO
PCIe
QMSS
M
M
M
TPCC
16ch QDMA
MTC0
MTC1
M
MDDR3
XMC
M
DebugSS
M
TPCC
64ch
QDMA
MTC2
MTC3
MTC4
MTC5
TPCC
64ch
QDMA
MTC6
MTC7
MTC8
MTC9
Network
Coprocessor
M
Hy per Li nk
M
Hy per Li nk
S
AIF / PktDMA M
FFTC / PktDMA M
RAC_BE0,1 M
TAC_FE M
SRIOS
S
RAC_FES
TCP3dS
TCP3e_W/RS
VCP2 (x4)S
M
EDMA_0
EDMA_1,2
CoreS M
CoreS M
L2 0-3S M
Facilitates high-bandwidth
communication links between
DSP cores, subsystems,
peripherals, and memories.
Supports parallel orthogonal
communication links
CPUCLK/2
256bit TeraNet
FFTC / PktDMA M
TCP3dS
RAC_FES
VCP2 (x4)SVCP2 (x4)SVCP2 (x4)S
RAC_BE0,1 M
CPUCLK/3
128bit TeraNet
SSSS
C66x Family Overview
GrabBag - 16d - 8 C6000 Embedded Design Workshop - C66x Introduction
Diagnostic Enhancements
1 to 8 Cores @ up to 1.25 GHz
MSMC
MSM
S R AM
Application-Specific
Coprocessors
Memory Subsystem
Multicore Navigator
C66x
CorePac
L1P
HyperLink TeraNet
L1D
L2
64-bit
DDR3 EMIF
Network
Coprocessor
Common and App-specific I/O
Diagnostic Enhancements
TeraNet Switch Fabric
Memory Subsystem
Multicore Navigator
CorePac
External Interfaces
Network Coprocessor
Embedded Trace Buffers (ETB)
enhance CorePac’s diagnostic
capabilities
CP Monitor provides diagnostics
on TeraNet data traffic
Automatic statistics collection
and exporting (non-intrusive)
Can monitor individual events
Monitor all memory transactions
Configure triggers to determine
when data is collected
Debug &
Trace
HyperLink Bus
1 to 8 Cores @ up to 1.25 GHz
MSMC
MSM
S R AM
Application-Specific
Coprocessors
Memory Subsystem
Multicore Navigator
C66x
CorePac
L1P
TeraNet
L1D
L2
64-bit
DDR3 EMIF
Network
Coprocessor
Common and App-specific I/O
TeraNet Switch Fabric
Memory Subsystem
Multicore Navigator
CorePac
External Interfaces
Network Coprocessor
Expands the TeraNet Bus to
external devices
Supports 4 lanes with up to
12.5Gbaud per lane
HyperLink Bus
Diagnostic Enhancements
HyperLink
C66x Family Overview
C6000 Embedded Design Workshop - C66x Introduction GrabBag - 16d - 9
Miscellaneous Elements
1 to 8 Cores @ up to 1.25 GHz
MSMC
MSM
S R AM
Application-Specific
Coprocessors
Memory Subsystem
Multicore Navigator
C66x
CorePac
L1P
TeraNet
L1D
L2
64-bit
DDR3 EMIF
Network
Coprocessor
Common and App-specific I/O
TeraNet Switch Fabric
Memory Subsystem
Multicore Navigator
CorePac
External Interfaces
Network Coprocessor
Boot ROM
HW Semaphore provides atomic
access to shared resources
Power Management
PLL1 (Corepacs), PLL2 (DDR3),
PLL3 (Packet Acceleration)
Three EDMA Controllers
16 64-bit Timers
Inter-Processor Communication
(IPC) Registers
Diagnostic Enhancements
HyperLink
Miscellaneous
HyperLink Bus
Power
M g mt
Boot ROM
HW Sem
x3
PLL
EDMA
x3
App-Specific: Wireless Applications
1 to 8 Cores @ up to 1.25 GHz
MSMC
MSM
S R AM
Memory Subsystem
Multicore Navigator
C66x
CorePac
L1P
TeraNet
L1D
L2
64-bit
DDR3 EMIF
Network
Coprocessor
Common and App-specific I/O
TeraNet Switch Fabric
Memory Subsystem
Multicore Navigator
CorePac
External Interfaces
Network Coprocessor
Diagnostic Enhancements
HyperLink
FFTC
TCP3d
TCP3e
x2
x2
Coprocessors
VCP2 x4
BCP
Wireless Applications
Miscellaneous
HyperLink Bus
Application-Specific
Wireless-specific Coprocessors
2x FFT Coprocessor (FFTC)
Turbo Dec/Enc (TCP3D/3E)
4x Viterbi Coprocessor (VCP2)
Bit-rate Coprocessor (BCP)
2x Rake Search Accel (RSA)
Wireless-specific Interfaces
6x Antenna Interface (AIF2)
AIF2 x6
RSA
C6670
C66x Family Overview
GrabBag - 16d - 10 C6000 Embedded Design Workshop - C66x Introduction
App-Specific: General Purpose
1 to 8 Cores @ up to 1.25 GHz
MSMC
MSM
S R AM
Application-Specific
Coprocessors
Memory Subsystem
Multicore Navigator
C66x
CorePac
L1P
TeraNet
L1D
L2
64-bit
DDR3 EMIF
Network
Coprocessor
Common and App-specific I/O
TeraNet Switch Fabric
Memory Subsystem
Multicore Navigator
CorePac
External Interfaces
Network Coprocessor
Diagnostic Enhancements
HyperLink
EMIF 16
TSIP x2
General Purpose Applications
Miscellaneous
HyperLink Bus
Wireless Applications
2x Telecom Serial Port (TSIP)
EMIF 16 (EMIF-A):
Connects memory up to 256MB
Three modes:
Synchronized SRAM
NAND Flash
NOR Flash
C665x Low-Power Devices
C6000 Embedded Design Workshop - C66x Introduction GrabBag - 16d - 11
C665x Low-Power Devices
Keystone C6655/57 Device Features
C6655/57 Low-Power Devices
1 or 2 Cores @ up to 1.25 GHz
C66x
CorePac
VCP2
C6655/57
MSMC
1MB
MSM
SRAM
32-Bit
DDR3 EM IF
TCP3d
x2
Coprocessors
Me mo ry S ub syste m
Packet
DMA
Multicore Navigator
Queue
Manager
x2
32 KB L1
P-Cache 32 KB L1
D-Cache
1024KB L2 Cache
PLL
EDM A
HyperLink TeraNet
Ethernet
M AC
SGM II
SRIOx4
SPI
UART x2
PCIex2
I
2
C
UPP
McBSP x2
GPIO
EMIF16
Boot ROM
Debug & Trace
Pow e r
Management
Semaphore
Security /
Key Manager
Timers
2nd core, C6657 only
C66x CorePac
C6655 (1 core) @ 1/1.25 GHz
C6657 (2 cores) @ 0.85, 1.0 or
1.25 GHz
Memory Subsystem
1MB Local L2 per core
MSMC , 32-bit DDR3 I/F
Hardware Coprocessors
TCP3d, VCP2
Multicore Navigator
Interfaces
2x McBSP, SPI, I2C, UPP, UART
1x 10/100/1000 SGMII port
Hyperlink, 4x SRIO, 2x PCIe
EMIF 16, GPIO
Debug and Trace (ETB/STB)
Keystone C6654 Power Optimized
C6654 Power Optimized
1 Core @ 850 MHz
C66x
CorePac
C6654
MSMC
32-Bit
DDR3 EM IF
Me mo ry S ub syste m
Packet
DMA
Multicore Navigator
Queue
Manager
x2
32 KB L1
P-Cache 32 KB L1
D-Cache
1024KB L2 Cache
PLL
EDM A
TeraNet
Ethernet
M AC
SGM II
SPI
UART x2
PCIex2
I
2
C
UPP
McBSP x2
GPIO
EMIF16
Boot ROM
Debug & Trace
Pow e r
Management
Semaphore
Security /
Key Manager
Timers
C66x CorePac
C6654 (1 core) @ 850 MHz
Memory Subsystem
1MB Local L2
MSMC , 32-bit DDR3 I/F
Multicore Navigator
Interfaces
2x McBSP, SPI, I2C, UPP, UART
1x 10/100/1000 SGMII port
EMIF 16, GPIO
Debug and Trace (ETB/STB)
C665x Low-Power Devices
GrabBag - 16d - 12 C6000 Embedded Design Workshop - C66x Introduction
Keystone C665x Comparisons
HW Feature C6654 C6655 C6657
CorePac
Frequency (GHz) 0.85 1 @ 1.0, 1.25 2 @ 0.85, 1.0, 1.25
Multicore
Shared Mem
(MSM)
No 1MB SRAM
DDR3 Maximum Data Rate
1066 1333
Serial Rapid I/O (SRIO) Lanes
No 4x
HyperLink
No Yes
Viterbi
CoProcessor (VCP) No 2x
Turbo Decoder (TCP3d)
No Yes
MCSDK Overview
C6000 Embedded Design Workshop - C66x Introduction GrabBag - 16d - 13
MCSDK Overview
What is the MCSDK ?
What is MCSDK?
The Multicore Software Development Kit (MCSDK)
provides the core foundational building blocks for
customers to quickly start developing embedded
applications on TI high performance multicore DSPs.
Uses the SYS/BIOS or Linux real-time operating system
Accelerates customer time to market by focusing on ease
of use and performance
Provides multicore programming methodologies
Available for free on the TI website bundled in one
installer, all the software in the MCSDK is in source
form along with pre-built libraries
Editor
CodeGen
OpenMP
Profiler
Debugger
Remote
Debug
Multicore Syste m
Analyzer
Visualization
Host Computer Target Board
Eclipse
PolyCore
ENEA
Optima
3L
Critical
Blue
Multicore Software Development Kit
Code
Composer
Studio
TM
Third
Party
Plug-Ins
Software Development Ecosystem
Multicore Performance, Single-core Simplicity
XDS 560 V2
XDS 560 Trace
MCSDK Overview
GrabBag - 16d - 14 C6000 Embedded Design Workshop - C66x Introduction
Software Architecture
Migrating Development Platform
May be used “as is” or customer can
implement value-add mod ifications
Needs to be modified or r eplaced
with customer version
No modifications required
CSL
TI Platform
Ne tw or k
Dev Kit
Demo A pplic atio n
TI Demo Application
on TI Evaluation
Platform
IPCLLD
EDMA ,
Etc
Tools
(UI A )
CSL
Custo mer
Platfo rm
TI Demo
Application on
Customer Platform
IPCLLD
Ne tw or k
Dev Kit
EDMA ,
Etc
Tools
(UI A )
Demo A pplic atio n
CSL
Custo mer
Platfo rm
Ne tw or k
Dev Kit
IPCLLD
EDMA ,
Etc
To ols
(UI A )
Customer
Application on
Customer Platform
Cu stome r A pp lic ation
CSL
Next Gen TI
Platfo rm
Ne tw or k
Dev Kit
IPCLLD
EDMA ,
Etc
To ols
(UI A )
Customer App on
Next Generation TI
SOC Platform
Cu stome r A pp lic ation
Software may be
different, but API
remain the same
(CSL, LLD, etc.)
BIOS-MCSDK Software
Hardware
SYS/BIOS
RTOS
Software Framework Components
Interprocessor
Communication
Instrumentation
(MCSA)
Communication Protocols
TCP/IP
Networking
(NDK)
Algorithm Libraries
DSPLIB IMGLIB MATHLIB
Demonstration Applications
HUA/OOB IO Bmarks Image
Processing
Low-Level Drivers (LLDs)
Chip Support Library (CSL)
EDMA3
PCIe
PA
QMSS
SRIO
CPPI
FFTC
HyperLink
TSIP
Platform/EVM Software
Bootloader
Platform
Library
POST
OSAL
Resource
Manager
Transports
-IPC
-NDK
MCSDK Overview
C6000 Embedded Design Workshop - C66x Introduction GrabBag - 16d - 15
Device 1
SoC Hardware and Peripherals
Core 1
IPC
Process
1
Process
2
BIOS
Core 2
IPC
Process
1
Process
2
BIOS
Device 2
SoC Hardware and Peripherals
Core 1
IPC
Process
1
Process
2
BIOS
Core 2
IPC
Process
1
Process
2
BIOS
Interprocessor Communication (IPC)
Device 1
SoC Hardware and Peripherals
Core 1
SysLin k
Process
1
Process
2
Linux
Core 2
IPC
Process
1
Process
2
BIOS
Core 3
IPC
Process
1
Process
2
BIOS
Core N
IPC
Process
1
Process
2
BIOS
IPC Transports
Task
to
Task
Core to
Core
Device
to
Device
Shared Memory x x
Navigator/QMSS x x
SRIO x x x
PCIe x x x
HyperLink x x x
For More Info…
GrabBag - 16d - 16 C6000 Embedded Design Workshop - C66x Introduction
For More Info…
EVM Flash Contents
NAND
64 MB
NOR
16 MB
EEPROM
128 KB
POST
IBL
BIOS MCSDK
“Out of Box” Demo Linux MCSDK
Demo
Linux/BIOS MCSDK C66x Lite EVM Details
DVD Contents
Factory de fault recovery
EEPROM : POST, IBL
NOR: BIOS MCSDK Demo
NAND: Linux M CSDK
De mo
EEPROM/Flash writers
CCS 5.0
IDE
C667x EVM GEL/XM L file s
BIOS M CSDK 2.0
Source/binary packages
Linux M CSDK 2.0
Source/binary packages
Online Collateral
TMS320C667x processor website
http://focus.ti.com/docs/prod/folders/print/tms320c6678.html
http://focus.ti.com/docs/prod/folders/print/tms320c6670.html
MCSDK website for updates
http://focus.ti.com/docs/toolsw/folders/print/bioslinuxmcsdk.html
CCS v5
http://processors.wiki.ti.com/index.php/Category:Code_Composer_Studio_v5
Developers website
Linux: http://linux-c6x.org/
BIOS: http://processors.wiki.ti.com/index.php/BIOS_MCSDK_2.0_Use r_Guide
For More Information
For questions regarding topics covered in this training, visit the following e2e support forums:
http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639.aspx
Download MCSDK software:
http://focus.ti.com/docs/toolsw/folders/print/bioslinuxmcsdk.html
Refer to the MCSDK User’s Guide:
http://processors.wiki.ti.com/index.php/BIOS_MCSDK_2.0_User_Guide
http://e2e.ti.com/support/embedded/f/355.aspx
Users
Guide
Download
Software
Software
Forums
Notes
C6000 Embedded Design Workshop - C66x Introduction GrabBag - 16d - 17
Notes
More Notes
GrabBag - 16d - 18 C6000 Embedded Design Workshop - C66x Introduction
More Notes
*** the very very end ***

Navigation menu